A while ago i’ve noticed that for one of our instances the memory usage of the XConnect Search Indexer was extremly high and would not go down. In addition to that, I also saw that changes on XDB related data were not reflected within Solr. Rescaling the webapp to a higher plan would just result in those resources being used as well.
Change Tracking
Let me first give you some background information on how the XConnect Search Indexer knows what it should index. A few tables within the XDB Shard databases have MSSQL Change Tracking enabled, meaning that it stores information about which records have changed (inserted, updated or deleted). For these records it doesn’t store what is changed, only that it is changed. These change records also have a version counter, which is an incremental value.
Solr and the sync-token
The XConnect Search Indexer stores a sync-token within the XDB Solr index, which holds an reference to the Shard database, and the last version number that it has indexed.
This sync token consists out of a field called xdbtokenbytes_s that contains a Base64 encoded Dictionary<string,long?> and is serialized using the default BinaryFormatter of C#. You can retrieve the sync token by querying your XDB Solr index using the following query: id:xdb-index-token
You can decode the value in C# as follows:
var bytes = Convert.FromBase64String("The value of the xdbtokenbytes_s field"); Dictionary<string, long?> deserializedDictionary; using (MemoryStream memoryStream = new MemoryStream()) { BinaryFormatter binaryFormatter = new BinaryFormatter(); memoryStream.Write(bytes, 0, (int)bytes.Length); memoryStream.Seek((long)0, SeekOrigin.Begin); deserializedDictionary = (Dictionary<string, long?>)binaryFormatter.Deserialize(memoryStream); } //deserializedDictionary now contains the data
An example JSON representation of the decoded value is as follows:
{ "[DataSource=myclient-prd-sql.database.windows.net Database=myclient-prd-shard1-db]": 33844557, "[DataSource=myclient-prd-sql.database.windows.net Database=myclient-prd-shard0-db]": 33706449 }
Processing of the changed items
So, now the XConnect Search indexer knows what it has succesfully indexed, and where it should continue using the version number. The indexer uses the following stored procedures on the Shard databases to retrieve the changesets using the version that was stored in the XDB index:
- GetContactFacetsChanges
- GetContactsChanges
- GetInteractionFacetsChanges
- GetInteractionsChanges
When it retrieves the ChangeSet, it will split the records depending on the value configured within the SplitRecordsThreshold setting, resulting in multiple lists with N changes. These lists will be processed in parallel using an amount of threads that is equal to the Processor Count.
The processing method does the following:
- Retrieve the data from the database using the ID’s from the list that is passed in
- Foreach item that is retrieved from the database, process it to get the data it needs for the Solr index.
- Create a JSON document that can be send to Solr, and store it in a batch
- When everything is done, commit the Solr batch.
As you can see in the above process the solr batch will be kept in memory untill the entire list is processed. Given the size by the SplitRecordsThreshold, this batch can grow very large. Interactions and InteractionFacets can be really large, causing high memory usage, especially when the SplitRecordsThreshold is set to a value that is to high.
Conclusion
For Sitecore 9.0 to Sitecore 9.2, the SplitRecordsThreshold setting is configured to 25000 by default, and is prone to high memory usage. Sitecore 9.3+ has a default configured of 1000, and i would advise to configure this by default for all Sitecore versions. You can find the release notes of Sitecore 9.3 here, and use reference number 340608 to see the change.
Verification
If you want to be sure that the memory issue is occuring because of the SplitRecordsThreshold setting, you can create a memory dump of the Search Indexer, and analyze it with the DebugDiag analysis tool. If the result is similar to the image below, then please try to reduce the value of the SplitRecordsThreshold to 1000.