Sitecore – Breaking changes when using the Solr ContentSearch provider in Sitecore 10.1+

Within the past few days my main focus was upgrading a solution from Sitecore 9.2, to Sitecore 10.2. I didn’t have any struggles and the upgrade was pretty smooth, until the client started testing the search behaviour.

Apparently Sitecore introduced a breaking change within Sitecore 10.1, that changes the way that queries with Contains are being executed. This change is not marked as breaking, and is mentioned as follows (from the release notes of Sitecore 10.1)

It looks that Sitecore has made some changes to introduce splitting on delimiters within the passed query. The delimiters are whitespace, symbols, puncation, control characters, quotes and dots.

The search behaviour in Sitecore 9.2

When using the Contains method, when we search for the search term My Search Term, Sitecore 9.2 would output the following query; (*My Search Term*).
This would resolve to the following parsed query by Solr (the field I’m searching in is called _content):
_content:*my _content:search _content:term*

The search behaviour in Sitecore 10.2

When using the Contains method, when we search for the search term My Search Term, Sitecore 10.2 will output the following query; (*My* *Search* *Term*) to Solr. As you can see this is a big difference to the query that is being generated in Sitecore 9.2.
This would resolve to the following parsed query by Solr (the field I’m searching in is called _content):
_content:*my* _content:*search* _content:*term*

The implementation in Sitecore 9.2

Within Sitecore 9.2, the methods are directly handled by the SolrQueryMapper (Sitecore.ContentSearch.Linq.Solr.SolrQueryMapper). This would just return a wildcard search for the complete query. Please see the implementation below

The implementation in Sitecore 10.2

Within Sitecore 10.2 they moved these special cases to a QueryTranslator implementation, so that these methods do not require changes to the SolrQueryMapper. At the time of writing, there are currently 10 different Query Translators defined:

  • default
  • pharse
  • cs.linq.equals
  • cs.linq.starts_with
  • cs.linq.ends_with
  • cs.linq.contains
  • cs.linq.matches.regex
  • cs.linq.matches.wildard
  • cs.linq.like.fuzzy
  • cs.linq.like.promixity.

The Visit methods in the SolrQueryMapper now checks if the ContentSearch.Solr.Linq.UseLegacyFormatQueryPipeline is disabled (default value), and if so it invokes the new behaviour that is configured within the Query Translator which will return the matching query.

The Query Translator for Contains looks as follows:

As you can see in the Translate method above, it uses the GetTokens method to retrieve the tokens from the search query. This method calls the GetTokens method on the Sitecore.ContentSearch.Linq.Solr.Parsing.TextTypeSolrTokenizer class, to parse the query and return a list of tokens.

The tokenizer splits the query on whitespace, symbols, puncation, control characters, quotes and dots. This means that if the input is “My Search Term”, that the output would be a list of strings, containing “My”, “Search”, “Term”.

These tokens are then passed to the RootWildcardQueryByField class, which will eventually lead to a generated query like (*My* *Search* *Term*)

So, how do we turn this off?

There are two ways to restore the behaviour as it was in versions of Sitecore < 10.1. You can either disable the entire new way of using QueryTranslators by changing a setting, or add a custom implementation for Contains, StartsWith and EndsWith.

Changing the setting.

You can easily disable the entire QueryTranslator functionality and revert back to the old functionality by setting the ContentSearch.Solr.Linq.UseLegacyFormatQueryPipeline setting to true. Please note that this is a temporary setting and will be removed in the next version. (according to the comments in the code)

<configuration xmlns:set="http://www.sitecore.net/xmlconfig/set/">
  <sitecore>
    <settings>
      <setting name="ContentSearch.Solr.Linq.UseLegacyFormatQueryPipeline" set:value="true" />
    </settings>
  </sitecore>
</configuration>

Adding a custom implementation

To resolve the issue using a custom implementation, you will have to create your own Query Translator, and patch it in. Don’t forget to add NuGet packages for Sitecore.ContentSearch.SolrNetExtension and SolrNet.Core.

ContainsTermSolrTextQueryTranslator.cs

using Sitecore.ContentSearch.Linq.Parsing;
using Sitecore.ContentSearch.SolrNetExtension.Queries;
using SolrNet;

namespace Example
{
    public class ContainsTermSolrTextQueryTranslator : BaseTextQueryTranslator<AbstractSolrQuery>
    {
        public override AbstractSolrQuery Translate(string query, TextQueryTranslatorContext context)
        {
            return new RootWildcardQueryByField(context.FieldName, new[] { query }, Operator.OR, RootWildcardQueryByField.WildcardPosition.Both);
        }
    }
}

Dont forget to add a config file so that your translator is patched in. You can do that as follows:

<configuration xmlns:search="http://www.sitecore.net/xmlconfig/search/" xmlns:set="http://www.sitecore.net/xmlconfig/set/">
  <sitecore search:require="solr">
    <contentSearch>
      <indexConfigurations>
        <defaultSolrIndexConfiguration type="Sitecore.ContentSearch.SolrProvider.SolrIndexConfiguration, Sitecore.ContentSearch.SolrProvider">
          <searchOptions>
            <textQueryTranslatorStore>
              <queryTranslators>
                <queryTranslator key="cs.linq.contains" set:type="Example.ContainsTermSolrTextQueryTranslator, Example"/>
              </queryTranslators>
            </textQueryTranslatorStore>
          </searchOptions>
        </defaultSolrIndexConfiguration>
      </indexConfigurations>
    </contentSearch>
  </sitecore>
</configuration>

Leave a Reply

Your email address will not be published.