Crafting Smart Search: Building a Custom Query Parser for Apache Solr

This technical guide demonstrates how to implement a custom query parser plugin for Apache Solr that enhances search capabilities beyond the default functionality. The implementation creates a sophisticated ranking system that prioritizes exact phrase matches while still capturing looser term associations, providing more nuanced search results. The article covers the basic plugin structure, configuration requirements, and integration with Solr, offering practical insights into extending Solr's search capabilities through its plugin API.

4 Minutes reading time

Behold the masterpiece that AI hallucinated while reading this post:

"How Little Query Parser Learned to Understand What Users Really Want"

(after I fed it way too many marketing blogs and memes)

Created using DALL-E 3

AI-Generated: How Little Query Parser Learned to Understand What Users Really Want

Apache Solr is a very powerful and mature enterprise search server. It comes with a lot of handy and useful features. One of its features is the query API.

Now, what is the query API? This API is used to search thru the indexed documents, as the name suggests. But how are the documents searched? Well, the search is based on a search query. Basically the search query is a string, and this string is passed to a so called query parser. The query parser then transforms the query string to a Lucene query instance, which is then used by Solr to crawl the index and return found documents.

Apache Solr comes with a default query parser which supports the Lucene query syntax. There are also some more advanced query parser available, and a plugin API so implement a custom query parser. Now I want to take a look at the plugin API to implement a custom query parser and configure Solr to use it.

Solr plugin implementation

What should the new query parser do? Well, it should transform a simple user entered phrase to a more advanced Lucene query, which does the following:

Search string is : domain driven design

Generated Lucene query in prose would be:

Highest raking

exact match of terms "domain" and "driven" and "design" is this order

A little bit lower ranking

terms "domain" and "driven" and "design" with a slop of one and in any order

Even lower ranking

terms "domain" and "driven" and "design" with a slop of two and in any order

Lowest ranking

existence of the terms "domain" and "driven" and "design" at any place in the document in any order

Where would we start to implement such a query parser? Well, we start with the Solr query parser plugin. And for starters, here is the annotated source code for our new plugin:

package de.mirkosertic.desktopsearch;

import org.apache.lucene.search.Query;
import org.apache.solr.common.params.SolrParams;
import org.apache.solr.common.util.NamedList;
import org.apache.solr.request.SolrQueryRequest;
import org.apache.solr.schema.IndexSchema;
import org.apache.solr.search.QParser;
import org.apache.solr.search.QParserPlugin;
import org.apache.solr.search.SyntaxError;

public class QueryParserPlugin extends QParserPlugin { (1)

    @Override
    public void init(NamedList args) { (2)
        super.init(args);
    }

    @Override
    public QParser createParser(String aQueryString, SolrParams aLocalParams, SolrParams aParams, SolrQueryRequest aRequest) { (3)
        return new QParser(aQueryString, aLocalParams, aParams, aRequest) {
            @Override
            public Query parse() throws SyntaxError { (4)
                IndexSchema theSchema = aRequest.getSchema();

                return ... (5)
            }
        };
    }
}
1Every custom query parser must extend the org.apache.solr.search.QParserPlugin class
2The init method is called once after Solr has instantiated the class
3This method is called for every search request to retrieve a new org.apache.solr.search.QParser instance
4The parse Method is invoked by Solr for every search request
5Here happens the magic to transform a query string into a Lucene org.apache.lucene.search.Query instance

Solr configuration

Now we need to configure Solr to make the plugin available. Part of the configuration is to build a JAR file with all of the plugin dependencies and add it to the Solr Core classpath. Then we need to register the plugin in the solrconfig.xml file as follows:

<queryParser
    name="customqueryparser"  (1)
    class="de.mirkosertic.desktopsearch.QueryParserPlugin" (2)
    />
1A unique name for the query parser plugin
2The full qualified classname of the query parser plugin

It is query time!

Finally we can fire a search query to Solr. To use our new query parser for this query, we have to add a defType=customqueryparser to the search request. The passed value matches the name attribute of the added queryParser element in solrconfig.xml.

Details I’ve missed

You will have noticed that I’ve left out the complete query parser implementation. Under the hood I am using a Lucene Boolean Query with a lot of nesting SpanNear and TermQueries. Showing the hole process would be too much at this point, as I am focusing on the Solr plugin API here. If you want to dive deeper into the Lucene query construction process, I’d suggest to take a look at my JavaFX Desktop Search Project hosted at GitHub.

Git revision: 2e692ad