Crafting Smart Search: Building a Custom Query Parser for Apache Solr

Thu, Oct 26, 2017 by Mirko Sertic

This technical guide demonstrates how to implement a custom query parser plugin for Apache Solr that enhances search capabilities beyond the default functionality. The implementation creates a sophisticated ranking system that prioritizes exact phrase matches while still capturing looser term associations, providing more nuanced search results. The article covers the basic plugin structure, configuration requirements, and integration with Solr, offering practical insights into extending Solr's search capabilities through its plugin API.

4 Minutes reading time

Enterprise ,Interesting ,Search

Apache Solr is a very powerful and mature enterprise search server. It comes with a lot of handy and useful features. One of its features is the query API.

Now, what is the query API? This API is used to search thru the indexed documents, as the name suggests. But how are the documents searched? Well, the search is based on a search query. Basically the search query is a string, and this string is passed to a so called query parser. The query parser then transforms the query string to a Lucene query instance, which is then used by Solr to crawl the index and return found documents.

Apache Solr comes with a default query parser which supports the Lucene query syntax. There are also some more advanced query parser available, and a plugin API so implement a custom query parser. Now I want to take a look at the plugin API to implement a custom query parser and configure Solr to use it.

Solr plugin implementation

What should the new query parser do? Well, it should transform a simple user entered phrase to a more advanced Lucene query, which does the following:

Search string is : domain driven design

Generated Lucene query in prose would be:

Highest raking: exact match of terms "domain" and "driven" and "design" is this order
A little bit lower ranking: terms "domain" and "driven" and "design" with a slop of one and in any order
Even lower ranking: terms "domain" and "driven" and "design" with a slop of two and in any order
Lowest ranking: existence of the terms "domain" and "driven" and "design" at any place in the document in any order

Where would we start to implement such a query parser? Well, we start with the Solr query parser plugin. And for starters, here is the annotated source code for our new plugin:

package de.mirkosertic.desktopsearch;

import org.apache.lucene.search.Query;
import org.apache.solr.common.params.SolrParams;
import org.apache.solr.common.util.NamedList;
import org.apache.solr.request.SolrQueryRequest;
import org.apache.solr.schema.IndexSchema;
import org.apache.solr.search.QParser;
import org.apache.solr.search.QParserPlugin;
import org.apache.solr.search.SyntaxError;

public class QueryParserPlugin extends QParserPlugin { (1)

    @Override
    public void init(NamedList args) { (2)
        super.init(args);
    }

    @Override
    public QParser createParser(String aQueryString, SolrParams aLocalParams, SolrParams aParams, SolrQueryRequest aRequest) { (3)
        return new QParser(aQueryString, aLocalParams, aParams, aRequest) {
            @Override
            public Query parse() throws SyntaxError { (4)
                IndexSchema theSchema = aRequest.getSchema();

                return ... (5)
            }
        };
    }
}

1	Every custom query parser must extend the org.apache.solr.search.QParserPlugin class
2	The init method is called once after Solr has instantiated the class
3	This method is called for every search request to retrieve a new org.apache.solr.search.QParser instance
4	The parse Method is invoked by Solr for every search request
5	Here happens the magic to transform a query string into a Lucene org.apache.lucene.search.Query instance

Solr configuration

Now we need to configure Solr to make the plugin available. Part of the configuration is to build a JAR file with all of the plugin dependencies and add it to the Solr Core classpath. Then we need to register the plugin in the solrconfig.xml file as follows:

<queryParser
    name="customqueryparser"  (1)
    class="de.mirkosertic.desktopsearch.QueryParserPlugin" (2)
    />

1	A unique name for the query parser plugin
2	The full qualified classname of the query parser plugin

It is query time!

Finally we can fire a search query to Solr. To use our new query parser for this query, we have to add a defType=customqueryparser to the search request. The passed value matches the name attribute of the added queryParser element in solrconfig.xml.

Details I’ve missed

You will have noticed that I’ve left out the complete query parser implementation. Under the hood I am using a Lucene Boolean Query with a lot of nesting SpanNear and TermQueries. Showing the hole process would be too much at this point, as I am focusing on the Solr plugin API here. If you want to dive deeper into the Lucene query construction process, I’d suggest to take a look at my JavaFX Desktop Search Project hosted at GitHub.

<<Pevious posting: Bytecoder : A Low Level Bytecode to JavaScript Transpiler Next posting: Mastering Object-Oriented Programming in WebAssembly: From High-Level to Low-Level>>

Git revision: 2e692ad