Query Parser

To make searching databases simpler, Xapian provides a QueryParser class which converts a human readable query string into a Xapian Query object, for example:

apple AND a NEAR word OR "a phrase" NOT (too difficult) +eh

The above example shows how some of the basic modifiers are interpreted by the QueryParser; the operators it supports follow the operators described earlier, for example:

  • apple AND pear matches documents where both terms are present
  • apple OR pear matches documents where either term (or both) are present
  • apple NOT pear matches documents where apple is present and pear is not

Term Generation

The QueryParser uses an internal process to convert the query string into terms. This is similar to the process used by the TermGenerator, which can be used at index time to convert a string into terms. It is often easiest to use QueryParser and TermGenerator on the same database.

Operators

As well as the basic logical operators, QueryParser supports the additional operators discussed earlier and introduces some new ones, for example:

apple NEAR dessert
president "united states"
"race condition" -horse
+recipe +apple pie cake dessert

The NEAR and phrase support behaves in the same way as described earlier; the new features are the + and - operators, which select documents based on the presence or absence of specified terms, for example:

"race condition" -horse

Matches all documents with the phrase "race condition" but not horse; and:

+recipe +apple pie cake desert

Which matches all documents which have the terms recipe and apple; then all documents with these terms are weighted according to the weight of the additional terms.

One thing to note is that the behaviour of the +/- operators vary depending on the default operator used and the above examples assume that OP_OR is used.

Bracketed Expressions

When queries contain both OR and AND operators, AND takes precedence. To change the precedence of parts of the query, brackets can be used. For example, with the query:

apple OR pear AND dessert

The query parser will interpret this query as:

apple OR (pear AND dessert)

So to change the precedence and make the dessert a requirement, you would write the query initially as:

(apple OR pear) AND dessert

Stop words

Xapian also supports a stop word list, which allows you to specify words which should be removed from a query before processing. This list can be overridden within user search, so stop words can still be searched for if desired, for example if a stop word list contained ‘the’ and a search was for:

+the +document

Then the search would find relevant documents which contained both ‘the’ and ‘document’. Also, when searching for phrases, stop words do not apply, for example here we will retrieve documents with the exact phrase including ‘the’:

"the green space"

Searching with Prefixes

When a database is populated using prefixed terms (for example, title, author) it is possible to tell the QueryParser that these fields can be searched for using a human-readable prefix; for example:

author:"william shakespeare" title:juliet

Ranges

The QueryParser also supports range searches on document values, matching documents which have values within a given range. There are several types of range processors available, but the two discussed here are Date and Number, which require that values are serialised as they are indexed.

To use a range, additional programming is required to tell the QueryParser what format a range is specified in and which value is to be searched for matches within that range. This then gives rise to the ability to specify ranges as:

$10..50
5..10kg
01/01/1970..01/03/1970
size:3..7

When date ranges are configured (as a DateRangeProcessor), you can configure which format dates are to be interpreted as (i.e. month-day-year) or otherwise.

Wildcards

It is also possible to use wildcards to match any number of trailing characters within a term; for example:

wild* matches wild, wildcard, wildcat, wilderness

This feature is disabled by default; to enable it, see ‘Parser Flags’ below. In Xapian 1.2.x and earlier it also required a database to be set on the QueryParser (so that it could find the list of terms to expand the wildcard to) - in Xapian 1.4.x and later the wildcard expansion doesn’t happen until xapian.Enquire.get_mset() is called.

By default the wildcard will expand to as many terms as there are with the specified prefix. This can cause performance problems, so you can limit the number of terms a wildcard will expand to by calling xapian.QueryParser.set_max_wildcard_expansion(). If this limit would be exceeded then an exception will be thrown. The exception may be thrown by the QueryParser, or later when Enquire handles the query.

Searching for Partially Entered Queries

The QueryParser also supports performing a search with a query which has only been partially entered. This is intended for use with “incremental search” systems, which don’t wait for the user to finish typing their search before displaying an initial set of results. For example, in such a system a user would enter a search, and the system would display a new set of results after each letter, or whenever the user pauses for a short period of time (or some other similar strategy).

The problem with this kind of search is that the last word in a partially entered query often has no semantic relation to the completed word. For example, a search for “dynamic cat” would return a quite different set of results to a search for “dynamic categorisation”. This results in the set of results displayed flicking rapidly as each new character is entered. A much smoother result can be obtained if the final word is treated as having an implicit terminating wildcard, so that it matches all words starting with the entered characters - thus, as each letter is entered, the set of results displayed narrows down to the desired subject.

A similar effect could be obtained simply by enabling the wildcard matching option, and appending a “*” character to each query string. However, this would be confused by searches which ended with punctuation or other characters.

This feature is disabled by default - pass FLAG_PARTIAL flag in the flags argument of xapian.QueryParser.parse_query(query_string, flags) to enable it, and tell the QueryParser which database to expand wildcards from using the xapian.QueryParser.set_database(database) method.

Default Operator

When the QueryParser receives a query, it joins together its component queries using a default operator which defaults to OP_OR but can be modified at run time.

Parser Flags

The operation of the QueryParser can be altered through the use of flags, combined with the bitwise OR operator; these flags include:

  • FLAG_BOOLEAN: enables support for AND, OR, etc and bracketed expressions
  • FLAG_PHRASE: enables support for phrase expressions
  • FLAG_LOVEHATE: enables support for + and - operators
  • FLAG_BOOLEAN_ANY_CASE: enables support for lower/mixed case boolean operators
  • FLAG_WILDCARD: enables support for wildcards

You can find more information about the available flags in the API documentation.

By default, the QueryParser enables FLAG_BOOLEAN, FLAG_PHRASE and FLAG_LOVEHATE.