27.5. Query Construction API

In addition to parsing query automatically it's also possible to constract them through API.

User query can be combined with queries created through API. Use query parser to construct query from a string:

<?php
$query = Zend_Search_Lucene_Search_QueryParser::parse($queryString);

27.5.1. Query Parser Exceptions

Query parser may generate two types of exceptions.

  • Zend_Search_Lucene_Exception is thrown if something wrong in a query parser itself.

  • Zend_Search_Lucene_Search_QueryParserException is thrown in tha case of query syntax exception.

Thus that's good idea to catch Zend_Search_Lucene_Search_QueryParserException and make appropriate message:

<?php
try {
    $query = Zend_Search_Lucene_Search_QueryParser::parse($queryString);
} catch (Zend_Search_Lucene_Search_QueryParserException $e) {
    echo "Query syntax error: " . $e->getMessage() . "\n";
}

The same technique may (and should) be used for find() method of Zend_Search_Lucene object.

27.5.2. Term Query

Term queries are intended for a searching for a single term.

Query string:

word1
        

or

Query construction by API:

<?php
$term  = new Zend_Search_Lucene_Index_Term('word1', 'field1');
$query = new Zend_Search_Lucene_Search_Query_Term($term);
$hits  = $index->find($query);

Term field is optional. Zend_Search_Lucene searches through all fields if field is not specified:

<?php
$term  = new Zend_Search_Lucene_Index_Term('word1');  // Search 'word1' through all indexed fields
$query = new Zend_Search_Lucene_Search_Query_Term($term);
$hits  = $index->find($query);

27.5.3. Multi-Term Query

Multi term queries are intended for a searching for a set of terms.

Each term in a set can be defined as required, prohibited, or neither.

  • required means that documents not matching this term will not match the query;

  • prohibited means that documents matching this term will not match the query;

  • neither, in which case matched documents are neither prohibited from, nor required to, match the term. A document must match at least 1 term, however, to match the query.

If optional terms are added to a query with required terms, then they will have the same result set, but a second query will have hits, which match optional terms and are moved to the top of the result set.

Both search methods can be used for multi term queries.

Query string:

+word1 author:word2 -word3
  • '+' is used to define a required term.

  • '-' is used to define a prohibited term.

  • 'field:' prefix is used to indicate a document field for a search. If it's omitted, then 'contents' is used.

or

Query construction by API:

<?php
$query = new Zend_Search_Lucene_Search_Query_MultiTerm();

$query->addTerm(new Zend_Search_Lucene_Index_Term('word1'), true);
$query->addTerm(new Zend_Search_Lucene_Index_Term('word2'), null);
$query->addTerm(new Zend_Search_Lucene_Index_Term('word3'), false);

$hits  = $index->find($query);

The $signs array contains information about the term type:

  • true is used to define required term.

  • false is used to define prohibited term.

  • null is used to define a term that is neither required nor prohibited.

27.5.4. Phrase Query

Phrase Queries are intended for searching for a phrase.

Phrase Queries are very flexible and allow to search exact phrases as well as sloppy phrases.

Phrases can also contain gaps or terms in the same places. They can be generated by Analyzer for different purposes. For example, a term can be duplicated to increase the term weight, or several synonyms can be placed into one position.

<?php
$query1 = new Zend_Search_Lucene_Search_Query_Phrase();

// Add 'word1' at 0 relative position.
$query1->addTerm(new Zend_Search_Lucene_Index_Term('word1'));

// Add 'word2' at 1 relative position.
$query1->addTerm(new Zend_Search_Lucene_Index_Term('word2'));

// Add 'word3' at 3 relative position.
$query1->addTerm(new Zend_Search_Lucene_Index_Term('word3'), 3);

...

$query2 = new Zend_Search_Lucene_Search_Query_Phrase(
                array('word1', 'word2', 'word3'), array(0,1,3));

...

// Query without a gap.
$query3 = new Zend_Search_Lucene_Search_Query_Phrase(
                array('word1', 'word2', 'word3'));

...

$query4 = new Zend_Search_Lucene_Search_Query_Phrase(
                array('word1', 'word2'), array(0,1), 'annotation');

A phrase query can be constructed in one step with a class constructor or step by step with a Zend_Search_Lucene_Search_Query_Phrase::addTerm() method call.

Zend_Search_Lucene_Search_Query_Phrase class constructor takes three optional arguments:

<?php
Zend_Search_Lucene_Search_Query_Phrase([array $terms[, array $offsets[, string $field]]]);

$terms is an array of strings that contains a set of phrase terms. If it's omitted or equal to null, then an empty query is constructed.

$offsets is an array of integers that contains offsets of terms in a phrase. If it's omitted or equal to null, then the terms' positions are supposed to be array(0, 1, 2, 3, ...).

$field is a string that indicates the searched document field. If it's omitted or equal to null, then the default field is searched. This version of Zend_Search_Lucene treats the 'contents' field as a default, but it's planned to change this behavior to "any field" in upcoming versions.

Thus:

<?php
$query = new Zend_Search_Lucene_Search_Query_Phrase(array('zend', 'framework'));

will search for the phrase 'zend framework'.

<?php
$query = new Zend_Search_Lucene_Search_Query_Phrase(array('zend', 'download'), array(0, 2));

will search for the phrase 'zend ????? download' and match 'zend platform download', 'zend studio download', 'zend core download', 'zend framework download', and so on.

<?php
$query = new Zend_Search_Lucene_Search_Query_Phrase(array('zend', 'framework'), null, 'title');

will search for the phrase 'zend framework' in a 'title' field.

Zend_Search_Lucene_Search_Query_Phrase::addTerm() takes two arguments, a required Zend_Search_Lucene_Index_Term object and an optional position:

<?php
Zend_Search_Lucene_Search_Query_Phrase::addTerm(Zend_Search_Lucene_Index_Term $term[, integer $position]);

$term describes the next term in a phrase. It must indicate the same field as previous terms, or an exception will be thrown.

$position indicates the term position.

Thus:

<?php
$query = new Zend_Search_Lucene_Search_Query_Phrase();
$query->addTerm(new Zend_Search_Lucene_Index_Term('zend'));
$query->addTerm(new Zend_Search_Lucene_Index_Term('framework'));

will search for the phrase 'zend framework'.

<?php
$query = new Zend_Search_Lucene_Search_Query_Phrase();
$query->addTerm(new Zend_Search_Lucene_Index_Term('zend'), 0);
$query->addTerm(new Zend_Search_Lucene_Index_Term('framework'), 2);

will search for the phrase 'zend ????? download' and match 'zend platform download', 'zend studio download', 'zend core download', 'zend framework download', and so on.

<?php
$query = new Zend_Search_Lucene_Search_Query_Phrase();
$query->addTerm(new Zend_Search_Lucene_Index_Term('zend', 'title'));
$query->addTerm(new Zend_Search_Lucene_Index_Term('framework', 'title'));

will search for the phrase 'zend framework' in a 'title' field.

The slop factor sets the number of other words permitted between words in query phrase. If zero, then this is an exact phrase search. For larger values this works like a WITHIN or NEAR operator.

The slop is in fact an edit-distance, where the units correspond to moves of terms in the query phrase out of position. For example, to switch the order of two words requires two moves (the first move places the words atop one another), so to permit re-orderings of phrases, the slop must be at least two.

More exact matches are scored higher than sloppier matches; thus, search results are sorted by exactness. The slop is zero by default, requiring exact matches.

The slop factor can be assigned after query creation:

<?php
// Query without a gap.
$query = new Zend_Search_Lucene_Search_Query_Phrase(array('word1', 'word2'));

// Search for 'word1 word2', 'word1 ... word2'
$query->setSlop(1);
$hits1 = $index->find($query);

// Search for 'word1 word2', 'word1 ... word2',
// 'word1 ... ... word2', 'word2 word1'
$query->setSlop(2);
$hits2 = $index->find($query);