Lucene is a great technology not only for searching, but also for doing relational operations on the content hierarchy in Sitecore. This article describes the different kinds of queries you can use, to get the data from Lucene.
The typical scenario for using Lucene, is when you need to get the five latest news with different constraints. Imagine you have a news template, which holds a category field holding different values such as press releases, corporate news etc. This article gives you an introduction to the different query types to retrieve data from the index.
This is the first of three articles on the subject. You can read the second part here and the third part here.
Tools
First of all I would like to advertise my shared source tool – The sitecore integrated index viewer. You can read more about it here: http://trac.sitecore.net/IndexViewer. It is a great tool for seeing which documents are actually indexed in the different indexes, but it also allows you to try out different queries testing your different searches.
Searching
Performing a search in Sitecore is pretty easy. What you need to do, is get an IndexSearcher and then call the Search method with a Query as parameter. The result is a collection of hits. You get the IndexSearch this way:
//First you nead to get the IndexSearch through Sitecore API.
// This is done through the database.
Database db = Sitecore. Context .Database;
// The indexName is specified in the web.config. See more in index definition
Sitecore.Data.Indexing. Index index = db.Indexes[indexName];
IndexSearcher indexSearcher = index.GetSearcher(db);
//Then you need to create a query which is described in the next section
//When you have created the query you can get the hits like this:
Hits hits = indexSearcher.Search(query);
Queries
Lucene holds different types of queries. Most people tend to use the QueryParser, which translates a string into a query. The string is a syntax for querying for data. For instance the string “title:testtitle” tells the QueryParser to search for the string “testtitle” in the field title.
The QueryParser is sometimes difficult to use, and doesn’t always return the result you expect. Some people have written that the QueryParser works in a nondeterministic way ;). If you want to be more specific and have more typesafe code, there are a lot of other queries you can use.
The most important queries are described here:
TermQuery:
TermQuery is probably the most simple of all the queries. It searches a field for a specific string. The search will only return documents, which contains the string – case sensitive. This makes the query good for key or id search.
You can create a TermQuery like this:
Term term = new Term ( "field" , "searchstring" );
Query q = new TermQuery (term);
If the field “field” contains the word “searchstring” it’s a match. The exact word needs to match, so if a field holds the word “1234” and you search for “123”, you won’t get a match.
The indexed value is split into separate words by tokenized characters. Examples are " ", "_", "," and "|". This means that you can use the termquery to search out an id, from an indexed multiselect field or similar.
So… use the query for key and id searches.
RangeQuery:
A RangeQuery is, as the name indicates, used for retrieving documents, which holds a value in a specific range. This Query should be used when you would like all documents in a specific date range.
The Query takes two values (in this case a date of the following type ddMMyyyy) and returns documents which matches that range:
Term beginDate = new Term ( "newsDate" , "20090220" );
Term endDate = new Term ( "newsDate" , "20090228" );
//The last parameter indicates whether the two values should be included in the range
RangeQuery query = new RangeQuery (beginDate, endDate, true );
So… use the RangeQuery (obvious) when you want documents, which holds a value in a given range.
PrefixQuery:
This query identifies index records that have a given field that starts with a specified term. This is excellent if you want to query in subtrees of your complete content tree. Imagine that you index the path of all items. Now you can search for items in a specific subtree by using the PrefixQuery with a term like “/sitecore/content/home/subtree1”. The PrefixQuery is used like this:
Term startPathTerm = new Term ( "path" , "/sitecore/content/site/home" );
PrefixQuery query = new PrefixQuery (startPathTerm);
This lets you search in the tree “/sitecore/content/site/home/”.
So… use PrefixQuery when you need to search a specific area in an hierarchy.
WildCardQuery:
WildCardQuery lets you query through data when you need to use wildcards or similar. This is a bit like a regular expression. Other than that it works quite like the TermQuery. Some examples of wildcards are for instance the operators “*” and “?”, where a "?" indicates that the token can be replaced with a single character and the "*" can be replaced with multible characters. You can use these operators if you don’t care, what characters a term starts with. For instance you can search for all words ending with “core” like this: “*core”. This will return Sitecore, hardcore, moltencore etc.
The query can be used like this:
Term term = new Term ( "content" , "?itecore*" );
Query q = new WildCardQuery(term);
This will find records with the following field values: “sitecore”, “titecore”, “mitecoreasdsdad”. But not a term like “skitecore” as the question mark operator denotes that exactly 0 or 1 character must be present.
So… use this query when you want to use wildcards.
BooleanQuery:
The BooleanQuery lets you combine several queries. Often you would like to combine queries to narrow down your search. You can use it like this:
BooleanQuery query = new BooleanQuery();
/*Iterate over a custom collection of constraints where the field and values are stored*/
foreach (string value in myConstraints.CollectionValues)
{
Term term = new Term(myConstraints.CollectionField, value);
Query termQuery = new TermQuery(term);
/* Add the query to the BooleanQuery. The second parameter is an enumerator where you can specify if the query must, mustnot or should match.*/
query.Add(termQuery, BooleanClause.Occur.MUST);
}
You can also add BooleanQueries to the BooleanQuery, which means that you can build complete trees of queries.
So… use BooleanQuery to combine different queries.
Other queries
There are quite a few other queries you can use. These most utilize the search possibilities in Lucene, giving you some possibilities if you want to use Lucene as a normal search engine on your site.
These are the other query types:
- PhraseQuery: You can use this if you need to search for content, where two terms should be near each other in a sentence
- FuzzyQuery: You can use this if you want to search for words there are similar to a given term. This can be used if you need to return results, even though a visitor misspells a search word.
- QueryParser: The QueryParser gives you the possibility to construct a search string and automatically convert it into specific queries. This is often used, as it can be a bit easier to understand, as much of the operators work a bit as googles. For instance you can create a search string like this: (Learn OR Learning) AND Sitecore. This will construct the three term queries and concatenate them with Boolean queries.
You can read the second part of the article here and the third part here.