Level: 300
 

Lucene Query walk-through – Part 1

Searching

Lucene is a great technology not only for searching, but also for doing relational operations on the content hierarchy in Sitecore. This article describes the different kinds of queries you can use, to get the data from Lucene.

 

The typical scenario for using Lucene, is when you need to get the five latest news with different constraints. Imagine you have a news template, which holds a category field holding different values such as press releases, corporate news etc. This article gives you an introduction to the different query types to retrieve data from the index.

 

This is the first of three articles on the subject. You can read the second part here and the third part here.

Written by: Jens Mikkelsen
Fri, Jun 26 2009

Tools

First of all I would like to advertise my shared source tool – The sitecore integrated index viewer. You can read more about it here: http://trac.sitecore.net/IndexViewer. It is a great tool for seeing which documents are actually indexed in the different indexes, but it also allows you to try out different queries testing your different searches.

 

 

Searching
Performing a search in Sitecore is pretty easy. What you need to do, is get an IndexSearcher and then call the Search method with a Query as parameter. The result is a collection of hits. You get the IndexSearch this way:

 

//First you nead to get the IndexSearch through Sitecore API. 

// This is done through the database.

Database db = Sitecore. Context .Database;

// The indexName is specified in the web.config. See more in index definition

Sitecore.Data.Indexing. Index index = db.Indexes[indexName];

IndexSearcher indexSearcher = index.GetSearcher(db);

 

//Then you need to create a query which is described in the next section

//When you have created the query you can get the hits like this:

Hits hits = indexSearcher.Search(query);

 

Queries

Lucene holds different types of queries. Most people tend to use the QueryParser, which translates a string into a query. The string is a syntax for querying for data. For instance the string “title:testtitle” tells the QueryParser to search for the string “testtitle” in the field title.

The QueryParser is sometimes difficult to use, and doesn’t always return the result you expect. Some people have written that the QueryParser works in a nondeterministic way ;). If you want to be more specific and have more typesafe code, there are a lot of other queries you can use.

 

The most important queries are described here:

 

TermQuery:

TermQuery is probably the most simple of all the queries. It searches a field for a specific string. The search will only return documents, which contains the string – case sensitive. This makes the query good for key or id search.

 

You can create a TermQuery like this:

 

Term term = new Term ( "field" , "searchstring" );

Query q = new TermQuery (term);

 

If the field “field” contains the word “searchstring” it’s a match. The exact word needs to match, so if a field holds the word “1234” and you search for “123”, you won’t get a match.

 

The indexed value is split into separate words by tokenized characters. Examples are " ", "_", "," and "|". This means that you can use the termquery to search out an id, from an indexed multiselect field or similar.

 

So… use the query for key and id searches.

 

RangeQuery:

A RangeQuery is, as the name indicates, used for retrieving documents, which holds a value in a specific range. This Query should be used when you would like all documents in a specific date range.

The Query takes two values (in this case a date of the following type ddMMyyyy) and returns documents which matches that range: 

 

Term beginDate = new Term ( "newsDate" , "20090220" );

Term endDate = new Term ( "newsDate" , "20090228" );

//The last parameter indicates whether the two values should be included in the range

RangeQuery query = new RangeQuery (beginDate, endDate, true );

 

So… use the RangeQuery (obvious) when you want documents, which holds a value in a given range. 

 

PrefixQuery:

This query identifies index records that have a given field that starts with a specified term. This is excellent if you want to query in subtrees of your complete content tree. Imagine that you index the path of all items. Now you can search for items in a specific subtree by using the PrefixQuery with a term like “/sitecore/content/home/subtree1”. The PrefixQuery is used like this:

 

Term startPathTerm = new Term ( "path" , "/sitecore/content/site/home" );

PrefixQuery query = new PrefixQuery (startPathTerm);

 

This lets you search in the tree “/sitecore/content/site/home/”.

So… use PrefixQuery when you need to search a specific area in an hierarchy.

 

WildCardQuery:

WildCardQuery lets you query through data when you need to use wildcards or similar. This is a bit like a regular expression. Other than that it works quite like the TermQuery. Some examples of wildcards are for instance the operators “*” and “?”, where a "?" indicates that the token can be replaced with a single character and the "*" can be replaced with multible characters. You can use these operators if you don’t care, what characters a term starts with. For instance you can search for all words ending with “core” like this: “*core”. This will return Sitecore, hardcore, moltencore etc.

The query can be used like this:

 

Term term = new Term ( "content" , "?itecore*" );

Query q = new WildCardQuery(term);

 

This will find records with the following field values: “sitecore”, “titecore”, “mitecoreasdsdad”. But not a term like “skitecore” as the question mark operator denotes that exactly 0 or 1 character must be present.

So… use this query when you want to use wildcards.

 

BooleanQuery:

The BooleanQuery lets you combine several queries. Often you would like to combine queries to narrow down your search. You can use it like this:

 

BooleanQuery query = new BooleanQuery();

/*Iterate over a custom collection of constraints where the field and values are stored*/

foreach (string value in myConstraints.CollectionValues)

{

  Term term = new Term(myConstraints.CollectionField, value);

  Query termQuery = new TermQuery(term);

  /* Add the query to the BooleanQuery. The second parameter is an enumerator where you can specify if the query must, mustnot or should match.*/

  query.Add(termQuery, BooleanClause.Occur.MUST);

}

 

You can also add BooleanQueries to the BooleanQuery, which means that you can build complete trees of queries.

So… use BooleanQuery to combine different queries. 

 

Other queries

There are quite a few other queries you can use. These most utilize the search possibilities in Lucene, giving you some possibilities if you want to use Lucene as a normal search engine on your site.

 

These are the other query types:

 

  • PhraseQuery: You can use this if you need to search for content, where two terms should be near each other in a sentence
  • FuzzyQuery: You can use this if you want to search for words there are similar to a given term. This can be used if you need to return results, even though a visitor misspells a search word.
  • QueryParser: The QueryParser gives you the possibility to construct a search string and automatically convert it into specific queries. This is often used, as it can be a bit easier to understand, as much of the operators work a bit as googles. For instance you can create a search string like this: (Learn OR Learning) AND Sitecore. This will construct the three term queries and concatenate them with Boolean queries. 

 

You can read the second part of the article here and the third part here.

 

 

Please rate this article


7 rates / 4,57 avg.

  • About the author:

    Jens Mikkelsen

    Jens Mikkelsen is a partner at Inmento Solutions a Sitecore consulting firm. He works as a Sitecore specialist and consulting helping clients architect and build quality Sitecore solutions using the newest modules and tools. 

    Further he has been deeply envolved in various complex solutions and has built up a strong knowledge of Sitecore architecture and best practices. He has especially focused on and is specialized in debugging and analyzing Sitecore solutions.

     

    Jens is very interested in the technical mechanisms in the new marketing products such as Sitecore DMS and Sitecore ECM.

    My Sitecore Freelance CV

6 responses to "Lucene Query walk-through – Part 1"

Hi Jens,

Thanks for the article and index viewer tool. This only partially related but when i tried to run your tool in my environment I get a serialization error since we have our web server set up to store session state to a database. I marked your classes as serializable (since they appeared to be) and rebuilt but am now seeing the same issue for 'Sitecore.Data.Indexing.Index' in Sitecore.Kernel. Have you seen this or have any ideas on how to work around?

Thanks!

- will
Posted: Thursday, December 03, 2009 1:40 AM
Hi Will,

Thanks for the feedback!

I haven't tried using the index viewer with a session state server, so I am unable to help you there. :(

You could try and change the code to use the ViewState instead of the Session. That might solve the problem.

Cheers
Jens
Posted: Tuesday, December 08, 2009 10:25 AM
Hi,

Can you please update this guide and make an implementation using Sitecore.Search API.
Because: The Sitecore.Data.Indexing API, widely referred to in this document, will be deprecated in Sitecore CMS 6.5. In Sitecore CMS 7.0, it will be completely removed.

Thanks
Posted: Monday, July 11, 2011 11:57 PM
Hi Lucian,

Thanks for your comment. Although the Sitecore.Data.Indexing API is deprecated, you can still use the Lucene API mentioned in this article. I might update all the Sitecore related stuff.

Cheers
Jens Mikkelsen
Posted: Tuesday, July 12, 2011 8:38 AM
Hi Jens,

I was trying to work with the new API but for some reason it says that it can't find my old index. Do you know if something was changed on the configuring part? Do I need to condifure the index in another way ? On sitecore site I could not find any refference...

Her is where I get the error:

string indexName = "raccustom2";
Database db = Factory.GetDatabase("web");
siteRoot = db.GetRootItem();

var searchIndex = Sitecore.Search.SearchManager.GetIndex(indexName); //I get error : The given key was not present in the dictionary.


Thanks in advance.
Posted: Tuesday, July 12, 2011 9:17 AM
Hi Jens,

I using Lucene.Net.dll, v3.0.3.0 but unable to use same syntax as I used in lucene 2.0 BooleanClause.Occur , now it gives error The type name 'Occur' does not exist in the type 'Lucene.Net.Search.BooleanClause'

Any input would be appreciated

Best Regards
K
Posted: Friday, February 28, 2014 8:39 AM

Leave a reply


Notify me of follow-up comments via email.
 
 
#nbsp;