Level: 300
 

Lucene walk through – Part 2: The example

Searching

This is the 2nd part of the series Lucene Query Walk Through. In this part we get our hands dirty by giving a practical example of use of the things we learned in part 1.

The article won’t be explicit on how to use different .Net controls as Repeaters or similar. Neither will it focus on architecture and reusability. It will merely focus on how to use the different Query types in Lucene.

 

You can read the first part of the article here and the third part here

Written by: Jens Mikkelsen
Fri, Jun 26 2009

The task

So you finished developing a solution for a client of yours. The solution has gone into production and the editors have put 10.000s of content items in three different websites. You created a news template for the editors. You made it possible for the editor to enter news everywhere in the content tree, as they may want to create different areas for news. Everything works great and the client is very happy. However they noticed that not many visitors navigate to the news sections, so now they want a page with a list of all the latest news from the different sites.

 

Your project manager says that this is an easy task and tells the client it will take no more than 3 hours to implement. (Aren’t project managers always like that?). You are assigned to the task and after you looked at the content tree, you see that there is no way that you can iterate through all content items and keep performance at an acceptable level. You tell your project manager that you can’t create a simple XSL for this task. You need to use an index. You somehow convince the project manager that this takes a bit longer then first assumed (This is a hypothetical situation).

 

So… you now have the time to implement the list using an index. The requirements to the task are as follows:

 

  • The list should only contain news items.
  • On a news item it is possible to categorize the item using a multilist field, where it is possible to assign globally defined categories. It should be possible to filter the news that is included with these categories.
  • As it is a multisite solution it should be possible to define a root node. Only news that are placed under that node should be included in the list.
  • There is a content field on the news template. It should be possible to specify a word and only news with that word in the content field should be included in the list.

 

 

The Sitecore templates and content hierarchy

The news template has the following fields:

 

  • Title (single-line text)
  • Content (Rich text)
  • Categories (Multilist)

 

You now create a template for the news list. On this template you want to be able to define the constraints and filters for the list, so you create a newslist template with the following fields:

 

  • Category (Droplink – for selecting the category filter)
  • RootNode (Internal Link – for defining the root node, from which all news should be included)
  • Keyword (Single-Line text – used to define a keyword that should be in the content field of the news item)

 

You have now got your data templates, and you also create a sublayout for presenting the list.

 

 

The index definition

In this article I won’t dig into how you define your index in the web.config or how you build a custom indexer so you can index the path. This will be covered in the third article in the series. and you can take a closer look at the Sitecore description of the indexes here

 

For now all you need to know, is that I created a custom index, where only the news items are included and there is the following fields: Categories, Content, Date and Path. The index is called newsindex.

 

 

The sublayout - Newslist

You have created a sublayout for presenting the news. For now it is very simple and just iterates over some items and prints out the title of the news. For this you have created a simple placeholder, to which you add the news titles. The Page_Load event then looks like this:

 

protected void Page_Load(object sender, EventArgs e)

{

  /*First get the constraints from the current item

  Normally you should check for null references*/

  string category = Sitecore.Context.Item["Category"];

  string rootNode = Sitecore.Context.Item["RootNode"];

  string keyword = Sitecore.Context.Item["Keyword"];  

 

 

  /*Here goes the logic that will retrieve the news

  You will see how it is implemented in a minute*/

  NewsRetriever retriever = new NewsRetriever("newsIndex");

  Hits news = retriever.GetNews(category, rootNode, keyword);

 

 

  /* Iterate over all the items and print the title */

  for (int i = 0; i < news.Length(); i++ )

  {

    /* Get a sitecore item from a hit using Sitecore API */

    Item newsItem = Sitecore.Data.Indexing.Index.GetItem(news.Doc(i), Sitecore.Context.Database);

    /* Add the title to the page*/

    ListPlaceholder.Controls.Add(new LiteralControl(newsItem["Title"] + " "));

  }

 

 

}

 

This is pretty simple. The fun stuff is about to unravel :)

 

 

The NewsRetriever

This class is where all the logic is put. If you plan on implementing something similar, you could consider making it more generic so it would retrieve all types of items. Further you could consider making a better implementation making constraints a generic thing. But for the sake of simplicity, this class is pretty static.

 

You take a look at the different constraints and soon you discover that you need to use different query types. For the category you decide to use a TermQuery, as the indexed item needs to contain a specific ID. For the rootNode you decide to use a PrefixQuery, as the indexed item path needs to be prefixed with the root path. For the keyword you decide to use a WildcardQuery, as the keyword just need to be present in the content field.

 

This gives the following methods that create each of the queries:

 

private WildcardQuery GetWildCardQuery(string field, string keyword)

{

  /* First we need to remove any potential wildcard operators from the string */

  keyword = keyword.Replace("*", "");

  keyword = keyword.Replace("?", "");

  /* now we can create the term. We need * on both sides of the keyword, to indicate that it doesn't matter where in the text, the keyword is placed */

  Term term = new Term(field, "*" + keyword + "*");

  return new WildcardQuery(term);

}

 

 

private PrefixQuery GetPrefixQuery(string fieldName, string rootNode)

{

  /*Create the term and the Query and return it*/

  Term term = new Term(fieldName, rootNode);

  return new PrefixQuery(term);

}

 

 

private TermQuery GetTermQuery(string fieldName, string termString)

{

  /*The TermQuery don't like chars like { and } so we need to remove them*/

  termString = termString.Replace("}", "");

  termString = termString.Replace("{", "");

  /* make the string lowercased */

  termString = termString.ToLower();

 

  /* create the term by giving in the fieldname and the termstring. Create the query and return it*/

  Term term = new Term(fieldName, termString);

  return new TermQuery(term);

}

 

 

 

Now all you got to do is concatenate the queries, so that you can perform the search. You create a method that does this and returns the hits. You call it GetNews:

 

//The main method for retrieving news, parsing in the different constraints.

public Hits GetNews(string category, string rootNode, string keyword)

{

  /*First we need to create a BooleanQuery so we can concatenate the different queries*/

  BooleanQuery completeQuery = new BooleanQuery();

 

 

  if(!String.IsNullOrEmpty(category))

    completeQuery.Add(GetTermQuery("category",category), BooleanClause.Occur.MUST);

 

 

  if (!String.IsNullOrEmpty(rootNode))

    completeQuery.Add(GetPrefixQuery("path", rootNode), BooleanClause.Occur.MUST);

 

 

  if (!String.IsNullOrEmpty(keyword))

    completeQuery.Add(GetWildCardQuery("content", keyword), BooleanClause.Occur.MUST);

 

 

  /* You want the hits sorted by date, so you create the sort to pass on to the search method. This method takes a fieldname as parameter. You pass in the name of the date field.The second parameter tells the sort if it should invert the sort, which is true as it sorts from lowest to highest*/

  Sort sort = new Sort("date", true);

 

 

  /* Now we can perform the search */

  Index newsIndex = Sitecore.Context.Database.Indexes["newsIndex"];

  IndexSearcher searcher = newsIndex.GetSearcher(Sitecore.Context.Database);

 

  Hits

hits;

  try

  {

    hits = searcher.Search(completeQuery, sort);

  }

  finally

  {

  //Allways remember to close the searcher when done

    searcher.Close();

  }

  return hits;

}

 

Wow! You just made your first index based constrained list! And it didn’t even take more than 3 hours. You can now goof off for a while and then tell your project manager that you’re done. :)

 

Read the third part of the article here.

 

Please rate this article


7 rates / 4,57 avg.

  • About the author:

    Jens Mikkelsen

    Jens Mikkelsen is a partner at Inmento Solutions a Sitecore consulting firm. He works as a Sitecore specialist and consulting helping clients architect and build quality Sitecore solutions using the newest modules and tools. 

    Further he has been deeply envolved in various complex solutions and has built up a strong knowledge of Sitecore architecture and best practices. He has especially focused on and is specialized in debugging and analyzing Sitecore solutions.

     

    Jens is very interested in the technical mechanisms in the new marketing products such as Sitecore DMS and Sitecore ECM.

    My Sitecore Freelance CV

27 responses to "Lucene walk through – Part 2: The example"

I'm completely new to Sitecore and I don't understand how the newslist template is being used in this example. Why use a template for this? Couldn't you just have a sublayout collect the search parameters from the user and then use them to generate the results? I'm having some difficulty seeing this from the Sitecore perspective.

Great article BTW; one of the best on Sitecore search that I have come across.
Posted: Friday, July 10, 2009 10:59 PM
Hi Scott, The news template is used, as it might have separate fields from other templates. I have limited the index to only contain items of this template, as we don't want any other document to appear in the list I have created. But yes, you could ommit the news template part, if you wanted this to work for all documents. Please let me know, if you have any other questions.
Posted: Monday, July 13, 2009 8:42 PM
Thank you, Jens, for this helpful article. I noticed in your example that the instance of IndexSearcher is not being closed after it is used. Doesn't this present a problem when querying an index that is updated frequently?
Posted: Monday, July 13, 2009 11:25 PM
Hi Eric, You are absolutely right! There should be a final statement closing the IndexSearcher. Thanks for pointing that out. I will add it to the article.
Posted: Tuesday, July 14, 2009 12:00 AM
Thanks Jens.
Posted: Tuesday, July 14, 2009 12:11 AM
I am returning the hits object from an external class. I get on object reference not set to an instance of an object in the calling method when retrieving the hits document in the iterator e.g. hits.Doc(i)
Posted: Thursday, October 22, 2009 4:21 PM
Figured out that you can't close the IndexSearcher in the finally block, because the returned Hits will not work. Instead I created a void method that closes the IndexSearcher (which is a public member of the NewsRetriever class), which I then call from the calling method
Posted: Thursday, October 22, 2009 4:36 PM
You're right Nick. What I normally do is parse the hits, get the values I need and return them in a class of their own.

In that way I can close the searcher right away, which is good, as you might forget to close it later on.

Thanks for the "correction".

Cheers
Jens
Posted: Thursday, October 22, 2009 5:13 PM
Using your steps I have been able to successfully get a RangeQuery up and running but can not figure out how to put the Sort into it. I see the code where you define the sort but then where do you put it after it is defined?

Thank you
Bill
Posted: Monday, January 11, 2010 7:57 PM
Hi Bill,

I can see that I need a parameter in the Search method. You can parse it to the IndexSearcher.Search(Query, Sort). That should parse the sort in.

I will update the article.

Cheers
Jens
Posted: Monday, January 11, 2010 8:47 PM
I have my sort defined:

Sort sort = new Sort("ArticleDateTime", true);

Then I have added it like this:

hits = searcher.Search(queryString, sort);

When I run the code with the sort I get this error:

Exception Details: System.NullReferenceException: Object reference not set to an instance of an object.

With out the sort it runs fine but the records returned are not sorted.

Any idea why?

Thank you

Bill
Posted: Monday, January 11, 2010 9:07 PM
I have my sort defined:

Sort sort = new Sort("ArticleDateTime", true);

Then I have added it like this:

hits = searcher.Search(queryString, sort);

When I run the code with the sort I get this error:

Exception Details: System.NullReferenceException: Object reference not set to an instance of an object.

With out the sort it runs fine but the records returned are not sorted.

Any idea why?

Thank you

Bill
Posted: Monday, January 11, 2010 9:09 PM
Do you have a callstack?
Posted: Monday, January 11, 2010 9:13 PM
Ok you lost me there, what do you mean by callstack?
Posted: Monday, January 11, 2010 9:16 PM
When you get the error, there is a list of methods called just before the error. It comes in a yellow box under the exception, when you hit the page, which throws the error.

Cheers
Jens
Posted: Monday, January 11, 2010 9:19 PM
Sorry, thought you where referring to something with in the code. Here is the error message .net page.

Server Error in '/' Application.
--------------------------------------------------------------------------------

Object reference not set to an instance of an object.
Description: An unhandled exception occurred during the execution of the current web request. Please review the stack trace for more information about the error and where it originated in the code.

Exception Details: System.NullReferenceException: Object reference not set to an instance of an object.

Source Error:


Line 303: // get a document referrer..
Line 304: Document document = hits.Doc(i);
Line 305: // .. so we can get the id..
Line 306: string itemID = document.Get("_docID");
Line 307: // .. so we can get a pointer to the item in itself..


Source File: C:\Inetpub\wwwroot\Sosland\Website\map\layouts\Browser Layout.aspx.cs Line: 305

Stack Trace:


[NullReferenceException: Object reference not set to an instance of an object.]
Sosland.map.layouts.Browser_Layout.GetTodaysNews() in C:\Inetpub\wwwroot\Sosland\Website\map\layouts\Browser Layout.aspx.cs:305
Sosland.map.layouts.Browser_Layout.Page_Load(Object sender, EventArgs e) in C:\Inetpub\wwwroot\Sosland\Website\map\layouts\Browser Layout.aspx.cs:63
System.Web.Util.CalliHelper.EventArgFunctionCaller(IntPtr fp, Object o, Object t, EventArgs e) +14
System.Web.Util.CalliEventHandlerDelegateProxy.Callback(Object sender, EventArgs e) +35
System.Web.UI.Control.OnLoad(EventArgs e) +99
System.Web.UI.Control.LoadRecursive() +50
System.Web.UI.Page.ProcessRequestMain(Boolean includeStagesBeforeAsyncPoint, Boolean includeStagesAfterAsyncPoint) +627




--------------------------------------------------------------------------------
Version Information: Microsoft .NET Framework Version:2.0.50727.3603; ASP.NET Version:2.0.50727.3082
Posted: Monday, January 11, 2010 9:24 PM
Hmmm. I don't think that has something to do directly with the sort. I think you have some "bad" records in the index, and when you sort all records the "bad" ones get to the top.

You can try and debug it with Visual Studio to see what is actually a null reference or you can try and use the indexviewer, to see what is in the index.
Posted: Monday, January 11, 2010 9:49 PM
When sorting by a datetime field is there anything else that has to be done. I ran a query pulling all articles out to a sql database and ran queries on them and all articles seem ok. So then I started doing some test. If I sort by id it runs, if I sort by title it runs, if I sort by headline it runs. When I try to sort by ArticleDateTime, Created, Updated, Published I get the same exact error every time. Do I need to do some type of conversion some place for the sort on datetime to work?
Posted: Monday, January 11, 2010 11:49 PM
Hi Bill,

Have you looked through the records using the IndexViewer? I would definently recommend that.

How is the datafields indexed? As integers (eg. 20090107)? Then you can try the following:

SortField sortField = new SortField("FieldName", SortField.INT, SortDirection.Ascending);
Sort sort = new Sort(sortField);

It might be that the type of field is wrong. But I would definently just look at the IndexViewer to see the field.
Posted: Tuesday, January 12, 2010 10:44 AM
Jens,

Wanted to thank you for all your help on this. With your help I finally got it working. Have a quick question. Is there a interval or time frame for the indexer? If so what is it set to and can it be changed. Here is what we are noticing. We will create a article and then publish it. Sometimes it takes a minute or so for it to appear with the search and sometimes 5 minutes or so for the search to find it. Didnt know if that is normal or how it acually works on keeping the index updated with new items entered.... We are hoping it can somehow be changed to alway be like a mintue or so and not the 5 minutes + it takes sometimes....

Thank you
Bill
Posted: Thursday, January 14, 2010 5:43 PM
Hi Bill,

Good to hear you got it working and glad that I could help. The index is by default updated every 5 minute. You can set the inteval in the web.config with the setting Indexing.UpdateInterval.
However you can set it up, so that it updates right away, when the history engine is updated. You can follow this guide: http://sdn.sitecore.net/Scrapbook/Lucene%20in%20staged%20environments.aspx

Hope it helps.

Cheers
Posted: Thursday, January 14, 2010 10:24 PM
Jens,

Thank you for the info. The version we are running is 6.0.2 and are having all kinds of problems with the index staying indexed. Sometimes I have to delete the index's and reindex and sometimes I have to reindex. For some reason the index is not keeping up with the amount of items published and content gets missed until I reindex again. Should we upgrade to the 6.1 to fix this issue or is there a patch for it...

Thanks again
Bill
Posted: Thursday, January 14, 2010 11:13 PM
Hi Jens,

Great article! I am looking for the faceted navigation and thinking of using it for a product selector tool. Do you have some live site examples where this has worked that I can view so see if this fits what I am looking for.

Thanks for your help.

Shuchi
Posted: Wednesday, August 10, 2011 6:26 PM
Hi Shuchi,

I haven't tried to build a faceted navigation with Lucene in Sitecore. I have heard some people recommend that you use SOLR (a more professional version of Lucene) for it. However on Dreamcore Europe this year I talked to a guy, describing how he had built faceted navigation using Lucene only. I am afraid I can't remember the details of it.

Cheer
Jens
Posted: Thursday, August 11, 2011 9:42 AM
Hi Jens Mikkelsen,

I am new to sitecore and facing some issue related to sort, I am using Lucene Search.

Please help me, I need this to resolve ASAP.


Problem-Sorting the data (result) which comprises of different versions(english, germany, chineese, japaneese etc)

i.e My result bind in repeater control and consists of different version(like in english, japanese, chineese etc.) my job is to sort the data alphabetically.For single language content, sorting is working fine but for different versions, i am not getting the relevant output i.e i got mix data in different version.(some eng items...then chineese items..then again english item )

I need to resolve this issue ASAP. Please help me in thi regard...Thanks in Advance.

Posted: Tuesday, July 10, 2012 5:40 PM
How do i perform search other language version content, for eg, i need to search for german text as input and i need to get the items which contains the text
Posted: Tuesday, May 05, 2015 8:39 AM
It is really help full to understand luceneQuery.

I am very thanks full to Jens Mikkelsen
Posted: Thursday, March 09, 2017 8:30 AM

Leave a reply


Notify me of follow-up comments via email.
 
 
#nbsp;