Level: 200
 

Lucene walk-through – Part 3: Setting up the index

Searching

In this last part of the three part article series you will learn how to set up the index in web.config and how to build a custom indexer, so you are able to index values not supported by the default implementation.


You can read the other parts here:


Part 1: The query types
Part 2: The example 

Written by: Jens Mikkelsen
Fri, Jun 26 2009

The index definition

 

All indexes are defined in the web.config under /configuration/sitecore/indexes. Here you can see the existing index definitions such as system. We now want to add a separate index for the news. As we don’t want any other templates, we will put a constraint on the index, so it only indexes items based on our news template.


Further we want to index the path of the item, so we can query on that field. As Sitecore doesn’t support this by default, we will implement our own indexer.


An index is defined by an index element which looks something like this:

 

< index id = "newsIndex" singleInstance="true" type="Examples.LuceneQuery.CustomPathIndexer, Examples"> 

 

The id defines the name of the index. In this case we call it “newsIndex”. The type specifies the indexer to use by specifying a fully qualified class and an assembly name. Normally this is set to Sitecores indexer: Sitecore.Data.Indexing.Index, Sitecore.Kernel, but as we need a custom indexer it has replaced with the class described later.


Under the index you need to specify a parameter to the index. This is more or less statically an id, which the indexer uses. You don’t need to worry about this too much, as it is probably going to be the same for all your indexes definitions:

< param desc = "name">$(id)</param>

 

Under this parameter you can specify a filter. In this case we only want items based on the news template, so we specify it in a templates element:

 

< templates hint = "list:AddTemplate">

  < template > {B2612CF6-16D6-426D-9E74-EE3A4E3989B2}</template>

</ templates >

 

The id points to the news template.


Under the templates element you can specify which fields the index should hold:

 

< fields hint = "raw:AddField">

  < field target = "category">Categories</field>

  < field > path</field>

  < field target = "content">Content</field>

  < field target = "date">__updated</field>

</ fields >

 

Here each field is created in the index. The target attribute indicates which field in Sitecore the index field pulls it data from. In this example you will for instance get a Category field in the index, which is filled from the Sitecore field category. If the category field isn’t present on the given item, it will just be empty.


The path field doesn’t have a target, as this is filled in by our custom indexer.


This gives the following definition:

 

< index id = "newsIndex" singleInstance="true" type="Examples.LuceneQuery.CustomPathIndexer, Examples">

  < param desc = "name">$(id)</param>

  < templates hint = "list:AddTemplate">

    < template > {B2612CF6-16D6-426D-9E74-EE3A4E3989B2}</template>

  </ templates >

  < fields hint = "raw:AddField">

    < field target = "category">Categories</field>

    < field > path</field>

    < field target = "content">Content</field>

    < field target = "date">__updated</field>

  </ fields >

</ index >


Now you need to add it to a database, which the index should be based upon. In this case we want it to operate on the web database, as we only want published content. We add the index in the /configuration/sitecore/databases/database element in the web.config. So to a child to this element: 

 

< database id = "web" singleInstance="true" type="Sitecore.Data.Database, Sitecore.Kernel">

 

Add the following element: 

 

< indexes hint = "list:AddIndex">

  < index path = "indexes/index[@id='newsIndex']" />

</ indexes >

  

You can add the element to all databases if you like.

 

Now you’re done with the config changes and can move on to coding the indexer.


 

The custom indexer

When building a custom indexer, which should add a special field, you must create a class and inherit the Sitecore.Data.Indexing.Index class. Here you can override the AddFields method to support your changes. The AddFields method is called for each item, which are being indexed.


In our case we want to add the path field besides all the normal fields. This gives the following class:

 

public class CustomPathIndexer : Sitecore.Data.Indexing.Index

{

  //Call the base template

  public CustomPathIndexer(string name) : base(name) { }

 

 

  protected override void AddFields(Sitecore.Data.Items.Item item, Lucene.Net.Documents.Document document)

  {

    //Call the base to add all fields normally

    base.AddFields(item, document);

 

 

    //Now we want to add the path

    /*First define the field by specifying the fieldname, the path,

     * whether the value should be stored in Lucene for output and the type of the index mechanism*/

    Field pathField = new Field("path", item.Paths.Path, Field.Store.YES, Field.Index.TOKENIZED);

    //Then we add the field to the document

    document.Add(pathField);

 

 

  }

}

 

It is as simple as that. All we do is implement the constructor to call the base, and override the AddFields method to handle your path field.

You’re now done and can rebuild the search index from the control panel in Sitecore. You can view the index in IndexViewer to ensure, that all fields are indexed correctly.

 

 

 

Please rate this article


9 rates / 4,44 avg.

  • About the author:

    Jens Mikkelsen

    Jens Mikkelsen is a partner at Inmento Solutions a Sitecore consulting firm. He works as a Sitecore specialist and consulting helping clients architect and build quality Sitecore solutions using the newest modules and tools. 

    Further he has been deeply envolved in various complex solutions and has built up a strong knowledge of Sitecore architecture and best practices. He has especially focused on and is specialized in debugging and analyzing Sitecore solutions.

     

    Jens is very interested in the technical mechanisms in the new marketing products such as Sitecore DMS and Sitecore ECM.

    My Sitecore Freelance CV

14 responses to "Lucene walk-through – Part 3: Setting up the index"

Jens, is there a way to tell the index to only index certain parts of your content tree?
Posted: Thursday, July 30, 2009 7:25 PM
Out-of-the-box it is only possible to only include special templates. However you can in your custom indexer (also used for indexing the path) override the two UpdateItem. Here you can stop the indexing by not calling the base.UpdateItem if the item isn't in the part of the content tree you want to index.
Posted: Friday, July 31, 2009 12:05 AM
Jens, I can't quite seem to get this to work. What is the purpose of the Custom Indexer? Do I have to create a Custom Indexer? I have set up my index and I have used your Index Viewer and everything is there. However when I run queries in the Index Viewer the only ones that work are if I use QueryParser type of query. All of the other types (Term, Wildcard, etc.) don't return any results. I have a field that is a Droplink - so it contains IDs. I am trying to search on that field but the Term doesn't seem to work. Any thoughts?
Posted: Tuesday, May 18, 2010 1:29 PM
Do you mind sharing the source code from this sample.
Posted: Saturday, May 22, 2010 1:23 AM
You might want to store the path field untokenized in order to be able to use wildcard or prefix queries the way you specify:

Field pathField = new Field("path", item.Paths.Path, Field.Store.YES, Field.Index.UN_TOKENIZED);

Perhaps that answers Coreys question, as long as Wildcard and Prefix queries are not passed through an Analyzer, resulting in a searchterm not tokenized before search is performed.

That aside - excellent write-up, Jens.
Posted: Thursday, June 17, 2010 11:43 AM
Would it be an idea to use the ID path instead (item.Paths.LongID)? That way you won't have to use wildcard/prefix queries to determine if a item resides under a particular item and you avoid problems with names containing white space.
Posted: Monday, July 26, 2010 5:33 PM
Hi Kern,

Yes that would be the best way to do it and that is actually the way we do it in Pentia.

Thanks!

Cheers
Jens
Posted: Monday, July 26, 2010 6:42 PM
Legend!
Posted: Friday, September 10, 2010 6:28 AM
Ben
Hi Jens,

Thanks for the article, was helpful. Any chance of updating it with the correction that Anders suggested? I lost a couple of hours with the same issue that Corey had before I eventually read the comments.

Cheers, Ben
Posted: Thursday, September 23, 2010 11:28 PM
Hi Jens.

First let me say that your site is "first read" for any new (and some old ones too :) Sitecore developers here at Magnetix. Keep up the good work and please conclude the "Simple information site in Sitecore" series ASAP.

Now for my input to this article. It seems that in latest releases of Sitecore no History Engine is configured for the web database. This means that indexes are not updatet when publishing. So unless you want to have to update the index for the web database manually I suggest you copy the Engines.HistoryEngine.Storage node from the master database configuration.

/Lars
Posted: Monday, November 08, 2010 9:39 AM
Hi Lars,

Thanks for the kind words. :) I will try and intensify te work on the Simple information site series.

The approach with the history engine is correct if you need instantanious(more or less) update of the indexes after publish. However this can cause a lot of work dedicated to updating/optimizing indexes if the solution is publishing all the time.
So we use this approach if the solution require instant update otherwise you can do it with the scheduled update set in the web.config.

Thanks for your input!

Cheers
Jens
Posted: Tuesday, November 09, 2010 10:19 PM
NXM
Jen,

If we have a publishing target set for a CD database, call it Pub, then when we rebuild the search index for the Pub database on CMS, how will we push/force the indexes to be created on the CD web-server and not on CMS web-server?

I can create the indexes just fine for the web and pub database but not sure on how to push the index from data folder to CD data folder.

thanks.
Posted: Monday, March 28, 2011 7:55 PM
Hi NXM,

You have to use HistoryEngine on the content delivery environment. These update the indexes according to changes, but isn't enabled by default. You can check out how to set it up here: http://sdn.sitecore.net/Scrapbook/Lucene%20in%20staged%20environments.aspx

Cheers
Jens Mikkelsen
Posted: Monday, March 28, 2011 8:28 PM
Is there any way to limit an index to only index one language. I want to create separate indexes for English and Spanish.

The reason I want to do this is because when I search for a term, it searches in both the english and spanish version. e.g. if I do a wildcardsearch for "a*" then I get words that begin with a in english or spanish. I want to prevent this from happening.
Posted: Thursday, March 01, 2012 6:14 PM

Leave a reply


Notify me of follow-up comments via email.
 
 
#nbsp;