Level: 200
 

Lucene walk-through – Part 3: Setting up the index

Searching

In this last part of the three part article series you will learn how to set up the index in web.config and how to build a custom indexer, so you are able to index values not supported by the default implementation.


You can read the other parts here:


Part 1: The query types
Part 2: The example 

Written by: Jens Mikkelsen
Fri, Jun 26 2009

The index definition

 

All indexes are defined in the web.config under /configuration/sitecore/indexes. Here you can see the existing index definitions such as system. We now want to add a separate index for the news. As we don’t want any other templates, we will put a constraint on the index, so it only indexes items based on our news template.


Further we want to index the path of the item, so we can query on that field. As Sitecore doesn’t support this by default, we will implement our own indexer.


An index is defined by an index element which looks something like this:

 

< index id = "newsIndex" singleInstance="true" type="Examples.LuceneQuery.CustomPathIndexer, Examples"> 

 

The id defines the name of the index. In this case we call it “newsIndex”. The type specifies the indexer to use by specifying a fully qualified class and an assembly name. Normally this is set to Sitecores indexer: Sitecore.Data.Indexing.Index, Sitecore.Kernel, but as we need a custom indexer it has replaced with the class described later.


Under the index you need to specify a parameter to the index. This is more or less statically an id, which the indexer uses. You don’t need to worry about this too much, as it is probably going to be the same for all your indexes definitions:

< param desc = "name">$(id)</param>

 

Under this parameter you can specify a filter. In this case we only want items based on the news template, so we specify it in a templates element:

 

< templates hint = "list:AddTemplate">

  < template > {B2612CF6-16D6-426D-9E74-EE3A4E3989B2}</template>

</ templates >

 

The id points to the news template.


Under the templates element you can specify which fields the index should hold:

 

< fields hint = "raw:AddField">

  < field target = "category">Categories</field>

  < field > path</field>

  < field target = "content">Content</field>

  < field target = "date">__updated</field>

</ fields >

 

Here each field is created in the index. The target attribute indicates which field in Sitecore the index field pulls it data from. In this example you will for instance get a Category field in the index, which is filled from the Sitecore field category. If the category field isn’t present on the given item, it will just be empty.


The path field doesn’t have a target, as this is filled in by our custom indexer.


This gives the following definition:

 

< index id = "newsIndex" singleInstance="true" type="Examples.LuceneQuery.CustomPathIndexer, Examples">

  < param desc = "name">$(id)</param>

  < templates hint = "list:AddTemplate">

    < template > {B2612CF6-16D6-426D-9E74-EE3A4E3989B2}</template>

  </ templates >

  < fields hint = "raw:AddField">

    < field target = "category">Categories</field>

    < field > path</field>

    < field target = "content">Content</field>

    < field target = "date">__updated</field>

  </ fields >

</ index >


Now you need to add it to a database, which the index should be based upon. In this case we want it to operate on the web database, as we only want published content. We add the index in the /configuration/sitecore/databases/database element in the web.config. So to a child to this element: 

 

< database id = "web" singleInstance="true" type="Sitecore.Data.Database, Sitecore.Kernel">

 

Add the following element: 

 

< indexes hint = "list:AddIndex">

  < index path = "indexes/index[@id='newsIndex']" />

</ indexes >

  

You can add the element to all databases if you like.

 

Now you’re done with the config changes and can move on to coding the indexer.


 

The custom indexer

When building a custom indexer, which should add a special field, you must create a class and inherit the Sitecore.Data.Indexing.Index class. Here you can override the AddFields method to support your changes. The AddFields method is called for each item, which are being indexed.


In our case we want to add the path field besides all the normal fields. This gives the following class:

 

public class CustomPathIndexer : Sitecore.Data.Indexing.Index

{

  //Call the base template

  public CustomPathIndexer(string name) : base(name) { }

 

 

  protected override void AddFields(Sitecore.Data.Items.Item item, Lucene.Net.Documents.Document document)

  {

    //Call the base to add all fields normally

    base.AddFields(item, document);

 

 

    //Now we want to add the path

    /*First define the field by specifying the fieldname, the path,

     * whether the value should be stored in Lucene for output and the type of the index mechanism*/

    Field pathField = new Field("path", item.Paths.Path, Field.Store.YES, Field.Index.TOKENIZED);

    //Then we add the field to the document

    document.Add(pathField);

 

 

  }

}

 

It is as simple as that. All we do is implement the constructor to call the base, and override the AddFields method to handle your path field.

You’re now done and can rebuild the search index from the control panel in Sitecore. You can view the index in IndexViewer to ensure, that all fields are indexed correctly.

 

 

 

Please rate this article


4 rates / 4,5 avg.

  • Jens Mikkelsen

    About the author:

    Jens Mikkelsen

    Jens Mikkelsen is currently employed by Pentia A/S a Sitecore consulting firm. Here he holds a position in Pentias core team as Core Technology Specialist. The position covers a lot of different roles from architect, to lead developer, to consultant. The core team is responsible for supporting all departments in Pentia including the project, sales and moduledepartment. The tasks include everything from code reviews and architecture to development methology to technical presales.

     

    Further he has been deeply envolved in various complex solutions and has built up a strong knowledge of Sitecore architecture and development. He has especially focused on and is specialized in debugging and analyzing Sitecore solutions.

7 responses to "Lucene walk-through – Part 3: Setting up the index"

Jens, is there a way to tell the index to only index certain parts of your content tree?
Posted: Thursday, July 30, 2009 7:25 PM
Out-of-the-box it is only possible to only include special templates. However you can in your custom indexer (also used for indexing the path) override the two UpdateItem. Here you can stop the indexing by not calling the base.UpdateItem if the item isn't in the part of the content tree you want to index.
Posted: Friday, July 31, 2009 12:05 AM
Jens, I can't quite seem to get this to work. What is the purpose of the Custom Indexer? Do I have to create a Custom Indexer? I have set up my index and I have used your Index Viewer and everything is there. However when I run queries in the Index Viewer the only ones that work are if I use QueryParser type of query. All of the other types (Term, Wildcard, etc.) don't return any results. I have a field that is a Droplink - so it contains IDs. I am trying to search on that field but the Term doesn't seem to work. Any thoughts?
Posted: Tuesday, May 18, 2010 1:29 PM
Do you mind sharing the source code from this sample.
Posted: Saturday, May 22, 2010 1:23 AM
You might want to store the path field untokenized in order to be able to use wildcard or prefix queries the way you specify:

Field pathField = new Field("path", item.Paths.Path, Field.Store.YES, Field.Index.UN_TOKENIZED);

Perhaps that answers Coreys question, as long as Wildcard and Prefix queries are not passed through an Analyzer, resulting in a searchterm not tokenized before search is performed.

That aside - excellent write-up, Jens.
Posted: Thursday, June 17, 2010 11:43 AM
Would it be an idea to use the ID path instead (item.Paths.LongID)? That way you won't have to use wildcard/prefix queries to determine if a item resides under a particular item and you avoid problems with names containing white space.
Posted: Monday, July 26, 2010 5:33 PM
Hi Kern,

Yes that would be the best way to do it and that is actually the way we do it in Pentia.

Thanks!

Cheers
Jens
Posted: Monday, July 26, 2010 6:42 PM

Leave a reply

Captcha image
Notify me of follow-up comments via email.
 
 
#nbsp;