Thursday, March 12, 2009

Undocumented dtSearch Synopsis Trick

I've worked with dtSearch before on other projects and have used other ones since then. I'm currently working on a project that uses dtSearch as the primary search engine. There's always a synopsis for a search result and the dtSearch module actually provides a XSLT helper.

If you just want to know how to do it, then follow this quick steps:
  1. Create an index in dtSearch using the Create Index(advanced) option.
  2. In the Create Index(advanced) window, make sure to check "Cache document text in index" and/or "Cache documents in index". This tells the dtSearch indexer to keep a copy of the content that's being index.
  3. Add a Sitecore setting in web.config named "dtSearchSynopsis" with the value "true".
  4. Update your XSLT Rendering to get the content in the search result set that comes back from the Search function.

Now, if you want a more detailed explanation, read on.

The dtSearch XSLT Helper's GetSynopsisNavigator() method requires:

  • pageUrl
  • search term
  • css highlighter style name
  • length of the sysnopsis

The method uses the page Url so that it can access it able to extract the page's content. Then it truncates the content while making sure that the search term is in the synopsis. Finally, it looks for the search term and re-formats it between a tag with the specified css class name. And it returns the string. Of course, it does some other things also such as remove scripts, tags, etc. to give you a nice clean synopsis.

Well, sometimes it's clean. The problem with this approach is that the synopsis will always return almost the same set of synopsis when the search term is something that's found on the top part of the page such as your navigations, etc. For instance, if you have a site that has a ubiquitous utility bar with "Contact Us". So, this text phrase exists on every page. Now, if you search for the word "Contact", dtSearch will return basically all pages as the set of results and the synopsis for each will be very odd using GetSynopsisNavigator() and will always include the utility bar content even though those parts of the site are really skeletal pieces that should not be searched.

So, you might think that it's an issue with the indexer. Well, that's partly true. To let dtSearch index only particular parts of the page, you need to use the that is documented in SDN5 and at dtSearch's site. So, you basically put the comment markers appropriately and it will only index, say, the main body content of the page and not the footer text, headers, callouts, and other supporting/related content.

At this point, we've only solved that dtSearch should only index appropriate content so that it doesn't return "ALL" the pages. The next part is to get the synopsis corrected. Well, I accidentally found this out by going through the intermediate language for the dtSearch module. I read it through to figure out how it works. Then, I noticed that it's actually possible to generate synopsis as part of the result set as returned by the module's XSLT Helper Search() function.

To do so, you'll need to add an undocumented Sitecore setting for dtSearch named "dtSearchSynopsis" with the value "true". This tells the dtSearch module's XSLT Helper to add a tag in the result set that comes back from the Search() function. The synopsis automatically adds "<b>" around the search term. The synopsis looks pretty good and there are no truncated words because of the # of characters limit.

Sometimes you might get some odd results, so you just need to fine-tune where you place your comments. Make sure that the indexer is able to see the scID and scPath meta tags since the dtSearch module uses them to find items.

Well, that's it. I hope that helps.