Enhancing Search Using Lucenesymposiumna.s3.amazonaws.com/2012/Developer-Enhancing Searc… ·...

53
Amsterdam Las Vegas Melbourne Enhancing Search Using Lucene Las Vegas, October 22 nd , 2012 Scott Rogers, Sean MacLean CX Interactive

Transcript of Enhancing Search Using Lucenesymposiumna.s3.amazonaws.com/2012/Developer-Enhancing Searc… ·...

Page 1: Enhancing Search Using Lucenesymposiumna.s3.amazonaws.com/2012/Developer-Enhancing Searc… · Search Manager Lucene Content Query + Filters Search Results Collection IFilters SearchResults

Amsterdam Las Vegas Melbourne

Enhancing Search Using Lucene

Las Vegas, October 22nd, 2012

Scott Rogers, Sean MacLean

CX Interactive

Page 2: Enhancing Search Using Lucenesymposiumna.s3.amazonaws.com/2012/Developer-Enhancing Searc… · Search Manager Lucene Content Query + Filters Search Results Collection IFilters SearchResults

Amsterdam Las Vegas Melbourne

Introductions

• Scott Rogers – Director, Application Development

• Sean MacLean – Senior Application Developer

• CX Interactive – Digital agency, specializing in web based solutions

• I cannot do anything about my Canadian accent, eh!

Page 3: Enhancing Search Using Lucenesymposiumna.s3.amazonaws.com/2012/Developer-Enhancing Searc… · Search Manager Lucene Content Query + Filters Search Results Collection IFilters SearchResults

Amsterdam Las Vegas Melbourne

Business Problem

• Need to allow searching over full text of

documents (in a variety of formats)

• Need to allow filtering on meta data (data

about the data)

• Replacing an older system with poor data–

garbage in, garbage out

Page 4: Enhancing Search Using Lucenesymposiumna.s3.amazonaws.com/2012/Developer-Enhancing Searc… · Search Manager Lucene Content Query + Filters Search Results Collection IFilters SearchResults

Amsterdam Las Vegas Melbourne

User Requirements

• Capital markets firms

– Investment dealer (includes investment banking, sales & trading, advisory and product management)

– Publish multiple documents daily

– Documents may be relevant that day (e.g. market reports - date based), or later (text or meta data searching)

Page 5: Enhancing Search Using Lucenesymposiumna.s3.amazonaws.com/2012/Developer-Enhancing Searc… · Search Manager Lucene Content Query + Filters Search Results Collection IFilters SearchResults

Amsterdam Las Vegas Melbourne

Technology Problem

• Many document formats not fully indexed by ‘out of the box’ Lucene

• Meta data can be searchable, filterable is harder

• Needed to connect documents to Sitecore content item

• How do we build a system that doesn’t require code modifications as content & meta data is added, changed or removed?

Page 6: Enhancing Search Using Lucenesymposiumna.s3.amazonaws.com/2012/Developer-Enhancing Searc… · Search Manager Lucene Content Query + Filters Search Results Collection IFilters SearchResults

Amsterdam Las Vegas Melbourne

Design Overview

Page 7: Enhancing Search Using Lucenesymposiumna.s3.amazonaws.com/2012/Developer-Enhancing Searc… · Search Manager Lucene Content Query + Filters Search Results Collection IFilters SearchResults

Amsterdam Las Vegas Melbourne

Design Overview

Design Ideals

Page 8: Enhancing Search Using Lucenesymposiumna.s3.amazonaws.com/2012/Developer-Enhancing Searc… · Search Manager Lucene Content Query + Filters Search Results Collection IFilters SearchResults

Amsterdam Las Vegas Melbourne

Design Overview

Configurable

Design Ideals

Page 9: Enhancing Search Using Lucenesymposiumna.s3.amazonaws.com/2012/Developer-Enhancing Searc… · Search Manager Lucene Content Query + Filters Search Results Collection IFilters SearchResults

Amsterdam Las Vegas Melbourne

Design Overview

Configurable Flexible

Design Ideals

Page 10: Enhancing Search Using Lucenesymposiumna.s3.amazonaws.com/2012/Developer-Enhancing Searc… · Search Manager Lucene Content Query + Filters Search Results Collection IFilters SearchResults

Amsterdam Las Vegas Melbourne

Design Overview

Configurable Flexible

Extensible

Design Ideals

Page 11: Enhancing Search Using Lucenesymposiumna.s3.amazonaws.com/2012/Developer-Enhancing Searc… · Search Manager Lucene Content Query + Filters Search Results Collection IFilters SearchResults

Amsterdam Las Vegas Melbourne

Design Overview

Configurable Flexible

Extensible Puppies, Rainbows

and Unicorns

Design Ideals

Page 12: Enhancing Search Using Lucenesymposiumna.s3.amazonaws.com/2012/Developer-Enhancing Searc… · Search Manager Lucene Content Query + Filters Search Results Collection IFilters SearchResults

Amsterdam Las Vegas Melbourne

Solution Overview

Content Browser

Search Manager

Configuration / Sublayouts

Page 13: Enhancing Search Using Lucenesymposiumna.s3.amazonaws.com/2012/Developer-Enhancing Searc… · Search Manager Lucene Content Query + Filters Search Results Collection IFilters SearchResults

Amsterdam Las Vegas Melbourne

Solution Overview

Content Browser

Search Manager

Configuration / Sublayouts

Something to display

items

Page 14: Enhancing Search Using Lucenesymposiumna.s3.amazonaws.com/2012/Developer-Enhancing Searc… · Search Manager Lucene Content Query + Filters Search Results Collection IFilters SearchResults

Amsterdam Las Vegas Melbourne

Solution Overview

Content Browser

Search Manager

Configuration / Sublayouts

Something to search

items

Page 15: Enhancing Search Using Lucenesymposiumna.s3.amazonaws.com/2012/Developer-Enhancing Searc… · Search Manager Lucene Content Query + Filters Search Results Collection IFilters SearchResults

Amsterdam Las Vegas Melbourne

Solution Overview

Content Browser

Search Manager

Configuration / Sublayouts

Some way to configure

Page 16: Enhancing Search Using Lucenesymposiumna.s3.amazonaws.com/2012/Developer-Enhancing Searc… · Search Manager Lucene Content Query + Filters Search Results Collection IFilters SearchResults

Amsterdam Las Vegas Melbourne

Solution Details

Update Panels

JQuery

Dynamic Row Controls and ‘Lazy’ Loading

Content Browser

Page 17: Enhancing Search Using Lucenesymposiumna.s3.amazonaws.com/2012/Developer-Enhancing Searc… · Search Manager Lucene Content Query + Filters Search Results Collection IFilters SearchResults

Amsterdam Las Vegas Melbourne

Solution Details

Sitecore Search API (Lucene)

Custom Indexer (Tika)

Index Search Context

Search Manager

Page 18: Enhancing Search Using Lucenesymposiumna.s3.amazonaws.com/2012/Developer-Enhancing Searc… · Search Manager Lucene Content Query + Filters Search Results Collection IFilters SearchResults

Amsterdam Las Vegas Melbourne

Solution Details

Sitecore Content Tree

Templates

Layouts and Devices

Configuration

Page 19: Enhancing Search Using Lucenesymposiumna.s3.amazonaws.com/2012/Developer-Enhancing Searc… · Search Manager Lucene Content Query + Filters Search Results Collection IFilters SearchResults

Amsterdam Las Vegas Melbourne

Search Manager Overview

Input: Query Text + IFilters

Output: Search Result Collection

Uses the Sitecore Search Api (Lucene) and Custom file crawler to provide context based search

IKVM wrapped Java-based Apache Tika providing linkage of parsed documents back to content items

Page 20: Enhancing Search Using Lucenesymposiumna.s3.amazonaws.com/2012/Developer-Enhancing Searc… · Search Manager Lucene Content Query + Filters Search Results Collection IFilters SearchResults

Amsterdam Las Vegas Melbourne

Search Manager Overview

Sitecore

Pdfs

Cxi IndexCrawler

Lucene Search Manager Content

Query + Filters

Search Results Collection

IFilters SearchResults

Search Context Query Search

Page 21: Enhancing Search Using Lucenesymposiumna.s3.amazonaws.com/2012/Developer-Enhancing Searc… · Search Manager Lucene Content Query + Filters Search Results Collection IFilters SearchResults

Amsterdam Las Vegas Melbourne

Search Manager Details

• Search (String query, List<IFilter> filters)

Page 22: Enhancing Search Using Lucenesymposiumna.s3.amazonaws.com/2012/Developer-Enhancing Searc… · Search Manager Lucene Content Query + Filters Search Results Collection IFilters SearchResults

Amsterdam Las Vegas Melbourne

Search Manager Details

• Search (String query, List<IFilter> filters)

• SearchResultCollection GetSearchResults()

Page 23: Enhancing Search Using Lucenesymposiumna.s3.amazonaws.com/2012/Developer-Enhancing Searc… · Search Manager Lucene Content Query + Filters Search Results Collection IFilters SearchResults

Amsterdam Las Vegas Melbourne

Search Manager Details

• Search (String query, List<IFilter> filters)

• SearchResultCollection GetSearchResults()

• SearchManager Class extensible for complex

data

Page 24: Enhancing Search Using Lucenesymposiumna.s3.amazonaws.com/2012/Developer-Enhancing Searc… · Search Manager Lucene Content Query + Filters Search Results Collection IFilters SearchResults

Amsterdam Las Vegas Melbourne

Search Manager Details

• Search(String query, List<IFilter> filters)

• SearchResultCollection GetSearchResults()

• SearchManager Class extensible for complex data

• CxiIndexCrawler dynamically parses and links file content to its owner content item.

Page 25: Enhancing Search Using Lucenesymposiumna.s3.amazonaws.com/2012/Developer-Enhancing Searc… · Search Manager Lucene Content Query + Filters Search Results Collection IFilters SearchResults

Amsterdam Las Vegas Melbourne

Content Browser Overview

Custom UserControl that extends UpdatePanel and is Content and Filter ‘Agnostic’

Large update panel used, along with JQuery Ajax calls for the previews and slide downs to link components together

Allowed the main browser to point at a datasource, and add any number of facets without changes to underlying code

Page 26: Enhancing Search Using Lucenesymposiumna.s3.amazonaws.com/2012/Developer-Enhancing Searc… · Search Manager Lucene Content Query + Filters Search Results Collection IFilters SearchResults

Amsterdam Las Vegas Melbourne

Content Browser Overview

Browser (filters)

Facets

Update Panel

Page 27: Enhancing Search Using Lucenesymposiumna.s3.amazonaws.com/2012/Developer-Enhancing Searc… · Search Manager Lucene Content Query + Filters Search Results Collection IFilters SearchResults

Amsterdam Las Vegas Melbourne

Content Browser Overview

• Search Results ListView

• Facets placeholder

• Pager control

• Options such as tabs using pre-assigned

filters and sorted results

Page 28: Enhancing Search Using Lucenesymposiumna.s3.amazonaws.com/2012/Developer-Enhancing Searc… · Search Manager Lucene Content Query + Filters Search Results Collection IFilters SearchResults

Amsterdam Las Vegas Melbourne

Content Browser Overview

• A ‘summary’ row user control injected for each

result

• Content ‘slide down’ via Ajax where view=content

• Mouse over ‘pop ups’ via Ajax where

view=preview

• Href=LinkManager(item.ID) + “?view=”

Page 29: Enhancing Search Using Lucenesymposiumna.s3.amazonaws.com/2012/Developer-Enhancing Searc… · Search Manager Lucene Content Query + Filters Search Results Collection IFilters SearchResults

Amsterdam Las Vegas Melbourne

Content Browser Overview

Page 30: Enhancing Search Using Lucenesymposiumna.s3.amazonaws.com/2012/Developer-Enhancing Searc… · Search Manager Lucene Content Query + Filters Search Results Collection IFilters SearchResults

Amsterdam Las Vegas Melbourne

Content Browser Content

Ajax …?view=content

Page 31: Enhancing Search Using Lucenesymposiumna.s3.amazonaws.com/2012/Developer-Enhancing Searc… · Search Manager Lucene Content Query + Filters Search Results Collection IFilters SearchResults

Amsterdam Las Vegas Melbourne

Content Browser Preview

Ajax

…?view=preview

Page 32: Enhancing Search Using Lucenesymposiumna.s3.amazonaws.com/2012/Developer-Enhancing Searc… · Search Manager Lucene Content Query + Filters Search Results Collection IFilters SearchResults

Amsterdam Las Vegas Melbourne

Content Browser Facets

• 1-N ‘Facets’ (sublayouts) can be added

• Facets use FindControl(“Browser”) and

Browser.Filters.Add(self)

• Each Facet operates with an update panel

and is a ‘stand alone’ control.

Page 33: Enhancing Search Using Lucenesymposiumna.s3.amazonaws.com/2012/Developer-Enhancing Searc… · Search Manager Lucene Content Query + Filters Search Results Collection IFilters SearchResults

Amsterdam Las Vegas Melbourne

Content Browser Facets

Page 34: Enhancing Search Using Lucenesymposiumna.s3.amazonaws.com/2012/Developer-Enhancing Searc… · Search Manager Lucene Content Query + Filters Search Results Collection IFilters SearchResults

Amsterdam Las Vegas Melbourne

Configuration – Overview

• Add a Content Browser and point it at a

collection of similar content items

• Add ‘Facet’ filters at that map content item

fields to ‘lookup’ items

• Map ‘summary’, preview and content views

Page 35: Enhancing Search Using Lucenesymposiumna.s3.amazonaws.com/2012/Developer-Enhancing Searc… · Search Manager Lucene Content Query + Filters Search Results Collection IFilters SearchResults

Amsterdam Las Vegas Melbourne

Browser Configuration

Model View Controller

Page 36: Enhancing Search Using Lucenesymposiumna.s3.amazonaws.com/2012/Developer-Enhancing Searc… · Search Manager Lucene Content Query + Filters Search Results Collection IFilters SearchResults

Amsterdam Las Vegas Melbourne

Browser Configuration

Content (/templates/Publication)

•Default Redirect

•Preview Device •Content Device

Model View Controller

Page 37: Enhancing Search Using Lucenesymposiumna.s3.amazonaws.com/2012/Developer-Enhancing Searc… · Search Manager Lucene Content Query + Filters Search Results Collection IFilters SearchResults

Amsterdam Las Vegas Melbourne

Browser Configuration

Configuration (/…/PublicationBrowser)

•Browser •SummaryRow

•Facet A •Facet B •Facet C

Content (/templates/Publication)

•Default Redirect

•Preview Device •Content Device

Model View Controller

Page 38: Enhancing Search Using Lucenesymposiumna.s3.amazonaws.com/2012/Developer-Enhancing Searc… · Search Manager Lucene Content Query + Filters Search Results Collection IFilters SearchResults

Amsterdam Las Vegas Melbourne

Browser Configuration

Configuration (/…/PublicationBrowser)

Browser (/home/Publications.aspx)

•Browser

•MultiList •DropDown •SingleLine •Textbox

Content (/templates/Publication)

•Default Redirect

•Preview Device •Content Device

Model View Controller

•Browser •SummaryRow

•Facet A •Facet B •Facet C

Page 39: Enhancing Search Using Lucenesymposiumna.s3.amazonaws.com/2012/Developer-Enhancing Searc… · Search Manager Lucene Content Query + Filters Search Results Collection IFilters SearchResults

Amsterdam Las Vegas Melbourne

“Model” - Device Layouts

• Needed to keep the browser content agnostic

• Wanted content items to ‘display themselves’ upon request

• Assigned a standard markup container to a preview and content device layout – Allows mouse over to call Ajax

• Summary list uses a dynamically loaded row or custom .ascx

Page 40: Enhancing Search Using Lucenesymposiumna.s3.amazonaws.com/2012/Developer-Enhancing Searc… · Search Manager Lucene Content Query + Filters Search Results Collection IFilters SearchResults

Amsterdam Las Vegas Melbourne

“Model” - Device Layouts

http://sitecorecxi/Publications.aspx?id={C4148482-...-204FE082D00F}

<a

href=http://sitecorecxi/.../Publications/CoverageListResearch/2011/Augus

t/ABH_T_08112011.aspx?view=preview

Page 41: Enhancing Search Using Lucenesymposiumna.s3.amazonaws.com/2012/Developer-Enhancing Searc… · Search Manager Lucene Content Query + Filters Search Results Collection IFilters SearchResults

Amsterdam Las Vegas Melbourne

“Controller” - Browser Config

Page 42: Enhancing Search Using Lucenesymposiumna.s3.amazonaws.com/2012/Developer-Enhancing Searc… · Search Manager Lucene Content Query + Filters Search Results Collection IFilters SearchResults

Amsterdam Las Vegas Melbourne

“Controller” - Facet Config

Page 43: Enhancing Search Using Lucenesymposiumna.s3.amazonaws.com/2012/Developer-Enhancing Searc… · Search Manager Lucene Content Query + Filters Search Results Collection IFilters SearchResults

Amsterdam Las Vegas Melbourne

“View” – Public Page Config

Page 44: Enhancing Search Using Lucenesymposiumna.s3.amazonaws.com/2012/Developer-Enhancing Searc… · Search Manager Lucene Content Query + Filters Search Results Collection IFilters SearchResults

Amsterdam Las Vegas Melbourne

Demo

Demo

Page 45: Enhancing Search Using Lucenesymposiumna.s3.amazonaws.com/2012/Developer-Enhancing Searc… · Search Manager Lucene Content Query + Filters Search Results Collection IFilters SearchResults

Amsterdam Las Vegas Melbourne

Source Code

Key Classes

Page 46: Enhancing Search Using Lucenesymposiumna.s3.amazonaws.com/2012/Developer-Enhancing Searc… · Search Manager Lucene Content Query + Filters Search Results Collection IFilters SearchResults

Amsterdam Las Vegas Melbourne

Solution Summary

• Started with Shared Source Module

• First iteration: added full text search on

PDFs

• Second iteration: added facets, IFilters,

searching on multiple document types

• Extended Configurability

Page 47: Enhancing Search Using Lucenesymposiumna.s3.amazonaws.com/2012/Developer-Enhancing Searc… · Search Manager Lucene Content Query + Filters Search Results Collection IFilters SearchResults

Amsterdam Las Vegas Melbourne

Future Enhancements

• Reflection class loading technique

Page 48: Enhancing Search Using Lucenesymposiumna.s3.amazonaws.com/2012/Developer-Enhancing Searc… · Search Manager Lucene Content Query + Filters Search Results Collection IFilters SearchResults

Amsterdam Las Vegas Melbourne

Future Enhancements

• Reflection class loading technique

• Dynamic facet loading with sort order

Page 49: Enhancing Search Using Lucenesymposiumna.s3.amazonaws.com/2012/Developer-Enhancing Searc… · Search Manager Lucene Content Query + Filters Search Results Collection IFilters SearchResults

Amsterdam Las Vegas Melbourne

Future Enhancements

• Reflection class loading technique

• Dynamic facet loading with sort order

• Default search using a custom Lucene

‘path’ term with sorting

Page 50: Enhancing Search Using Lucenesymposiumna.s3.amazonaws.com/2012/Developer-Enhancing Searc… · Search Manager Lucene Content Query + Filters Search Results Collection IFilters SearchResults

Amsterdam Las Vegas Melbourne

Future Enhancements

• Reflection class loading technique

• Dynamic facet loading with sort order

• Default search using a custom Lucene

‘path’ term with sorting

• Integrate new Lucene apis and pre-filter

Page 51: Enhancing Search Using Lucenesymposiumna.s3.amazonaws.com/2012/Developer-Enhancing Searc… · Search Manager Lucene Content Query + Filters Search Results Collection IFilters SearchResults

Amsterdam Las Vegas Melbourne

Future Enhancements

• Reflection class loading technique

• Dynamic Facet loading with sort order

• Default search using a custom Lucene ‘path’ term with sorting

• Integrate new Lucene apis and pre-filter

• Configurable sortable fields

Page 52: Enhancing Search Using Lucenesymposiumna.s3.amazonaws.com/2012/Developer-Enhancing Searc… · Search Manager Lucene Content Query + Filters Search Results Collection IFilters SearchResults

Amsterdam Las Vegas Melbourne

Questions

Page 53: Enhancing Search Using Lucenesymposiumna.s3.amazonaws.com/2012/Developer-Enhancing Searc… · Search Manager Lucene Content Query + Filters Search Results Collection IFilters SearchResults

Amsterdam Las Vegas Melbourne

Feedback Appreciated!

Please take a moment to provide session feedback via the mobile site.

http://www.sitecore.net/SymNA