Intro to Solr in Drupal
-
Upload
mediacurrent -
Category
Technology
-
view
489 -
download
0
description
Transcript of Intro to Solr in Drupal
Intro to Solr
DrupalConPortland
Andrew RileyDirector of Drupal Development
@andrewmriley
Agenda
Search?WhySolr? Searching
Behindthe
Scenes
Search?
What is Search?
Search (v): to go or look through (a place, area, etc.) carefully in order to find something missing or lost: I searched the desk for the letter.
Source: http://dictionary.reference.com/browse/search
@Mediacurrent
Why Users Search
•Navigation doesn't make sense
• It can be faster
•Lots of data
•Frequent data changes
•Might just be looking for something
@Mediacurrent
Search Problems
•Search accuracy
•Too much data
•Slow response
•Wrong results
@Mediacurrent
Why
Solr?
History
Solr was initially created in 2004 as an in-house project for CNET. It was open sourced in 2006 and donated to the Apache Software Foundation.
@Mediacurrent
Lucene
•Solr is a layer on top of Lucene
•Lucene is a library
•Solr stores files in Lucene format
*http://wiki.apache.org/solr/SolrPerformanceData
@Mediacurrent
Speed
Search speed is important!
@Mediacurrent
Speed
Source: Web Performance Today http://j.mp/12h8wLZ
@Mediacurrent
Speed
• Important!
• It scales well
•No database required
•Clustering & Sharding
•Netflix runs 1.2MM q/day on 4 servers*
*http://wiki.apache.org/solr/SolrPerformanceData
@Mediacurrent
Natural Results
•Stemming: Blogging vs. Blog
•Stop Word Removal: The
•Synonyms: Tissue vs Kleenex
•Highly Configurable
@Mediacurrent
Drupal Search
•Not stemmed by default
•Queries the database
•Stores tokenized words in a single large table
•Much slower to index
@Mediacurrent
VS@Mediacurr
ent
Searching
Ordering
•Score
•Comes from Lucene
•Not "out of 100"
•Bigger score first
More Info: http://lucene.apache.org/core/3_6_1/scoring.html
???
201
200
199
184
@Mediacurrent
Facets
•Users do the work
•Fixes too much data
•Native to Solr
•Requires the Facet API module
•Shopping Sites
@Mediacurrent
Behind the
Scenes
Index?
• Index contains Documents
•Documents have Fields
•Fields have Terms
•~2 minutes for updates
•Uses Lucene syntax
@Mediacurrent
Tokenizing
•Splits words and numbers"this" "is" "blogging"
•Excludes Stopwords"this" "blogging"
•Handles Stemming (if enabled)"this" "blog"
•Very configurable
@Mediacurrent
Bias
•Adjusts the order of search results
•Works on: Content Type, Fields, Comments, Promoted to Home Page and more
•Can be dynamic with custom modules.
@Mediacurrent
Recap
Modules
•Apache Solr (apachesolr)
•Facet API (facetapi)
•Chaos tool suite (ctools)
@Mediacurrent
Overall
•Search is becoming more and more important
•You want to control your search results
• If you don't provide a good search experience, somebody else will.
•Solr doesn't have to be complex.
•Solr is fast and scales.
@Mediacurrent
Thank You!
Questions?
@Mediacurrent Mediacurrent.com
@andrewmriley
slideshare.net/mediacurrent