How Solr Search Works
-
Upload
atlogys-technical-consulting -
Category
Technology
-
view
48 -
download
0
Transcript of How Solr Search Works
![Page 1: How Solr Search Works](https://reader034.fdocuments.net/reader034/viewer/2022051520/58835f861a28ab42678b7079/html5/thumbnails/1.jpg)
How SOLR Search WorksRajat Jain - 20th Dec, 2016
![Page 2: How Solr Search Works](https://reader034.fdocuments.net/reader034/viewer/2022051520/58835f861a28ab42678b7079/html5/thumbnails/2.jpg)
Agenda
• What do you mean by Search?
• Search Requirements
• Comparison of SOLR with SQL/NoSQL
• SOLR Architecture
• SOLR Usage in Trellis
• How Google Search Works
• Other Search Technologies
![Page 3: How Solr Search Works](https://reader034.fdocuments.net/reader034/viewer/2022051520/58835f861a28ab42678b7079/html5/thumbnails/3.jpg)
What do you mean by Search?
![Page 4: How Solr Search Works](https://reader034.fdocuments.net/reader034/viewer/2022051520/58835f861a28ab42678b7079/html5/thumbnails/4.jpg)
What do you mean by Search?
![Page 5: How Solr Search Works](https://reader034.fdocuments.net/reader034/viewer/2022051520/58835f861a28ab42678b7079/html5/thumbnails/5.jpg)
What do you mean by Search?
![Page 6: How Solr Search Works](https://reader034.fdocuments.net/reader034/viewer/2022051520/58835f861a28ab42678b7079/html5/thumbnails/6.jpg)
Search Requirements
• Text Search – eg. “Architects”
• Filters – eg. “In New Delhi”, “iOS”
• Sorting – eg. “Best Match”, “Highest Rating”, etc.
• And More..• Facets
• Stemming
• Fuzzy Matching
• Image Search, etc.
![Page 7: How Solr Search Works](https://reader034.fdocuments.net/reader034/viewer/2022051520/58835f861a28ab42678b7079/html5/thumbnails/7.jpg)
Search Requirements
• Full Text Search
• Fast reads (writes can be slower)
• Various Combinations of Filters
• Various Combinations of Sorting
• Non Features:• Real-time – usually staleness is not a problem
• Data Integrity – usually not a source of storage – can be ‘lossy’
![Page 8: How Solr Search Works](https://reader034.fdocuments.net/reader034/viewer/2022051520/58835f861a28ab42678b7079/html5/thumbnails/8.jpg)
Search Requirements – Faceted Search
• A Type of Filtering with suggestions
• In most cases – sorted by number
• Basically helps the user to narrow down the search without having to ‘guess’ how to narrow it
![Page 9: How Solr Search Works](https://reader034.fdocuments.net/reader034/viewer/2022051520/58835f861a28ab42678b7079/html5/thumbnails/9.jpg)
Conventional Storage for Search
• SQL (MySQL)• Relational Tables
• Normalized Data
• Assuming using Keys / Indexes for reads & writes
• Optimized for reads and writes & transactional data (acid transactions)
• Lots of security, etc.
• Table Data stored in File System
• Indexing - Individual columns – set of columns
• Full Text search – recent addition (full text index)
![Page 10: How Solr Search Works](https://reader034.fdocuments.net/reader034/viewer/2022051520/58835f861a28ab42678b7079/html5/thumbnails/10.jpg)
Conventional Storage for Search
• No SQL (think MongoDB)• Key Value Pairs
• De-normalized Data
• Unstructured Data
• Optimized for Reads – writes can be slightly slower (in case of transactional)
• Data stored in File System
• Indexing – individual fields
• Full Text Search – has in-built support
![Page 11: How Solr Search Works](https://reader034.fdocuments.net/reader034/viewer/2022051520/58835f861a28ab42678b7079/html5/thumbnails/11.jpg)
Advantages of SOLR over MySQL/NoSQL
• Reversed Index
• Mind-blowing Text-analysis / stemming / scoring / fuzziness
• Weighting fields / boosting – custom scoring functions
• Single document concept – no relations (in general)
• Faceting support out-of-the box
• Optimized for search and search alone (at scale without performance drop)
![Page 12: How Solr Search Works](https://reader034.fdocuments.net/reader034/viewer/2022051520/58835f861a28ab42678b7079/html5/thumbnails/12.jpg)
SOLR Architecture – Indexing
• Take a ‘document’ / field, etc.
• For each field apply set of filters / tokenizers
• Convert to individual tokens
• Update the ‘inverted’ index based on the tokens
• In general in the Index keep track of stats, etc. for the various terms
• Different indexes per field
![Page 13: How Solr Search Works](https://reader034.fdocuments.net/reader034/viewer/2022051520/58835f861a28ab42678b7079/html5/thumbnails/13.jpg)
SOLR Architecture - Indexing
13
XML Update Handler
CSV Update Handler
/update /update/csv
XML Update with custom
processor chain
/update/xml
Extracting RequestHandler(PDF, Word, …)
/update/extract
Lucene Index
Data ImportHandler
Database pullRSS pullSimple
transformsSQL DB
RSS feed
<doc><title>
Remove Duplicatesprocessor
Loggingprocessor
Indexprocessor
Custom Transformprocessor
HTTP POSTHTTP POST
pull
pull
Update Processor Chain (per handler)
Lucene
Text Index Analyzers
![Page 14: How Solr Search Works](https://reader034.fdocuments.net/reader034/viewer/2022051520/58835f861a28ab42678b7079/html5/thumbnails/14.jpg)
SOLR Architecture – Searching
• User enters query
• Parse the query, i.e. apply the required filters and tokenizers
• Converted to tokens
• Parallel search across multiple indexes (per field)
• Score all the documents
• Sort in async fashion
![Page 15: How Solr Search Works](https://reader034.fdocuments.net/reader034/viewer/2022051520/58835f861a28ab42678b7079/html5/thumbnails/15.jpg)
SOLR Architecture - Full
![Page 16: How Solr Search Works](https://reader034.fdocuments.net/reader034/viewer/2022051520/58835f861a28ab42678b7079/html5/thumbnails/16.jpg)
SOLR Architecture – Updating Index
• Types of Index Updates• Instant Index
• Incremental Indexing
• Full Indexing
• Index Update Strategies• Instant / Incremental Index cannot happen continuously
• Too much causes performance degradation
• Full Index periodically to optimize the index
![Page 17: How Solr Search Works](https://reader034.fdocuments.net/reader034/viewer/2022051520/58835f861a28ab42678b7079/html5/thumbnails/17.jpg)
SOLR Architecture – Scalability
• Sharding• Splitting collections across servers
– search in parallel
• Replication• More than one copy of the data
for failover
• SolrCloud• Using Zookeeper for managing
clusters
![Page 18: How Solr Search Works](https://reader034.fdocuments.net/reader034/viewer/2022051520/58835f861a28ab42678b7079/html5/thumbnails/18.jpg)
SOLR Architecture – Other Features
• Stemming• Identify root word and variations of the word, eg. "stems", "stemmer",
"stemming", "stemmed" as based on "stem"
• Fuzzy Matching• Similar Words / Misspellings
• Edit Distance
• NLP• Identify Entities / Nouns in Search Query
• OpenNLP Plugin for SOLR
• And much more…
![Page 19: How Solr Search Works](https://reader034.fdocuments.net/reader034/viewer/2022051520/58835f861a28ab42678b7079/html5/thumbnails/19.jpg)
SOLR Usage in Trellis
• Architecture• Data-in from MySQL
• Index Update Strategy
• AutoComplete
• Basic Search
• Advanced Search
• Filters / Sorting / Facets & More
• Demo (Incl. Config Files)
![Page 20: How Solr Search Works](https://reader034.fdocuments.net/reader034/viewer/2022051520/58835f861a28ab42678b7079/html5/thumbnails/20.jpg)
How Google Search Works
• Crawling• Robots.txt
• Indexing• Multiple Indexes – Instant / Daily / Weekly / Long Tail
• Searching• NLP, Stemming, Auto-correct, etc.
• Ranking – PageRank
• Video - https://www.youtube.com/watch?v=BNHR6IQJGZs
![Page 21: How Solr Search Works](https://reader034.fdocuments.net/reader034/viewer/2022051520/58835f861a28ab42678b7079/html5/thumbnails/21.jpg)
Other Search Technologies
• ElasticSearch• Much newer than Solr
• Built-in scalability
• Uses same Lucene as the base
• JSON instead of XML
• Good for Analytical querying
• Others• Splunk
• Sphinx
![Page 22: How Solr Search Works](https://reader034.fdocuments.net/reader034/viewer/2022051520/58835f861a28ab42678b7079/html5/thumbnails/22.jpg)
That’s All Folks
References• SOLR Home Page -
http://lucene.apache.org/solr/
• Tutorials• http://www.solrtutorial.com/index.h
tml
• https://lucene.apache.org/solr/4_10_0/tutorial.html
• Just Google the rest!!