Practical Elasticsearch - real world use cases
Transcript of Practical Elasticsearch - real world use cases
![Page 2: Practical Elasticsearch - real world use cases](https://reader033.fdocuments.net/reader033/viewer/2022042615/55c399a2bb61ebca718b4645/html5/thumbnails/2.jpg)
Me?
• Itamar Syn-Hershko / @synhershko
• Lucene.NET PMC and lead committer
• Microsoft MVP
• RavenDB
– X-Core developer
– “RavenDB in Action” authorConsulting Partner
![Page 3: Practical Elasticsearch - real world use cases](https://reader033.fdocuments.net/reader033/viewer/2022042615/55c399a2bb61ebca718b4645/html5/thumbnails/3.jpg)
![Page 4: Practical Elasticsearch - real world use cases](https://reader033.fdocuments.net/reader033/viewer/2022042615/55c399a2bb61ebca718b4645/html5/thumbnails/4.jpg)
An index
![Page 5: Practical Elasticsearch - real world use cases](https://reader033.fdocuments.net/reader033/viewer/2022042615/55c399a2bb61ebca718b4645/html5/thumbnails/5.jpg)
Elasticsearch
• Powered by Apache Lucene
• Open-source
• Rapid growth
• High profile users world-wide
![Page 6: Practical Elasticsearch - real world use cases](https://reader033.fdocuments.net/reader033/viewer/2022042615/55c399a2bb61ebca718b4645/html5/thumbnails/6.jpg)
REST API
• Indexes• Types• IDs
$ curl -XPUT 'http://localhost:9200/twitter/tweet/1' -d '{"user" : "synhershko","post_date" : "2013-05-30T14:12:12","message" : "trying out Elastic Search","followers": 3,"registered": true
}'
![Page 7: Practical Elasticsearch - real world use cases](https://reader033.fdocuments.net/reader033/viewer/2022042615/55c399a2bb61ebca718b4645/html5/thumbnails/7.jpg)
Full-Text Search
![Page 8: Practical Elasticsearch - real world use cases](https://reader033.fdocuments.net/reader033/viewer/2022042615/55c399a2bb61ebca718b4645/html5/thumbnails/8.jpg)
DocumentsTerm
<6>and
<2> <3>big
<6>dark
<4>did
<2>gown
<3>had
<2> <3>house
<1> <2> <3> <5> <6>in
<1> <3> <5>keep
<1> <4> <5>keeper
<1> <5> <6>keeps
<6>light
<4>never
<1> <4> <5>night
<1> <2> <3> <4>old
<4>sleep
<6>sleeps
<1> <2> <3> <4> <5> <6>the
<1> <3>town
<4>where
The index:
Dictionary and
posting lists
6 documents to index
Example from:
Justin Zobel , Alistair Moffat,
Inverted files for text search engines,
ACM Computing Surveys (CSUR)
v.38 n.2, p.6-es, 2006
The old night keeper keeps the keep in the town1
In the big old house in the big old gown.2
The house in the town had the big old keep3
Where the old night keeper never did sleep.4
The night keeper keeps the keep in the night5
And keeps in the dark and sleeps in the light.6
Full-text Search 101:The inverted index
![Page 9: Practical Elasticsearch - real world use cases](https://reader033.fdocuments.net/reader033/viewer/2022042615/55c399a2bb61ebca718b4645/html5/thumbnails/9.jpg)
Full-text Search 101:The inverted index
DocumentsTerm
<6>and
<2> <3>big
<6>dark
<4>did
<2>gown
<3>had
<2> <3>house
<1> <2> <3> <5> <6>in
<1> <3> <5>keep
<1> <4> <5>keeper
<1> <5> <6>keeps
<6>light
<4>never
<1> <4> <5>night
<1> <2> <3> <4>old
<4>sleep
<6>sleeps
<1> <2> <3> <4> <5> <6>the
<1> <3>town
<4>where
The index:
Dictionary and
posting lists
6 documents to index
The old night keeper keeps the keep in the town1
In the big old house in the big old gown.2
The house in the town had the big old keep3
Where the old night keeper never did sleep.4
The night keeper keeps the keep in the night5
And keeps in the dark and sleeps in the light.6
User queries for “keeper”
![Page 10: Practical Elasticsearch - real world use cases](https://reader033.fdocuments.net/reader033/viewer/2022042615/55c399a2bb61ebca718b4645/html5/thumbnails/10.jpg)
Term NormalizationDocumentsTerm
<6>and
<2> <3>big
<6>dark
<4>did
<2>gown
<3>had
<2> <3>house
<1> <2> <3> <5> <6>in
<1> <3> <5>keep
<1> <4> <5>keeper
<1> <5> <6>keeps
<6>light
<4>never
<1> <4> <5>night
<1> <2> <3> <4>old
<4>sleep
<6>sleeps
<1> <2> <3> <4> <5> <6>the
<1> <3>town
<4>where
• Lowercasing
• Stop words (grey)
• Not best practice anymore
• Stemming
• Porter stemmer
• s-stemmer
• Relevance++
• SizeOnDisk--
![Page 11: Practical Elasticsearch - real world use cases](https://reader033.fdocuments.net/reader033/viewer/2022042615/55c399a2bb61ebca718b4645/html5/thumbnails/11.jpg)
Full-Text Search
Your data store
![Page 12: Practical Elasticsearch - real world use cases](https://reader033.fdocuments.net/reader033/viewer/2022042615/55c399a2bb61ebca718b4645/html5/thumbnails/12.jpg)
How hard is it to get search right, anyway?
![Page 13: Practical Elasticsearch - real world use cases](https://reader033.fdocuments.net/reader033/viewer/2022042615/55c399a2bb61ebca718b4645/html5/thumbnails/13.jpg)
Relevance
• PrecisionThe fraction of the retrieved documents that are relevant
• RecallThe fraction of the relevant documents that are retrieved
• Order of results
![Page 14: Practical Elasticsearch - real world use cases](https://reader033.fdocuments.net/reader033/viewer/2022042615/55c399a2bb61ebca718b4645/html5/thumbnails/14.jpg)
Challenges with search
• Relevance
• Getting the tokens right
– Tokenization
– Stemming
• Multi-lingual content
– Or other cross-cutting search concerns
• Tolerance
![Page 15: Practical Elasticsearch - real world use cases](https://reader033.fdocuments.net/reader033/viewer/2022042615/55c399a2bb61ebca718b4645/html5/thumbnails/15.jpg)
Real-time Analytics
![Page 16: Practical Elasticsearch - real world use cases](https://reader033.fdocuments.net/reader033/viewer/2022042615/55c399a2bb61ebca718b4645/html5/thumbnails/16.jpg)
Real-time Analytics
Queue(Redis)
“Shippers”
“Indexer”
![Page 17: Practical Elasticsearch - real world use cases](https://reader033.fdocuments.net/reader033/viewer/2022042615/55c399a2bb61ebca718b4645/html5/thumbnails/17.jpg)
Scaling out
![Page 18: Practical Elasticsearch - real world use cases](https://reader033.fdocuments.net/reader033/viewer/2022042615/55c399a2bb61ebca718b4645/html5/thumbnails/18.jpg)
Moar use cases!
![Page 19: Practical Elasticsearch - real world use cases](https://reader033.fdocuments.net/reader033/viewer/2022042615/55c399a2bb61ebca718b4645/html5/thumbnails/19.jpg)
#1: Real-Time Alerting System
![Page 20: Practical Elasticsearch - real world use cases](https://reader033.fdocuments.net/reader033/viewer/2022042615/55c399a2bb61ebca718b4645/html5/thumbnails/20.jpg)
Percolation
![Page 21: Practical Elasticsearch - real world use cases](https://reader033.fdocuments.net/reader033/viewer/2022042615/55c399a2bb61ebca718b4645/html5/thumbnails/21.jpg)
#2: Smarter query parsing
![Page 22: Practical Elasticsearch - real world use cases](https://reader033.fdocuments.net/reader033/viewer/2022042615/55c399a2bb61ebca718b4645/html5/thumbnails/22.jpg)
Matching inexact queries
• Phrase slop
– “Bridge of London” -> “London Bridge”
• Word-level edit distance with fuzzy queries
– ditsance -> distance
– color -> colour
![Page 23: Practical Elasticsearch - real world use cases](https://reader033.fdocuments.net/reader033/viewer/2022042615/55c399a2bb61ebca718b4645/html5/thumbnails/23.jpg)
#3: Offline Classification
![Page 24: Practical Elasticsearch - real world use cases](https://reader033.fdocuments.net/reader033/viewer/2022042615/55c399a2bb61ebca718b4645/html5/thumbnails/24.jpg)
Structuring the unstructured
• Record linkage
– Bag of words model
– “More Like This” functionality
• NLP
• Entity extraction
![Page 25: Practical Elasticsearch - real world use cases](https://reader033.fdocuments.net/reader033/viewer/2022042615/55c399a2bb61ebca718b4645/html5/thumbnails/25.jpg)
#4: Everything is searchable
![Page 26: Practical Elasticsearch - real world use cases](https://reader033.fdocuments.net/reader033/viewer/2022042615/55c399a2bb61ebca718b4645/html5/thumbnails/26.jpg)
Geo-spatial search
• Distance
• Shape interactions
• Multiple algorithms
![Page 27: Practical Elasticsearch - real world use cases](https://reader033.fdocuments.net/reader033/viewer/2022042615/55c399a2bb61ebca718b4645/html5/thumbnails/27.jpg)
Geo-spatial search
![Page 28: Practical Elasticsearch - real world use cases](https://reader033.fdocuments.net/reader033/viewer/2022042615/55c399a2bb61ebca718b4645/html5/thumbnails/28.jpg)
![Page 30: Practical Elasticsearch - real world use cases](https://reader033.fdocuments.net/reader033/viewer/2022042615/55c399a2bb61ebca718b4645/html5/thumbnails/30.jpg)
http://cs.stanford.edu/people/karpathy/deepimagesent
Deep Visual-Semantic Alignments for Generating Image Descriptions
![Page 31: Practical Elasticsearch - real world use cases](https://reader033.fdocuments.net/reader033/viewer/2022042615/55c399a2bb61ebca718b4645/html5/thumbnails/31.jpg)
#5: Anomaly detection
![Page 32: Practical Elasticsearch - real world use cases](https://reader033.fdocuments.net/reader033/viewer/2022042615/55c399a2bb61ebca718b4645/html5/thumbnails/32.jpg)
The Significant Terms Aggregation
![Page 33: Practical Elasticsearch - real world use cases](https://reader033.fdocuments.net/reader033/viewer/2022042615/55c399a2bb61ebca718b4645/html5/thumbnails/33.jpg)
Uncommonly common
Mark Harwood’s talk at
http://www.infoq.com/presentations/elasticsearch-revealing-uncommonly-common
![Page 34: Practical Elasticsearch - real world use cases](https://reader033.fdocuments.net/reader033/viewer/2022042615/55c399a2bb61ebca718b4645/html5/thumbnails/34.jpg)
#6: Debugging a distributed system
Queue(Redis)
![Page 35: Practical Elasticsearch - real world use cases](https://reader033.fdocuments.net/reader033/viewer/2022042615/55c399a2bb61ebca718b4645/html5/thumbnails/35.jpg)
#6: Debugging a distributed system
127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gifHTTP/1.0" 200 2326 "http://www.example.com/start.html" "Mozilla/4.08 [en] (Win98; I ;Nav)"
System.NullReferenceException: Object reference not set to an instance of an object. at System.Collections.Generic.Dictionary`2.Insert(TKey key, TValue value, Boolean add) at AjaxControlToolkit.ToolkitScriptManager.GetScriptCombineAttributes(Assembly assembly) at AjaxControlToolkit.ToolkitScriptManager.IsScriptCombinable(ScriptEntry scriptEntry) at AjaxControlToolkit.ToolkitScriptManager.OnResolveScriptReference(ScriptReferenceEventArgs e) at System.Web.UI.ScriptManager.RegisterScripts() at System.Web.UI.ScriptManager.OnPagePreRenderComplete(Object sender, EventArgs e) at System.Web.UI.Page.OnPreRenderComplete(EventArgs e) at System.Web.UI.Page.ProcessRequestMain(Boolean includeStagesBeforeAsyncPoint, Boolean includeStagesAfterAsyncPoint)
![Page 36: Practical Elasticsearch - real world use cases](https://reader033.fdocuments.net/reader033/viewer/2022042615/55c399a2bb61ebca718b4645/html5/thumbnails/36.jpg)
#7: Distributed git storage
• PoC in C# using libgit2sharp
• https://github.com/synhershko/libgit2sharp.Elasticsearch
• Kudos @nulltoken
![Page 37: Practical Elasticsearch - real world use cases](https://reader033.fdocuments.net/reader033/viewer/2022042615/55c399a2bb61ebca718b4645/html5/thumbnails/37.jpg)
Putting this to practice
• Search on your data
– Data doesn’t have to be structured to be queried
• Use your logs to gain insight
– Metrics
– Establish a baseline
– Investigate on unexpected / unfamiliar behaviors
![Page 38: Practical Elasticsearch - real world use cases](https://reader033.fdocuments.net/reader033/viewer/2022042615/55c399a2bb61ebca718b4645/html5/thumbnails/38.jpg)
Thank you.Questions?
Itamar Syn-Hershko
http://code972.com
@synhershko