EyeEm
Lars Fronius @LarsFronius
ElasticSearch in production at EyeEm
• H A P P Y G R U M P Y C AT O F E Y E E M
• S TA R T E D A S A N O P S I N A S C I E N T I F I C D ATA C E N T E R
• N O W D E V • D E V E L O P E R S H AT E M E
S O M E T I M E S
M E
A B O U T M E
EyeEm is the world’s premier community and marketplace for the photographer inside all of us
A P I S TA C K
• PHP
• MySQL (~10k commands per second)
• Memcached (~50k commands per second)
• Redis (~3k commands per second)
• S3 (~1k commands per second, 40m photos stored)
• Elasticsearch (~250 commands per second - elasticsearch-php)
• All writes are async
• Metrics everywhere
C U R R E N T C L U S T E R S P E C S
• 3 x m3.xlarge (4 cores, 15GiB Mem, 2 x 40GB SSD)
• cloud-aws plugin to interconnect.
• OpenJDK 1.6
• 60% heap size (9 GiB)
• 4 Indexes, 5 Shards each. From 1GB to 15GB
C U R R E N T P R O D U C T I O N U S E - C A S E S
C U R R E N T P R O D U C T I O N U S E - C A S E S
A L B U M S E A R C H
C U R R E N T P R O D U C T I O N U S E - C A S E S
P E O P L E S E A R C H
C U R R E N T P R O D U C T I O N U S E - C A S E S
• C I T Y- S E A R C H • L I V E N E A R B Y
D I S C O V E R
C U R R E N T P R O D U C T I O N U S E - C A S E S
L I V E N E A R B Y
C U R R E N T B E TA U S E - C A S E S
C U R R E N T B E TA U S E - C A S E S
L O N G S T O R Y
• MyISAM full-text search
• Album Search on one ElasticSearch node
• People Search added
• Scale-Out to 3 instances for Photo Search (+ Live Nearby)
E L A S T I C S E A R C H - I N T E R N A L S
• Index
• What your application sees.
• View for a logical namespace inside ElasticSearch.
• Consists of a fixed number of shards
• “To Index” means to “put” your data into ElasticSearch to make it available for search and for persistence.
E L A S T I C S E A R C H - I N T E R N A L S
• Inverted-Index/Mapping
• The Mapping tells Lucene how to create the inverted-index in order to make data searchable.
• e.g. “EyeEm” as an nGram{2,3} gets “indexed” as [“Ey”,”ye”,”eE”,”Em”,”Eye”,”yeE”,”eEm”], “yeah” would be [“ye”,”ah”,”yea”, “eah”]
E L A S T I C S E A R C H - I N T E R N A L S
• Inverted Index/Mapping by example
Ey 1
ye 1,2
eE 1
Em 1
Eye 1
yeE 1
eEm 1
ah 2
yea 2
eah 2
S C H E M A - L E S S O R W H AT ?
• Yes and No.
S C H E M A - L E S S O R W H AT ?
• Yes - You can put anything that can be formatted as a JSON in your index, and you get a readable document.
S C H E M A - L E S S O R W H AT ?
• No - you have to think first, because changing your Mapping is expensive, since you have to reindex.
E L A S T I C S E A R C H - I N T E R N A L S
• Shard
• Instance of Lucene
• Consists of multiple Lucene segments
• Manages segments (Merging, fsync, deletion etc.)
E L A S T I C S E A R C H - I N T E R N A L S
segments API http://example.es:9200/yourindex/_segments
indices: { eyephoto6: { shards: { 0: [! {! routing: {! state: "STARTED",! primary: true,! node: "PiVDZW-VRYmeaVOy7afoWQ"! },! num_committed_segments: 2,! num_search_segments: 3,! segments: {! _l: {! generation: 21,! num_docs: 13,! deleted_docs: 0,! size_in_bytes: 30810,! memory_in_bytes: 589,! committed: true,! search: true,! version: "4.7",! compound: true! },!!!!!!!
_m: {! generation: 22,! num_docs: 371,! deleted_docs: 16,! size_in_bytes: 408548,! memory_in_bytes: 7365,! committed: false,! search: true,! version: "4.7",! compound: false! },! _n: {! generation: 23,! num_docs: 16,! deleted_docs: 0,! size_in_bytes: 38514,! memory_in_bytes: 615,! committed: false,! search: true,! version: "4.7",! compound: true! }! }! }! ],! 1: [! {!
E L A S T I C S E A R C H - I N T E R N A L S
• Segments
• Managed by ElasticSearch
• Is the storage for the inverted index
E L A S T I C S E A R C H - I N T E R N A L S
• Basically ElasticSearch is a Lucene cluster manager and API
L E S S O N S L E A R N E D - S H A R D S /S E G M E N T S
• Deletion does only mark documents as deleted and does not delete them immediately.
• Updating a document does only create a new one and marks old one as deleted.
• The actual cleanup process happens in background and can result in nice performance surprises.
L E S S O N S L E A R N E D - S H A R D S /S E G M E N T S
• Nested documents live in the same Lucene Segment.
• Can bloat up memory usage a lot.
• They are treated as every other document.
• If you don’t necessarily always have to search in them, go for parent-child.
L E S S O N S L E A R N E D - E L A S T I C S E A R C H
• Start with more than one instance - just too simple
• Major upgrades are a pain (0.90 -> 1.1)
• PHP Client Libraries mostly do not handle connection pools properly, use elasticsearch-php
• ‘connectionPoolClass' => ‘\Elasticsearch\ConnectionPool\StaticConnectionPool'
• let an intermediate webserver handle it
L E S S O N S L E A R N E D - E L A S T I C S E A R C H
• You will index more than one time. Promise. Be prepared.
• Rebalancing is smooth, don’t worry.
• Have your metrics ready.
• “You can have a good time with ElasticSearch, if you don't ignore the complexity and internals of this distributed database.”
L E S S O N S L E A R N E D - E L A S T I C S E A R C H
L E S S O N S L E A R N E D - E L A S T I C S E A R C H
L E S S O N S L E A R N E D - I N D E X / M A P P I N G
• Different analysers should go into separate fields
• Score individually - iterative optimisations possible
• Keep a raw field
• Use dynamic_templates if you found the holy grail of field analysis.
• Filter first! Querying and scoring is expensive.
L E S S O N S L E A R N E D - I N D E X / M A P P I N G
L E S S O N S L E A R N E D - I N D E X / M A P P I N G
GET /eyephoto/_mapping!{! "eyephoto6": {! "mappings": {! "photo": {! "dynamic_templates": [! {! "string": {! "mapping": {! "type": "string",! "index_analyzer": "photo_names",! "search_analyzer": "photo_standard",! "fields": {! "raw": {! "type": "string",! "index": "not_analyzed"! },! "split": {! "type": "string",! "analyzer": "standard"! }! }! },! "match": "*",! "match_mapping_type": "string"! }! }! ]
• Different analysers should go into separate fields
L E S S O N S L E A R N E D - I N D E X / M A P P I N G
{! "took": 18,! "timed_out": false,! "_shards": {!##########! },! "hits": {! "total": 125,! "max_score": 6.44889,! "hits": [! {!#####! "_id": "167480",!#####! }! }! ]! },! "facets": {! "topic": {! "_type": "terms",! "missing": 0,! "total": 138,! "other": 57,! "terms": [! {! "term": "Coffee",! "count": 81! }! ]! }! }!}
• Different analysers should gointo separate fields
POST /eyephoto/photo/_search!{! "size": 1,! "fields": [! "id"! ],! "query": {! "multi_match": {! "query": "coff",! "fields": [! "topics"! ]! }! },! "facets": {! "topic": {! "terms": {! "field": "topics.raw",! "size": 1! }! }! }!}
L E S S O N S L E A R N E D - I N D E X / M A P P I N G
POST /eyephoto/photo/_search!{! "query": {! "bool": {! "should": [! {! "multi_match": {! "query": "lars",! "operator": "and",! "fields": [! “name.raw^3",! “name.split^2”,! “name"! ]! }! },! {! "multi_match": {! "query": "lars",! "fields": [! “name.raw^3”,! “name.split^2”,! “name”! ]! }! }! ]! }! }!
• Different analysers should go into separate fields
L E S S O N S L E A R N E D - I N D E X / M A P P I N G
• Read and write only to index aliases.
Index Name Index Aliases
eyephoto5 “eyephotoread”
eyephoto6 “eyephotowrite”
L E S S O N S L E A R N E D - I N D E X / M A P P I N G
• If you have a string or integer field, you can put an array into it as well.
Ey 1
ye 1,2
eE 1
Em 1
Eye 1
yeE 1
eEm 1
ah 2
yea 2
eah 2
L E S S O N S L E A R N E D - I N D E X / M A P P I N G• Use geohash wherever you query on lat/lng.
POST /eyephoto/photo/_search!{! "query": {! "function_score": {! "query": {! "filtered": {! "query": {! "match_all": []! },! "filter": {! "geohash_cell": {! "location": {! "lat": 52.5311,! "lon": 13.404! },! "precision": 4,! "neighbors": true! } } } },! "functions": [! {! "gauss": {! "location": {! "origin": "52.5311,13.404",! "scale": "10km"! }! }! },! {! "exp": {! "uploaded": {! "origin": "now",! "scale": "2d"! }! }! }! ]!
L E S S O N S L E A R N E D - A G G R E G AT I O N S• Aggregations give you recursive facets, handle with
care. "aggregations": {! “user_fullname": {! "filter": {! "query": {! "match": {! "topics": {! "query": "lars beer",! "operator": "or"! } } } },! "aggs": {! “user_fullname": {! "terms": {! "field": “user_fullname.raw”,! "size": 3! },! "aggs": {! “topics": {! "filter": {! "query": {! "match": {! “topics": {! "query": "lars beer",! "operator": "or"! } } } },! "aggs": {! “topics": {! "terms": {! "field": “topics.raw”,! "size": 3! }! }! }! },!
L E S S O N S L E A R N E D - A G G R E G AT I O N S• Aggregations give you recursive facets, handle with
care. "user_fullname": {! "doc_count": 678,! "user_fullname": {! "buckets": [! {! "key": "Lars 🍻 ",! "doc_count": 678,! "topics": {! "doc_count": 5,! "topics": {! "buckets": [! {! "key": "Beer",! "doc_count": 1! },! {! "key": "BeerOps",! "doc_count": 1! },! {! "key": "Birthday beer in the snow",! "doc_count": 1! }! ]! }! }! }! ]! }! }
O U T L O O K
O U T L O O K
• 1-liner search
• public release
• Localisation (snowball / stopwords)
• Keep indexed documents (e.g. albums) updated
N E X T I T E R AT I O N ( E VA L U AT I N G )
• Elasticsearch 1.1
• Oracle Java 1.8 (GC)
• more indexes and even more shards.
• restore API
46
WE ARE HIRING
Top Related