Elasticsearch at EyeEm

45
EyeEm Lars Fronius @LarsFronius ElasticSearch in production at EyeEm

description

ElasticSearch in production at EyeEm use-cases and pitfalls

Transcript of Elasticsearch at EyeEm

Page 1: Elasticsearch at EyeEm

EyeEm

Lars Fronius @LarsFronius

ElasticSearch in production at EyeEm

Page 2: Elasticsearch at EyeEm

• H A P P Y G R U M P Y C AT O F E Y E E M

• S TA R T E D A S A N O P S I N A S C I E N T I F I C D ATA C E N T E R

• N O W D E V • D E V E L O P E R S H AT E M E

S O M E T I M E S

M E

A B O U T M E

Page 3: Elasticsearch at EyeEm

EyeEm is the world’s premier community and marketplace for the photographer inside all of us

Page 4: Elasticsearch at EyeEm
Page 5: Elasticsearch at EyeEm
Page 6: Elasticsearch at EyeEm

A P I S TA C K

• PHP

• MySQL (~10k commands per second)

• Memcached (~50k commands per second)

• Redis (~3k commands per second)

• S3 (~1k commands per second, 40m photos stored)

• Elasticsearch (~250 commands per second - elasticsearch-php)

• All writes are async

• Metrics everywhere

Page 7: Elasticsearch at EyeEm

C U R R E N T C L U S T E R S P E C S

• 3 x m3.xlarge (4 cores, 15GiB Mem, 2 x 40GB SSD)

• cloud-aws plugin to interconnect.

• OpenJDK 1.6

• 60% heap size (9 GiB)

• 4 Indexes, 5 Shards each. From 1GB to 15GB

Page 8: Elasticsearch at EyeEm

C U R R E N T P R O D U C T I O N U S E - C A S E S

Page 9: Elasticsearch at EyeEm

C U R R E N T P R O D U C T I O N U S E - C A S E S

A L B U M S E A R C H

Page 10: Elasticsearch at EyeEm

C U R R E N T P R O D U C T I O N U S E - C A S E S

P E O P L E S E A R C H

Page 11: Elasticsearch at EyeEm

C U R R E N T P R O D U C T I O N U S E - C A S E S

• C I T Y- S E A R C H • L I V E N E A R B Y

D I S C O V E R

Page 12: Elasticsearch at EyeEm

C U R R E N T P R O D U C T I O N U S E - C A S E S

L I V E N E A R B Y

Page 13: Elasticsearch at EyeEm

C U R R E N T B E TA U S E - C A S E S

Page 14: Elasticsearch at EyeEm

C U R R E N T B E TA U S E - C A S E S

Page 15: Elasticsearch at EyeEm

L O N G S T O R Y

• MyISAM full-text search

• Album Search on one ElasticSearch node

• People Search added

• Scale-Out to 3 instances for Photo Search (+ Live Nearby)

Page 16: Elasticsearch at EyeEm

E L A S T I C S E A R C H - I N T E R N A L S

• Index

• What your application sees.

• View for a logical namespace inside ElasticSearch.

• Consists of a fixed number of shards

• “To Index” means to “put” your data into ElasticSearch to make it available for search and for persistence.

Page 17: Elasticsearch at EyeEm

E L A S T I C S E A R C H - I N T E R N A L S

• Inverted-Index/Mapping

• The Mapping tells Lucene how to create the inverted-index in order to make data searchable.

• e.g. “EyeEm” as an nGram{2,3} gets “indexed” as [“Ey”,”ye”,”eE”,”Em”,”Eye”,”yeE”,”eEm”], “yeah” would be [“ye”,”ah”,”yea”, “eah”]

Page 18: Elasticsearch at EyeEm

E L A S T I C S E A R C H - I N T E R N A L S

• Inverted Index/Mapping by example

Ey 1

ye 1,2

eE 1

Em 1

Eye 1

yeE 1

eEm 1

ah 2

yea 2

eah 2

Page 19: Elasticsearch at EyeEm

S C H E M A - L E S S O R W H AT ?

• Yes and No.

Page 20: Elasticsearch at EyeEm

S C H E M A - L E S S O R W H AT ?

• Yes - You can put anything that can be formatted as a JSON in your index, and you get a readable document.

Page 21: Elasticsearch at EyeEm

S C H E M A - L E S S O R W H AT ?

• No - you have to think first, because changing your Mapping is expensive, since you have to reindex.

Page 22: Elasticsearch at EyeEm

E L A S T I C S E A R C H - I N T E R N A L S

• Shard

• Instance of Lucene

• Consists of multiple Lucene segments

• Manages segments (Merging, fsync, deletion etc.)

Page 23: Elasticsearch at EyeEm

E L A S T I C S E A R C H - I N T E R N A L S

segments API http://example.es:9200/yourindex/_segments

indices: { eyephoto6: { shards: { 0: [! {! routing: {! state: "STARTED",! primary: true,! node: "PiVDZW-VRYmeaVOy7afoWQ"! },! num_committed_segments: 2,! num_search_segments: 3,! segments: {! _l: {! generation: 21,! num_docs: 13,! deleted_docs: 0,! size_in_bytes: 30810,! memory_in_bytes: 589,! committed: true,! search: true,! version: "4.7",! compound: true! },!!!!!!!

_m: {! generation: 22,! num_docs: 371,! deleted_docs: 16,! size_in_bytes: 408548,! memory_in_bytes: 7365,! committed: false,! search: true,! version: "4.7",! compound: false! },! _n: {! generation: 23,! num_docs: 16,! deleted_docs: 0,! size_in_bytes: 38514,! memory_in_bytes: 615,! committed: false,! search: true,! version: "4.7",! compound: true! }! }! }! ],! 1: [! {!

Page 24: Elasticsearch at EyeEm

E L A S T I C S E A R C H - I N T E R N A L S

• Segments

• Managed by ElasticSearch

• Is the storage for the inverted index

Page 25: Elasticsearch at EyeEm

E L A S T I C S E A R C H - I N T E R N A L S

• Basically ElasticSearch is a Lucene cluster manager and API

Page 26: Elasticsearch at EyeEm

L E S S O N S L E A R N E D - S H A R D S /S E G M E N T S

• Deletion does only mark documents as deleted and does not delete them immediately.

• Updating a document does only create a new one and marks old one as deleted.

• The actual cleanup process happens in background and can result in nice performance surprises.

Page 27: Elasticsearch at EyeEm

L E S S O N S L E A R N E D - S H A R D S /S E G M E N T S

• Nested documents live in the same Lucene Segment.

• Can bloat up memory usage a lot.

• They are treated as every other document.

• If you don’t necessarily always have to search in them, go for parent-child.

Page 28: Elasticsearch at EyeEm

L E S S O N S L E A R N E D - E L A S T I C S E A R C H

• Start with more than one instance - just too simple

• Major upgrades are a pain (0.90 -> 1.1)

• PHP Client Libraries mostly do not handle connection pools properly, use elasticsearch-php

• ‘connectionPoolClass' => ‘\Elasticsearch\ConnectionPool\StaticConnectionPool'

• let an intermediate webserver handle it

Page 29: Elasticsearch at EyeEm

L E S S O N S L E A R N E D - E L A S T I C S E A R C H

• You will index more than one time. Promise. Be prepared.

• Rebalancing is smooth, don’t worry.

• Have your metrics ready.

• “You can have a good time with ElasticSearch, if you don't ignore the complexity and internals of this distributed database.”

Page 30: Elasticsearch at EyeEm

L E S S O N S L E A R N E D - E L A S T I C S E A R C H

Page 31: Elasticsearch at EyeEm

L E S S O N S L E A R N E D - E L A S T I C S E A R C H

Page 32: Elasticsearch at EyeEm

L E S S O N S L E A R N E D - I N D E X / M A P P I N G

• Different analysers should go into separate fields

• Score individually - iterative optimisations possible

• Keep a raw field

• Use dynamic_templates if you found the holy grail of field analysis.

• Filter first! Querying and scoring is expensive.

Page 33: Elasticsearch at EyeEm

L E S S O N S L E A R N E D - I N D E X / M A P P I N G

Page 34: Elasticsearch at EyeEm

L E S S O N S L E A R N E D - I N D E X / M A P P I N G

GET /eyephoto/_mapping!{! "eyephoto6": {! "mappings": {! "photo": {! "dynamic_templates": [! {! "string": {! "mapping": {! "type": "string",! "index_analyzer": "photo_names",! "search_analyzer": "photo_standard",! "fields": {! "raw": {! "type": "string",! "index": "not_analyzed"! },! "split": {! "type": "string",! "analyzer": "standard"! }! }! },! "match": "*",! "match_mapping_type": "string"! }! }! ]

• Different analysers should go into separate fields

Page 35: Elasticsearch at EyeEm

L E S S O N S L E A R N E D - I N D E X / M A P P I N G

{! "took": 18,! "timed_out": false,! "_shards": {!##########! },! "hits": {! "total": 125,! "max_score": 6.44889,! "hits": [! {!#####! "_id": "167480",!#####! }! }! ]! },! "facets": {! "topic": {! "_type": "terms",! "missing": 0,! "total": 138,! "other": 57,! "terms": [! {! "term": "Coffee",! "count": 81! }! ]! }! }!}

• Different analysers should gointo separate fields

POST /eyephoto/photo/_search!{! "size": 1,! "fields": [! "id"! ],! "query": {! "multi_match": {! "query": "coff",! "fields": [! "topics"! ]! }! },! "facets": {! "topic": {! "terms": {! "field": "topics.raw",! "size": 1! }! }! }!}

Page 36: Elasticsearch at EyeEm

L E S S O N S L E A R N E D - I N D E X / M A P P I N G

POST /eyephoto/photo/_search!{! "query": {! "bool": {! "should": [! {! "multi_match": {! "query": "lars",! "operator": "and",! "fields": [! “name.raw^3",! “name.split^2”,! “name"! ]! }! },! {! "multi_match": {! "query": "lars",! "fields": [! “name.raw^3”,! “name.split^2”,! “name”! ]! }! }! ]! }! }!

• Different analysers should go into separate fields

Page 37: Elasticsearch at EyeEm

L E S S O N S L E A R N E D - I N D E X / M A P P I N G

• Read and write only to index aliases.

Index Name Index Aliases

eyephoto5 “eyephotoread”

eyephoto6 “eyephotowrite”

Page 38: Elasticsearch at EyeEm

L E S S O N S L E A R N E D - I N D E X / M A P P I N G

• If you have a string or integer field, you can put an array into it as well.

Ey 1

ye 1,2

eE 1

Em 1

Eye 1

yeE 1

eEm 1

ah 2

yea 2

eah 2

Page 39: Elasticsearch at EyeEm

L E S S O N S L E A R N E D - I N D E X / M A P P I N G• Use geohash wherever you query on lat/lng.

POST /eyephoto/photo/_search!{! "query": {! "function_score": {! "query": {! "filtered": {! "query": {! "match_all": []! },! "filter": {! "geohash_cell": {! "location": {! "lat": 52.5311,! "lon": 13.404! },! "precision": 4,! "neighbors": true! } } } },! "functions": [! {! "gauss": {! "location": {! "origin": "52.5311,13.404",! "scale": "10km"! }! }! },! {! "exp": {! "uploaded": {! "origin": "now",! "scale": "2d"! }! }! }! ]!

Page 40: Elasticsearch at EyeEm

L E S S O N S L E A R N E D - A G G R E G AT I O N S• Aggregations give you recursive facets, handle with

care. "aggregations": {! “user_fullname": {! "filter": {! "query": {! "match": {! "topics": {! "query": "lars beer",! "operator": "or"! } } } },! "aggs": {! “user_fullname": {! "terms": {! "field": “user_fullname.raw”,! "size": 3! },! "aggs": {! “topics": {! "filter": {! "query": {! "match": {! “topics": {! "query": "lars beer",! "operator": "or"! } } } },! "aggs": {! “topics": {! "terms": {! "field": “topics.raw”,! "size": 3! }! }! }! },!

Page 41: Elasticsearch at EyeEm

L E S S O N S L E A R N E D - A G G R E G AT I O N S• Aggregations give you recursive facets, handle with

care. "user_fullname": {! "doc_count": 678,! "user_fullname": {! "buckets": [! {! "key": "Lars 🍻 ",! "doc_count": 678,! "topics": {! "doc_count": 5,! "topics": {! "buckets": [! {! "key": "Beer",! "doc_count": 1! },! {! "key": "BeerOps",! "doc_count": 1! },! {! "key": "Birthday beer in the snow",! "doc_count": 1! }! ]! }! }! }! ]! }! }

Page 42: Elasticsearch at EyeEm

O U T L O O K

Page 43: Elasticsearch at EyeEm

O U T L O O K

• 1-liner search

• public release

• Localisation (snowball / stopwords)

• Keep indexed documents (e.g. albums) updated

Page 44: Elasticsearch at EyeEm

N E X T I T E R AT I O N ( E VA L U AT I N G )

• Elasticsearch 1.1

• Oracle Java 1.8 (GC)

• more indexes and even more shards.

• restore API

Page 45: Elasticsearch at EyeEm

46

WE ARE HIRING