Scaling real-time search and analytics with Elasticsearch

193
Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited. Clinton Gormley @clintongormley Scaling real time search and analytics with elasticsearch

description

See the video here: https://www.youtube.com/watch?v=o6lSeNatVFM A look at the elements required by Elasticsearch to turn a simple inverted index into an auto-clustering, horizontally scalable real time search and analytics engine. The talk will start from first principles, explaining how an inverted index works, how to make an inverted index suitable for real time search, how to scale that out, and how to add reliability and failover to the cluster.

Transcript of Scaling real-time search and analytics with Elasticsearch

Page 1: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

Clinton Gormley @clintongormley

Scaling real time search and analytics with

elasticsearch

Page 2: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

Page 3: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

elasticsearch.org/guide

Page 4: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

elasticsearch

Page 5: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

elasticsearch• real-time

Page 6: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

elasticsearch• real-time • distributed

Page 7: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

elasticsearch• real-time • distributed • search

Page 8: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

elasticsearch• real-time • distributed • search • analytics

Page 9: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

how to use it?

Page 10: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

how to use it?

Page 11: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

how does it work?

Page 12: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

step 1:making text searchable

Page 13: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

Page 14: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

where content like “%darling%buds%”

Page 15: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

slow & inflexible

Page 16: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

Page 17: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

Term Doc  1 Doc  2 Doc  3breathebringsbudsbutbycan…damaskeddarlingdatedaydeafdeathdeclinesdelight

sorted list of unique terms

Page 18: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

Term Doc  1 Doc  2 Doc  3breathebringsbudsbutbycan…damaskeddarlingdatedaydeafdeathdeclinesdelight

where they occur

Page 19: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

Term Doc  1 Doc  2 Doc  3breathebringsbudsbutbycan…damaskeddarlingdatedaydeafdeathdeclinesdelight

Page 20: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

Term Doc  1 Doc  2 Doc  3breathebringsbudsbutbycan…damaskeddarlingdatedaydeafdeathdeclinesdelight

Page 21: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

inverted index

Page 22: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

inverted index• term frequencies

Page 23: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

inverted index• term frequencies » relevance

Page 24: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

inverted index• term frequencies • text length

» relevance

Page 25: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

inverted index• term frequencies • text length

» relevance » doc weight

Page 26: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

inverted index• term frequencies • text length • term positions

» relevance » doc weight

Page 27: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

inverted index• term frequencies • text length • term positions

» relevance » doc weight » word proximity

Page 28: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

inverted index• term frequencies • text length • term positions • char offsets

» relevance » doc weight » word proximity

Page 29: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

inverted index• term frequencies • text length • term positions • char offsets

» relevance » doc weight » word proximity » highlighting

Page 30: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

inverted indexnot just for text

Page 31: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

inverted indexnumbers, dates, bools, enums

geopoints, geoshapes, etc

Page 32: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

step 2:analytics

Page 33: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

for searchmap values → doc_ids

Page 34: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

for searchmap values → doc_ids

for analyticsmap doc_ids → values

Page 35: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

uninvert the index

Page 36: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

uninvert the indexcache values in memory

called “fielddata”

Page 37: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

uninvert the indexdata access from RAM

very fast

Page 38: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

on-the-fly analyticsin the context of

a user’s query

Page 39: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

on-the-fly analyticsrelevant analytics

for each user

Page 40: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

calculate metricscount, min, max, sum, avg,

percentiles, cardinality, stddev, variance, sum of squares

!

Page 41: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

grouped bypopular terms, significant terms, ranges, dates, geolocation, etc

Page 42: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

grouped bygroups can

… contain subgroups … which contain subgroups

etc

Page 43: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

step 3:building the inverted index

Page 44: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

inverted index

Page 45: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

immutable

Page 46: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

immutable• cache friendly

Page 47: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

immutable• cache friendly • reads from RAM

Page 48: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

immutable• cache friendly • reads from RAM • fielddata never changes

Page 49: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

immutable• cache friendly • reads from RAM • fielddata never changes • compressible

Page 50: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

immutable• cache friendly • reads from RAM • fielddata never changes • compressible • no locking

Page 51: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

but, immutable…

Page 52: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

step 4:dynamic inverted index

Page 53: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

in-memory buffer

commit

segment

Page 54: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

commit point

segment

searchable

Page 55: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

commit point

segment

searchable

commit

Page 56: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

commit point

segment

searchable

Page 57: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

commit point

segment

searchable

commit

Page 58: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

commit point

segment

searchable

Page 59: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

lucene commit

Page 60: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

lucene commit• write new segment

Page 61: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

lucene commit• write new segment • write new commit point

Page 62: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

lucene commit• write new segment • write new commit point • fsync

Page 63: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

lucene commit• write new segment • write new commit point • fsync • clear buffer

Page 64: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

lucene commit• write new segment • write new commit point • fsync • clear buffer • reopen index

Page 65: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

lucene commit• write new segment • write new commit point • fsync ← expensive! • clear buffer • reopen index

Page 66: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

step 5:near real-time search

Page 67: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

in-memory buffer

flush

segment

Page 68: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

segment

searchable

Page 69: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

segment

searchable

flush

Page 70: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

segment

searchable

Page 71: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

segment

searchable

flush

Page 72: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

segment

searchable

Page 73: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

segment

searchablecommit

Page 74: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

commit point

segment

searchable

Page 75: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

lucene flush

Page 76: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

lucene flush• write new segment • clear buffer • reopen index !

Page 77: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

lucene flush• write new segment • clear buffer • reopen index • no fsync

Page 78: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

lucene flush• write new segment • clear buffer • reopen index • no fsync → lightweight

Page 79: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

but…data not safe until fsync’ed!

Page 80: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

step 6:don’t lose data

Page 81: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

step 6:don’t lose data → transaction log

Page 82: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

in-memory buffer

flush

segment

translog

Page 83: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

segment

searchable

translog

Page 84: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

segment

searchable

flush

translog

Page 85: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

segment

searchable

translog

Page 86: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

segment

searchable

translog

commit

Page 87: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

segment

searchable

translog

commit point

Page 88: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

elasticsearch “refresh”• lucene “flush” • makes changes searchable • lightweight !

Page 89: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

• lucene “commit” • clears transaction log • persists changes • heavy !

elasticsearch “flush”

Page 90: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

refresh every second

Page 91: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

near real-time search!

Page 92: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

near real-time search!near real-time analytics!

Page 93: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

but…

Page 94: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

• slow searches • poor term frequencies • poor compression !

!

too many segments

Page 95: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

step 7:reduce segments

Page 96: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

searchable

Page 97: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

searchable

Page 98: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

searchable

Page 99: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

searchable

Page 100: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

searchable

Page 101: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

searchable

Page 102: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

searchable

Page 103: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

searchable

Page 104: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

searchable

Page 105: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

searchable

Page 106: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

searchable

Page 107: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

searchable

Page 108: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

merge process• many small → one big • removes deleted docs • runs in background • throttled

Page 109: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

but…

Page 110: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.“Any wonder it broke down” by Brian Snelson is licensed under CC BY 2.0

Page 111: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

sometimes you need another truck

Page 112: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

step 8:scale out, not up

Page 113: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

shard your data

Page 114: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

shard your datatransparent in elasticsearch

Page 115: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

many segments

Page 116: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

one shard

ss

many segments →

Page 117: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

one shard

ss

many segments

ssssssss

many shards

ss

Page 118: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

one shard

ss

many segments

one index

IIssssssss

many shards

ss

Page 119: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

“node”running instance of elasticsearch

≈ one server

Page 120: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

“shard”bucket of data

lives on one nodephysical worker unit

Page 121: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

“index”logical namespace

points to one or more shards

Page 122: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

“index”logical namespace

points to one or more shards

shard = hash(_id) % no_of_shards

Page 123: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

PUT doc _id:1

hash(1) % 3 ⇒ shard_2

node_A

shard_0

node_B

shard_1

node_C

shard_2

Page 124: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

GET doc _id:2

hash(2) % 3 ⇒ shard_0

node_A

shard_0

node_B

shard_1

node_C

shard_2

Page 125: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

Search all docs

shard = hash(_id) % no_of_shards

node_A

shard_0

node_B

shard_1

node_C

shard_2

Page 126: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

step 9:scaling elastically

Page 127: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

start smallnode_A

shard_0

shard_1

shard_2

Page 128: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

add more nodesnode_A

shard_0

shard_1

shard_2

node_B node_C

Page 129: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

shards migratenode_A

shard_0

shard_1

shard_2

node_B

shard_1

node_C

shard_2

Page 130: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

rebalancednode_A

shard_0

node_B

shard_1

node_C

shard_2

Page 131: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

add new indexnode_A

shard_0

shard_1

node_B

shard_1

shard_2

node_C

shard_0

shard_2

Page 132: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

but…

Page 133: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

but…more hardware?

Page 134: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

but…more hardware?

more hardware failure

Page 135: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

at 3am on sunday…node_A

shard_0

shard_1

node_B

shard_1

shard_2

node_C

shard_0

shard_2

Page 136: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

boom!node_A

shard_0

shard_1

node_B

shard_1

shard_2

node_C

shard_0

shard_2

Page 137: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

step 10:add redundancy

Page 138: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

for every shard…make a copy

Page 139: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

“primary shard”main shard

Page 140: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

“replica shard(s)”copy of primary shard

Page 141: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

one nodenode_A

P0

P1

P2

Page 142: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

add a nodenode_A

P0

P1

P2

node_B

Page 143: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

add a nodenode_A

P0

P1

P2

node_B

R0

R1

R2

Page 144: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

redundancynode_A

P0

P1

P2

node_B

R0

R1

R2

Page 145: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

add a nodenode_A

P0

P1

P2

node_B

R0

R1

R2

node_C

Page 146: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

add a nodenode_A

P0

P1

P2

node_B

R0

R1

R2

R1

node_C

P0

Page 147: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

rebalancednode_A

P0

P1

P2

node_B

R0

R1

R2

node_C

P0

Page 148: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

lose a nodenode_A

P0

P1

P2

node_B

R0

R2

R1

node_C

P0

Page 149: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

replica ⇒ primarynode_A

P0

P1

P2

node_B

R0

R2

Page 150: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

replica ⇒ primarynode_A

P0

P1

P2

node_B

P0

R2

Page 151: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

allocate replicasnode_A

P0

P1

P2

node_B

P0

R2

R0

R1

Page 152: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

rebalancednode_A

P0

P1

P2

node_B

P0

R2

R0

R1

Page 153: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

primary shard• just a role • receives doc changes first • forwards new doc to replicas in parallel • number of primaries fixed

Page 154: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

replica shard• copy of primary shard • serves read/search requests • number of replicas can be changed • more replicas → more read throughput

*if you have more hardware*

Page 155: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

but…

Page 156: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

but…who controls all this?

Page 157: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

step 11:the master node

Page 158: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.“Master Yoda” by Gonzalo Martín is licensed under CC BY-SA 2.0

Page 159: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

“node”running instance of elastic search

Page 160: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

“node”running instance of elastic search

node_A

Page 161: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

“cluster”one or more nodes

with same cluster name working together

Page 162: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

“cluster”

node_A node_B

Page 163: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

node_A node_B node_C

discover a clusterwith multicast/unicast

Page 164: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

node_A node_B node_C

discover a clusterwith multicast/unicast

Page 165: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

node_A node_B node_C

request routingsend request to any node

Page 166: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

node_A node_B node_C

request routingforwards to correct node

Page 167: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

how?

Page 168: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

how?every node knows where

every document is

Page 169: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

cluster stateevery node knows where

every document is

Page 170: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

cluster statecluster level information

Page 171: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

cluster statecluster level information

indices ⇔ shards ⇔ nodes

Page 172: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

cluster statecan only be updated by

the master node

Page 173: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

node_A

master nodeelected when cluster forms

Page 174: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

node_A node_B

master node

Page 175: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

node_A node_B

master node

node_C

Page 176: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

node_A node_B node_C

master node

Page 177: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

node_A node_B node_C

master nodejust a role

Page 178: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

node_A node_B node_C

master nodere-elected if master fails

Page 179: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

node_B node_C

master node

node_A

re-elected if master fails

Page 180: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

node_B node_C

master nodere-elected if master fails

Page 181: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

master nodeonly manages

cluster level changes

Page 182: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

master nodenot doc-level

get/put/search

Page 183: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

the result?

Page 184: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

distributed real-time

search & analytics

Page 185: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

which works in the same way on your laptop…

Page 186: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

…as on your 1,000 node cluster

Page 187: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

who is using it?

Page 188: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

• full text search • highlighted search snippets • search-as-you-type • did-you-mean suggestions

Page 189: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

• combine visitor logs with social network data

• real-time feedback to editors

Page 190: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

• combines full text search with geolocation

• uses more-like-this to find related questions and answers

Page 191: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

• search repositories, users, issues, pull requests

• search 130 billion lines of code • track all alerts, events, logs

Page 192: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

• index and analyse 5TB of log data every day

Page 193: Scaling real-time search and analytics with Elasticsearch

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited.

thank you@clintongormley

elasticsearch.org/downloadselasticsearch.com/support

elasticsearch.com/jobs