99 problems but the search aint one, confoo 2011, andrei zmievski

64
99 Problems, But The Search Ain’t One Andrei Zmievski • ConFoo • Mar 9, 2011

description

 

Transcript of 99 problems but the search aint one, confoo 2011, andrei zmievski

Page 1: 99 problems but the search aint one, confoo 2011, andrei zmievski

99 Problems, ButThe Search Ain’t OneAndrei Zmievski • ConFoo •!Mar 9, 2011

Page 2: 99 problems but the search aint one, confoo 2011, andrei zmievski

who am i?

curl http://localhost:9200/speaker/info/andrei

{“name”: “Andrei Zmievski”, “works”: “Analog Co-op”, “projects”: [“PHP”, “PHP-GTK”, “Smarty”, “Unicode/i18n”], “likes”: [“coding”, “beer”, “brewing”, “photography”], “twitter”: “@a”, “email”: “[email protected]”}

Page 3: 99 problems but the search aint one, confoo 2011, andrei zmievski

what is elasticsearch?

a search engine for the NoSQL generation

domain-driven

distributed

RESTful

Hitchhiker’s Guide to the Galaxy (no, really)

Page 4: 99 problems but the search aint one, confoo 2011, andrei zmievski

document model

document-oriented

JSON-based

schema-free

Page 5: 99 problems but the search aint one, confoo 2011, andrei zmievski

based on Lucene

multi-tenancy

distributed, out of the box

engine

Page 6: 99 problems but the search aint one, confoo 2011, andrei zmievski

3 easy steps

Page 7: 99 problems but the search aint one, confoo 2011, andrei zmievski

1. index!"#$%&'()*+%,--./00$1!2$,13-/45660!17803.92:9#0;%&<=

>

%%%%?72@9?/%?A7<#9B%C@B9D3:B?E

%%%%?-2$:?/%?44%(#1F$9@3E%F"-%-,9%*92#!,%AB7==-%)79?E

%%%%?$B:93?/%G?!1<B7H?E%?F99#?E%?.,1-1H#2.,I?JE

%%%%?-KB--9#?/%?2?E

%%%%?,9BH,-?/%;LM

N=

requ

est

>

%%%%?1:?/-#"9

%%%%?OB7<9P?/?!178?

%%%%?O-I.9?/?3.92:9#?

%%%%?OB<?/?;?

Nresp

onse

Page 8: 99 problems but the search aint one, confoo 2011, andrei zmievski

2. search!"#$%,--./00$1!2$,13-/45660!17803.92:9#0O392#!,QRSF99#

requ

est

>%?-11:?%/%TE

%%?O3,2#<3?%/%>

%%%%?-1-2$?%/%;E

%%%%?3"!!9338"$?%/%;E

%%%%?82B$9<?%/%6

%%NE

%%?,B-3?%/%>

%%%%?-1-2$?%/%;E

%%%%?@2PO3!1#9?%/%6UV46LM64E

%%%%?,B-3?%/%G%>

%%%%%%?OB7<9P?%/%?!178?E

%%%%%%?O-I.9?%/%?3.92:9#?E

%%%%%%?OB<?%/%?;?E

%%%%%%?O3!1#9?%/%6UV46LM64E

%%%%%%?O31"#!9?%/%

>

%%%%?72@9?/%?A7<#9B%C@B9D3:B?E

%%%%?-2$:?/%?44%(#1F$9@3E%F"-%-,9%*92#!,%AB7=-%)79?E

%%%%?$B:93?/%G?!1<B7H?E%?F99#?E%?.,1-1H#2.,I?JE

%%%%?-KB--9#?/%?2?E

%%%%?,9BH,-?/%;LM

N%N%J%N%N

resp

onse

Page 9: 99 problems but the search aint one, confoo 2011, andrei zmievski

2. search!"#$%,--./00$1!2$,13-/45660!17803.92:9#0O392#!,QRSF99#

requ

est

>%?-11:?%/%TE

%%?O3,2#<3?%/%>

%%%%?-1-2$?%/%;E

%%%%?3"!!9338"$?%/%;E

%%%%?82B$9<?%/%6

%%NE

%%?,B-3?%/%>

!!!!"#$#%&"!'!()%%%%?@2PO3!1#9?%/%6UV46LM64E

%%%%?,B-3?%/%G%>

%%%%%%?OB7<9P?%/%?!178?E

%%%%%%?O-I.9?%/%?3.92:9#?E

%%%%%%?OB<?%/%?;?E

%%%%%%?O3!1#9?%/%6UV46LM64E

%%%%%%?O31"#!9?%/%

>

%%%%?72@9?/%?A7<#9B%C@B9D3:B?E

%%%%?-2$:?/%?44%(#1F$9@3E%F"-%-,9%*92#!,%AB7=-%)79?E

%%%%?$B:93?/%G?!1<B7H?E%?F99#?E%?.,1-1H#2.,I?JE

%%%%?-KB--9#?/%?2?E

%%%%?,9BH,-?/%;LM

N%N%J%N%N

resp

onse

total number of hits

Page 10: 99 problems but the search aint one, confoo 2011, andrei zmievski

2. search!"#$%,--./00$1!2$,13-/45660!17803.92:9#0O392#!,QRSF99#

requ

est

>%?-11:?%/%TE

%%?O3,2#<3?%/%>

%%%%?-1-2$?%/%;E

%%%%?3"!!9338"$?%/%;E

%%%%?82B$9<?%/%6

%%NE

%%?,B-3?%/%>

%%%%?-1-2$?%/%;E

%%%%?@2PO3!1#9?%/%6UV46LM64E

%%%%?,B-3?%/%G%>

!!!!!!"*+,-./"!'!"0$,1")%%%%%%?O-I.9?%/%?3.92:9#?E

%%%%%%?OB<?%/%?;?E

%%%%%%?O3!1#9?%/%6UV46LM64E

%%%%%%?O31"#!9?%/%

>

%%%%?72@9?/%?A7<#9B%C@B9D3:B?E

%%%%?-2$:?/%?44%(#1F$9@3E%F"-%-,9%*92#!,%AB7=-%)79?E

%%%%?$B:93?/%G?!1<B7H?E%?F99#?E%?.,1-1H#2.,I?JE

%%%%?-KB--9#?/%?2?E

%%%%?,9BH,-?/%;LM

N%N%J%N%N

resp

onse the index of the doc

Page 11: 99 problems but the search aint one, confoo 2011, andrei zmievski

2. search!"#$%,--./00$1!2$,13-/45660!17803.92:9#0O392#!,QRSF99#

requ

est

>%?-11:?%/%TE

%%?O3,2#<3?%/%>

%%%%?-1-2$?%/%;E

%%%%?3"!!9338"$?%/%;E

%%%%?82B$9<?%/%6

%%NE

%%?,B-3?%/%>

%%%%?-1-2$?%/%;E

%%%%?@2PO3!1#9?%/%6UV46LM64E

%%%%?,B-3?%/%G%>

%%%%%%?OB7<9P?%/%?!178?E

!!!!!!"*#23."!'!"43.%5.6")%%%%%%?OB<?%/%?;?E

%%%%%%?O3!1#9?%/%6UV46LM64E

%%%%%%?O31"#!9?%/%

>

%%%%?72@9?/%?A7<#9B%C@B9D3:B?E

%%%%?-2$:?/%?44%(#1F$9@3E%F"-%-,9%*92#!,%AB7=-%)79?E

%%%%?$B:93?/%G?!1<B7H?E%?F99#?E%?.,1-1H#2.,I?JE

%%%%?-KB--9#?/%?2?E

%%%%?,9BH,-?/%;LM

N%N%J%N%N

resp

onse the type of the doc

Page 12: 99 problems but the search aint one, confoo 2011, andrei zmievski

2. search!"#$%,--./00$1!2$,13-/45660!17803.92:9#0O392#!,QRSF99#

requ

est

>%?-11:?%/%TE

%%?O3,2#<3?%/%>

%%%%?-1-2$?%/%;E

%%%%?3"!!9338"$?%/%;E

%%%%?82B$9<?%/%6

%%NE

%%?,B-3?%/%>

%%%%?-1-2$?%/%;E

%%%%?@2PO3!1#9?%/%6UV46LM64E

%%%%?,B-3?%/%G%>

%%%%%%?OB7<9P?%/%?!178?E

%%%%%%?O-I.9?%/%?3.92:9#?E

!!!!!!"*+-"!'!"(")%%%%%%?O3!1#9?%/%6UV46LM64E

%%%%%%?O31"#!9?%/%

>

%%%%?72@9?/%?A7<#9B%C@B9D3:B?E

%%%%?-2$:?/%?44%(#1F$9@3E%F"-%-,9%*92#!,%AB7=-%)79?E

%%%%?$B:93?/%G?!1<B7H?E%?F99#?E%?.,1-1H#2.,I?JE

%%%%?-KB--9#?/%?2?E

%%%%?,9BH,-?/%;LM

N%N%J%N%N

resp

onse

the id of the doc

Page 13: 99 problems but the search aint one, confoo 2011, andrei zmievski

2. search!"#$%,--./00$1!2$,13-/45660!17803.92:9#0O392#!,QRSF99#

requ

est

>%?-11:?%/%TE

%%?O3,2#<3?%/%>

%%%%?-1-2$?%/%;E

%%%%?3"!!9338"$?%/%;E

%%%%?82B$9<?%/%6

%%NE

%%?,B-3?%/%>

%%%%?-1-2$?%/%;E

%%%%?@2PO3!1#9?%/%6UV46LM64E

%%%%?,B-3?%/%G%>

%%%%%%?OB7<9P?%/%?!178?E

%%%%%%?O-I.9?%/%?3.92:9#?E

!!!!!!"*+-"!'!"(")%%%%%%?O3!1#9?%/%6UV46LM64E

%%%%%%?O31"#!9?%/%

>

%%%%?72@9?/%?A7<#9B%C@B9D3:B?E

%%%%?-2$:?/%?44%(#1F$9@3E%F"-%-,9%*92#!,%AB7=-%)79?E

%%%%?$B:93?/%G?!1<B7H?E%?F99#?E%?.,1-1H#2.,I?JE

%%%%?-KB--9#?/%?2?E

%%%%?,9BH,-?/%;LM

N%N%J%N%N

resp

onse

the hit score

Page 14: 99 problems but the search aint one, confoo 2011, andrei zmievski

2. search!"#$%,--./00$1!2$,13-/45660!17803.92:9#0O392#!,QRSF99#

requ

est

>%?-11:?%/%TE

%%?O3,2#<3?%/%>

%%%%?-1-2$?%/%;E

%%%%?3"!!9338"$?%/%;E

%%%%?82B$9<?%/%6

%%NE

%%?,B-3?%/%>

%%%%?-1-2$?%/%;E

%%%%?@2PO3!1#9?%/%6UV46LM64E

%%%%?,B-3?%/%G%>

%%%%%%?OB7<9P?%/%?!178?E

%%%%%%?O-I.9?%/%?3.92:9#?E

!!!!!!"*+-"!'!"(")%%%%%%?O3!1#9?%/%6UV46LM64E

%%%%%%?O31"#!9?%/%

7!!!!",%8."'!"9,-6.+!:8+.;45+")!!!!"#%&5"'!"<<!=6$>&.84)!>?#!#@.!A.%60@!9+,B#!C,.")!!!!"&+5.4"'!D"0$-+,E")!">..6")!"3@$#$E6%3@2"F)!!!!"#G+##.6"'!"%")!!!!"@.+E@#"'!(HIJ%N%J%N%N

resp

onse

the original doc contents

Page 15: 99 problems but the search aint one, confoo 2011, andrei zmievski

2. search!"#$%,--./00$1!2$,13-/45660!17803.92:9#0O392#!,QRSF99#

requ

est

>%"#$$5"!'!K)%%?O3,2#<3?%/%>

%%%%?-1-2$?%/%;E

%%%%?3"!!9338"$?%/%;E

%%%%?82B$9<?%/%6

%%NE

%%?,B-3?%/%>

%%%%?-1-2$?%/%;E

%%%%?@2PO3!1#9?%/%6UV46LM64E

%%%%?,B-3?%/%G%>

%%%%%%?OB7<9P?%/%?!178?E

%%%%%%?O-I.9?%/%?3.92:9#?E

%%%%%%?OB<?%/%?;?E

%%%%%%?O3!1#9?%/%6UV46LM64E

%%%%%%?O31"#!9?%/%

>

%%%%?72@9?/%?A7<#9B%C@B9D3:B?E

%%%%?-2$:?/%?44%(#1F$9@3E%F"-%-,9%*92#!,%AB7=-%)79?E

%%%%?$B:93?/%G?!1<B7H?E%?F99#?E%?.,1-1H#2.,I?JE

%%%%?-KB--9#?/%?2?E

%%%%?,9BH,-?/%;LM

N%N%J%N%N

resp

onse

the execution time

Page 16: 99 problems but the search aint one, confoo 2011, andrei zmievski

3. profit

that’s up to you

Page 17: 99 problems but the search aint one, confoo 2011, andrei zmievski

demo

Page 18: 99 problems but the search aint one, confoo 2011, andrei zmievski

distributed model

provides:

performance

resiliency (high-availability)

Page 19: 99 problems but the search aint one, confoo 2011, andrei zmievski

shards

a portion of the document space

each one is a separate Lucene index

thus, many per-index settings are available

document is sharded by its _id value

but can be assigned (routed) to a shard deterministically

Page 20: 99 problems but the search aint one, confoo 2011, andrei zmievski

zero-conf discovery

zen (multicast and unicast)

cloud (EC2 via API)

Page 21: 99 problems but the search aint one, confoo 2011, andrei zmievski

auto-routing

master node:

maintains cluster state

reassigns shards if nodes leave/join cluster

any node can process the search request

the query is handled via scatter-gather mechanism

Page 22: 99 problems but the search aint one, confoo 2011, andrei zmievski

replicas

each shard can have 1 or more replicas

# of replicas can be updated dynamically after index creation

replicas can be used for querying in parallel

Page 23: 99 problems but the search aint one, confoo 2011, andrei zmievski

shard allocationnode 1

start with a single node

Page 24: 99 problems but the search aint one, confoo 2011, andrei zmievski

shard allocation

PUT /person { “index”: { “number_of_shards”: 2, “number_of_replicas”: 1}}

node 1person1person2

Page 25: 99 problems but the search aint one, confoo 2011, andrei zmievski

shard allocationnode 1person1person2

node 2person1person2

start the second node

Page 26: 99 problems but the search aint one, confoo 2011, andrei zmievski

shard allocationnode 1 node 2 node 3 node 4person1person2

person1person2

start 2 more nodes

Page 27: 99 problems but the search aint one, confoo 2011, andrei zmievski

shard allocationnode 1 node 2 node 3 node 4person1

person2person1

person2

start 2 more nodes

Page 28: 99 problems but the search aint one, confoo 2011, andrei zmievski

document shardingnode 1 node 2 node 3 node 4person1

person2person1

person2

PUT /person/info/1{ … }

Page 29: 99 problems but the search aint one, confoo 2011, andrei zmievski

document shardingnode 1 node 2 node 3 node 4person1

person2person1

person2

hashed to shard 1PUT /person/info/1{ … }

Page 30: 99 problems but the search aint one, confoo 2011, andrei zmievski

document shardingnode 1 node 2 node 3 node 4person1

person2person1

person2

replicated

PUT /person/info/1{ … }

Page 31: 99 problems but the search aint one, confoo 2011, andrei zmievski

document shardingnode 1 node 2 node 3 node 4person1

person2person1

person2

PUT /person/info/2{ … }

Page 32: 99 problems but the search aint one, confoo 2011, andrei zmievski

document shardingnode 1 node 2 node 3 node 4person1

person2person1

person2

hashed to shard 2

PUT /person/info/2{ … }

Page 33: 99 problems but the search aint one, confoo 2011, andrei zmievski

document shardingnode 1 node 2 node 3 node 4person1

person2person1

person2

replicated

PUT /person/info/2{ … }

Page 34: 99 problems but the search aint one, confoo 2011, andrei zmievski

scatter-gathernode 1 node 2 node 3 node 4person1

person2person1

person2

GET /person/_search?q=name:thomas

Page 35: 99 problems but the search aint one, confoo 2011, andrei zmievski

shard allocationnode 1 node 2 node 3 node 4person1

person2person1

person2

GET /person/_search?q=name:thomas

Page 36: 99 problems but the search aint one, confoo 2011, andrei zmievski

shard allocationnode 1 node 2 node 3 node 4person1

person2person1

person2

GET /person/_search?q=name:thomas

Page 37: 99 problems but the search aint one, confoo 2011, andrei zmievski

shard allocationnode 1 node 2 node 3 node 4person1

person2person1

person2

GET /person/_search?q=name:thomas

Page 38: 99 problems but the search aint one, confoo 2011, andrei zmievski

demo

Page 39: 99 problems but the search aint one, confoo 2011, andrei zmievski

transactional model

per-document consistency

no need to commit/flush

uses write-ahead transaction log

write consistency (W) can be controlled

one, quorum, or all

Page 40: 99 problems but the search aint one, confoo 2011, andrei zmievski

(near) real-time search

1 second refresh rate by default

_refresh API also

Page 41: 99 problems but the search aint one, confoo 2011, andrei zmievski

index storage

node data considered transient

can be stored in local file system, JVM heap, native OS memory, or FS & memory combination

persistent storage requires a gateway

Page 42: 99 problems but the search aint one, confoo 2011, andrei zmievski

gateways

persistent store for cluster state and indices

asynchronous, translog-based write strategy

allows full recovery if a cluster restart is needed

supported gateways:local

shared FS

Hadoop via HDFS

S3

Page 43: 99 problems but the search aint one, confoo 2011, andrei zmievski

mapping

describes document structure to the search engine

automatically created with sensible defaults

explicit mapping can be provided (generally, a good idea)

can run into merge conflicts

Page 44: 99 problems but the search aint one, confoo 2011, andrei zmievski

mapping

important meta fields:

_source

_all

there are more

Page 45: 99 problems but the search aint one, confoo 2011, andrei zmievski

mapping types

simple:

string, integer/long, float/double, boolean, and null)

complex:

array, object

Page 46: 99 problems but the search aint one, confoo 2011, andrei zmievski

sample mapping

>?"39#?/%%%%%%?<9#B!:?E

%?-B-$9?/%%%%%?W17X-%(27B!?E

%?-2H3?/%%%%%%G?.#18B$B7H?E%?<9F"HHB7H?E%?.,.?JE

%?.13-W2-9?/%%?56;6&;5&55+;M/;Y/;5?E

%?.#B1#B-I?/%%5Ndocu

men

t

>?.13-?/%>

%%?.#1.9#-B93?%/%>

%%%%?"39#?/%%%%%%>?-I.9?/%?3-#B7H?E%?B7<9P?/%?71-O272$IZ9<?NE

%%%%?@9332H9?/%%%>?-I.9?/%?3-#B7H?E%[F113-\/%;UVNE

%%%%?-2H3?/%%%%%%>?-I.9?/%?3-#B7H?E%?B7!$"<9OB7O2$$?/%?71?NE

%%%%?.13-W2-9?%/%>?-I.9?%/%?<2-9?E%[3-1#9\/%[71\NE

%%%%?.#B1#B-I?%/%>?-I.9?%/%?B7-9H9#?N

NNN

map

ping

Page 47: 99 problems but the search aint one, confoo 2011, andrei zmievski

analyzers

break down (tokenize) and normalize fields during indexing and query strings at search time

analyzer = tokenizer + token filters (0 or more)

*-27<2#<%A72$IZ9#%S

%%%*-27<2#<%+1:97BZ9#%]

%%%%%%%*-27<2#<%+1:97%^B$-9#%]

%%%%%%%_1K9#!239%+1:97%^B$-9#%]

%%%%%%%*-1.%+1:97%^B$-9#

Page 48: 99 problems but the search aint one, confoo 2011, andrei zmievski

analyzers

analyzers, tokenizers, and filters can be customizedB7<9P/

%%272$I3B3/

%%%%272$IZ9#/

%%%%%%.?&%,E/%%%%%%%%-I.9/%!"3-1@

%%%%%%%%-1:97BZ9#/%3-27<2#<

%%%%%%%%8B$-9#/%G3-27<2#<E%$1K9#!239E%3-1.E

%%%%%%%%%%%%%%%%%23!BB81$<B7HE%.1#-9#*-9@Jelas

ticse

arch

.ym

l

`

?-B-$9?/%>?-I.9?/%?3-#B7H?E%?272$IZ9#?/%?9"$27H?NE

`

map

ping

Page 49: 99 problems but the search aint one, confoo 2011, andrei zmievski

API

Page 50: 99 problems but the search aint one, confoo 2011, andrei zmievski

API conventions

append ?pretty=true to get readable JSON

boolean values: false/0/off = false, rest is true

JSONP support via callback parameter

Page 51: 99 problems but the search aint one, confoo 2011, andrei zmievski

API structure

http://host:port/[index]/[type]/[_action/id]

GET http://es:9200/_status

GET http://es:9200/twitter/_status

POST http://es:9200/twitter/tweet/1

GET http://es:9200/twitter/tweet/1

Page 52: 99 problems but the search aint one, confoo 2011, andrei zmievski

API structure

http://host:port/[index]/[type]/[_action/id]

GET http://es:9200/twitter/tweet/_search

GET http://es:9200/twitter/user/_search

GET http://es:9200/twitter/tweet,user/_search

GET http://es:9200/twitter,facebook/_search

GET http://es:9200/_search

Page 53: 99 problems but the search aint one, confoo 2011, andrei zmievski

API query example>

%%%%?R"9#I?/%>

%%%%%%%%?8B$-9#9<?/%>

%%%%%%%%%%%%?R"9#I?/%>

%%%%%%%%%%%%%%%%?R"9#IO3-#B7H?/%>

%%%%%%%%%%%%%%%%%%%%?R"9#I?/%?811%F2#?E

%%%%%%%%%%%%%%%%%%%%?<982"$-O1.9#2-1#?/%?AaW?E

%%%%%%%%%%%%%%%%%%%%?8B9$<3?/%G?-B-$9?E%?<93!#B.-B17?JE

%%%%%%%%%%%%%%%%%%%%?F113-?/%5U6

%%%%%%%%%%%%%%%%N

%%%%%%%%%%%%NE

%%%%%%%%%%%%?8B$-9#?/%>

%%%%%%%%%%%%%%%%?#27H9?/%>?<2-9?/%>?H-?/%?56;;&6T&64?NN

%%%%%%%%%%%%N

%%%%%%%%N

%%%%NE

%%%%?8#1@/%;6E

%%%%?3BZ9?/%;6

N

Page 54: 99 problems but the search aint one, confoo 2011, andrei zmievski

API {core}

index

bulk

delete

delete by query

get

count

search

query

from/size paging

sort

highlighting

selective fields

Page 55: 99 problems but the search aint one, confoo 2011, andrei zmievski

API {indices}

create

delete

open/close

get/put/delete mapping

refresh

optimize

snapshot

update settings

analyze

status

flush

Page 56: 99 problems but the search aint one, confoo 2011, andrei zmievski

Query DSL

term / terms

range

prefix

bool

fuzzy

wildcard

query_string

default_operator

analyzer

phrase_slop

etc

Page 57: 99 problems but the search aint one, confoo 2011, andrei zmievski

filters

share some similar features with queries (term, range, etc)

why use a filter?

Page 58: 99 problems but the search aint one, confoo 2011, andrei zmievski

filters

faster than queries

cached (depends on the filter)

the cache is used for different queries against the same filter

no scoring

more useful ones: term, terms, range, prefix, and, or, not, exists, missing, query

Page 59: 99 problems but the search aint one, confoo 2011, andrei zmievski

facets

provide aggregated data based on the search request

terms, histogram, date histogram, range, statistical, and more

Page 60: 99 problems but the search aint one, confoo 2011, andrei zmievski

geo search

implemented as filters (and a facet)

geo_distance

geo_bounding_box

geo_polygon

Page 61: 99 problems but the search aint one, confoo 2011, andrei zmievski

interfaces

REST

Java /!Groovy

Language clients (REST/Thrift):

pyes, PHP (standalone and symfony), Ruby, Perl

Flume sink implementation

Page 62: 99 problems but the search aint one, confoo 2011, andrei zmievski

data import

ES is not the primary data store (usually)

to import/synchronize data:

write an agent (Gearman, message queues, etc)

use rivers (CouchDB, RabbitMQ, Twitter)

Page 63: 99 problems but the search aint one, confoo 2011, andrei zmievski

10 more features

versioning

index aliases

parent/child docs

scripting

dynamic mapping templates

load balancing nodes

plugins

more_like_this

multi_field mapping

percolation