Craig Brown speaks on ElasticSearch

24
3 DBAs walk into a NOSQL bar. A little while later they walk out ..... because they couldn't find a table.

description

 

Transcript of Craig Brown speaks on ElasticSearch

Page 1: Craig Brown speaks on ElasticSearch

3 DBAs walk into a NOSQL bar. A little while later they walk out ..... because they couldn't find a table.

Page 2: Craig Brown speaks on ElasticSearch

3 nosql guys walk into a SQL bar... a little while later they leave because they couldn't find a relationship

Page 3: Craig Brown speaks on ElasticSearch

Objective – Understanding of ElasticSeach as a search engine and nosql datastore.

Page 4: Craig Brown speaks on ElasticSearch

About Me

Search Architect

Big Data/Hadoop Engineer

NoSql Advocate

blog.nosqltips.com

@nosqltips on twitter

Page 5: Craig Brown speaks on ElasticSearch

ElasticSearch

You know, for search

Shay Banon - compass

Elasticsearch.org

Built on Lucene core (4.3 as of 0.90.3)

JSON over HTTP (REST)

Page 6: Craig Brown speaks on ElasticSearch

ElasticSearch

Very easy scalability – multicast, unicast, AWS

Open Source – apache 2 license

https://github.com/elasticsearch

Transports – HTTP, memcached, thrift

Scripting – mvel, javascript, java, python, groovy

custom scoring, document updates

Page 7: Craig Brown speaks on ElasticSearch

Schema Free & Document Oriented

$ curl -XPUT http://localhost:9200/twitter/user/kimchy -d '{ "name" : "Shay Banon"}'

$ curl -XPUT http://localhost:9200/twitter/tweet/1 -d '{ "user": "kimchy", "post_date": "2009-11-15T13:12:00", "message": "Trying out elasticsearch, so far so good?"}'

$ curl -XPUT http://localhost:9200/twitter/tweet/2 -d '{ "user": "kimchy", "post_date": "2009-11-15T14:12:12", "message": "You know, for Search"}'

Page 8: Craig Brown speaks on ElasticSearch

Search

$ curl -XGET http://localhost:9200/twitter/tweet/_search?q=user:kimchy

$ curl -XGET http://localhost:9200/twitter/tweet/_search -d '{ "query" : { "term" : { "user": "kimchy" } }}'

$ curl -XGET http://localhost:9200/twitter/_search?pretty=true -d '{ "query" : { "range" : { "post_date" : { "from" : "2009-11-15T13:00:00", "to" : "2009-11-15T14:30:00" } } }}'

Page 9: Craig Brown speaks on ElasticSearch

GETting Some Data

$ curl -XPUT http://localhost:9200/twitter/tweet/2 -d '{ "user": "kimchy", "post_date": "2009-11-15T14:12:12", "message": "You know, for Search"}'

$ curl -XGET http://localhost:9200/twitter/tweet/2

Page 10: Craig Brown speaks on ElasticSearch

Schema Mapping

$ curl -XPUT http://localhost:9200/twitter

$ curl -XPUT http://localhost:9200/twitter/user/_mapping -d '{ "properties" : { "name" : { "type" : "string" } }}'

Page 11: Craig Brown speaks on ElasticSearch

Multi Tenancy

$ curl -XPUT http://localhost:9200/kimchy

$ curl -XPUT http://localhost:9200/elasticsearch

$ curl -XPUT http://localhost:9200/elasticsearch/tweet/1 -d '{ "post_date": "2009-11-15T14:12:12", "message": "Zug Zug", "tag": "warcraft"}'

$ curl -XPUT http://localhost:9200/kimchy/tweet/1 -d '{ "post_date": "2009-11-15T14:12:12", "message": "Whatyouwant?", "tag": "warcraft"}'

$ curl -XGET http://localhost:9200/kimchy,elasticsearch/tweet/_search?q=tag:warcraft

$ curl -XGET http://localhost:9200/_all/tweet/_search?q=tag:warcraft

Page 12: Craig Brown speaks on ElasticSearch

Settings

$ curl -XPUT http://localhost:9200/elasticsearch/ -d '{ "settings" : { "number_of_shards" : 2, "number_of_replicas" : 3 }}'

Page 13: Craig Brown speaks on ElasticSearch

Distributed

Shards – write scale

Replicas – read scale, durability

Segments

Routing – index and search

Discovery – multicast, unicast, AWS

http://www.youtube.com/watch?v=l4ReamjCxHo

Page 14: Craig Brown speaks on ElasticSearch

Consistency

Always read consistent with RT GET

View always consistent after write

Document searchable after short delay(1s)

Write tunable – one, quorum, all

Page 15: Craig Brown speaks on ElasticSearch

Gateway

Local/NFS

Amazon S3

Hadoop

Page 16: Craig Brown speaks on ElasticSearch

River

Twitter

CouchDB

MongoDB

RabbitMQ

Wikipedia

Logstash

Page 17: Craig Brown speaks on ElasticSearch

SOLR over ElasticSearch

Release synchronized with Lucene

Larger community

Larger tool set

Feature set a bit better

XML configuration

Page 18: Craig Brown speaks on ElasticSearch

ElasticSearch over SOLR

Natively distributed

JSON based

Dynamic, template, and defined schema

Returns source document by default

Avoids overhead of index commit after write

Mock SOLR interface

Rivers

Page 19: Craig Brown speaks on ElasticSearch

Who uses ElasticSearch

StumbleUpon

Mozilla Foundation

Sony Computer Entertainment

Infochimps

Foursquare

Github

Ataxo Social Insider

Sonian Inc.

Page 20: Craig Brown speaks on ElasticSearch

Demo

Page 21: Craig Brown speaks on ElasticSearch

Hadoop Integration

Hadoop as gateway storage

LoadFunc/StorFunc – Pig/Map Reduce

Hadoop streaming interface

Manual export and import of data

Page 22: Craig Brown speaks on ElasticSearch

ES in Big Data

Endpoint for processed data

Aggregator for BI or dashboard (facets)

Used to query reduced data sets for machine

learning algorithms

Data storage engine in it's own right plus full

search capabilities

Page 23: Craig Brown speaks on ElasticSearch

ElasticStoreMake ES look and function more like a document store while exposing advanced ES features

Influenced by Mongo API

Expose a simpler, more programmer centric API

Expose A QueryBuilder style API (HQL)

Expose annotations for easier schema definition,

properties, analyzers, etc.

Allow both strong and weak object mapping

https://github.com/nosqltips/elasticstore

Page 24: Craig Brown speaks on ElasticSearch

Resources

www.elasticsearch.org

www.elasticsearch.com

https://github.com/elasticsearch

http://lucene.apache.org/core