Transit from SQL to Elastic Search - Meetupfiles.meetup.com/19156515/ElasticSearch_Session1.pdf ·...

Post on 22-May-2020

25 views 0 download

Transcript of Transit from SQL to Elastic Search - Meetupfiles.meetup.com/19156515/ElasticSearch_Session1.pdf ·...

Transit from SQL to Elastic Search By Manjunathan Raman

WHY ELASTIC SEARCH

Highly scalable open-source full-text search and analytics engine

Product Search, Log Analysis (ELK), Search As you Type, Did you mean

Usability Schema-less Default Lucene Standard Analyzer Fuzzy, Facets/aggregations, Histogram, Filter cache, date range, geo distance, boost, doc_values, paging Being fast Relevance Tuning Percolate Search

Document Mapping PUT my_index { "mappings": { "user": { "_all": { "enabled": false }, "properties": { "title": { "type": "string" }, "name": { "type": "string" }, "age": { "type": "integer" } }},"blogpost": { "properties": { "title": { "type": "string" }, "body": { "type": "string" }, "user_id": { "type": "string", "index": "not_analyzed" },"created": { "type": "date", "format": "strict_date_optional_time||epoch_millis" }}}}}

Field Types a simple type like string, date, long, double, boolean or ip.

a type which supports the hierarchical nature of JSON such as object or nested.

or a specialised type like geo_point, geo_shape, or completion.

“Index”: “not_analyzed”

“no”

Analyzer

Inverted Index

Query Related Index

curl -XPUT 'http://localhost:9200/twitter/tweet/1' -d '{

"user" : "kimchy”,

"post_date" : "2009-11-15T14:12:12”,

"message" : "trying out Elasticsearch”}’

Get

curl -XGET 'http://localhost:9200/twitter/tweet/1'

Delete

curl -XDELETE 'http://localhost:9200/twitter/tweet/1'

Update

curl -XPUT localhost:9200/test/type1/1 -d '{

"counter" : 1,

"tags" : ["red”] }’

Multi Get

curl 'localhost:9200/_mget' -d '{

"docs" : [

{

"_index" : "test", "_type" : "type", "_id" : "1"

},{

"_index" : "test", "_type" : "type", "_id" : "2"

}] }’

Bulk API

The bulk API makes it possible to perform many index/delete operations in a single API call.

Query { "fields" : ["tpnb","name"], "query" : { "filtered": { "filter": { "bool" : { "must" : [ { "term": { "store_price.store": 2396}} ,{ "term": { "tpnb": 50006459} } ,{ "term": { "price": "1.65"} } ] } } } } }

{ "query" : { "nested" : { "path" : "stores", "query" : { "filtered": { "filter": { "bool" : { "must" : [ { "term": { "stores.store": 2396}},{ "term": { "stores.price": "1.73" }} ]}}}}}} , "sort": [ {"stores.availability": { "order": "asc","mode": "min", "nested_filter" : { "bool" : { "must" : [ { "term": { "stores.store": 2396}},{ "term": { "stores.price": "1.73" }} ]}}}},"popularity"] ,"fields" : ["tpnb"] ,"aggs" : { "storesprice" : { "nested" : { "path" : "stores"}, "aggs" : { "min_price" : { "min" : { "field" : "stores.price" } }}}}}

Distributed Storage An index should be sharded proportionally with the anticipated growth. As more nodes are added to an Elasticsearch cluster, it does a good job at reallocating and moving shards around. As such, Elasticsearch is very easy to scale out.

Each shard contains multiple "segments", where a segment is an inverted index

While you are indexing documents, Elasticsearch collects them in memory

Then every second or so, writes a new small segment to disk, and "refreshes" the search.

Key Points Write once

Query And then fetch

Query/filter And term/match

Nested, Inner Hits

Concurrency Control by document version

You can also specify the consistency level of index-operations, in terms of how many replicas must acknowledge the operation before returning

Elasticsearch has a concept of "query time" joining with parent/child-relations, and "index time" joining with nested types.

Index Vs Type

Alias – Filtered, Routing, multiple indices (array or pattern)

.scripts (groovy) - evaluated custom expression or Function score query

Mustache Template

PostMan, Head plugin

Elastic Search - Limits Architect you application with right set of DB

Relational No SQL – Graph Oriented Database

Non-relational > Denormalized > Document Oriented

Isolation > Transaction > ACID > Distribution Transactions

Nearly Real Time

Robust – costly query – cancel

Split Brain – Data Loss

Security

Different Models - Legacy

Different Model – with Elasticsearch

Java Clients for Elastic Search The Native Client – Application node client integrates with Elasticsearch Cluster, it knows the cluster state, less hope to get document from a specific shard of a node.

Jest – Light weight client, uses Elastic Rest API

Spring Data Elasticsearch – Comes with similar feel of other Spring Data Project, one step further to Jest, you can annotate data object like @Id, @Field, @Document

References https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html

https://www.elastic.co/blog/found-elasticsearch-as-nosql

https://www.elastic.co/blog/found-elasticsearch-from-the-bottom-up

https://qbox.io/blog/optimizing-search-results-in-elasticsearch-with-scoring-and-boosting

https://www.elastic.co/blog/understanding-query-then-fetch-vs-dfs-query-then-fetch

https://www.elastic.co/blog/found-elasticsearch-from-the-bottom-up

http://stackoverflow.com/questions/15426441/understanding-segments-in-elasticsearch

Quiz What query language does elasticsearch use?

SQL Query DSL Query SSL ElasticClient

What filter can be used to combine multiple filters? Term Range Exists Bool

”Synonym" is an example of: Tokenizer Analyzer Token Filter Character Filter

Quiz Answers

B. Query DSL - Query DSL is Elastic search native Query Domain specific Language

D. Bool, which is to combine multiple filters using must/should

C. Token filter, which is the usage of synonym to filter based on synonyms example

E.g. “synonyms”: [“british,english”, “hen,chicken”]