Elasticsearch for SQL Users - Percona · PDF fileCREATE TABLE IF NOT EXISTS emails (sender...
Transcript of Elasticsearch for SQL Users - Percona · PDF fileCREATE TABLE IF NOT EXISTS emails (sender...
Elasticsearchfor SQL UsersPhilipp Krenn @xeraa
Infrastructure | Developer Advocate
ViennaDBPapers We Love Vienna
AgendaEcosystemArchitecture
QueriesSchema
Conclusion
Ecosystem
You Know, for Search
ELK Stack
Elastic Stack
Architecture
Apache LuceneLibrary
Index, store, search
ElasticsearchDistribution
RESTQuery DSL
IndexType
Document
ClusterIndex
Shard & Replica
Topology
Write
Immutable Segments
Immutable Segments
Schema-Free Is a Lie!Dynamic mapping
CREATE TABLE IF NOT EXISTS emails ( sender VARCHAR(255) NOT NULL, recipients TEXT, cc TEXT, bcc TEXT, subject VARCHAR(1024), body MEDIUMTEXT, datetime DATETIME);
CREATE INDEX emails_sender ON emails(sender);CREATE FULLTEXT INDEX emails_subject ON emails(subject);CREATE FULLTEXT INDEX emails_body ON emails(body);
PUT /communication{ "mappings": { "email": { "properties": { "sender": { "type": "keyword" }, "recipients": { "type": "keyword" }, "cc": { "type": "keyword" }, "bcc": { "type": "keyword" }, "subject": { "type": "text", "analyzer": "english" }, "body": { "type": "text", "analyzer": "english" } } } }}
Beware of the Reindex
Queries
Let's Add Some Data
POST /communication/email{ "sender": "[email protected]", "recipients": [ "[email protected]" ], "cc": [], "subject": "Elasticsearch is pretty cool", "body": "Hey Philipp, this is great stuff. Check it out!"}
Let's Add Some Data
POST /communication/email{ "sender": "[email protected]", "recipients": [ "[email protected]" ], "cc": [ "[email protected]" ], "subject": "Thanks for creating Elasticsearch", "body": "David pointed me to your project. It's awesome"}
Let's Add Some Data
POST /communication/email{ "sender": "[email protected]", "recipients": [ "[email protected]" ], "cc": [], "subject": "We should hire Philipp", "body": "He is really into the project and will do great things."}
Get All the Data
POST /communication/_search{ "query": { "match_all": { } }}
Find great
POST /communication/_search{ "query": { "match": { "body": "great" } }}
text vs keywordtext (default) is analyzed
keyword is not
Analyzed Text
PUT /cities/city/1{ "city": "Sandusky", "population": 25340}PUT /cities/city/2{ "city": "New Albany", "population": 8829}PUT /cities/city/3{ "city": "New York", "population": 8406000}
Analyzed TextStop words, stemming, synonyms, fuzzying,...
Analyzed Text
POST /cities/_search{ "query": { "match": { "city": "New Albany" } }}
Not Analyzed Text
DELETE /citiesPUT /cities{ "mappings": { "city": { "properties": { "city": { "type": "keyword" } } } }}
Not Analyzed Text
POST /cities/_search{ "query": { "match": { "city": "New Albany" } }}
Search for great stuff
POST /communication/_search{ "query": { "match": { "body": "great stuff" } }}
Search for the Phrase great stuff
POST /communication/_search{ "query": { "match_phrase": { "body": "great stuff" } }}
Multiple Fields
POST /communication/_search{ "query": { "multi_match": { "query": "Philipp", "fields": [ "subject", "body" ] } }}
BoostingPositive (>1) or negative (<1)
POST /communication/_search{ "query": { "multi_match": { "query": "Philipp", "fields": [ "subject^3", "body" ] } }}
Fuzziness
POST /communication/_search{ "query": { "match": { "body": { "query": "awsome", "fuzziness": 1 } } }}
Lots MoreBoolean (must, must_not, should, minimum_should_match)
HighlightingSearch and scroll
Geo
(Pipelined) AggregationsExample: Group by day, sum, moving average
Schema
Application-Side Joins
PUT /blog/author/1{ "name": "Philipp", "bio": "..."}PUT /blog/post/1{ "author_id": 1, "title": "...", "body": "..."}PUT /blog/post/2{ "author_id": 1, "title": "...", "body": "..."}
Application-Side Joins
POST /blog/author/_search{ "query": { "match": { "name": "Philipp" } }}
POST /blog/post/_search{ "query": { "match": { "author_id": <each id from query 1 result> } }}
Data Denormalization
PUT /blog/post/1{ "author_name": "Philipp", "title": "...", "body": "..."}PUT /blog/post/2{ "author_name": "Philipp", "title": "...", "body": "..."}
Data Denormalization
POST /blog/post/_search{ "query": { "match": { "author_name": "Philipp" } }}
Nested Objects
PUT /blog/author/1{ "name": "Philipp", "bio": "...", "blog_posts": [ { "title": "...", "body": "..." }, { "title": "...", "body": "..." }, { "title": "...", "body": "..." } ]}
Nested Objects
POST /blog/author/_search{ "query": { "match": { "name": "Philipp" } }}
Parent-Child Documents
PUT /blog{ "mappings": { "author": {}, "post": { "_parent": { "type": "author" } } }}
Parent-Child Documents
PUT /blog/author/1{ "name": "Philipp", "bio": "..."}PUT /blog/post/1?parent=1{ "title": "...", "body": "..."}PUT /blog/post/2?parent=1{ "title": "...", "body": "..."}
Parent-Child Documents
POST /blog/post/_search{ "query": { "has_parent": { "type": "author", "query": { "match": { "name": "Philipp" } }}
Conclusion
Many QueriesSearch, boolean aggregate, geo,...
text vs keyword
SchemaApplication, denormalization, nesting, parent-child
Getting Data from Your RDBMSRDBMS trigger
LogstashForked writes from your application
Can I use it as a primary datastore?
It dependshttps://www.elastic.co/guide/en/elasticsearch/
resiliency/current/index.html
How fast is it?
Datastore, logs, metrics, analytics,...
Thanks!
Questions?Philipp Krenn @xeraa
PS: Stickers
Elk horn: https://www.theexplora.com/the-irish-elk-megaloceros-giganteus/