Elasticsearch Introduction to Data model, Search & Aggregations
-
Upload
alae-samer -
Category
Technology
-
view
299 -
download
1
Transcript of Elasticsearch Introduction to Data model, Search & Aggregations
Index / Type
- An index is a collection of documents that should be grouped together for a common reason.
- A type is a collection of documents all share an identical (or very similar) schema
Relationships
● Application Side Joins
● Parent-Child
● Nested objects
Relationships
● Application Side Joins
● Parent-Child
● Nested objects ●
●
●
●
Relationships
● Application Side Joins
● Parent-Child
● Nested objects ● Parent-child queries can be 5 to 10
times slower than the equivalent
nested query!
●
●
●
Relationships
● Application Side Joins
● Parent-Child
● Nested objects ●
●
●
●
●
●
Searching
A filter asks a yes|no question of every document and is
used for fields that contain exact values
- Is a date within the range 2012 to 2015 ?
- Is the status “Approved” ?
- Is the language code “DE” ?
STRUCTURED SEARCH
A query calculates how relevant each document is to the
query, and assigns it a relevance , which is later used
to sort matching documents by relevance.
- Containing the word run, but maybe also matching
runs, running, jog, or sprint
UNSTRUCTURED SEARCH
Searching
A filter asks a yes|no question of every document and is
used for fields that contain exact values
- Is a date within the range 2012 to 2015 ?
- Is the status “Approved” ?
- Is the language code “DE” ?
STRUCTURED SEARCH
A query calculates how relevant each document is to the
query, and assigns it a relevance , which is later used
to sort matching documents by relevance.
- Containing the word run, but maybe also matching
runs, running, jog, or sprint
UNSTRUCTURED SEARCH
Unstructured Search (Full Text)
Quick brown foxes leap over lazy dogs in summer
Quick, brown, foxes, leap, over, lazy, dogs, in, summer
Quick, brown, foxes, leap, lazy, dogs, summer
Quick, brown, fox, leap, lazy, dog, summer
fast, brown, fox, jump, lazy, dog, summer
tsar -> star
Inverted Index
Scoring & Relevance in Full-Text Search
Relevance is the algorithm to calculate how similar the contents of a field to a query.
TF/IDFTerm Frequency
How often does the term appear in the field?
Inverse Document Frequency
How often does each term appear in the index?
Field Length Norm
How long is the field?
Vector Space Model
The vector space model provides a way of comparing a multiterm query against a document.
- The model represents both the document and the query as vectors.
Vector Space Model
1. I am happy in summer.
2. After Christmas I’m a hippopotamus.
3. The happy hippopotamus helped Harry.
- By measuring the angle between the query vector and the document vector, it is possible to assign a relevance score to each document.
- If The angle between a document and the query is large, so it is of low relevance.
Aggregation
Search Analytics
Business Requirement “Help me find the best documents ?”
“What do theses documents tell me about my business ?”
Enablers Matching, Relevance, Filtering, Auto-completion,...
Summaries, Patterns, Trends, Outliers, Predictions,
Visualization
- Aggregations help build complex summaries & analytics of the indexed data.
Metrics Aggregations
● Extended Stats Aggregation
● Geo Bounds Aggregation
● Geo Centroid Aggregation
● Percentiles Aggregation
● Stats Aggregation
● Value Count Aggregation
● Avg, Sum, Min, Max Aggregations
Metrics Aggregations
● Extended Stats Aggregation
● Geo Bounds Aggregation
● Geo Centroid Aggregation
● Percentiles Aggregation
● Stats Aggregation
● Value Count Aggregation
● Avg, Sum, Min, Max Aggregations
Metrics Aggregations
● Extended Stats Aggregation
● Geo Bounds Aggregation
● Geo Centroid Aggregation
● Percentiles Aggregation
● Stats Aggregation
● Value Count Aggregation
● Avg, Sum, Min, Max Aggregations
Metrics Aggregations
● Extended Stats Aggregation
● Geo Bounds Aggregation
● Geo Centroid Aggregation
● Percentiles Aggregation
● Stats Aggregation
● Value Count Aggregation
● Avg, Sum, Min, Max Aggregations
Metrics Aggregations
● Extended Stats Aggregation
● Geo Bounds Aggregation
● Geo Centroid Aggregation
● Percentiles Aggregation
● Stats Aggregation
● Value Count Aggregation
● Avg, Sum, Min, Max Aggregations
Metrics Aggregations
● Extended Stats Aggregation
● Geo Bounds Aggregation
● Geo Centroid Aggregation
● Percentiles Aggregation
● Stats Aggregation
● Value Count Aggregation
● Avg, Sum, Min, Max Aggregations
Metrics Aggregations
● Extended Stats Aggregation
● Geo Bounds Aggregation
● Geo Centroid Aggregation
● Percentiles Aggregation
● Stats Aggregation
● Value Count Aggregation
● Avg, Sum, Min, Max Aggregations
Significant Terms
- Significant_terms analyzes your data and finds terms that appear with a frequency that is statistically anomalous compared to the background data.
- It can uncover surprisingly sophisticated trends and correlation in your data.- Used in discovering anomalies
Significant Terms
Summarisehow their style differ from everyone else
Find all people who like these products