OUTLINE
➤ Core data types
➤ String, numeric, data, boolean, binary
➤ Complex data types
➤ Object, array, nested
➤ Geo data types
➤ Geo-point, Geo-shape
➤ Specialized data types
➤ IPv4, completion, token count, attachment
STRING
➤ String field types accept string values
➤ Can be sub-divided into full text and keywords
➤ We will take a look at these next
STRING - FULL TEXT
➤ Typically used for text based relevance searches (e.g. search for products by name)
➤ Full text fields are analyzed
➤ Data is passed through an analyzer to convert the string into a list of individual
terms, before being indexed
➤ This allows Elasticsearch to search for individual words within a full text field
➤ Full text fields are not used for sorting and are rarely used for aggregations
STRING - KEYWORDS
➤ Exact values such as tags, status, e-mail addresses, etc.
➤ Keywords fields are not analyzed
➤ The exact string value is added to the index as a single term
➤ Typically used for filtering
➤ E.g. find all products where status is "On Discount"
➤ Also often used for sorting and aggregations
NUMERIC
➤ Supports the following numeric types
➤ long (signed 64-bit integer)
➤ integer (signed 32-bit integer)
➤ short (signed 16-bit integer)
➤ byte (signed 8-bit integer)
➤ double (double-precision 64-bit floating point)
➤ float (single-precision 32-bit floating point)
DATE
➤ Dates in Elasticsearch can be either
➤ Strings containing formatted dates
➤ E.g. 2016-01-01 or 2016/01/01 12:00:00
➤ A long number representing milliseconds since the epoch
➤ An integer representing seconds since the epoch
➤ Internally stored as a long number representing milliseconds since the epoch
DATE - FORMATS
➤ Defaults to strict_date_optional_time||epoch_millis
➤ Dates with optional timestamps, which conform to the formats supported by strict_date_optional_time - or milliseconds since the epoch
➤ Examples
➤ 2016-01-01 (date only)
➤ 2016-01-01T12:00:00Z (date including time)
➤ 1410020500000 (milliseconds since the epoch)
➤ Multiple formats can be specified by separating them with the || separator
➤ E.g. yyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis
BOOLEAN
➤ Boolean fields accept true and false values as in JSON
➤ Can also accept strings and numbers which are interpreted as either true or false
➤ False values
➤ false, "false", "off", "no", "0", "" (empty string), 0, 0.0
➤ True values
➤ Anything that is not false
BINARY
➤ A binary value as a Base64 encoded string
➤ E.g. aHR0cDovL2NvZGluZ2V4cGxhaW5lZC5jb20=
➤ Not searchable
OBJECT
➤ JSON documents are hierarchical
➤ A document may contain inner objects, which in turn may contain inner objects
➤ In Elasticsearch, documents are indexed as flat lists of key-value pairs
{
"message": "Some text...",
"customer.age": 26,
"customer.address.city": "Copenhagen",
"customer.address.country": "Denmark"
}
ARRAY
➤ Elasticsearch does not have a dedicated array type
➤ Any field can contain zero or more values by default
➤ All values in an array must be of the same data type
➤ When adding a field dynamically, the first value in the array determines the field type
➤ Examples
➤ Array of strings: ["Elasticsearch", "rocks"]
➤ Array of integers: [1, 2]
➤ Array of arrays: [1, [2, 3]] - equivalent of [1, 2, 3]
➤ Array of objects: [{ "name": "Andy", "age": 26 }, { "name":
"Brenda", "age": 32 }]
ARRAY - OBJECTS
➤ Arrays of objects do not work as you would expect
➤ You cannot query each object independently of the other objects in the array
➤ Lucene has no concept of inner objects
➤ Elasticsearch flattens object hierarchies into a list of field names and values
is stored similar to this:
{ "users : [{ "name": "Andy", "age": 26 }, { "name": "Brenda", "age": 32 }] }
{ "users.name": ["Andy", "Brenda"], "users.age": [32, 26] }
➤ The association between "Andy" and 26 is lost
➤ A search for a user named "Andy" who is 26 years old would return incorrect results!
➤ If you need to be able to do this, then you must use the nested data type
NESTED
➤ If you need to index arrays of objects and to maintain the independence of each
object in the array, you should used the nested data type
➤ Internally, nested objects index each object in the array as a separate hidden
document
➤ Each nested object can be queried independently of the others, with a nested
query
➤ A nested query is executed against the nested objects as if they were indexed as
separate documents (internally, this is actually the case)
GEO-POINT
➤ Latitude-longitude pairs
➤ Used for geographical operations on documents (searching, sorting, ...)
{
"location": {
"lat": 33.5206608,
"lon": -86.8024900
}
}
{
"location": "33.5206608,-86.8024900"
}
{
"location": "drm3btev3e86"
}
{
"location": [-86.8024900,33.5206608]
}
1 2
3 4
GEO-SHAPE
➤ Geo shapes such as rectangles and polygons
➤ Should be used when either the data being indexed or the queries being executed
contain shapes other than just points
➤ LineString
➤ Array of two or more positions (array of arrays). Straight line in the case of two
points
➤ Polygon
➤ An array of arrays, where each array contains points
➤ The first and last points in the outer array must be the same (to close the polygon)
➤ ...
COMPLETION
➤ The completion suggester is a so-called prefix suggester
➤ It does not do spell correction, but enables basic auto-complete functionality
➤ Useful for providing the user with suggestions while searching, e.g. like on Google
➤ Stores a FST (Finite State Transducer) as part of the index
➤ Allows for very fast loads and executions
➤ You don't have to worry about this - just know when to use this type
TOKEN COUNT
➤ An integer field which accepts string values
➤ The string values are analyzed, and the number of tokens are indexed
➤ Example
➤ A name property could have a length field of the type token_count
➤ Then, a search query could be executed to find persons whose name contains X
tokens (split by space, for instance)
ATTACHMENT
➤ Lets Elasticsearch index attachments in common formats
➤ E.g. PDF, XLS, PPT, ...
➤ Attachment content is stored as a Base64 encoded string
➤ This functionality is available as a plugin that must be installed
➤ sudo /path/to/elasticsearchbin/plugin install mapper-attachments
➤ Must be installed on every node of a cluster
➤ Nodes must be restarted after the installation
Top Related