SFScon17 - Patrick Puecher: "Exploring data with Elasticsearch and Kibana"

31
Exploring data with Elasticsearch and Kibana Patrick Puecher, Developer SFSCon, November 10th 2017

Transcript of SFScon17 - Patrick Puecher: "Exploring data with Elasticsearch and Kibana"

Page 1: SFScon17 - Patrick Puecher: "Exploring data with Elasticsearch and Kibana"

Exploring data with Elasticsearch and Kibana

Patrick Puecher, Developer SFSCon, November 10th 2017

Page 2: SFScon17 - Patrick Puecher: "Exploring data with Elasticsearch and Kibana"

Elastic Stack (the ELK Stack)

Elasticsearch Kibana

BeatsLogstash

Page 3: SFScon17 - Patrick Puecher: "Exploring data with Elasticsearch and Kibana"

Elasticsearch- Distributed, RESTful search engine- Based on Lucene- Written in Java- Apache License- APIs

- Indexes APIs- Document APIs- Search APIs- …

Page 4: SFScon17 - Patrick Puecher: "Exploring data with Elasticsearch and Kibana"

Kibana- Visualize your data- Histograms, line graphs, pie charts, …- Time Series with Timelion

Page 5: SFScon17 - Patrick Puecher: "Exploring data with Elasticsearch and Kibana"

Logstash- Server-side data processing pipeline- How Logstash works

- Inputs- file, syslog, redis, beats, …

- Filters- split, mutate (convert, rename, add_field, remove_field), date, …

- Outputs- elasticsearch, file, email, exec, …

Page 6: SFScon17 - Patrick Puecher: "Exploring data with Elasticsearch and Kibana"

Beats- Send data from machines to Logstash and Elasticsearch- Beats family

- Filebeat- log files

- Metricbeat- system and service metrics

- Packetbeat- network data

- Winlogbeat- windows event logs

- Heartbeat (beta)- uptime monitoring

Page 7: SFScon17 - Patrick Puecher: "Exploring data with Elasticsearch and Kibana"

Demo time! 1Big Data 4 Tourism

- Input: CSV file

- Data processing: Java API

- Visualizing: Kibana

2Instagram Data

- Input: JSON files

- Data processing: Logstash & jq

- Visualizing: Kibana

./jq

Page 8: SFScon17 - Patrick Puecher: "Exploring data with Elasticsearch and Kibana"

Demo 1: Big Data 4 Tourism- Collect and visualize accommodation enquiries and bookings

○ Create Elasticsearch index○ Tourism Data Collector (https://github.com/idm-suedtirol/big-data-for-tourism)

- Upload and process CSV files- Written in Java- Open Source ツ

○ Kibana to visualize

- Big Data 4 Tourism working group by IDM Südtirol - Alto Adige○ Brandnamic, HGV, IDM Südtirol - Alto Adige, Internet Consulting, Limitis, LTS, Peer GmbH,

SiMedia …

Page 9: SFScon17 - Patrick Puecher: "Exploring data with Elasticsearch and Kibana"

PUT /tourism-data_2017{ "mappings" : { "enquiry" : { "properties" : { "arrival" : { "type" : "date", "format" : "epoch_millis||date" }, "departure" : { "type" : "date", "format" : "epoch_millis||date" }, "country.code" : { "type" : "keyword" }, "country.name" : { "type" : "keyword" }, "country.latlon" : { "type" : "geo_point" }, "adults" : { "type" : "byte" }, "children" : { "type" : "byte" }, "destination.code" : { "type" : "short" }, "destination.name" : { "type" : "keyword" }, "destination.latlon" : { "type" : "geo_point" }, "category.code" : { "type" : "byte" }, "category.name" : { "type" : "keyword" }, "booking" : { "type" : "boolean" }, "cancellation" : { "type" : "boolean" }, "submitted_on" : { "type" : "date", "format" : "epoch_millis||date||date_hour_minute_second" }, "length_of_stay" : { "type" : "short" } } } }}

Demo 1: Create Elasticsearch index"2015-01-01","2015-01-03","","2","0","21027","1","1","0","2015-01-01T01:59:00"

Page 10: SFScon17 - Patrick Puecher: "Exploring data with Elasticsearch and Kibana"

Demo 1: Tourism Data Collector

Page 11: SFScon17 - Patrick Puecher: "Exploring data with Elasticsearch and Kibana"

Demo 1: Visualize sample data (I)

1

2

Page 12: SFScon17 - Patrick Puecher: "Exploring data with Elasticsearch and Kibana"

Demo 1: Visualize sample data (II)

2

1

3

Page 13: SFScon17 - Patrick Puecher: "Exploring data with Elasticsearch and Kibana"

How to get Instagram postsfrom South Tyrol?

Demo 2: Instagram dataMission: Must-see places for route planner

Page 14: SFScon17 - Patrick Puecher: "Exploring data with Elasticsearch and Kibana"

Demo 2: Instagram data1. Get a shape file of South Tyrol (http://geoportal.buergernetz.bz.it/)

Page 15: SFScon17 - Patrick Puecher: "Exploring data with Elasticsearch and Kibana"

Demo 2: Instagram data1. Get a shape file of South Tyrol (http://geoportal.buergernetz.bz.it/)2. Use QGIS to create a regularly-spaced grid of points

Page 16: SFScon17 - Patrick Puecher: "Exploring data with Elasticsearch and Kibana"

Demo 2: Instagram data1. Get a shape file of South Tyrol (http://geoportal.buergernetz.bz.it/)2. Use QGIS to create a regularly-spaced grid of points3. Export points as latitude and longitude coordinates

Page 17: SFScon17 - Patrick Puecher: "Exploring data with Elasticsearch and Kibana"

Demo 2: Instagram rate limits & scopes- Global rate limits on the Instagram platform

(https://www.instagram.com/developer/limits/)- 5000 API calls / hour

- Scopes- public_content - to read any public profile info and media on a user’s behalf

(applications no longer accepted) :’(

Page 18: SFScon17 - Patrick Puecher: "Exploring data with Elasticsearch and Kibana"

Demo 2: Instagram search API

{ "data":[ { "id":"1614761577805643016_1157147895", "user":{ "id":"1157147895","full_name":"Marc Hochstaffl","profile_picture":"…","username":"marc_hochstaffl" }, "images":{ "thumbnail":{"width":150,"height":150,"url":"…"},"low_resolution":{…},"standard_resolution":{…} }, "created_time":"1506714602", "caption":{ … }, "user_has_liked":false, "likes":{"count":181}, "tags":["sam","karposfasttrail","autumn\ud83c\udf41","ahrntal","hundskehljoch"], "filter":"Normal", "comments":{"count":3}, "type":"image", "link":"https://www.instagram.com/p/BZoyRmCDf0I/", "location":{"latitude":47.05,"longitude":12.06667,"name":"Hundskehljoch","id":1033509208}, "attribution":null, "users_in_photo":[] } ], "meta":{"code":200}}

https://api.instagram.com/v1/media/search?lat=47.051124693028548&lng=12.039835734128651&access_token=key&distance=5000

Page 19: SFScon17 - Patrick Puecher: "Exploring data with Elasticsearch and Kibana"

PUT /_template/ instagram{ "template" : "instagram*" , "mappings" : { "_default_" : { "properties" : { "images" : { … }, "carousel_media" : { … }, "geoip" : { "type": "geo_point" }, "users_in_photo" : { … }, "link" : { … }, "created_time" : { "type" : "date", "format" : "strict_date_optional_time||epoch_second" }, "caption" : { … }, "type" : { "type": "keyword" }, "tags" : { "type": "keyword" }, "filter" : { "type": "keyword" }, "likes.count" : { "type" : "integer" }, "comments.count" : { "type" : "integer" }, "location" : { … }, "id" : { "type" : "keyword" }, "user" : { … } } } }}

Demo 2: Create Elasticsearch index

Page 20: SFScon17 - Patrick Puecher: "Exploring data with Elasticsearch and Kibana"

input { http_poller { urls => { insta1 => "/v1/media/search?lat=47.051124693028548&lng=12.039835734128651&access_token=key&distance=5000" insta2 => "/v1/media/search?lat=47.049359378811829&lng=12.105570031601609&access_token=key&distance=5000" … } keepalive => false cookies => false request_timeout => 30 schedule => { every => "10m" } codec => "json" }}

output { elasticsearch { hosts => ["127.0.0.1:9200"] index => "instagram-%{+YYYYMM}" document_id => "%{id}" }}

Demo 2: Grab and store posts using Logstash (I)

Page 21: SFScon17 - Patrick Puecher: "Exploring data with Elasticsearch and Kibana"

Demo 2: Grab and store posts using Logstash (II)

filter { split { field => "data" } if [data][id] { mutate { convert => { "[data][comments][count]" => "integer" "[data][likes][count]" => "integer" } rename => { "[data][created_time]" => "[created_time]" "[data][images]" => "[images]" "[data][comments][count]" => "[comments_count]" … "[data][id]" => "[id]" "[data][user]" => "[user]" "[data][likes][count]" => "[likes_count]" } add_field => [ "geoip", "%{[location][latitude]},%{[location][longitude]}" ] remove_field => ["data", "meta"] } date { match => ["[caption][created_time]" , "UNIX"] target => [ "[caption][created_time]" ] } date { match => ["[created_time]" , "UNIX"] remove_field => [ "[created_time]" ] } }}

Page 22: SFScon17 - Patrick Puecher: "Exploring data with Elasticsearch and Kibana"

Demo 2: Grab and store posts using Linux Shell

#!/bin/bash

insta=( 'https://api.instagram.com/v1/media/search?lat=47.051124693028548&lng=12.039835734128651&access_token=key&distance=5000' 'https://api.instagram.com/v1/media/search?lat=47.049359378811829&lng=12.105570031601609&access_token=key&distance=5000')

count=0while [ "x${insta[count]} " != "x" ]do MIN=`date -d '11 minutes ago' +"%s"` # reduce bandwidth URL="${insta[count]} &min_timestamp= $MIN" curl -s $URL | jq -c '.data[] | .geoip = ((.location.latitude | tostring) + "," + (.location.longitude | tostring)) | {'index': {'_index': ("instagram-" + (.created_time | ' tonumber' | gmtime | strftime("%Y%m"))), ' _type': "feed", ' _id': .id}}, .' | curl -s -XPOST localhost:9200/_bulk --data-binary @- & # start in background

if [ $((($count + 1) % 20)) = 0 ]; then # parallelize wait fi

count= $(( $count + 1 ))done

Use a cron job to run the shell script every 10 minutes!

Page 23: SFScon17 - Patrick Puecher: "Exploring data with Elasticsearch and Kibana"

Demo 2: Visualize posts by dateJuly - August

Page 24: SFScon17 - Patrick Puecher: "Exploring data with Elasticsearch and Kibana"

Demo 2: Daily rhythm (1 for monday … 7 for sunday)

Sunday… 2 pm - 7 pm

Page 25: SFScon17 - Patrick Puecher: "Exploring data with Elasticsearch and Kibana"

Demo 2: Top locations by number of posts (I)

Riva del Garda

Trento

BolzanoTre Cime

Merano

Page 26: SFScon17 - Patrick Puecher: "Exploring data with Elasticsearch and Kibana"

Demo 2: Top tags by number of postssnukiefulmartinisisters

giuliavalentina

valentinavignali valentinavignali

querly_official

igworld_globalmanueldietrichphotography

Page 27: SFScon17 - Patrick Puecher: "Exploring data with Elasticsearch and Kibana"

Demo 2: Top travellers

Page 28: SFScon17 - Patrick Puecher: "Exploring data with Elasticsearch and Kibana"

Demo 2: Influencer Trentino

Page 29: SFScon17 - Patrick Puecher: "Exploring data with Elasticsearch and Kibana"

Demo 2: Influencer South Tyrol

Page 30: SFScon17 - Patrick Puecher: "Exploring data with Elasticsearch and Kibana"

Demo 2:

Glassyhuman

Page 31: SFScon17 - Patrick Puecher: "Exploring data with Elasticsearch and Kibana"

Data-Driven Advertising