Devoxx france 2015 influxdb

43
@zepouet #InfluxDB :: InfluxDB :: @zepouet http://www.treeptik.fr http://www.cloudunit.fr http://www.labaixbidouille.com

Transcript of Devoxx france 2015 influxdb

@zepouet#InfluxDB

:: InfluxDB ::

@zepouet http://www.treeptik.fr http://www.cloudunit.fr http://www.labaixbidouille.com

@zepouet#InfluxDB

:: InfluxDB :: Time Series ::• About Me

• What is a time serie ?

• State of the Art in 2015

• Why yet another product for time series ?

• Live Demo

• Q/A

@zepouet#InfluxDB

About Me•Treeptik

•MarsJUG

•LabAixBidouille

What is a time series ?

Things happening in times…

@zepouet#InfluxDB

@zepouet#InfluxDB

Events, events… events• Measurements (physical sensors…)

• Exceptions (applications)

• Page views

• User actions

• Commits Git

• Webapp Deployment

• Things appening in time

State of the Art :: 2015

@zepouet#InfluxDB

What we have to store ?

• At the moment, we have :

• Graphite

• OpenTSDB (events, Hadoop, HBase…)

• Kairos (events, rewrite from OpenTSBD)

• Ganglia (more present in BigData/Hadoop)

• And others…

@zepouet#InfluxDB

What we have to collect ?

• At the moment, we have :

• CollectD

• Sensu

• DropWizard/Metrics

• JMXTrans

• Jolokia

@YourTwitterHandle@YourTwitterHandle@zepouet#InfluxDB

Something missing…

@zepouet#InfluxDB

Because in 2015, we need

• Simple product to install and manage

• To store millions of points (IoT is here)

• HTTP native support (JSON)

• Build with API

• Automatically clear out old data

• Easy scalable : cloud is a buzzword

@YourTwitterHandle@YourTwitterHandle@zepouet#InfluxDB

UseCase : Fablab

@zepouet#InfluxDB

wiki.labaixbidouille.com/index.php?4tle=Domo4que

@zepouet#InfluxDB

Feedback •Data volume : • 1 event / sensor / minute

• 1 * 60 * 24 = 1440 events per day

• 42.300 events per month

• 518.400 events per year

•First error : use MYSQL

•Second error : bad pattern with InfluxDB

@zepouet#InfluxDB

1.21

GIG

AWAT

TS

@zepouet#InfluxDB

About InfluxDB•An opensource distributed time series database

• ErrPlane

• MIT License

• Written in GO

• Young but awesome project

@zepouet#InfluxDB

InfluxDB :: design goals• Simple to install and manage thank to Go.

• No external dependencies like Zookeeper and Hadoop.

• HTTP(s) interface for reading and writing data.

• Horizontally scalable.

• On disk and in memory. Most data is cold.

• Compute percentiles and others functions on the fly.

• Downsample data on different windows of time.

@zepouet#InfluxDB

InfluxDB :: installing• MacOS : $ brew install influxdb

• Debian : $ sudo dpkg -i influxdb_latest_amd64.deb

• CentOS : $ sudo rpm -ivh influxdb-latest-1.x86_64.rpm

• Docker : $ docker run tutum/influxdb

• Soon ARM and Windows

@zepouet#InfluxDB

InfluxDB :: running• $ influxdb -config=/usr/local/etc/influxdb.conf

• Ports

• 8083 : UI

• 8086 : API

• 8090 : Cluster management raft

• 8099 : Cluster management protobuf

@zepouet#InfluxDB

InfluxDB :: design• Database (like in Mysql, Postgres…)

• Time Series (kind of like tables with time, sequence number and columns)

• A timeserie is composed by points or events (kinds of like rows)

• Primary index is always time

• Null values are not stored

• You can have millions of series

@zepouet#InfluxDB

InfluxDB :: security• Cluster admins

• Database admins

• Database users• Read permissions

• only certains series

• only queries with a column having a specific value (e.g. customer_id = 32)

• Write permissions

• only certains series

• only columns having a specific value

@zepouet#InfluxDB

InfluxDB :: create points

curl -X POST -d '[{"name":"temp","columns":["celsius"],"points":[[23]]}]' ‘http://localhost:8086/db/mydb/series?u=root&p=root

curl -G 'http://localhost:8086/db/mydb/series?u=root&p=root' --data-urlencode "q=select * from temp"

@zepouet#InfluxDB

InfluxDB :: Pitfalls• Schemaless Warning • Data partinioning with one serie

Time Name Host Metrics

3236765 cpu web0 78

3236765 disk_io web0 98344

3236765 load db1 5

3236765 eth_0 ldap0 8755

@zepouet#InfluxDB

Time Name Host Metrics

3236765 disk_io web0 98344

3236766 disk_io web0 98354

3236767 disk_io web0 98224

3236768 disk_io web0 98994

Time Name Host Metrics

3236765 eth_0 ldap0 8755

3236766 eth_0 ldap0 8721

3236767 eth_0 ldap0 8734

3236768 eth_0 ldap0 8723

Time Name Host Metrics

3236765 cpu web0 78

3236766 cpu web0 77

3236767 cpu web0 79

3236768 cpu web0 76

Time Name Host Metrics

3236765 load db1 5

3236766 load db1 6

3236767 load db1 5

3236768 load db1 7

@zepouet#InfluxDB

InfluxDB :: Why so many series?

• To take advantage of the Storage engines • Points are indexed by time, not by any other

columns • Tricks : easily work with grafana

InfluxDB works best with large number of series with fewer columns in each one

@zepouet#InfluxDB

:: Query Langage• select * from /.*/ limit 1

• select val1, val2 from serverA

• select cpu from /server.*/

• select * from /.*/ where time > now() - 1h

• select * from /.*/ where time > ‘2013-08-12 23:32:00’

• select * from /.*/ group by time(10m)

• select count(val) from /.*/ group by time(10m)

• select percentile(val, 95) from /.*/ group by time(10m)

• select count(distinct(val)) from /.*/

@zepouet#InfluxDB

:: Query Langage• DELETE

• delete from response_times where time < now() - 1h

• delete from /^stats.*/ where time < now() - 7d

• drop series response_times

• GROUP BY

• select count(type) from events group by time(10m);

• select count(type),type from events group by time(10m), type;

@zepouet#InfluxDB

:: Visualize and summarize• Graphs

• Last 10 minutes

• Last 4 hours

• Last 24 hours

• Past week

• Past month

• All time

@zepouet#InfluxDB

:: Merging :: Series

• select count(type) from user_events merge admin_events group by time(10m)

• select mean(value) from merge(/.*az\.1.*\.cpu/) group by time(1h)

@zepouet#InfluxDB

:: Joining :: Series

• select hosta.value + hostb.valuefrom cpu_load as hosta inner join cpu_load as hostbwhere hosta.host = 'hosta.influxdb.orb' and hostb.host = ‘hostb.influxdb.org’;

• select errors_per_minute.value / page_views_per_minute.valuefrom errors_per_minute inner join page_views_per_minute

@zepouet#InfluxDB

:: Naming Strategy :: 0.8

• Tag versus Value

• Rule : <tagName>.<tagValue>.serieName

• Examples : arduino.uno.shield.ethernet.sensor.dht11.temperature arduino.uno.shield.ethernet.sensor.dht11.temperature

arduino.uno.shield.wifi.sensor.dht22.humidity arduino.uno.shield.wifi.sensor.dht22.humidity

@zepouet#InfluxDB

:: Naming Strategy :: 0.9+

• Migration processus

• Rule : serieName = serieName

• Tag are defined into JSON and indexed

{ "database" : "domotic", "points": [ { "name": "temperature_x", "tags": { "arduino": "uno", "shield": "wifi", "position": "indoor", "sensor": "dht22", }, "timestamp": "2015-03-28T14:50:00Z", "fields": { "celsius": 23.2, "farenheit": 192 } } ] }

Continuous Queries

@zepouet#InfluxDB

:: Continuous Queries

• select count(type) from events group by time(10m), type into events.count_per_type.10m

DOWNSAMPLING

Next release

@zepouet#InfluxDB

Soon in april 2015

• New model Clustering

• Influx shell

• Tags indexed

• Backup

For Java Dev and Devops

@zepouet#InfluxDB

Libraries

• https://github.com/influxdb/influxdb-java Official java client

• https://github.com/davidB/metrics-influxdb A reporter for metrics which announces measurements to an InfluxDB server.

• https://github.com/vietj/vertx-influxdb-metricsProof of concept of reporting to InfluxDB

@zepouet#InfluxDB

davidb/metrics-influxdbNon official plugin from https://github.com/dropwizard/metrics

@zepouet#InfluxDB

Carbon-influxdb

https://github.com/dropwizard/metrics

@YourTwitterHandle@YourTwitterHandle@zepouet#InfluxDB

Demo

@YourTwitterHandle@YourTwitterHandle@zepouet#InfluxDB

Q & A