GraphandTimeseriesDatabases - Graz University of...

30
Graph and Timeseries Databases Databases 2 (VU) (706.711 / 707.030) Roman Kern Institute of Interactive Systems and Data Science, Technical University Graz 2018-10-22 Roman Kern (ISDS, TU Graz) Graph/TimeDBs 2018-10-22 1 / 30

Transcript of GraphandTimeseriesDatabases - Graz University of...

Page 1: GraphandTimeseriesDatabases - Graz University of Technologykti.tugraz.at/staff/rkern/courses/dbase2/slides_graphdbase.pdf · GraphandTimeseriesDatabases Databases2(VU)(706.711/707.030)

Graph and Timeseries DatabasesDatabases 2 (VU) (706.711 / 707.030)

Roman Kern

Institute of Interactive Systems and Data Science,Technical University Graz

2018-10-22

Roman Kern (ISDS, TU Graz) Graph/TimeDBs 2018-10-22 1 / 30

Page 2: GraphandTimeseriesDatabases - Graz University of Technologykti.tugraz.at/staff/rkern/courses/dbase2/slides_graphdbase.pdf · GraphandTimeseriesDatabases Databases2(VU)(706.711/707.030)

Graph DatabasesMotivation and Basics of Graph Databases?

Roman Kern (ISDS, TU Graz) Graph/TimeDBs 2018-10-22 2 / 30

Page 3: GraphandTimeseriesDatabases - Graz University of Technologykti.tugraz.at/staff/rkern/courses/dbase2/slides_graphdbase.pdf · GraphandTimeseriesDatabases Databases2(VU)(706.711/707.030)

Introduction - Graph Databases

What is a graph database?Datastorage optimised for graph data structure

▶ i.e., efficient storage and access

Scales gracefully with the amount of data

Additional index for look-ups

Roman Kern (ISDS, TU Graz) Graph/TimeDBs 2018-10-22 3 / 30

Page 4: GraphandTimeseriesDatabases - Graz University of Technologykti.tugraz.at/staff/rkern/courses/dbase2/slides_graphdbase.pdf · GraphandTimeseriesDatabases Databases2(VU)(706.711/707.030)

Introduction - Graph Databases

Why should graph databases work?Networks usually have certain properties

▶ Small world phenomena▶ … even in big networks only a few hops are on average required to reach even distant nodes

Access to data follows certain patterns▶ Locality of reference▶ … operations are focused on certain areas

Roman Kern (ISDS, TU Graz) Graph/TimeDBs 2018-10-22 4 / 30

Page 5: GraphandTimeseriesDatabases - Graz University of Technologykti.tugraz.at/staff/rkern/courses/dbase2/slides_graphdbase.pdf · GraphandTimeseriesDatabases Databases2(VU)(706.711/707.030)

Introduction - Graph Databases

Why should one not use a graph database?… if all the data is updated at once

▶ E.g. operation applied on all nodes

… if the query cannot easily be expressed as a graph traversal operation▶ E.g. lot of random access, or aggregate functions

Roman Kern (ISDS, TU Graz) Graph/TimeDBs 2018-10-22 5 / 30

Page 6: GraphandTimeseriesDatabases - Graz University of Technologykti.tugraz.at/staff/rkern/courses/dbase2/slides_graphdbase.pdf · GraphandTimeseriesDatabases Databases2(VU)(706.711/707.030)

Introduction

Graph database vs. relational databaseIn principle a graph database can be implemented

▶ … via a relational database▶ … using joins or consecutive queries

But, relational databases are not optimised for such graph models▶ … i.e., lot of sparse (semi-empty) rows

Additionally, relational databases are not designed▶ … for changes in the schema, e.g. dynamic types of relations

Roman Kern (ISDS, TU Graz) Graph/TimeDBs 2018-10-22 6 / 30

Page 7: GraphandTimeseriesDatabases - Graz University of Technologykti.tugraz.at/staff/rkern/courses/dbase2/slides_graphdbase.pdf · GraphandTimeseriesDatabases Databases2(VU)(706.711/707.030)

Introduction

Figure: Comparison of import time for 2 graph databases vs. a relational DBRoman Kern (ISDS, TU Graz) Graph/TimeDBs 2018-10-22 7 / 30

Page 8: GraphandTimeseriesDatabases - Graz University of Technologykti.tugraz.at/staff/rkern/courses/dbase2/slides_graphdbase.pdf · GraphandTimeseriesDatabases Databases2(VU)(706.711/707.030)

Introduction - Graph Databases

Main conceptsMany contemporary graph databases are based on property graphs

▶ i.e., each node and edge are associated with a set of key/values▶ … where edges are directed (and often carry a label)

Often support ACID properties▶ i.e., each modification takes place within a transaction

Roman Kern (ISDS, TU Graz) Graph/TimeDBs 2018-10-22 8 / 30

Page 9: GraphandTimeseriesDatabases - Graz University of Technologykti.tugraz.at/staff/rkern/courses/dbase2/slides_graphdbase.pdf · GraphandTimeseriesDatabases Databases2(VU)(706.711/707.030)

Introduction - Graph Databases

Query types for graph databasesLookup of nodesTraversal of a graph

▶ Start at a node▶ … continue following edges▶ … until a stopping criteria has been reached▶ Breadth-first vs. depth-first

Path finding▶ Find a path between two nodes (e.g. Dijkstra, A*)

Path matching▶ Matching patterns in graph

Roman Kern (ISDS, TU Graz) Graph/TimeDBs 2018-10-22 9 / 30

Page 10: GraphandTimeseriesDatabases - Graz University of Technologykti.tugraz.at/staff/rkern/courses/dbase2/slides_graphdbase.pdf · GraphandTimeseriesDatabases Databases2(VU)(706.711/707.030)

Modelling of Graph DatabasesHow to represent the data and how to model

Roman Kern (ISDS, TU Graz) Graph/TimeDBs 2018-10-22 10 / 30

Page 11: GraphandTimeseriesDatabases - Graz University of Technologykti.tugraz.at/staff/rkern/courses/dbase2/slides_graphdbase.pdf · GraphandTimeseriesDatabases Databases2(VU)(706.711/707.030)

Modelling of Graph Databases

ApproachHow should a graph database schema look like?

▶ i.e., how is the data represented as nodes and edges

Many different ways of modelling▶ Graphs are a very flexible data structure▶ … capable of capturing many domains models

In many cases a direct mapping is possible▶ Domain model and graph model

Need to review the model▶ Validate that the graph is suited for the queries being used▶ E.g. don’t mix entities with relations

Roman Kern (ISDS, TU Graz) Graph/TimeDBs 2018-10-22 11 / 30

Page 12: GraphandTimeseriesDatabases - Graz University of Technologykti.tugraz.at/staff/rkern/courses/dbase2/slides_graphdbase.pdf · GraphandTimeseriesDatabases Databases2(VU)(706.711/707.030)

Modelling of Graph Databases

How to model for graph databases

Roman Kern (ISDS, TU Graz) Graph/TimeDBs 2018-10-22 12 / 30

Page 13: GraphandTimeseriesDatabases - Graz University of Technologykti.tugraz.at/staff/rkern/courses/dbase2/slides_graphdbase.pdf · GraphandTimeseriesDatabases Databases2(VU)(706.711/707.030)

ApplicationPractical aspects of graph databases

Roman Kern (ISDS, TU Graz) Graph/TimeDBs 2018-10-22 13 / 30

Page 14: GraphandTimeseriesDatabases - Graz University of Technologykti.tugraz.at/staff/rkern/courses/dbase2/slides_graphdbase.pdf · GraphandTimeseriesDatabases Databases2(VU)(706.711/707.030)

Application of Graph Databases

Main software tools for graph databasesNeo4j

OrientDB

TitanDB

… and many others

Roman Kern (ISDS, TU Graz) Graph/TimeDBs 2018-10-22 14 / 30

Page 15: GraphandTimeseriesDatabases - Graz University of Technologykti.tugraz.at/staff/rkern/courses/dbase2/slides_graphdbase.pdf · GraphandTimeseriesDatabases Databases2(VU)(706.711/707.030)

Application of Graph Databases

Panama Papers - IntroLeaked documents from a firm in Panama (2.6TB of data)

▶ … about offshore activitiesJournalist (around the world) were working on analysing the data

▶ … a graph database was the back-end of this activities▶ Neo4j (plus Apache Solr and Tika)

Roman Kern (ISDS, TU Graz) Graph/TimeDBs 2018-10-22 15 / 30

Page 16: GraphandTimeseriesDatabases - Graz University of Technologykti.tugraz.at/staff/rkern/courses/dbase2/slides_graphdbase.pdf · GraphandTimeseriesDatabases Databases2(VU)(706.711/707.030)

Application of Graph Databases

Panama Papers - StepsPopulate the database

▶ Analyse the documents⋆ e.g., entity extraction (detect names)

▶ → entities in the graph (entity types: company, officer, client, address, …)▶ Extract meta-data of documents▶ → properties for nodes

Detect relationships▶ e.g. using the connection of sender/receiver of E-Mails▶ → connections between the nodes

Refinement of graph▶ Manual work conducted by the journalists▶ More entity types, e.g. money flow, document types

Roman Kern (ISDS, TU Graz) Graph/TimeDBs 2018-10-22 16 / 30

Page 17: GraphandTimeseriesDatabases - Graz University of Technologykti.tugraz.at/staff/rkern/courses/dbase2/slides_graphdbase.pdf · GraphandTimeseriesDatabases Databases2(VU)(706.711/707.030)

Application of Graph Databases

https://neo4j.com/blog/analyzing-panama-papers-neo4j/Roman Kern (ISDS, TU Graz) Graph/TimeDBs 2018-10-22 17 / 30

Page 18: GraphandTimeseriesDatabases - Graz University of Technologykti.tugraz.at/staff/rkern/courses/dbase2/slides_graphdbase.pdf · GraphandTimeseriesDatabases Databases2(VU)(706.711/707.030)

Time Series DatabasesMotivation and Basics of Time Series Databases

Roman Kern (ISDS, TU Graz) Graph/TimeDBs 2018-10-22 18 / 30

Page 19: GraphandTimeseriesDatabases - Graz University of Technologykti.tugraz.at/staff/rkern/courses/dbase2/slides_graphdbase.pdf · GraphandTimeseriesDatabases Databases2(VU)(706.711/707.030)

Introduction Time Series Databases

What is a time series database?Data storage optimised for temporal data

▶ “Endless” stream of incoming data▶ … challenge for traditional databases

Often accompanied by▶ Tools to acquire time series data▶ Tools to visualise and analyse such data

Roman Kern (ISDS, TU Graz) Graph/TimeDBs 2018-10-22 19 / 30

Page 20: GraphandTimeseriesDatabases - Graz University of Technologykti.tugraz.at/staff/rkern/courses/dbase2/slides_graphdbase.pdf · GraphandTimeseriesDatabases Databases2(VU)(706.711/707.030)

Introduction Time Series Databases

Typical characteristics of time series databasesFast write/append operations

Slow update/delete operations

Scales well to huge amount of dataRetention policy

▶ i.e., to forget old data

Access restrictionsProvide (SQL-like) query languages

▶ Optimised for time range queries▶ Specialised queries for aggregates

Often rely on other storage mechanism▶ e.g., key-value store, wide-column storage

Roman Kern (ISDS, TU Graz) Graph/TimeDBs 2018-10-22 20 / 30

Page 21: GraphandTimeseriesDatabases - Graz University of Technologykti.tugraz.at/staff/rkern/courses/dbase2/slides_graphdbase.pdf · GraphandTimeseriesDatabases Databases2(VU)(706.711/707.030)

Introduction Time Series Databases

Sources of time seriesObservations

▶ Environmental, e.g., weather data, CO2

Economy▶ e.g., stock exchange data

Sensor data▶ e.g., human sensor data

Log files

Roman Kern (ISDS, TU Graz) Graph/TimeDBs 2018-10-22 21 / 30

Page 22: GraphandTimeseriesDatabases - Graz University of Technologykti.tugraz.at/staff/rkern/courses/dbase2/slides_graphdbase.pdf · GraphandTimeseriesDatabases Databases2(VU)(706.711/707.030)

Introduction Time Series Databases

Types of time series databasesLimited type of payload

▶ E.g. limited to just timestamp + number▶ → least amount of memory needed

Flexible payload▶ Allows for richer representation▶ E.g. timestamp + document

Wide-tables▶ Each row consists of many columns▶ … often hundreds of columns▶ → sparse rows

Roman Kern (ISDS, TU Graz) Graph/TimeDBs 2018-10-22 22 / 30

Page 23: GraphandTimeseriesDatabases - Graz University of Technologykti.tugraz.at/staff/rkern/courses/dbase2/slides_graphdbase.pdf · GraphandTimeseriesDatabases Databases2(VU)(706.711/707.030)

Modelling of Time Series DatabasesHow to represent the data and how to model

Roman Kern (ISDS, TU Graz) Graph/TimeDBs 2018-10-22 23 / 30

Page 24: GraphandTimeseriesDatabases - Graz University of Technologykti.tugraz.at/staff/rkern/courses/dbase2/slides_graphdbase.pdf · GraphandTimeseriesDatabases Databases2(VU)(706.711/707.030)

Modelling of Time Series Databases

ApproachOften based on single samples (observations)

▶ Univariat, bivariat, multivariat⋆ E.g. sensor readouts of multiple sensors (temperature, air pressure)

▶ Example: Measurement consists of⋆ Timestamp, metric name, value, list of filters⋆ E.g. 10:32, cpu-usage, 0.87, host=example.com, cpu=01

Flat file▶ Generic vs. specific▶ Store the name of the time series with each observation (generic)

⋆ Needed in case of dynamic systems⋆ e.g. different sensors become available or disappear

▶ Have dedicated time series (specific)

Roman Kern (ISDS, TU Graz) Graph/TimeDBs 2018-10-22 24 / 30

Page 25: GraphandTimeseriesDatabases - Graz University of Technologykti.tugraz.at/staff/rkern/courses/dbase2/slides_graphdbase.pdf · GraphandTimeseriesDatabases Databases2(VU)(706.711/707.030)

Modelling of Time Series Databases

ApproachWindowed storage

▶ Each row represent a time window▶ Columns for a more fine grained resolution

⋆ Typically between 100 and 1000 observations per row▶ Alternatively, multiple observations are stored in a single columns

⋆ Using a custom (compressed) format

Special case: temporal and spatial data▶ Requires specialised look-up methods

Roman Kern (ISDS, TU Graz) Graph/TimeDBs 2018-10-22 25 / 30

Page 26: GraphandTimeseriesDatabases - Graz University of Technologykti.tugraz.at/staff/rkern/courses/dbase2/slides_graphdbase.pdf · GraphandTimeseriesDatabases Databases2(VU)(706.711/707.030)

Example for Time Series DatabasesPractical Aspects of Time Series Databases

Roman Kern (ISDS, TU Graz) Graph/TimeDBs 2018-10-22 26 / 30

Page 27: GraphandTimeseriesDatabases - Graz University of Technologykti.tugraz.at/staff/rkern/courses/dbase2/slides_graphdbase.pdf · GraphandTimeseriesDatabases Databases2(VU)(706.711/707.030)

Time Series Databases Example

TICK StackCollection of tools:

▶ Telegraf: server agent for collecting and reporting metrics (stream or batch processing) towrite data into the DB

▶ InfluxDB: the time series database component▶ Chronograf: Graphing and visualisation frontend for exploration▶ Kapacitor: Data processing engine, can process stream and batch data

https://www.influxdata.com/wp-content/themes/influx/images/TICK-Stack.png

Roman Kern (ISDS, TU Graz) Graph/TimeDBs 2018-10-22 27 / 30

Page 28: GraphandTimeseriesDatabases - Graz University of Technologykti.tugraz.at/staff/rkern/courses/dbase2/slides_graphdbase.pdf · GraphandTimeseriesDatabases Databases2(VU)(706.711/707.030)

Time Series Databases Example

InfluxDB FeaturesTags

▶ Tags are indexed▶ store commonly-queried meta data▶ if “GROUP BY” should be used on the data

Fields▶ Fields are not indexed▶ Everything that should not be stored as string▶ If aggregation functions should be used on the data (COUNT, MAX, PERCENTILE, CUMSUM)

Roman Kern (ISDS, TU Graz) Graph/TimeDBs 2018-10-22 28 / 30

Page 29: GraphandTimeseriesDatabases - Graz University of Technologykti.tugraz.at/staff/rkern/courses/dbase2/slides_graphdbase.pdf · GraphandTimeseriesDatabases Databases2(VU)(706.711/707.030)

Time Series Databases Example

Figure: Screenshot of example data stored in InfluxDB

Roman Kern (ISDS, TU Graz) Graph/TimeDBs 2018-10-22 29 / 30

Page 30: GraphandTimeseriesDatabases - Graz University of Technologykti.tugraz.at/staff/rkern/courses/dbase2/slides_graphdbase.pdf · GraphandTimeseriesDatabases Databases2(VU)(706.711/707.030)

The EndNext: Map/Reduce

Roman Kern (ISDS, TU Graz) Graph/TimeDBs 2018-10-22 30 / 30