Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smart Data Management"

40
DEMONSTRATING THE SOCIETAL VALUE OF BIG & SMART DATA MANAGEMENT Apache Big_Data Europe, Seville 14 November 2016

Transcript of Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smart Data Management"

Page 1: Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smart Data Management"

DEMONSTRATING THE SOCIETAL VALUE OF BIG & SMART DATA

MANAGEMENT

Apache Big_Data Europe, Seville14 November 2016

Page 2: Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smart Data Management"

Talk outline

� The BigDataEurope Project & Mission

� The Big Data Integrator (BDI) platform

� 7 Pilots for the 7 Societal Challenge Domains

� A look into the BDI platform [DEMO]

� Collocated Event – Today @ 16:30pm

14-nov.-16www.big-data-europe.eu

Page 3: Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smart Data Management"

Supporting the Societal Domains with Big Data Technology

BigDataEurope Project

14-nov.-16www.big-data-europe.eu

Page 4: Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smart Data Management"

BigDataEurope Action

� EC Horizon 2020 Coordination & Support Action

o ~5mio €, 2015-2017

� Show societal value of Big Data

o Across all societal challenges addressed by H2020

� Lower barrier for using big data technologies

o Effort and resources to convert tools and workflows

o Skills and expertise

� Help establish data value chains across domains & orgs.

14-nov.-16www.big-data-europe.eu

Page 5: Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smart Data Management"

Consortium

NCSRDEMOKRITOS

Page 6: Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smart Data Management"

Stakeholder Engagement Cycle

� Present action, showcase deployments

� Raise awareness about BDE results, what they mean for stakeholders

� Collect requirements to drive further development

14-nov.-16

www.big-data-europe.eu

M12M6 M18 M24 M30

Page 7: Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smart Data Management"

Data Value Chain Evolution

14-nov.-16

Extraction, CurationQuality, Linking,

Integration

Publication,

Visualization, Analysis

Extraction, Curation, Quality,

Linking, Integration, Publication,

Visualization, Analysis

Health

Transport

Security

Extraction Curation Quality Linking Integration Publication Visualization Analysis

Data Repositories

Linked Open Data

TIME

Food SocietiesClimate Energy

Proprietary,

‘locked-in’

solutions

OS Solutions,

Big Data Stackswww.big-data-europe.eu

Page 8: Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smart Data Management"

Quelle: Gesellschaft für Informatik

Variety – The most neglected V?

� Data Source Heterogeneity

� Lack of interoperability/semantics

Page 9: Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smart Data Management"

A flexible, generic platform for (Big) Data Value Chain Deployment

Big Data Integrator

14-nov.-16www.big-data-europe.eu

Page 10: Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smart Data Management"

Big Data Integrator

� Prototype developed by BDE

o Incorporates existing BD technology

o Facilitates integration and deployment

� Main points of the architecture

o Dockerization

o Support layer, including integrated UI

o Semantification layer14-nov.-16www.big-data-europe.eu

Page 11: Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smart Data Management"

Generic Architecture

14-nov.-16www.big-data-europe.eu

� Plug-and-play BD Platform

� Cloud-deployment ready

� Domain independent, Customisable

� Stacks Open Source solutions

BDI Prototype Releases

1. [July 2016]

2. December 2016

3. ….

Page 12: Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smart Data Management"

Docker containers

14-nov.-16www.big-data-europe.eu

� Docker offers lightweight virtualizationo Containers can be shared/provisioned on different Linux variations/versions

� Identical base system

o NOT Required

� All BDI components

o Docker containers

Page 13: Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smart Data Management"

BDI Docker Containers (so far)

14-nov.-16www.big-data-europe.eu

� Data serving: HDFS, Cassandra, 4store, PostGIS, Strabon, Elastic Search, Hive, Semagrow

� Processing: Spark, Flink, Sansa

� Stream ingestion middleware: Flume, Kafka

Page 14: Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smart Data Management"

BDI Instances – An example

14-nov.-16www.big-data-europe.eu

� Processing and storage components

o Re-used existing docker containers (where available)

o Dockerized by BDE otherwise

o Ensuring all can be provisioned through Docker Swarm

� Other BDI Components:

o Support Layer

o Semantic Layer

Page 15: Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smart Data Management"

Support Layer

14-nov.-16www.big-data-europe.eu

� Integrator UI

o Web UIs from BDE dockers (including 3rd party components) follow these BDE stylesheets

� Stack Monitor App

o Configure Stack order

� Swarm UI o Launch, Install

and Manage Stacks

Stack

Page 16: Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smart Data Management"

Semantic Layer

www.big-data-europe.eu

� Semantic Data Lakes

o Minimal ingestion pre-processing

o Semantic layer maintains metadata

o Add meaning when retrieving/processing

Data Lake: scalable unstructured data store

Relationship definitions and metadata

JSON-LD CSVW R2RMLXML2RDF

� Ongoing Research for Semantic Big Data & Analytics

Knowledge Graphs

Page 17: Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smart Data Management"

Semantic Layer tools

14-nov.-16www.big-data-europe.eu

� BDE tooling for Semantic Data Lake:

o Swagger: Semantics of RESTful APIs

o Semantic Analytics Stack (SANSA): Distributed data processing over large-scale Knowledge Graphs

o Semagrow: SPARQL over Big Data stores

o Ontario: Querying over Semantic Data Lakes

Page 18: Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smart Data Management"

More Information

� Big Data Integrator:

https://github.com/big-data-europe

� README includes extensive documentation, instructions and information on supported components

� “Integrators at Work! Real-Life Applications of Apache Big Data Components” @4:30 PM

o Includes more details & demo

14-nov.-16www.big-data-europe.eu

Page 19: Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smart Data Management"

Demonstrating the Societal Value through 7 Pilot ‘Real-world’ use-cases

BigDataEurope Pilots

14-nov.-16www.big-data-europe.eu

Page 20: Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smart Data Management"

Pilots: Overview

� SC1: Health & Pharm.

� SC2: Food & Agr.

� SC3: Energy

� SC4: Transport

14-nov.-16www.big-data-europe.eu

� SC5: Climate

� SC6: Social Sciences

� SC7: Security

Page 21: Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smart Data Management"

7 Pilots

◎ BDI Platform Instantiationso Allow end-users to easily deploy functionality in own system environment o Modularized Docker approach - easier to replace componentso Reduces effort to keep 3rd party software updated & integrated

◎ 7 Societal Challenge Pilots o Aligned with 7 European Commision H2020 Societal Challengeso Real-world use-cases (Data, Objectives, Solutions)o Some pilots have different data & objectives but a similar solution

14-nov.-16www.big-data-europe.eu

Page 22: Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smart Data Management"

SC1: Pharmacology research

14-nov.-16

www.big-data-europe.eu

Life Sciences & Health

• Query a large number of datasets, some large

• Existing elaborate ingestion and homogenization by OpenPHACTS

• Extensive toolset developed by OPF and others

Objective: Large-scale heterogeneous pharma-

research data linking & integration

Page 23: Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smart Data Management"

SC1: Architecture & Components

14-nov.-16www.big-data-europe.eu

• Replicate Open PHACTS functionality on the BDE infrastructure using OS solutions• Based on Virtuoso, proprietary

distributed database

• Apply to other domains (e.g. Agriculture)

• Porting to BDI gives flexibility and enables new functionalities• Logging & system health monitoring

Page 24: Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smart Data Management"

SC2: Viticulture resources

14-nov.-16www.big-data-europe.eu

Food and Agriculture

Objective: Automate publication ingestion and

thematic classification• AgInfra is a major

infrastructure for agriculture researchers, serving cross-linked bibliography, data, and processing services

Page 25: Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smart Data Management"

www.big-data-europe.eu

SC2: Architecture & Components

• BDI deployed as an external infrastructure for processing text (viticulture publications)

• Storing and processing text at a larger scale than AgInfracan currently manage

Page 26: Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smart Data Management"

SC3: Predictive maintenance

14-nov.-16www.big-data-europe.eu

Energy

• Wind turbine monitoring applies computational models to sensor data streams

• Models are weekly re-parameterized using week’s data from multiple turbines

Objective: Real-time turbine monitoring stream

processing and analytics

Page 27: Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smart Data Management"

www.big-data-europe.eu

• Existing in-house non-scalable solution for model parameterization• Reliable Fortran software for data analysis• Efficient, but not scalable to data volume

• Developing a BDI orchestrator• Re-uses existing software unmodified• Makes it easy to apply in parallel to many

datasets and manage the outputs

SC3: Architecture & Components

Page 28: Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smart Data Management"

SC4: Traffic conditions estimation

14-nov.-16www.big-data-europe.eu

Transport

• Combines:• Traffic modelling from

historical data• Current measurements from a

taxi fleet of 1200 vehicles

Objective: Estimation of real-time traffic

conditions in Thessaloniki

Page 29: Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smart Data Management"

14-nov.-16www.big-data-europe.eu

• New Flink implementations of map matching and traffic prediction algorithms

• BDI provides access to varied data sources• PostGIS database with

city map• ElasticSearch database

of historical data• Kafka stream of real-

time data

SC4: Architecture & Components

Page 30: Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smart Data Management"

SC5: Climate modelling

14-nov.-16www.big-data-europe.eu

Climate

• Preparing modelling experiments• Slicing, transforming, combining datasets• Submission and retrieval from modelling

infrastructure

• Discovering and re-using previously computed derivatives• Lineage annotation: computer derivatives

from datasets and model parameters• Finding appropriate past runs avoids

repeating weeks-long modelling runs

Objective: Supporting data-intensive climate research

Page 31: Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smart Data Management"

• BDI offers:• Hive for managing data

in a way that can be retrieved and manipulated, rather than file blocks

• Cassandra stores structured and textual metadata for searching headers and lineage

• Existing infrastructure; stable, reliable software for parallel computation of models• BDI is deployed as an external infrastructure for preparing and managing datasets

SC5: Architecture & Components

Page 32: Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smart Data Management"

SC6: Municipality budgets

14-nov.-16www.big-data-europe.eu

Social Sciences

• Ingestion of budget and budget execution data

• Multiple municipalities in varied formats and data models

Objective: Homogenized Budgetary data made

available for analysis and comparison

Page 33: Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smart Data Management"

14-nov.-16www.big-data-europe.eu

• BDI deployed as ingestion and storage infrastructure for external tools• Homogenizes variety of

data (JSON, CSV, XML, etc.)

• Exposes data as SPARQL endpoint serving

homogenized data

• Existing analytics and visualization tools• Use SPARQL queries to retrieve only the relevant slices of the overall data

SC6: Architecture & Components

Page 34: Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smart Data Management"

SC7: Change detection & verification

14-nov.-16www.big-data-europe.eu

Secure Societies

• Events are extracted from text published by news agencies and on social networking sites

• Events are geo-located and relevant changes are detected by comparing current and previous satellite images

Objective: Detect and Verify Events based on Satellite

Imagery, News and Social Media

Page 35: Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smart Data Management"

14-nov.-16www.big-data-europe.eu

Event Detection

Change Detection

• Re-implementation of change detection algorithms for Spark

• Parallel orchestrator for text analytics• Re-uses existing software• Scales to many input streams

• BDI provides:• Cassandra for text content and

metadata• Strabon GIS store for detected

change location• Homogeneous access to both for

analysis and visualization

SC7: Architecture & Components

Page 36: Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smart Data Management"

Free Workshops, Hangouts & Webinars

BigDataEurope Activities

14-nov.-16www.big-data-europe.eu

Page 37: Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smart Data Management"

2nd round of Societal Workshops

14-nov.-16www.big-data-europe.eu

Transport 22 September 2016 Brussels Collocated with Big Data for

Transport, Tisa workshop

Food&Agri 30 September 2016 Brussels Collocated with DG AGRI WP2018-

20 stakeholder consultation

Energy 4 October 2016 Brussels Collocated with EC H2020 Info Day

on “Smart Grids and Storage”

Climate 11 October 2016 Brussels Collocated with Melodies Project

Event – Exploiting Open Data

Security 18 October 2016 Brussels Standalone Workshop

Societies 5 December 2016 Cologne Collocated with EDDI16- 8th Annual

European DDI User Conference

Health 9 December 2016 Brussels Standalone Workshop

Page 38: Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smart Data Management"

Other Activities

� Fresh set (7) of Societal Workshops in 2017

� Various SC-focussed and general hangouts, follow!

o Apache Flink & BDE (20 Oct) – available online

o More to follow!

o Keep track on BDE Website (Events)

14-nov.-16www.big-data-europe.eu

Page 39: Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smart Data Management"

Demonstrating the ease-of-use in deploying custom instances of the BDI Platform

BDI Platform – A Demo

14-nov.-16www.big-data-europe.eu

Page 40: Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smart Data Management"

WEB: www.big-data-europe.eu EMAIL: [email protected]

BIG DATA INTEGRATOR

www.github.com/big-data-europe

PROJECT COORDINATION (Fraunhofer IAIS)

Prof. Sören Auer, auer © cs.uni-bonn · de > Dr. Simon Scerri, scerri © cs.uni-bonn · deEIS Department/Group,Fraunhofer IAIS & CS Department Uni-Bonn, Bonn, Germany

Questions & Contacts

www.big-data-europe.eu14-nov.-16

#BigDataEurope

leads the Fraunhofer

Big Data Alliance