GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA … · GRAPH-BASED DATA INTEGRATION AND...
Transcript of GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA … · GRAPH-BASED DATA INTEGRATION AND...
![Page 1: GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA … · GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA ERHARD RAHM Big is changing quickly Gigabytes Terabytes (1012)](https://reader031.fdocuments.net/reader031/viewer/2022021914/5c6df3ea09d3f201028c5e11/html5/thumbnails/1.jpg)
www.scads.de
GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA
ERHARD RAHM
![Page 2: GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA … · GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA ERHARD RAHM Big is changing quickly Gigabytes Terabytes (1012)](https://reader031.fdocuments.net/reader031/viewer/2022021914/5c6df3ea09d3f201028c5e11/html5/thumbnails/2.jpg)
Big is changing quickly Gigabytes Terabytes (1012) Petabytes (1015) Exabytes (1018) Zettabytes (1021), Yottabytes (1024), Brontobytes (1027), …
by 2020 about 40 ZB of data will be generated
HOW BIG IS BIG DATA?
Source: IDC
data growth(in Exabytes)
![Page 3: GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA … · GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA ERHARD RAHM Big is changing quickly Gigabytes Terabytes (1012)](https://reader031.fdocuments.net/reader031/viewer/2022021914/5c6df3ea09d3f201028c5e11/html5/thumbnails/3.jpg)
DATA CENTER
Source: Google Inc.
![Page 4: GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA … · GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA ERHARD RAHM Big is changing quickly Gigabytes Terabytes (1012)](https://reader031.fdocuments.net/reader031/viewer/2022021914/5c6df3ea09d3f201028c5e11/html5/thumbnails/4.jpg)
BIG DATA CHALLENGES
Big Data
VolumePetabytes / exabytes of data
Velocityfast analysis of data streams
Varietyheterogeneousdata of different kinds
Veracityhigh data quality
Valueuseful analysisresults
![Page 5: GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA … · GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA ERHARD RAHM Big is changing quickly Gigabytes Terabytes (1012)](https://reader031.fdocuments.net/reader031/viewer/2022021914/5c6df3ea09d3f201028c5e11/html5/thumbnails/5.jpg)
BIG DATA ANALYSIS PIPELINE
5
Dataintegration/annotation
Data extraction /
cleaning
Dataaquisition
Data analysis andvisualization
Inter-pretation
Varie
ty
Volu
me
Velo
city
Vera
city
Priva
cy
![Page 6: GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA … · GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA ERHARD RAHM Big is changing quickly Gigabytes Terabytes (1012)](https://reader031.fdocuments.net/reader031/viewer/2022021914/5c6df3ea09d3f201028c5e11/html5/thumbnails/6.jpg)
„GRAPHS ARE EVERYWHERE“
6
Facebookca. 1.3 billion usersca. 340 friends per user
Twitterca. 300 million usersca. 500 million tweets per day
Internetca. 2.9 billion users
Gene (human)20,000-25,000ca. 4 million individuals
Patients> 18 millions (Germany)
Illnesses> 30.000
World Wide Webca. 1 billion Websites
LOD-Cloudca. 90 billion triples
Social science Engineering Life science Information science
![Page 7: GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA … · GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA ERHARD RAHM Big is changing quickly Gigabytes Terabytes (1012)](https://reader031.fdocuments.net/reader031/viewer/2022021914/5c6df3ea09d3f201028c5e11/html5/thumbnails/7.jpg)
,
“GRAPHS ARE EVERYWHERE”
7
![Page 8: GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA … · GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA ERHARD RAHM Big is changing quickly Gigabytes Terabytes (1012)](https://reader031.fdocuments.net/reader031/viewer/2022021914/5c6df3ea09d3f201028c5e11/html5/thumbnails/8.jpg)
,
“GRAPHS ARE EVERYWHERE”
Alice
Bob
Eve
Dave
Carol
Mallory
Peggy
Trent
8
![Page 9: GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA … · GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA ERHARD RAHM Big is changing quickly Gigabytes Terabytes (1012)](https://reader031.fdocuments.net/reader031/viewer/2022021914/5c6df3ea09d3f201028c5e11/html5/thumbnails/9.jpg)
,
“GRAPHS ARE EVERYWHERE”
Alice
Bob
Eve
Dave
Carol
Mallory
Peggy
Trent
9
![Page 10: GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA … · GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA ERHARD RAHM Big is changing quickly Gigabytes Terabytes (1012)](https://reader031.fdocuments.net/reader031/viewer/2022021914/5c6df3ea09d3f201028c5e11/html5/thumbnails/10.jpg)
“GRAPHS ARE HETEROGENEOUS”
Alice
Bob
AC/DC
Dave
Carol
Mallory
Peggy
Metallica
∪ , ∪
10
![Page 11: GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA … · GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA ERHARD RAHM Big is changing quickly Gigabytes Terabytes (1012)](https://reader031.fdocuments.net/reader031/viewer/2022021914/5c6df3ea09d3f201028c5e11/html5/thumbnails/11.jpg)
0.2
0.28
0.26
0.33
0.25
0.26
“GRAPHS CAN BE ANALYZED”
Alice
Bob
AC/DC
Dave
Carol
Mallory
Peggy
Metallica
3.62.82
∪ , ∪
11
![Page 12: GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA … · GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA ERHARD RAHM Big is changing quickly Gigabytes Terabytes (1012)](https://reader031.fdocuments.net/reader031/viewer/2022021914/5c6df3ea09d3f201028c5e11/html5/thumbnails/12.jpg)
“GRAPHS CAN BE ANALYZED“
Assuming a social network1. Determine subgraph2. Find communities3. Filter communities4. Find common subgraph
12
![Page 13: GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA … · GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA ERHARD RAHM Big is changing quickly Gigabytes Terabytes (1012)](https://reader031.fdocuments.net/reader031/viewer/2022021914/5c6df3ea09d3f201028c5e11/html5/thumbnails/13.jpg)
all V challenges (volume, variety, velocity, veracity) ease-of-use high cost-effectiveness powerful but easy to use graph data model support for heterogeneous, schema-flexible vertices and edges support for collections of graphs (not only 1 graph) powerful graph operators
graph-based integration of many data sources
versioning and evolution (dynamic /temporal graphs)
interactive, declarative graph queries
scalable graph mining
comprehensive visualization support
GRAPH DATA ANALYTICS: REQUIREMENTS
13
![Page 14: GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA … · GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA ERHARD RAHM Big is changing quickly Gigabytes Terabytes (1012)](https://reader031.fdocuments.net/reader031/viewer/2022021914/5c6df3ea09d3f201028c5e11/html5/thumbnails/14.jpg)
COMPARISON
14
Graph Database SystemsNeo4j, OrientDB
data model rich graph models(PGM)
focus queries
query language yes
graph analytics no
scalability vertical
Workflows no
persistency yes
dynamic graphs / versioning
no
data integration no
visualization (yes)
![Page 15: GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA … · GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA ERHARD RAHM Big is changing quickly Gigabytes Terabytes (1012)](https://reader031.fdocuments.net/reader031/viewer/2022021914/5c6df3ea09d3f201028c5e11/html5/thumbnails/15.jpg)
COMPARISON (2)
15
Graph Database SystemsNeo4j, OrientDB
Graph Processing Systems(Pregel, Giraph)
data model rich graph models(PGM)
genericgraph models
focus queries analytic
query language yes no
graph analytics no yes
scalability vertical horizontal
Workflows no no
persistency yes no
dynamic graphs / versioning
no no
data integration no no
visualization (yes) no
![Page 16: GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA … · GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA ERHARD RAHM Big is changing quickly Gigabytes Terabytes (1012)](https://reader031.fdocuments.net/reader031/viewer/2022021914/5c6df3ea09d3f201028c5e11/html5/thumbnails/16.jpg)
COMPARISON (3)
16
Graph Database SystemsNeo4j, OrientDB
Graph Processing Systems(Pregel, Giraph)
Distributed DataflowSystems (Flink Gelly, Spark GraphX)
data model rich graph models(PGM)
genericgraph models
genericgraph models
focus queries analytic analytic
query language yes no no
graph analytics no yes yes
scalability vertical horizontal horizontal
Workflows no no yes
persistency yes no no
dynamic graphs / versioning
no no no
data integration no no no
visualization (yes) no no
![Page 17: GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA … · GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA ERHARD RAHM Big is changing quickly Gigabytes Terabytes (1012)](https://reader031.fdocuments.net/reader031/viewer/2022021914/5c6df3ea09d3f201028c5e11/html5/thumbnails/17.jpg)
An end-to-end framework and research platform forefficient, distributed and domain independent graph
data management and analytics.
WHAT‘S MISSING?
17
![Page 18: GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA … · GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA ERHARD RAHM Big is changing quickly Gigabytes Terabytes (1012)](https://reader031.fdocuments.net/reader031/viewer/2022021914/5c6df3ea09d3f201028c5e11/html5/thumbnails/18.jpg)
Data Volume and Problem Complexity
Ease
-of-u
se
Graph Processing Systems
Graph Databases
Graph Dataflow Systems Gelly
18
![Page 19: GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA … · GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA ERHARD RAHM Big is changing quickly Gigabytes Terabytes (1012)](https://reader031.fdocuments.net/reader031/viewer/2022021914/5c6df3ea09d3f201028c5e11/html5/thumbnails/19.jpg)
Intro Graph Analytics Graph data Requirements Graph database vs graph processing systems
Gradoop architecture and data integration
Extended Property Graph Model (EPGM) Data organization and operators Implementation
Performance Evaluation
Summary/Outlook
AGENDA
19
![Page 20: GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA … · GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA ERHARD RAHM Big is changing quickly Gigabytes Terabytes (1012)](https://reader031.fdocuments.net/reader031/viewer/2022021914/5c6df3ea09d3f201028c5e11/html5/thumbnails/20.jpg)
Hadoop-based framework for graph data management and analysis persistent graph storage in scalable distributed store (Hbase) utilization of powerful dataflow system (Apache Flink) for parallel,
in-memory processing
Extended property graph data model (EPGM) operators on graphs and sets of (sub) graphs support for semantic graph queries and mining
Declarative specification of graph analysis workflows Graph Analytical Language - GrALa
End-to-end functionality graph-based data integration, data analysis and visualization
Open-source implementation: www.gradoop.org
GRADOOP CHARACTERISTICS
20
![Page 21: GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA … · GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA ERHARD RAHM Big is changing quickly Gigabytes Terabytes (1012)](https://reader031.fdocuments.net/reader031/viewer/2022021914/5c6df3ea09d3f201028c5e11/html5/thumbnails/21.jpg)
integrate data from one or more sources into a dedicated graph storewith common graph data model
definition of analytical workflows from operator algebra
result representation in meaningful way
END-TO-END GRAPH ANALYTICS
Data Integration Graph Analytics Visualization
21
![Page 22: GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA … · GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA ERHARD RAHM Big is changing quickly Gigabytes Terabytes (1012)](https://reader031.fdocuments.net/reader031/viewer/2022021914/5c6df3ea09d3f201028c5e11/html5/thumbnails/22.jpg)
HIGH LEVEL ARCHITECTURE
HDFS/YARNCluster
HBase Distributed Graph Store
Extended Property Graph Model
Flink Operator Implementations
Data Integration
Flink Operator Execution
Workflow Declaration
Visual
GrALa DSLRepresentation
Data flow
Control flow
Graph Analytics Representation
22
![Page 23: GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA … · GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA ERHARD RAHM Big is changing quickly Gigabytes Terabytes (1012)](https://reader031.fdocuments.net/reader031/viewer/2022021914/5c6df3ea09d3f201028c5e11/html5/thumbnails/23.jpg)
BIIIG: Business Intelligence on Integrated Instance Graphs
heterogeneous data sources are integrated within an instance graph by preserving original relationships between data objects transactional and master data
largely automated extraction of metadata and instance data andtransformation into graphs fusion of matching entities and relations
extraction of subgraphs (business transaction graphs) related to interrelated business activities
analysis of graphs/subgraphs with aggregation queries, pattern mining etc.
GRAPH-BASED BUSINESS INTELLIGENCE
23
![Page 24: GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA … · GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA ERHARD RAHM Big is changing quickly Gigabytes Terabytes (1012)](https://reader031.fdocuments.net/reader031/viewer/2022021914/5c6df3ea09d3f201028c5e11/html5/thumbnails/24.jpg)
DATA INTEGRATION AND ANALYSIS WORKFLOW
24
„Business Intelligence on Integrated Instance Graphs (BIIIG)“ (PVLDB 2014)
Business Transaction Graphs
(3) SubgraphIsolation
(2) Graph integration
Integrated Instance Graph
Domain expert
metadata
(1) Graph transformation
Dat
a So
urce
s
![Page 25: GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA … · GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA ERHARD RAHM Big is changing quickly Gigabytes Terabytes (1012)](https://reader031.fdocuments.net/reader031/viewer/2022021914/5c6df3ea09d3f201028c5e11/html5/thumbnails/25.jpg)
SAMPLE INSTANCE GRAPH
25
![Page 26: GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA … · GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA ERHARD RAHM Big is changing quickly Gigabytes Terabytes (1012)](https://reader031.fdocuments.net/reader031/viewer/2022021914/5c6df3ea09d3f201028c5e11/html5/thumbnails/26.jpg)
SCREENSHOT OF NEO4J IMPLEMENTATION
26
![Page 27: GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA … · GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA ERHARD RAHM Big is changing quickly Gigabytes Terabytes (1012)](https://reader031.fdocuments.net/reader031/viewer/2022021914/5c6df3ea09d3f201028c5e11/html5/thumbnails/27.jpg)
Intro Graph Analytics Graph data Requirements Graph database vs graph processing systems
Gradoop architecture and data integration
Extended Property Graph Model (EPGM) Data organization and operators Implementation
Performance Evaluation
Summary/Outlook
AGENDA
27
![Page 28: GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA … · GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA ERHARD RAHM Big is changing quickly Gigabytes Terabytes (1012)](https://reader031.fdocuments.net/reader031/viewer/2022021914/5c6df3ea09d3f201028c5e11/html5/thumbnails/28.jpg)
includes PGM as special case
support for collections of logical graphs / subgraphs can be defined explicitly can be result of graph algorithms / operators
support for graph properties
powerful operators on both graphs and graph collections
Graph Analytical Language – GrALa domain-specific language (DSL) for EPGM flexible use of operators with application-specific UDFs plugin concept for graph mining algorithms
EXTENDED PROPERTY GRAPH MODEL (EPGM)
28
![Page 29: GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA … · GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA ERHARD RAHM Big is changing quickly Gigabytes Terabytes (1012)](https://reader031.fdocuments.net/reader031/viewer/2022021914/5c6df3ea09d3f201028c5e11/html5/thumbnails/29.jpg)
• Vertices and directed Edges
29
![Page 30: GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA … · GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA ERHARD RAHM Big is changing quickly Gigabytes Terabytes (1012)](https://reader031.fdocuments.net/reader031/viewer/2022021914/5c6df3ea09d3f201028c5e11/html5/thumbnails/30.jpg)
• Vertices and directed Edges• Logical Graphs
30
![Page 31: GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA … · GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA ERHARD RAHM Big is changing quickly Gigabytes Terabytes (1012)](https://reader031.fdocuments.net/reader031/viewer/2022021914/5c6df3ea09d3f201028c5e11/html5/thumbnails/31.jpg)
• Vertices and directed Edges• Logical Graphs• Identifiers
1 3
4
5
21 2
3
4
5
1
2
31
![Page 32: GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA … · GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA ERHARD RAHM Big is changing quickly Gigabytes Terabytes (1012)](https://reader031.fdocuments.net/reader031/viewer/2022021914/5c6df3ea09d3f201028c5e11/html5/thumbnails/32.jpg)
• Vertices and directed Edges• Logical Graphs• Identifiers• Type Labels
1 3
4
5
21 2
3
4
5Person Band
Person
Person
Band
likes likes
likes
knows
likes
1|Community
2|Community
32
![Page 33: GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA … · GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA ERHARD RAHM Big is changing quickly Gigabytes Terabytes (1012)](https://reader031.fdocuments.net/reader031/viewer/2022021914/5c6df3ea09d3f201028c5e11/html5/thumbnails/33.jpg)
• Vertices and directed Edges• Logical Graphs• Identifiers• Type Labels• Properties
1 3
4
5
21 2
3
4
5Personname : Aliceborn : 1984
Bandname : Metallicafounded : 1981
Personname : Bob
Personname : Eve
Bandname : AC/DCfounded : 1973
likessince : 2014
likessince : 2013
likessince : 2015
knows
likessince : 2014
1|Community|interest:Heavy Metal
2|Community|interest:Hard Rock
33
![Page 34: GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA … · GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA ERHARD RAHM Big is changing quickly Gigabytes Terabytes (1012)](https://reader031.fdocuments.net/reader031/viewer/2022021914/5c6df3ea09d3f201028c5e11/html5/thumbnails/34.jpg)
Operators
34
![Page 35: GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA … · GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA ERHARD RAHM Big is changing quickly Gigabytes Terabytes (1012)](https://reader031.fdocuments.net/reader031/viewer/2022021914/5c6df3ea09d3f201028c5e11/html5/thumbnails/35.jpg)
Operators
Unary BinaryAlgorithms
* auxiliary
Graph
Collection
LogicalG
raph
Aggregation
Pattern Matching
Transformation
Grouping Equality
Call *
Combination
Overlap
Exclusion
Equality
Union
IntersectionDifference
Gelly Library
BTG Extraction
Frequent Subgraphs
Limit
Selection
DistinctSort
Apply *Reduce *Call *
Adaptive Partitioning
Subgraph
35
![Page 36: GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA … · GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA ERHARD RAHM Big is changing quickly Gigabytes Terabytes (1012)](https://reader031.fdocuments.net/reader031/viewer/2022021914/5c6df3ea09d3f201028c5e11/html5/thumbnails/36.jpg)
Combination
Overlap
Exclusion
LogicalGraph graph3 = graph1.combine(graph2);LogicalGraph graph4 = graph1.overlap(graph2);LogicalGraph graph5 = graph1.exclude(graph2);
BASIC BINARY OPERATORS
1 34
52
3
1 2
1 34
52
12 4
5
3
36
![Page 37: GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA … · GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA ERHARD RAHM Big is changing quickly Gigabytes Terabytes (1012)](https://reader031.fdocuments.net/reader031/viewer/2022021914/5c6df3ea09d3f201028c5e11/html5/thumbnails/37.jpg)
udf = (graph => graph[‘vertexCount’] = graph.vertices.size())graph3 = graph3.aggregate(udf)
AGGREGATION
1 34
52
3
1 34
52
3 | vertexCount: 5
UDF
37
![Page 38: GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA … · GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA ERHARD RAHM Big is changing quickly Gigabytes Terabytes (1012)](https://reader031.fdocuments.net/reader031/viewer/2022021914/5c6df3ea09d3f201028c5e11/html5/thumbnails/38.jpg)
LogicalGraph graph4 = graph3.subgraph((vertex => vertex[:label] == ‘green’))LogicalGraph graph5 = graph3.subgraph((edge => edge[:label] == ‘blue’))LogicalGraph graph6 = graph3.subgraph(
(vertex => vertex[:label] == ‘green’),(edge => edge[:label] == ‘orange’))
SUBGRAPH
3
1 34
52
3
4
1 2
5
35
2UDF
UDF
UDF 3
6
1 2
38
![Page 39: GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA … · GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA ERHARD RAHM Big is changing quickly Gigabytes Terabytes (1012)](https://reader031.fdocuments.net/reader031/viewer/2022021914/5c6df3ea09d3f201028c5e11/html5/thumbnails/39.jpg)
GraphCollection collection = graph3.match(“(:Green)‐[:orange]‐>(:Orange)”);
PATTERN MATCHING
3
1 34
52 Pattern
4 5
1 34
2
Graph Collection
39
new: support of Cypher query language for pattern matching*
* Junghanns et al.: Cypher-based Graph Pattern Matching in Gradoop. Proc. GRADES 2017
q = "MATCH (p1: Person ) ‐[e: knows *1..3] ‐>( p2: Person)WHERE p1.gender <> p2 .gender RETURN *"
GraphCollection matches = g.cypher (q)
![Page 40: GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA … · GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA ERHARD RAHM Big is changing quickly Gigabytes Terabytes (1012)](https://reader031.fdocuments.net/reader031/viewer/2022021914/5c6df3ea09d3f201028c5e11/html5/thumbnails/40.jpg)
LogicalGraph grouped = graph3.groupBy([:label], // vertex keys[:label]) // edge keys
LogicalGraph grouped = graph3.groupBy([:label], [COUNT()], [:label], [MAX(‘a’)])
GROUPING
Keys
3
1 34
52
+Aggregate
3
a:23 a:84
a:42
a:12
1 34
52
a:13
a:21
4
count:2 count:3
max(a):42
max(a):84
max(a):13 max(a):21
6 7
4
6 7
40
![Page 41: GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA … · GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA ERHARD RAHM Big is changing quickly Gigabytes Terabytes (1012)](https://reader031.fdocuments.net/reader031/viewer/2022021914/5c6df3ea09d3f201028c5e11/html5/thumbnails/41.jpg)
SAMPLE GRAPH
[0] Tagname : Databases
[1] Tagname : Graphs
[2] Tagname : Hadoop
[3] Forumtitle : Graph Databases
[4] Forumtitle : Graph Processing
[5] Personname : Alicegender : fcity : Leipzigage : 23
[6] Personname : Bobgender : mcity : Leipzigage : 30
[7] Personname : Carolgender : fcity : Dresdenage : 30
[8] Personname : Davegender : mcity : Dresdenage : 42
[9] Personname : Evegender : fcity : Dresdenage : 35speaks : en
[10] Personname : Frankgender : mcity : Berlinage : 23IP: 169.32.1.3
0
1
2
3
4
5
6 7 8 9
10
11 12 13 14
15
16
17
18 19 20 21
22
23
knowssince : 2014
knowssince : 2014
knowssince : 2013
hasInterest
hasInterest hasInterest
hasInterest
hasModeratorhasModeratorhasMember hasMember
hasMember hasMember
hasTag hasTaghasTag hasTag
knowssince : 2013
knowssince : 2014
knowssince : 2014
knowssince : 2015
knowssince : 2015
knowssince : 2015
knowssince : 2013
![Page 42: GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA … · GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA ERHARD RAHM Big is changing quickly Gigabytes Terabytes (1012)](https://reader031.fdocuments.net/reader031/viewer/2022021914/5c6df3ea09d3f201028c5e11/html5/thumbnails/42.jpg)
GROUPING: TYPE LEVEL (SCHEMA GRAPH)
vertexGrKeys = [:label]edgeGrKeys = [:label]sumGraph = databaseGraph.groupBy(vertexGrKeys, [COUNT()], edgeGrKeys, [COUNT()])
[11] Person
count : 6
[12] Forum
count : 2
[13] Tag
count : 3
hasMembercount : 4
knowscount : 10
hasInterestcount : 4
hasTagcount : 4
hasModeratorcount : 2
24
26
28
27
25
42
![Page 43: GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA … · GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA ERHARD RAHM Big is changing quickly Gigabytes Terabytes (1012)](https://reader031.fdocuments.net/reader031/viewer/2022021914/5c6df3ea09d3f201028c5e11/html5/thumbnails/43.jpg)
personGraph = databaseGraph.subgraph((vertex => vertex[:label] == ‘Person’),(edge => edge[:label] == ‘knows’))
vertexGrKeys = [:label, “city”]edgeGrKeys = [:label]sumGraph = personGraph.groupBy(vertexGrKeys, [COUNT()], edgeGrKeys, [COUNT()])
GROUPING: PROPERTY-SPECIFIC
1 3
[11] Person
city : Leipzigcount : 2
[12] Person
city : Dresdencount : 3
[13] Person
city : Berlincount : 1
24
25
26
27
28
knowscount : 3
knowscount : 1 knows
count : 2
knowscount : 2
knowscount : 2
43
![Page 44: GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA … · GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA ERHARD RAHM Big is changing quickly Gigabytes Terabytes (1012)](https://reader031.fdocuments.net/reader031/viewer/2022021914/5c6df3ea09d3f201028c5e11/html5/thumbnails/44.jpg)
GraphCollection filtered = collection.select((graph => graph[‘vertexCount’] > 4));
SELECTION
UDF
vertexCount > 4
1 | vertexCount: 5
2 | vertexCount: 4
0 23
41
5 7 86
1 | vertexCount: 5
0 23
41
44
![Page 45: GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA … · GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA ERHARD RAHM Big is changing quickly Gigabytes Terabytes (1012)](https://reader031.fdocuments.net/reader031/viewer/2022021914/5c6df3ea09d3f201028c5e11/html5/thumbnails/45.jpg)
GraphCollection frequentPatterns = collection.callForCollection(new TransactionalFSM(0.5))
CALL (E.G. FREQUENT SUBGRAPHS)
FSM
Threshold: 50%
1
0 1 23
4
5 6 78
9
1013
14
2
3
11 12
15 16
17 18
19 20
4
5
6
21 2322
25 2624
7
8
45
![Page 46: GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA … · GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA ERHARD RAHM Big is changing quickly Gigabytes Terabytes (1012)](https://reader031.fdocuments.net/reader031/viewer/2022021914/5c6df3ea09d3f201028c5e11/html5/thumbnails/46.jpg)
Implementation
46
![Page 47: GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA … · GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA ERHARD RAHM Big is changing quickly Gigabytes Terabytes (1012)](https://reader031.fdocuments.net/reader031/viewer/2022021914/5c6df3ea09d3f201028c5e11/html5/thumbnails/47.jpg)
GRAPH REPRESENTATION
Id Label Properties Graphs
Id Label Properties SourceId TargetId Graphs
EPGMGraphHead
EPGMVertex
EPGMEdge
Id Label Properties POJO
POJO
POJO
DataSet<EPGMGraphHead>
DataSet<EPGMVertex>
DataSet<EPGMEdge>
Id Label Properties Graphs
EPGMVertex
GradoopId := UUID128‐bit
String PropertyList := List<Property>Property := (String, PropertyValue)PropertyValue := byte[]
GradoopIdSet := Set<GradoopId>
47
![Page 48: GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA … · GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA ERHARD RAHM Big is changing quickly Gigabytes Terabytes (1012)](https://reader031.fdocuments.net/reader031/viewer/2022021914/5c6df3ea09d3f201028c5e11/html5/thumbnails/48.jpg)
Id Label Properties
1 Community {interest:Heavy Metal}
2 Community {interest:Hard Rock}
Id Label Properties Graphs
1 Person {name:Alice, born:1984} {1}
2 Band {name:Metallica,founded:1981} {1}
3 Person {name:Bob} {1,2}
4 Band {name:AC/DC,founded:1973} {2}
5 Person {name:Eve} {2}
Id Label Source Target Properties Graphs
1 likes 1 2 {since:2014} {1}
2 likes 3 2 {since:2013} {1}
3 likes 3 4 {since:2015} {2}
4 knows 3 5 {} {2}
5 likes 5 4 {since:2014} {2}
likessince : 2014
likessince : 20131 3
4
5
2
1|Community|interest:Heavy Metal
2|Community|interest:Hard Rock
Personname : Aliceborn : 1984
Bandname : Metallicafounded : 1981
Personname : Bob
Personname : Eve
Bandname : AC/DCfounded : 1973likes
since : 2015
knows
likessince : 20141 2
3
4
5
DataSet<EPGMGraphHead>
DataSet<EPGMVertex> DataSet<EPGMEdge>
GRAPH REPRESENTATION: EXAMPLE
48
![Page 49: GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA … · GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA ERHARD RAHM Big is changing quickly Gigabytes Terabytes (1012)](https://reader031.fdocuments.net/reader031/viewer/2022021914/5c6df3ea09d3f201028c5e11/html5/thumbnails/49.jpg)
// input: firstGraph (G[1]), secondGraph (G[2])
1: DataSet<GradoopId> graphId = secondGraph.getGraphHead()2: .map(new Id<G>());3: 4: DataSet<V> newVertices = firstGraph.getVertices()5: .filter(new NotInGraphBroadCast<V>())6: .withBroadcastSet(graphId, GRAPH_ID);7:8: DataSet<E> newEdges = firstGraph.getEdges()9: .filter(new NotInGraphBroadCast<E>())
10: .withBroadcastSet(graphId, GRAPH_ID)11: .join(newVertices)12: .where(new SourceId<E>().equalTo(new Id<V>())13: .with(new LeftSide<E, V>())14: .join(newVertices)15: .where(new TargetId<E>().equalTo(new Id<V>())16: .with(new LeftSide<E, V>());
Exclusion
OPERATOR IMPLEMENTATION
likessince : 2013
likessince : 20141 3
4
5
2
1|Community|interest:Heavy Metal
2|Community|interest:Hard Rock
Personname : Aliceborn : 1984
Bandname : Metallicafounded : 1981
Personname : Bob
Personname : Eve
Bandname : AC/DCfounded : 1973likes
since : 2015
knows
likessince : 20141 2
3
4
5
49
![Page 50: GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA … · GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA ERHARD RAHM Big is changing quickly Gigabytes Terabytes (1012)](https://reader031.fdocuments.net/reader031/viewer/2022021914/5c6df3ea09d3f201028c5e11/html5/thumbnails/50.jpg)
IMPLEMENTATION OF GRAPH GROUPING ( PROC. BTW2017)
GroupBy(1,2,3) +GC + GR* + MapAssign edges to groupsCompute aggregatesBuild super edges
Filter + MapExtract super vertex tuplesBuild super vertices
GroupBy(1) + GroupReduce*Assign vertices to groupsCompute aggregatesCreate super vertex tuplesForward updated group members
V
E
MapExtractattributes
Filter + Map Extract group membersReduce memory footprint
Join*Replace Source/TargetIdwith corresponding super vertex id
MapExtractattributes
*requires worker communication
V1 V2
V3
V‘
E1 E2 E‘
50
![Page 51: GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA … · GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA ERHARD RAHM Big is changing quickly Gigabytes Terabytes (1012)](https://reader031.fdocuments.net/reader031/viewer/2022021914/5c6df3ea09d3f201028c5e11/html5/thumbnails/51.jpg)
ITERATIVE COMPUTATION OF FREQUENT SUBGRAPHS
51
n‐edge3‐edge2‐edge1‐edge
result
collecting intermediate iteration results
searchspace
1‐edgeR C F
2‐edgeG R C F
3‐edgeG R C F
n‐edgeG R C F
G : grow frequent patternsR : report supported patternsC : count global frequencyF : filter by min frequency
![Page 52: GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA … · GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA ERHARD RAHM Big is changing quickly Gigabytes Terabytes (1012)](https://reader031.fdocuments.net/reader031/viewer/2022021914/5c6df3ea09d3f201028c5e11/html5/thumbnails/52.jpg)
Intro Graph Analytics Graph data Requirements Graph database vs graph processing systems
Gradoop architecture and data integration
Extended Property Graph Model (EPGM) Data organization and operators Implementation
Performance Evaluation
Summary/Outlook
AGENDA
52
![Page 53: GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA … · GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA ERHARD RAHM Big is changing quickly Gigabytes Terabytes (1012)](https://reader031.fdocuments.net/reader031/viewer/2022021914/5c6df3ea09d3f201028c5e11/html5/thumbnails/53.jpg)
TEST WORKFLOW: SUMMARIZED COMMUNITIES
http://ldbcouncil.org/
1. Extract subgraph containing only Persons and knows relations
2. Transform Persons to necessary information
3. Find communities using Label Propagation
4. Aggregate vertex count for each community
5. Select communities with more than 50K users
6. Combine large communities to a single graph
7. Group graph by Persons location and gender
8. Aggregate vertex and edge count of grouped graph
53
![Page 54: GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA … · GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA ERHARD RAHM Big is changing quickly Gigabytes Terabytes (1012)](https://reader031.fdocuments.net/reader031/viewer/2022021914/5c6df3ea09d3f201028c5e11/html5/thumbnails/54.jpg)
TEST WORKFLOW: SUMMARIZED COMMUNITIES
https://git.io/vgozj
1. Extract subgraph containing only Persons
and knows relations
2. Transform Persons to necessary information
3. Find communities using Label Propagation
4. Aggregate vertex count for each community
5. Select communities with more than 50K users
6. Combine large communities to a single graph
7. Group graph by Persons location and gender
8. Aggregate vertex and edge count of grouped graph
54
![Page 55: GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA … · GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA ERHARD RAHM Big is changing quickly Gigabytes Terabytes (1012)](https://reader031.fdocuments.net/reader031/viewer/2022021914/5c6df3ea09d3f201028c5e11/html5/thumbnails/55.jpg)
BENCHMARK RESULTS
Dataset # Vertices # Edges
Graphalytics.1 61,613 2,026,082
Graphalytics.10 260,613 16,600,778
Graphalytics.100 1,695,613 147,437,275
Graphalytics.1000 12,775,613 1,363,747,260
Graphalytics.10000 90,025,613 10,872,109,028
16x Intel(R) Xeon(R) 2.50GHz (6 Cores) 16x 48 GB RAM 1 Gigabit Ethernet Hadoop 2.6.0 Flink 1.0-SNAPSHOT
0
200
400
600
800
1000
1200
1 2 4 8 16
Runtim
e [s]
Number of workers
RuntimeGraphalytics.100
1
2
4
8
16
1 2 4 8 16
Speedu
p
Number of workers
SpeedupGraphalytics.100 Linear
55
![Page 56: GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA … · GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA ERHARD RAHM Big is changing quickly Gigabytes Terabytes (1012)](https://reader031.fdocuments.net/reader031/viewer/2022021914/5c6df3ea09d3f201028c5e11/html5/thumbnails/56.jpg)
BENCHMARK RESULTS 2
Dataset # Vertices # Edges
Graphalytics.1 61,613 2,026,082
Graphalytics.10 260,613 16,600,778
Graphalytics.100 1,695,613 147,437,275
Graphalytics.1000 12,775,613 1,363,747,260
Graphalytics.10000 90,025,613 10,872,109,028
1
10
100
1000
10000
Runtim
e [s]
Datasets
16x Intel(R) Xeon(R) 2.50GHz (6 Cores) 16x 48 GB RAM 1 Gigabit Ethernet Hadoop 2.6.0 Flink 1.0-SNAPSHOT
56
![Page 57: GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA … · GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA ERHARD RAHM Big is changing quickly Gigabytes Terabytes (1012)](https://reader031.fdocuments.net/reader031/viewer/2022021914/5c6df3ea09d3f201028c5e11/html5/thumbnails/57.jpg)
EVALUATION OF GROUPING: SCALABILITY
Speedup for grouping on type Runtime for grouping on type
57
![Page 58: GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA … · GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA ERHARD RAHM Big is changing quickly Gigabytes Terabytes (1012)](https://reader031.fdocuments.net/reader031/viewer/2022021914/5c6df3ea09d3f201028c5e11/html5/thumbnails/58.jpg)
Intro Graph Analytics Graph data Requirements Graph database vs graph processing systems
Gradoop architecture and data integration
Extended Property Graph Model (EPGM) Data organization and operators Implementation
Performance Evaluation
Summary/Outlook
AGENDA
58
![Page 59: GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA … · GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA ERHARD RAHM Big is changing quickly Gigabytes Terabytes (1012)](https://reader031.fdocuments.net/reader031/viewer/2022021914/5c6df3ea09d3f201028c5e11/html5/thumbnails/59.jpg)
Big Graph Analytics Hadoop-based graph processing frameworks based on generic graphs Spark/Flink: batch/streaming-oriented workflows (rather than interactive OLAP) graph collections not generally supported generally missing: graph-based data integration, built-in support for dynamic graph data
GraDoop (www.gradoop.org) open-source infrastructure for entire processing pipeline: graph acquisition, storage,
integration, transformation, analysis (queries + graph mining), visualization extended property graph model (EPGM) with powerful operators (e.g., grouping, pattern
matching) and support for graph collections leverages Hadoop ecosystem Apache HBase for permanent graph storage Apache Flink to implement operators
ongoing implementation
SUMMARY
59
![Page 60: GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA … · GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA ERHARD RAHM Big is changing quickly Gigabytes Terabytes (1012)](https://reader031.fdocuments.net/reader031/viewer/2022021914/5c6df3ea09d3f201028c5e11/html5/thumbnails/60.jpg)
COMPARISON
60
Graph Database SystemsNeo4j, OrientDB
Graph Processing Systems(Pregel, Giraph)
DistributedDataflow Systems (Flink Gelly, Spark GraphX)
data model rich graphmodels (PGM)
genericgraph models
genericgraph models
Extended PGM
focus queries analytic analytic analytic
query language yes no no (yes)
graph analytics no yes yes yes
scalability vertical horizontal horizontal horizontal
Workflows no no yes yes
persistency yes no no yes
dynamic graphs / versioning
no no no no
data integration no no no (yes)
visualization (yes) no no limited
![Page 61: GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA … · GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA ERHARD RAHM Big is changing quickly Gigabytes Terabytes (1012)](https://reader031.fdocuments.net/reader031/viewer/2022021914/5c6df3ea09d3f201028c5e11/html5/thumbnails/61.jpg)
Graph-based data integration refined operators for data import, matching and fusion holistic data integration for many sources (clusters of matching
entities instead of binary „sameAs“ links)
Graph analytics optimized graph partitioning approaches automatic load balancing techniques visualization of graphs and analysis results interactive graph analytics dynamic graph data
OUTLOOK / CHALLENGES
61
![Page 62: GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA … · GRAPH-BASED DATA INTEGRATION AND ANALYSIS FOR BIG DATA ERHARD RAHM Big is changing quickly Gigabytes Terabytes (1012)](https://reader031.fdocuments.net/reader031/viewer/2022021914/5c6df3ea09d3f201028c5e11/html5/thumbnails/62.jpg)
M. Junghanns, M. Kießling, A. Averbuch, A. Petermann, E. Rahm: Cypher-based Graph Pattern Matching in Gradoop. Proc. ACM SIGMOD workshop on Graph Data Management Experiences and Systems (GRADES), 2017
M. Junghanns, A. Petermann, K. Gomez, E. Rahm: GRADOOP - Scalable Graph Data Management and Analytics with Hadoop. Tech. report (Arxiv), Univ. of Leipzig, 2015
M. Junghanns, A. Petermann, M. Neumann, E. Rahm: Management and Analysis of Big Graph Data: Current Systems and Open Challenges. In: Big Data Handbook (eds.: S. Sakr, A. Zomaya) , Springer, 2017
M. Junghanns, A. Petermann, N. Teichmann, K. Gomez, E. Rahm: Analyzing Extended Property Graphs with Apache Flink. Proc. ACM SIGMOD workshop on Network Data Analytics (NDA), 2016
M. Junghanns, A. Petermann, E. Rahm: Distributed Grouping of Property Graphs with GRADOOP. In: Proc. BTW, March 2017
A. Petermann; M. Junghanns: Scalable Business Intelligence with Graph Collections. it - Information Technology Special Issue: Big Data Analytics, 2016
A. Petermann, M. Junghanns, S. Kemper, K. Gomez, N.Teichmann, E. Rahm: Graph Mining for Complex Data Analytics. Proc. ICDM 2016 (Demo paper)
A. Petermann, M. Junghanns, R. Müller, E. Rahm: BIIIG : Enabling Business Intelligence with Integrated Instance Graphs. Proc. 5th Int. Workshop on Graph Data Management (GDM 2014)
A. Petermann, M. Junghanns, R. Müller, E. Rahm: Graph-based Data Integration and Business Intelligence with BIIIG. Proc. VLDB Conf., 2014
A. Petermann, M. Junghanns, R. Müller, E. Rahm: FoodBroker - Generating Synthetic Datasets for Graph-Based Business Analytics. Proc. 5th Int. Workshop on Big Data Benchmarking (WBDB), 2014
A. Petermann, M. Junghanns, E. Rahm: DIMSpan - Transactional Frequent Subgraph Mining with Distributed In-Memory Dataflow Systems. arXiv 2017
REFERENCES
62