URL: MESH.CS.UMN.EDU e3e1 Printingmesh.cs.umn.edu/posters/mesh-OpenHouse_Nov15.pdf ·...

1
MESH : Enabling Scalable Social Group Analytics Via Hyper Graph Analysis Systems MOTIVATION Leveraging existing partitioning methods: Vertex cut Hyperedge cut Edge cut – 1D/2D Partitioning for load balancing. Sources of imbalance in hypergraph: Dataset – Ratio of vertices to hyperedges Skewed degree distribution –vertex or hyperedge or both Computation – vertex Vs hyperedge program Interaction – between vertices and hyperedges MESH : PROTOTYPE CHALLENGES: PARTITIONING CHALLENGES: REPRESENTATION Rapid growth of social data: Likes. Tweets, Publications Need to transform data into knowledge Importance / centrality / influence Community detection, Shortest paths Information flow State of the art: Graph Computational Systems Pregel, GraphLab (Dato), Apache Spark GraphX Students: Benjamin Heintz, Janani Thirunavukkarasu, Jayapriya Surendran, Shivangi Singh PIs: Abhishek Chandra, Jaideep Srivastava (University Of Minnesota, Twin Cities) URL: MESH.CS.UMN.EDU Sponsor : National Science Foundation To better model social group structure and behavior, we need hypergraph computing systems. v1 v2 v3 hyperedge vertex v1 v2 v3 v1 v2 v3 v1 v2 v3 e1 e2 e3 e1 e2 e3 v2 v1 v3 v2 v2 Multigraph Limited system support Hard to implement with existing APIs Can exploit differences between vertices, hyperedges Bipartite graph Obscures differences between hyperedges, vertices Portable to any graph system Dataset Vertices Hyperedges Bipartite Edges 1-mode Projection Edges DBLP 952,115 authors 916,947 collaborations 2,768,930 21,592,883 Friendster 7,944,949 users 1,620,991 communities 23,479,217 > 15.1 B Proof-of-concept prototype Implemented on Apache Spark GraphX 1.2.1 Run on shared 6-node cluster (2x6-core, 24GB RAM each) Using bipartite graph representation Performance heavily affected by Dataset Characteristics + Algorithm Characteristics Representation + Partitioning 0 300 600 900 0 0.5 1 1.5 2 2.5 3 Execution Time (s) Number of Bipartite Edges (millions) Hypergraph PageRank: DBLP PR PR-Entropy 0 1000 2000 3000 0 5 10 15 20 Execution Time (s) Number of Bipartite Edges (millions) Hypergraph PageRank: Friendster PR PR-Entropy 0 100 200 300 400 500 600 700 PR PR-Entropy Execution Time (s) Partitioning: DBLP Cut Vertices Cut Hyperedges Cut Both Scalability is a real challenge. Iterative computation Vertex and hyperedge programs Message flow: vertex hyperedge hyperedge vertex RESULTS 0 500 1000 1500 2000 2500 3000 3500 4000 4500 PR CC SP Execution Time (s) Algorithms Algorithm Comparison Friendster DBLP

Transcript of URL: MESH.CS.UMN.EDU e3e1 Printingmesh.cs.umn.edu/posters/mesh-OpenHouse_Nov15.pdf ·...

Page 1: URL: MESH.CS.UMN.EDU e3e1 Printingmesh.cs.umn.edu/posters/mesh-OpenHouse_Nov15.pdf · •Implemented on Apache Spark GraphX 1.2.1 •Run on shared 6-node cluster (2x6-core, 24GB RAM

Printing: This poster is 48” wide by 36” high. It’s designed to be printed on a large-format printer.

Customizing the Content: The placeholders in this poster are formatted for you. Type in the placeholders to add text, or click an icon to add a table, chart, SmartArt graphic, picture or multimedia file.

To add or remove bullet points from text, just click the Bullets button on the Home tab.

If you need more placeholders for titles, content or body text, just make a copy of what you need and drag it into place. PowerPoint’s Smart Guides will help you align it with everything else.

Want to use your own pictures instead of ours? No problem! Just right-click a picture and choose Change Picture. Maintain the proportion of pictures as you resize by dragging a corner.

MESH : Enabling Scalable Social Group Analytics Via Hyper Graph Analysis Systems

MOTIVATION

Leveraging existing partitioning methods:

Vertex cut Hyperedge cut Edge cut – 1D/2D

Partitioning for load balancing.

Sources of imbalance in hypergraph:

• Dataset – Ratio of vertices to hyperedges

• Skewed degree distribution –vertex or hyperedge or both

• Computation – vertex Vs hyperedge program

• Interaction – between vertices and hyperedges

MESH : PROTOTYPE CHALLENGES: PARTITIONING

CHALLENGES: REPRESENTATION

Rapid growth of social data: Likes. Tweets, Publications

Need to transform data into knowledge

• Importance / centrality / influence

• Community detection, Shortest paths

• Information flow

State of the art: Graph Computational Systems

• Pregel, GraphLab (Dato), Apache Spark GraphX

Students: Benjamin Heintz, Janani Thirunavukkarasu, Jayapriya Surendran, Shivangi Singh PIs: Abhishek Chandra, Jaideep Srivastava (University Of Minnesota, Twin Cities)

URL: MESH.CS.UMN.EDU Sponsor : National Science Foundation

To better model social group structure and behavior, we need hypergraph computing systems.

v1 v2

v3

hyperedge vertex

v1 v2

v3

v1 v2

v3

v1

v2

v3

e1

e2

e3

e1

e2

e3

v2v1

v3v2

v2

Multigraph Limited system support Hard to implement with existing APIs Can exploit differences between vertices, hyperedges

Bipartite graph Obscures differences between hyperedges, vertices Portable to any graph system

Dataset Vertices Hyperedges Bipartite

Edges 1-mode

Projection Edges

DBLP 952,115 authors

916,947 collaborations

2,768,930 21,592,883

Friendster 7,944,949

users 1,620,991

communities 23,479,217 > 15.1 B

Proof-of-concept prototype • Implemented on Apache Spark GraphX 1.2.1 • Run on shared 6-node cluster (2x6-core, 24GB RAM each) • Using bipartite graph representation

Performance heavily affected by • Dataset Characteristics + Algorithm Characteristics • Representation + Partitioning

0

300

600

900

0 0.5 1 1.5 2 2.5 3

Exe

cuti

on

Tim

e (

s)

Number of Bipartite Edges (millions)

Hypergraph PageRank: DBLP

PR PR-Entropy

0

1000

2000

3000

0 5 10 15 20

Exe

cuti

on

Tim

e (

s)

Number of Bipartite Edges (millions)

Hypergraph PageRank: Friendster

PR PR-Entropy

0

100

200

300

400

500

600

700

PR PR-Entropy

Exe

cuti

on

Tim

e (

s)

Partitioning: DBLP

Cut Vertices Cut Hyperedges Cut Both

Scalability is a real challenge.

Iterative computation

Vertex and hyperedge programs

Message flow: • vertex hyperedge • hyperedge vertex

RESULTS

0

500

1000

1500

2000

2500

3000

3500

4000

4500

PR CC SP

Exe

cuti

on

Tim

e (

s)

Algorithms

Algorithm Comparison

Friendster DBLP