URL: MESH.CS.UMN.EDU e3e1 Printingmesh.cs.umn.edu/posters/mesh-OpenHouse_Nov15.pdf ·...
Transcript of URL: MESH.CS.UMN.EDU e3e1 Printingmesh.cs.umn.edu/posters/mesh-OpenHouse_Nov15.pdf ·...
Printing: This poster is 48” wide by 36” high. It’s designed to be printed on a large-format printer.
Customizing the Content: The placeholders in this poster are formatted for you. Type in the placeholders to add text, or click an icon to add a table, chart, SmartArt graphic, picture or multimedia file.
To add or remove bullet points from text, just click the Bullets button on the Home tab.
If you need more placeholders for titles, content or body text, just make a copy of what you need and drag it into place. PowerPoint’s Smart Guides will help you align it with everything else.
Want to use your own pictures instead of ours? No problem! Just right-click a picture and choose Change Picture. Maintain the proportion of pictures as you resize by dragging a corner.
MESH : Enabling Scalable Social Group Analytics Via Hyper Graph Analysis Systems
MOTIVATION
Leveraging existing partitioning methods:
Vertex cut Hyperedge cut Edge cut – 1D/2D
Partitioning for load balancing.
Sources of imbalance in hypergraph:
• Dataset – Ratio of vertices to hyperedges
• Skewed degree distribution –vertex or hyperedge or both
• Computation – vertex Vs hyperedge program
• Interaction – between vertices and hyperedges
MESH : PROTOTYPE CHALLENGES: PARTITIONING
CHALLENGES: REPRESENTATION
Rapid growth of social data: Likes. Tweets, Publications
Need to transform data into knowledge
• Importance / centrality / influence
• Community detection, Shortest paths
• Information flow
State of the art: Graph Computational Systems
• Pregel, GraphLab (Dato), Apache Spark GraphX
Students: Benjamin Heintz, Janani Thirunavukkarasu, Jayapriya Surendran, Shivangi Singh PIs: Abhishek Chandra, Jaideep Srivastava (University Of Minnesota, Twin Cities)
URL: MESH.CS.UMN.EDU Sponsor : National Science Foundation
To better model social group structure and behavior, we need hypergraph computing systems.
v1 v2
v3
hyperedge vertex
v1 v2
v3
v1 v2
v3
v1
v2
v3
e1
e2
e3
e1
e2
e3
v2v1
v3v2
v2
Multigraph Limited system support Hard to implement with existing APIs Can exploit differences between vertices, hyperedges
Bipartite graph Obscures differences between hyperedges, vertices Portable to any graph system
Dataset Vertices Hyperedges Bipartite
Edges 1-mode
Projection Edges
DBLP 952,115 authors
916,947 collaborations
2,768,930 21,592,883
Friendster 7,944,949
users 1,620,991
communities 23,479,217 > 15.1 B
Proof-of-concept prototype • Implemented on Apache Spark GraphX 1.2.1 • Run on shared 6-node cluster (2x6-core, 24GB RAM each) • Using bipartite graph representation
Performance heavily affected by • Dataset Characteristics + Algorithm Characteristics • Representation + Partitioning
0
300
600
900
0 0.5 1 1.5 2 2.5 3
Exe
cuti
on
Tim
e (
s)
Number of Bipartite Edges (millions)
Hypergraph PageRank: DBLP
PR PR-Entropy
0
1000
2000
3000
0 5 10 15 20
Exe
cuti
on
Tim
e (
s)
Number of Bipartite Edges (millions)
Hypergraph PageRank: Friendster
PR PR-Entropy
0
100
200
300
400
500
600
700
PR PR-Entropy
Exe
cuti
on
Tim
e (
s)
Partitioning: DBLP
Cut Vertices Cut Hyperedges Cut Both
Scalability is a real challenge.
Iterative computation
Vertex and hyperedge programs
Message flow: • vertex hyperedge • hyperedge vertex
RESULTS
0
500
1000
1500
2000
2500
3000
3500
4000
4500
PR CC SP
Exe
cuti
on
Tim
e (
s)
Algorithms
Algorithm Comparison
Friendster DBLP