Gephi, Graphx, and Giraph

15
Graph Theory at work [email protected]

Transcript of Gephi, Graphx, and Giraph

Page 1: Gephi, Graphx, and Giraph

Graph Theory at work

[email protected]

Page 2: Gephi, Graphx, and Giraph

• @dougneedham• Data Guy - Started as a DBA in the Marine Corps,

evolved to Architect, now aspiring Data Scientist. • Oracle, SQL Server, Cassandra, Hadoop, MySQL.• I have a strong relational/traditional background. • Perpetual Student• Learning new things challenges our assumptions.

Forces us to take a new perspective on “old” problems. Eventually maybe even shows us that there is a better way to solve a problem.

Page 3: Gephi, Graphx, and Giraph

• Stand back, we are going to talk about math!• Basically we are talking about a bunch of dots joined together by

lines • Vertex – Dot on a graph• Edge – Line connecting the two points• Triangle – 3 Vertices, 3 Edges• Square – 4 Vertices, 4 edges• Open Triangle - 3 Vertices, 2 edges\• A lot of things are networks if you look at them the right way.• Mark Newman has done a number of really cool presentations,

available on Youtube about Network analysis.• https://www.youtube.com/watch?v=lETt7IcDWLI

Page 4: Gephi, Graphx, and Giraph
Page 5: Gephi, Graphx, and Giraph

• The 7 Bridges of Konisberg

• Every tome on Graph theory or Network analysis devotes a small portion of there time to the 7 Bridges of Konisberg.

• If I don’t cover this with you, the gods of mathematics will strike me down, and never allow me to do analysis again in the future.

Page 6: Gephi, Graphx, and Giraph
Page 7: Gephi, Graphx, and Giraph

• Folks enjoyed there Sunday afternoon strolls across the bridges, but occasionally people would wonder if one particular route was more efficient than another.

• Eventually Leonhard Euler was brought into the debate about the efficiency problem.

• Euler used Vertices to represent the land masses and edges (or arcs, at the time) to represent bridges. He realized the odd number of edges per vertex made the problem unsolvable.

• And here is the cool thing about mathematicians. If we tell you something is impossible, we have to tell you why in a way you can understand it. But he also invented the branch of mathematics today we call Graph Theory.

• http://en.wikipedia.org/wiki/Leonhard_Euler

Page 8: Gephi, Graphx, and Giraph

• http://gephi.github.io/

• From the website: “Gephi is an interactive visualization and exploration platform for all kinds of networks and complex systems, dynamic and hierarchical graphs.”

• To get this yourself go into Facebook and search for: Netvizz. (You have to authorized it. You can un-authorized it later)

• Click the application.• Click “personal network”• Click Start• Download your gdf file• Quick Demo:

Page 9: Gephi, Graphx, and Giraph

• Shortest path – How are two vertices connected?• What is a path?• Centrality• Transitivity• Homophily• Directed Graphs – or Digraphs• Contagion – How do things “spread” through a network?• Let’s rearrange things, how does the layout affect

understanding?• This is not just data visualization, it can also be used for

prediction. https://www.youtube.com/watch?v=rwA-y-XwjuU

Page 10: Gephi, Graphx, and Giraph

• Requires Spark, which is not a bad deal.

• Jump to Demo• http://ampcamp.berkeley.edu/big-data-mini-course/graph-analytics-with-

graphx.html

Page 11: Gephi, Graphx, and Giraph

• Giraph, I haven’t really done as much with as I wanted to do. Perhaps a later presentation with a more detailed example comparing GraphX with Giraph.

Page 12: Gephi, Graphx, and Giraph

• I started doing some analysis some time ago using Graph models to understand metadata.

• I came up with two types of Graphs:

• Data Structure Graph Level 1 – This is roughly like an Entity Relationship Diagram (ERD) Tables are Vertices, Foreign Keys are Edges.

• Data Structure Graph Level 2 – Each Vertex in this graph is an application. Each Edge is data transfer. Roughly equivalent to what we used to call Data Flow diagrams.

Page 13: Gephi, Graphx, and Giraph

• A DSG Level 1 can show you where you are going to have the most interesting query performance of your tables.

• A DSG Level 2 can show you where the most amount of work is going on in your Enterprise.

Page 14: Gephi, Graphx, and Giraph

• Network/Graph Analysis is cool. • It can show you some interesting things about your data. • Some things to consider.• Some thought needs to be put into how the raw data is

organized for a Graph Analysis. • Directed graph, undirected, bigraph? Some up front setup

work needs to be done. • Tools help with the detailed calculations, and show the

paths, walks, etc. • However, due thought should be put towards a network

analysis project.

Page 15: Gephi, Graphx, and Giraph

• http://blog.revolutionanalytics.com/2012/05/facebook-class-social-network-analysis-with-r-and-hadoop.html