Social Network Analysis Introduction including Data Structure Graph overview.

52
Social Network Analysis An overview Presentation by @dougneedham

Transcript of Social Network Analysis Introduction including Data Structure Graph overview.

Page 1: Social Network Analysis Introduction including Data Structure Graph overview.

Social Network Analysis

An overviewPresentation by @dougneedham

Page 2: Social Network Analysis Introduction including Data Structure Graph overview.

Introduction @dougneedham Data Guy - Started as a DBA in the Marine Corps, evolved to

Architect, now Data Scientist. Oracle, SQL Server, Cassandra, Hadoop, MySQL, Spark. I have a strong relational/traditional background. Perpetual Student Learning new things challenges our assumptions. Forces us to take a

new perspective on “old” problems. Eventually maybe even shows us that there is a better way to solve a problem.

Page 3: Social Network Analysis Introduction including Data Structure Graph overview.

Why study social networks? It is cool. The concepts around Social Network Analysis can be applied to

many interesting problems in a variety of business verticals. The foundation of Social Network Analysis is Graph theory. Solving Crime Some examples: Introduction to Graph_Theory

Page 4: Social Network Analysis Introduction including Data Structure Graph overview.

What is Social Network Analysis?

“Social network analysis (SNA) is a strategy for investigating social structures through the use of network and graph theories. It characterizes networked structures in terms of nodes (individual actors, people, or things within the network) and the ties or edges (relationships or interactions) that connect them. Examples of social structures commonly visualized through social network analysis include social media networks, friendship and acquaintance networks, kinship, disease transmission, and sexual relationships. These networks are often visualized through sociograms in which nodes are represented as points and ties are represented as lines.” – Wikipedia

https://en.wikipedia.org/wiki/Social_network_analysis

Page 5: Social Network Analysis Introduction including Data Structure Graph overview.

Example From wiki: "Kencf0618FacebookNetwork" by Kencf0618 - Own work. Licensed under CC BY-SA 3.0 via Wikimedia Commons - https://commons.wikimedia.org/wiki/File:Kencf0618FacebookNetwork.jpg#/media/File:Kencf0618FacebookNetwork.jpg

Page 6: Social Network Analysis Introduction including Data Structure Graph overview.

A little History

The 7 Bridges of Konisberg Every tome on Graph theory or Network analysis devotes a small

portion of there time to the 7 Bridges of Konisberg. If I don’t cover this with you, the gods of mathematics will strike me

down, and never allow me to do analysis again in the future.

Page 7: Social Network Analysis Introduction including Data Structure Graph overview.

The Bridges

Page 8: Social Network Analysis Introduction including Data Structure Graph overview.

The Problem

Folks enjoyed there Sunday afternoon strolls across the bridges, but occasionally people would wonder if one particular route was more efficient than another.

Eventually Leonhard Euler was brought into the debate about the efficiency problem.

Euler used Vertices to represent the land masses and edges (or arcs, at the time) to represent bridges. He realized the odd number of edges per vertex made the problem unsolvable.

Sarada Herke provides for one of the best explanations of the solution Solution to Konisburg

And here is the cool thing about mathematicians. If we tell you something is impossible, we have to tell you why in a way you can understand it. But he also invented the branch of mathematics today we call Graph Theory.

http://en.wikipedia.org/wiki/Leonhard_Euler

Page 9: Social Network Analysis Introduction including Data Structure Graph overview.

Why analyze Facebook data?

Facebook is something that most people use. It is easy to see the relationships and the concepts of the

Graph/Network are intuitive to people who are looking at their “own” network.

The main idea is, if you can understand your own friend data, you can learn the concepts quickly, then apply these same concepts to more complicated problems.

We will talk a little about some complicated topics at the end.

Page 10: Social Network Analysis Introduction including Data Structure Graph overview.

A few terms Stand back, we are going to talk about math! Basically we are talking about a bunch of dots joined together by lines Vertex – Dot on a graph Edge – Line connecting the two points Edge_Label – this is a term I coined originally related to Data Structure Graphs that helps

trace a path. If you label your edges, and you have multiple edges with the same label in a Graph you can quite easily identify walks, paths, and cycles through your graph.

Triangle – 3 Vertices, 3 Edges Square – 4 Vertices, 4 edges Open Triangle - 3 Vertices, 2 edges \/ A lot of things are networks if you look at them the right way. Mark Newman has done a number of well done presentations, available on Youtube about

Network analysis. https://www.youtube.com/watch?v=lETt7IcDWLI

Page 11: Social Network Analysis Introduction including Data Structure Graph overview.

More terms Transitivity – The friend of my friend is my friend. Really? Homophily – how things are similar Directed Graphs – or Digraphs Contagion – How do things “spread” through a network? Let’s rearrange things, how does the layout affect understanding? Order of a graph – number of vertices Size of the graph – number of edges This is not just data visualization, it can also be used for prediction.

https://www.youtube.com/watch?v=rwA-y-XwjuU

Page 12: Social Network Analysis Introduction including Data Structure Graph overview.

Final terms Centrality – Hub and Authority

This is almost a whole topic by itself, since there are different types of Centrality:

Degree Centrality – Simple, the Vertex with the most degrees is the most central.

Eigenvector Centrality – How important a particular Vertex is to a given network.

PageRank – similar to Eigenvector Centrality, only scaled, and if a given vertex is closely connected to very high PageRank vertex, it is itself given a high PageRank.

Serious nutshell definitions. Shortest path – How are two vertices connected? Longest Path – Tracing the flow of an interesting item through a large

collection of applications.

Page 13: Social Network Analysis Introduction including Data Structure Graph overview.

Why is a path important? More on this later…

The Original JokeThis is me in different stores

Page 14: Social Network Analysis Introduction including Data Structure Graph overview.

The Math doesn’t change.

One thing I like about Graphs – The Math does not change. The math behind Graph theory can be a little intense, but it does not

change regardless of the scale of the graph. Once you understand how to “do the math” on a small graph, those

same Maths apply to a Graph whether it is a graph of the people in this room, or a graph of the people on this planet.

Now, let me introduce you to a tool that does much of the Mathematics for you…

Page 15: Social Network Analysis Introduction including Data Structure Graph overview.

But first, Netvizz… Netvizz is a tool that extracts data from different sections of the Facebook Platform. It provides an interface to the Facebook Graph API https://www.youtube.com/watch?v=3vkKPcN7V7Q For the version of data we will be looking at, I was able to extract friendship connections.

Facebook has since changed their permissions such that you can no longer extract this information.

However, there are some other interesting things you can do with Netvizz. If you manage a Facebook Group, this might be interesting. For this particular talk we are going to focus on Gephi interpretation. If we want to have a

more in-depth talk on Facebook and the Graph API that Facebook has opened, we can discuss that at another time.

To get this yourself go into Facebook and search for: Netvizz. (You have to authorize it. You can un-authorized it later)

You will have a number of options: group data, page data, page like network, search, and link stats.

Click “group data” Select a group if you need a sample id use: 39462256584 It runs for a bit, then dumps to a zip file. Save the file, then extract it. Open Gephi, and use Gephi to import your GDF file.

Page 16: Social Network Analysis Introduction including Data Structure Graph overview.

Gephihttp://gephi.github.io/

From the website: “Gephi is an interactive visualization and exploration platform for all kinds of networks and complex systems, dynamic and hierarchical graphs.”Java 1.7 required, you may have to set this in Gephi.confDepending on the size of the network you are studying you may need to increase the memory available to Java in Gephi.conf

Page 17: Social Network Analysis Introduction including Data Structure Graph overview.

Gephi Startup

Page 18: Social Network Analysis Introduction including Data Structure Graph overview.

Gephi – Open GML file

Page 19: Social Network Analysis Introduction including Data Structure Graph overview.

Gephi – After opening

Page 20: Social Network Analysis Introduction including Data Structure Graph overview.

Layout

Page 21: Social Network Analysis Introduction including Data Structure Graph overview.

Behavior Options

Page 22: Social Network Analysis Introduction including Data Structure Graph overview.

After running

Page 23: Social Network Analysis Introduction including Data Structure Graph overview.

Partitioning

Page 24: Social Network Analysis Introduction including Data Structure Graph overview.
Page 25: Social Network Analysis Introduction including Data Structure Graph overview.

Metrics

Remember all those numbers we spoke about? Here are many of them.

Page 26: Social Network Analysis Introduction including Data Structure Graph overview.

Data Table

Click icon to add picture

Page 27: Social Network Analysis Introduction including Data Structure Graph overview.

Configure Labels

Page 28: Social Network Analysis Introduction including Data Structure Graph overview.

Here is the layout with the labels as number of connections

Page 29: Social Network Analysis Introduction including Data Structure Graph overview.

Add Background

Page 30: Social Network Analysis Introduction including Data Structure Graph overview.

VisualizationFile->Export-> SVG/PDF/PNG…

Page 31: Social Network Analysis Introduction including Data Structure Graph overview.

Export to Excel

Page 32: Social Network Analysis Introduction including Data Structure Graph overview.

How do we use this?

Finding bottlenecks. You have to ignore the fact that everyone on this graph is connected

to you for a moment. How would someone get a message to another given person? They would have to pass it to someone either they both know, or

pass the message to someone who is more likely to be connected to the target of the message.

This was the heart of Milgram’s experiment that gave us the concept of 6 degrees of separation.

Page 33: Social Network Analysis Introduction including Data Structure Graph overview.

Other Analysis

What else can be done with Social Network Analysis? How about risk exposure to banks? http://www.federalreserve.gov/newsevents/speech/yellen20130104a.htm

Page 34: Social Network Analysis Introduction including Data Structure Graph overview.
Page 35: Social Network Analysis Introduction including Data Structure Graph overview.

Application to Business Intelligence

What if the Vertices are not people ? What if the Edges are not mutual connections? Jonathan and others over the past few meetings have done a great

job at explaining the underpinnings of how a particular BI framework is put together.

Within a Data Architecture there are lots of moving pieces. ETL, FTP, SFTP, Web-Services, External data feeds. Data moving into Data Marts, and Data Warehouses. Data Moving between applications.

Let’s imagine how to visualize this using the information we just gained.

Page 36: Social Network Analysis Introduction including Data Structure Graph overview.

Data Structure Graph

A Data Structure Graph is a group of atomic entities that are related to each other, stored in a repository, then moved from one persistence layer to another, rendered as a Graph. A group of atomic entities. Related to each other. Stored in a repository. Moved from one persistence layer to another. Rendered as a Graph.

Page 37: Social Network Analysis Introduction including Data Structure Graph overview.

Introducing Data Structure Graphs

Data Structure Graph Level 1 (DSG-L1)– This is roughly like an Entity Relationship Diagram (ERD) Tables are Vertices, Foreign Keys are Edges.

Data Structure Graph Level 2 (DSG-L2) – Each Vertex in this graph is an application. Each Edge is data transfer. Roughly equivalent to what we used to call Data Flow diagrams.

Data Structure Graph Dependency (DSG-D) – Each vertex is a job,script, program, or process that is dependent on something happening in sequence before it can do its work.

A DSG-L1 can show you where you are going to have the most interesting query performance of your tables.

A DSG-L2 can show you where the most amount of work is going on in your Enterprise.

A DSG-D can show you the sequence of events that need to take place in order for something to be completed.

Page 38: Social Network Analysis Introduction including Data Structure Graph overview.

New Project, Data Table, Import data.

Page 39: Social Network Analysis Introduction including Data Structure Graph overview.

Load as “Edges Table” Source, Target (required)

Page 40: Social Network Analysis Introduction including Data Structure Graph overview.

Choose Create Missing Nodes

Page 41: Social Network Analysis Introduction including Data Structure Graph overview.

After a few calculations and layout runs

Page 42: Social Network Analysis Introduction including Data Structure Graph overview.

PageRank – Which application is most important?

Page 43: Social Network Analysis Introduction including Data Structure Graph overview.

A few more tweaks

Page 44: Social Network Analysis Introduction including Data Structure Graph overview.

Where is that Node with the highest PageRank?

Page 45: Social Network Analysis Introduction including Data Structure Graph overview.

Remember paths?

The Original JokeThis is me in different stores

Page 46: Social Network Analysis Introduction including Data Structure Graph overview.

Dijkstra's algorithm

Some of you may have heard of Dijkstra’s algorithm. It is a method for finding the shortest path between two nodes on a

Graph. This is a great optimization technique, but what if you need to find

the longest path? What “edge_label” has the most influence on my organization? Iterate through each Edge_Label, create a subgraph that consists of

only the nodes this Edge_Label touches, then calculate the diameter of that Graph.

The data point represented by a given Edge_label that has the longest path has the most “value” to your organization.

Page 47: Social Network Analysis Introduction including Data Structure Graph overview.

https://dougneedham.shinyapps.io/DataStructureGraphHard to see, I know, but the top diagram is the “master graph”, the bottom image is a single Edge_Label. You can see how an individual data entity flows through an organization.

Page 48: Social Network Analysis Introduction including Data Structure Graph overview.

My bookGoes through a number of examples for doing an Graph analysis of a fictional organization.

Page 49: Social Network Analysis Introduction including Data Structure Graph overview.

Consider the following: If you need assistance, send a message to the group, or contact me

directly (I am easy to find @dougneedham) Network/Graph Analysis is cool. It can show you some interesting things about your data that you

may not have considered. Due thought should be put towards a network analysis project. Organizing the data requires a bit of thought. (From -> To vertices is

just a start). Directed graph, undirected, bigraph? Setup work needs to be done. Tools help with the detailed calculations, and show the paths, walks,

etc.

Page 50: Social Network Analysis Introduction including Data Structure Graph overview.

What did I leave out? Graphs that change over time – What happens when you remove a single

Edge or Vertex? Growth of a Network – Erdos-Renyi versus Barabasi-Albert models (Random

versus Preferential Attachment) Scale Free networks – Graphs that conform to Power laws. (These are

intrinsically Social Networks, but I didn’t give much detail) Comparing two networks – If you have the same number of edges and nodes,

are two graphs the same? Is one graph an isomorphism of another? Contagion – Ceteris paribus how will things(information, virus’s,

data,disease…) spread through the network. (Since a DSG represents different types of Edges based on Edge_Label, Contagion should not affect this type of network entirely.)

Large Graphs – GraphX a part of Apache Spark is best used for this purpose. The strength of Weak Ties Paradox Social Capital

Page 51: Social Network Analysis Introduction including Data Structure Graph overview.

Finally… Want to do Data Science? Challenge for members of the audience. 1. Download Gephi. 2. Put together a simple CSV: Source, Target,Edge_Label that describes

your own data environment. 3. Load it in Gephi and have Gephi run the metrics, and perform the auto

layout. 4. Answer this question: Did you get what you expected? 5. Get a colleague to do the same thing, compare the images. How similar

are they? Here is my hypothesis: If you have more than 5 data applications,

including Hadoop, and Data Warehouse infrastructure, your Graph will follow the rules of preferential attachment. (To<->From ETL tools don’t count in the analysis)

Tweet me @dougneedham #DataStructureGraph (anonymized, of course.) What does your Graph look like?

Page 52: Social Network Analysis Introduction including Data Structure Graph overview.

Final Thoughts – Questions?