Graph Databases Overview and Applications - · PDF fileGraph Databases Overview and...
Transcript of Graph Databases Overview and Applications - · PDF fileGraph Databases Overview and...
![Page 1: Graph Databases Overview and Applications - · PDF fileGraph Databases Overview and Applications By Rodger Lepinsky University of Winnipeg ... • Tools: Neo4J, Infinite Graph, OrientDB](https://reader030.fdocuments.net/reader030/viewer/2022020213/5a71697d7f8b9aa7538cd2bf/html5/thumbnails/1.jpg)
Graph Databases
Overview and Applications
By Rodger Lepinsky
University of Winnipeg
April 29, 2013
![Page 2: Graph Databases Overview and Applications - · PDF fileGraph Databases Overview and Applications By Rodger Lepinsky University of Winnipeg ... • Tools: Neo4J, Infinite Graph, OrientDB](https://reader030.fdocuments.net/reader030/viewer/2022020213/5a71697d7f8b9aa7538cd2bf/html5/thumbnails/2.jpg)
Overview
• Literature search from blogs, online articles,
company websites, videos, twitter
• Private research
• Only a little in the academic realm
• Originally intended to approach companies.
Copyright Rodger Lepinsky
2014
2
![Page 3: Graph Databases Overview and Applications - · PDF fileGraph Databases Overview and Applications By Rodger Lepinsky University of Winnipeg ... • Tools: Neo4J, Infinite Graph, OrientDB](https://reader030.fdocuments.net/reader030/viewer/2022020213/5a71697d7f8b9aa7538cd2bf/html5/thumbnails/3.jpg)
Rodger Lepinsky – Formal Training
• Bachelor of Commerce (Honours)
– Asper - University of Manitoba
• Bachelor of Applied Computer Science
– University of Winnipeg
• Passed Chartered Financial Analyst (CFA) Level 1
exam (pass rate: 33% to 40%)
3 Copyright Rodger Lepinsky
2014
![Page 4: Graph Databases Overview and Applications - · PDF fileGraph Databases Overview and Applications By Rodger Lepinsky University of Winnipeg ... • Tools: Neo4J, Infinite Graph, OrientDB](https://reader030.fdocuments.net/reader030/viewer/2022020213/5a71697d7f8b9aa7538cd2bf/html5/thumbnails/4.jpg)
Rodger Lepinsky – RDBMS
• RDBMS Expert
• DB Architecture, Design, Development, Warehousing, Tuning, DB Administration
• Working with databases since 1992
• With enterprise Oracle, SQL Server, Sybase databases since 1995
• Oracle User Groups Presentations: – High Speed Database Tuning
– Cartesian products
• Technical Blog: rodgersnotes.Wordpress.com
• Twitter: @rodgernotes
Copyright Rodger Lepinsky
2014
4
![Page 5: Graph Databases Overview and Applications - · PDF fileGraph Databases Overview and Applications By Rodger Lepinsky University of Winnipeg ... • Tools: Neo4J, Infinite Graph, OrientDB](https://reader030.fdocuments.net/reader030/viewer/2022020213/5a71697d7f8b9aa7538cd2bf/html5/thumbnails/5.jpg)
Copyright Rodger Lepinsky
2014
Tim Gasper - Big Data Right Now: Five Trendy Open Source Technologies
http://techcrunch.com/2012/10/27/big-data-right-now-five-trendy-open-source-technologies/
5
The RDBMS world is
changing rapidly
![Page 6: Graph Databases Overview and Applications - · PDF fileGraph Databases Overview and Applications By Rodger Lepinsky University of Winnipeg ... • Tools: Neo4J, Infinite Graph, OrientDB](https://reader030.fdocuments.net/reader030/viewer/2022020213/5a71697d7f8b9aa7538cd2bf/html5/thumbnails/6.jpg)
Copyright Rodger Lepinsky
November 2013
http://blogs.the451group.com/information_management/2013/06/10/updated-database-landscape-map-june-2013/
https://blogs.the451group.com/information_management/files/2013/06/451db_map_06.13.jpg
6
![Page 7: Graph Databases Overview and Applications - · PDF fileGraph Databases Overview and Applications By Rodger Lepinsky University of Winnipeg ... • Tools: Neo4J, Infinite Graph, OrientDB](https://reader030.fdocuments.net/reader030/viewer/2022020213/5a71697d7f8b9aa7538cd2bf/html5/thumbnails/7.jpg)
Big Data
Volume, velocity, variety of data
Often machine generated:
Internet logs/analytics
Sensors in machines like modern jets
Online gaming companies: ½ terabyte of new data,
daily
Copyright Rodger Lepinsky
2014
7
![Page 8: Graph Databases Overview and Applications - · PDF fileGraph Databases Overview and Applications By Rodger Lepinsky University of Winnipeg ... • Tools: Neo4J, Infinite Graph, OrientDB](https://reader030.fdocuments.net/reader030/viewer/2022020213/5a71697d7f8b9aa7538cd2bf/html5/thumbnails/8.jpg)
Google’s Paper – IEEE 2009
• The Unreasonable Effectiveness of Data: Alon Halevy, Peter Norvig, and Fernando Pereira
• “simple models and a lot of data trump more elaborate models based on less data.”
• “simple n-gram models or linear classifiers based on millions of specific features perform better than elaborate models that try to discover general rules.”
Copyright Rodger Lepinsky
2014
http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/pubs/archive/351
79.pdf
8
![Page 9: Graph Databases Overview and Applications - · PDF fileGraph Databases Overview and Applications By Rodger Lepinsky University of Winnipeg ... • Tools: Neo4J, Infinite Graph, OrientDB](https://reader030.fdocuments.net/reader030/viewer/2022020213/5a71697d7f8b9aa7538cd2bf/html5/thumbnails/9.jpg)
Big Data - Google Ngram
Copyright Rodger Lepinsky
2014
books.google.com/ngrams 9
“each n-gram sequence from a corpus of billions or trillions of words”
![Page 10: Graph Databases Overview and Applications - · PDF fileGraph Databases Overview and Applications By Rodger Lepinsky University of Winnipeg ... • Tools: Neo4J, Infinite Graph, OrientDB](https://reader030.fdocuments.net/reader030/viewer/2022020213/5a71697d7f8b9aa7538cd2bf/html5/thumbnails/10.jpg)
Big Data, NOSQL Databases
• NOSQL: Not Only SQL
• Also called New SQL
• Four main types of NOSQL Databases:
• Key Value
• Column
• Document
• Graph Database
Copyright Rodger Lepinsky
2014
10
![Page 11: Graph Databases Overview and Applications - · PDF fileGraph Databases Overview and Applications By Rodger Lepinsky University of Winnipeg ... • Tools: Neo4J, Infinite Graph, OrientDB](https://reader030.fdocuments.net/reader030/viewer/2022020213/5a71697d7f8b9aa7538cd2bf/html5/thumbnails/11.jpg)
NOSQL DB Compared
Copyright Rodger Lepinsky
2014
Michel Domenjoud - Graph databases: an overview
http://blog.octo.com/en/graph-databases-an-overview/
11
![Page 12: Graph Databases Overview and Applications - · PDF fileGraph Databases Overview and Applications By Rodger Lepinsky University of Winnipeg ... • Tools: Neo4J, Infinite Graph, OrientDB](https://reader030.fdocuments.net/reader030/viewer/2022020213/5a71697d7f8b9aa7538cd2bf/html5/thumbnails/12.jpg)
NOSQL DB – Key Value
• Works like a simple hashtable
• Tools: Memcached, Amazon’s Dynamo, Project
Voldemort, Riak, Redis
• Twitter, StackOverFlow, Instagram, Youtube,
Wikipedia
• Use: Store user information, like Session, Profiles,
Preferences, Shopping Cart
• Drawback: Can’t query by value. No relationships.
No rollbacks.
Copyright Rodger Lepinsky
2014
http://pauloortins.com/nosql-databases-why-we-should-use-and-which-one-we-should-choose/ 12
![Page 13: Graph Databases Overview and Applications - · PDF fileGraph Databases Overview and Applications By Rodger Lepinsky University of Winnipeg ... • Tools: Neo4J, Infinite Graph, OrientDB](https://reader030.fdocuments.net/reader030/viewer/2022020213/5a71697d7f8b9aa7538cd2bf/html5/thumbnails/13.jpg)
NOSQL – Column Databases
• Store data in column families. Ie. Person is usually
queried by name or id, not salary.
• Tools: Cassandra, Hbase
• Ebay, Instagram, NASA, Twitter, Facebook, Yahoo
• Use: Logging, and Blogging. Tags, categories,
posts in different column families.
• Drawback: No ACID transactions
• (Column databases are used in data warehouses)
Copyright Rodger Lepinsky
2014
http://pauloortins.com/nosql-databases-why-we-should-use-and-which-one-we-should-choose/ 13
![Page 14: Graph Databases Overview and Applications - · PDF fileGraph Databases Overview and Applications By Rodger Lepinsky University of Winnipeg ... • Tools: Neo4J, Infinite Graph, OrientDB](https://reader030.fdocuments.net/reader030/viewer/2022020213/5a71697d7f8b9aa7538cd2bf/html5/thumbnails/14.jpg)
NOSQL – Document Databases
• Store data as documents using XML, JSON or
JSONB
• Tools: MongoDB, CouchDB, RavenDB
• SAP, Codecademy, Foursquare, NBC News
• Use: No fixed schema. Store different info.
• Drawback: Does not support transactions between
documents
Copyright Rodger Lepinsky
2014
http://pauloortins.com/nosql-databases-why-we-should-use-and-which-one-we-should-choose/ 14
![Page 15: Graph Databases Overview and Applications - · PDF fileGraph Databases Overview and Applications By Rodger Lepinsky University of Winnipeg ... • Tools: Neo4J, Infinite Graph, OrientDB](https://reader030.fdocuments.net/reader030/viewer/2022020213/5a71697d7f8b9aa7538cd2bf/html5/thumbnails/15.jpg)
NOSQL – Graph Databases
• Store data as graphs, not rows and columns.
• Tools: Neo4J, Infinite Graph, OrientDB
• Linked In, Facebook, Google, NSA
• Use: with data that is connected.
• Not all data can be modeled in graph. Ie.
Spreadsheets rows and columns are better in
RDBMS.
Copyright Rodger Lepinsky
2014
http://pauloortins.com/nosql-databases-why-we-should-use-and-which-one-we-should-choose/ 15
![Page 16: Graph Databases Overview and Applications - · PDF fileGraph Databases Overview and Applications By Rodger Lepinsky University of Winnipeg ... • Tools: Neo4J, Infinite Graph, OrientDB](https://reader030.fdocuments.net/reader030/viewer/2022020213/5a71697d7f8b9aa7538cd2bf/html5/thumbnails/16.jpg)
RDBMS Data Structure
• Rows and columns, like a spreadsheet
• Rows added/deleted, and columns updated frequently
• Table structures never change without a conscious decision
• Unlike programs, Relational DB Design is rarely refined
• Result: Awful DB designs are put into production, and huge amounts of code required to make them work.
• See: DB Design Mistakes To Avoid by Lepinsky
Copyright Rodger Lepinsky
2014
http://rodgersnotes.wordpress.com/2010/09/14/database-design-mistakes-to-avoid/ 16
![Page 17: Graph Databases Overview and Applications - · PDF fileGraph Databases Overview and Applications By Rodger Lepinsky University of Winnipeg ... • Tools: Neo4J, Infinite Graph, OrientDB](https://reader030.fdocuments.net/reader030/viewer/2022020213/5a71697d7f8b9aa7538cd2bf/html5/thumbnails/17.jpg)
RDBMS Data Model
Copyright Rodger Lepinsky
2014
http://blog.octo.com/en/graph-databases-an-overview/ 17
Four tables
Many rows in each table
![Page 18: Graph Databases Overview and Applications - · PDF fileGraph Databases Overview and Applications By Rodger Lepinsky University of Winnipeg ... • Tools: Neo4J, Infinite Graph, OrientDB](https://reader030.fdocuments.net/reader030/viewer/2022020213/5a71697d7f8b9aa7538cd2bf/html5/thumbnails/18.jpg)
Graph DB Data Structure
• Nodes/Vertices, and Edges
• Adding or modifying Nodes or Edges changes the
structure
• Structure constantly changing, as nodes and
edges are inserted and updated.
Copyright Rodger Lepinsky
2014
18
![Page 19: Graph Databases Overview and Applications - · PDF fileGraph Databases Overview and Applications By Rodger Lepinsky University of Winnipeg ... • Tools: Neo4J, Infinite Graph, OrientDB](https://reader030.fdocuments.net/reader030/viewer/2022020213/5a71697d7f8b9aa7538cd2bf/html5/thumbnails/19.jpg)
Graph DB Data Model
Copyright Rodger Lepinsky
2014
http://blog.octo.com/en/graph-databases-an-overview/ 19
Each row becomes a
node
Many nodes
Rows in M:N tables
become an edge
between nodes
New nodes can be
inserted at will
Ie. News events
![Page 20: Graph Databases Overview and Applications - · PDF fileGraph Databases Overview and Applications By Rodger Lepinsky University of Winnipeg ... • Tools: Neo4J, Infinite Graph, OrientDB](https://reader030.fdocuments.net/reader030/viewer/2022020213/5a71697d7f8b9aa7538cd2bf/html5/thumbnails/20.jpg)
RDBMS vs Graph
• RDBMS:
• Good fit for static data
structures, that do not
change much
• Ubiquitous in
business.
• Graph:
• Good for semi- or un-
structured data
• Fits complex and
dynamic data better
• Assumption: the
relationships are as
important as the
records
Copyright Rodger Lepinsky
2014
http://www.zdnet.com/facebook-neo4j-7000009866/ 20
![Page 21: Graph Databases Overview and Applications - · PDF fileGraph Databases Overview and Applications By Rodger Lepinsky University of Winnipeg ... • Tools: Neo4J, Infinite Graph, OrientDB](https://reader030.fdocuments.net/reader030/viewer/2022020213/5a71697d7f8b9aa7538cd2bf/html5/thumbnails/21.jpg)
RDBMS vs Graph
• RDBMS:
• Join and query
multiple tables to see
relationship
• Retrieve rows and
columns
• Graph:
• Query nodes and
edges
• Edges are the
relationship
• Relationship (edge) is
labelled
• Queries can return
edges only
Copyright Rodger Lepinsky
2014
http://www.zdnet.com/facebook-neo4j-7000009866/ 21
![Page 22: Graph Databases Overview and Applications - · PDF fileGraph Databases Overview and Applications By Rodger Lepinsky University of Winnipeg ... • Tools: Neo4J, Infinite Graph, OrientDB](https://reader030.fdocuments.net/reader030/viewer/2022020213/5a71697d7f8b9aa7538cd2bf/html5/thumbnails/22.jpg)
Transitive Closure
Copyright Rodger Lepinsky
2014
http://techportal.inviqa.com/2009/09/07/graphs-in-the-database-sql-meets-social-networks
https://en.wikipedia.org/wiki/Transitive_closure
22
Lorenzo Alberton :
SQL has historically
been unable to express
recursive functions
needed to maintain the
transitive closure of a
graph without an
auxiliary table.
![Page 23: Graph Databases Overview and Applications - · PDF fileGraph Databases Overview and Applications By Rodger Lepinsky University of Winnipeg ... • Tools: Neo4J, Infinite Graph, OrientDB](https://reader030.fdocuments.net/reader030/viewer/2022020213/5a71697d7f8b9aa7538cd2bf/html5/thumbnails/23.jpg)
BIOIT Problem
Copyright Rodger Lepinsky
2014
http://www.nasa.gov/audience/foreducators/postsecondary/features/F_Space_Radiation_Project_prt.htm 23
2003 – BIOIT
conference
How to model
(DNA) molecules
in RDBMS?
![Page 24: Graph Databases Overview and Applications - · PDF fileGraph Databases Overview and Applications By Rodger Lepinsky University of Winnipeg ... • Tools: Neo4J, Infinite Graph, OrientDB](https://reader030.fdocuments.net/reader030/viewer/2022020213/5a71697d7f8b9aa7538cd2bf/html5/thumbnails/24.jpg)
Tree Structures
• Difficult to do represent or use in RDBMS
• Easy in Graph DB
• Lorenzo Alberton: Trees In The Database,
Attempts to represent trees in RDBMS/SQL.
• 128 slides, but still no simple or complete solution.
Copyright Rodger Lepinsky
2014
http://www.slideshare.net/quipo/trees-in-the-database-advanced-data-structures 24
![Page 25: Graph Databases Overview and Applications - · PDF fileGraph Databases Overview and Applications By Rodger Lepinsky University of Winnipeg ... • Tools: Neo4J, Infinite Graph, OrientDB](https://reader030.fdocuments.net/reader030/viewer/2022020213/5a71697d7f8b9aa7538cd2bf/html5/thumbnails/25.jpg)
My First Graph Problem
• DBA_Objects are created in a tree structure.
• Type, used in a table, used in a view, view used by
multiple procedures.
• You can have a single procedure, reading 15
tables: pyramid
• Can have one table, Customer, or Error_Log, used
by many procedures: inverted pyramid.
• What’s the order of operations to build the objects?
• N factorial?
Copyright Rodger Lepinsky
2014
http://rodgersnotes.wordpress.com/2013/07/31/dba_objects-tree-modelled-as-a-graph/ 25
![Page 26: Graph Databases Overview and Applications - · PDF fileGraph Databases Overview and Applications By Rodger Lepinsky University of Winnipeg ... • Tools: Neo4J, Infinite Graph, OrientDB](https://reader030.fdocuments.net/reader030/viewer/2022020213/5a71697d7f8b9aa7538cd2bf/html5/thumbnails/26.jpg)
DBA_Objects
• Oracle’s DBA_Dependencies only refers one level
up, or down.
• Many recursive reads required to see the whole
structure, and correct order of operations.
• SQL output: more like directory structure.
• Ultimate problem: SQL output is in rows and
columns. But object structure is actually a tree.
• Software to solve problem by Yuri Slutsky:
• http://www.samtrest.com/
Copyright Rodger Lepinsky
2014
http://rodgersnotes.wordpress.com/2013/07/31/dba_objects-tree-modelled-as-a-graph/ 26
![Page 27: Graph Databases Overview and Applications - · PDF fileGraph Databases Overview and Applications By Rodger Lepinsky University of Winnipeg ... • Tools: Neo4J, Infinite Graph, OrientDB](https://reader030.fdocuments.net/reader030/viewer/2022020213/5a71697d7f8b9aa7538cd2bf/html5/thumbnails/27.jpg)
DBA_Objects
• SQL output: single object found in multiple places and branches in the output. No clear order of operations.
• OBJECT_LVL_OBJID_ROWNUM
• ------------------------------------------------------------
• PACKAGE BODY APPS.HR_DELETE 1 278801 1
• PACKAGE BODY APPS.HR_DELETE 1 278801 2
• PACKAGE BODY APPS.HR_DELETE 1 278801 3
• SYNONYM PUBLIC.USER_CATALOG 2 1167 4
• VIEW SYS.USER_CATALOG 3 1166 5
• VIEW SYS.USER_CATALOG 3 1166 6
• VIEW SYS._CURRENT_EDITION_OBJ 4 3270113 7
• VIEW SYS._CURRENT_EDITION_OBJ 4 3270113 8
• PACKAGE BODY APPS.HR_DELETE 1 278801 9
• SYNONYM PUBLIC.DBMS_SQL 2 2328 10
• PACKAGE SYS.DBMS_SQL 3 2327 11
• PACKAGE SYS.DBMS_SQL 3 2327 12
• PACKAGE SYS.DBMS_SQL 3 2327 13
• PACKAGE SYS.UTL_IDENT 4 3291213 14
Copyright Rodger Lepinsky
2014
http://rodgersnotes.wordpress.com/2011/12/29/the-parents-and-the-order-of-operations/ 27
![Page 28: Graph Databases Overview and Applications - · PDF fileGraph Databases Overview and Applications By Rodger Lepinsky University of Winnipeg ... • Tools: Neo4J, Infinite Graph, OrientDB](https://reader030.fdocuments.net/reader030/viewer/2022020213/5a71697d7f8b9aa7538cd2bf/html5/thumbnails/28.jpg)
DBA_OBJECTS as a Graph (Subset)
Copyright Rodger Lepinsky
2014
http://rodgersnotes.wordpress.com/2013/07/31/dba_objects-tree-modelled-as-a-graph/ 28
![Page 29: Graph Databases Overview and Applications - · PDF fileGraph Databases Overview and Applications By Rodger Lepinsky University of Winnipeg ... • Tools: Neo4J, Infinite Graph, OrientDB](https://reader030.fdocuments.net/reader030/viewer/2022020213/5a71697d7f8b9aa7538cd2bf/html5/thumbnails/29.jpg)
DBA_OBJECTS as a Graph (Gephi)
Copyright Rodger Lepinsky
2014
http://rodgersnotes.wordpress.com/2013/08/06/visualizing-fifty-thousand-dba_objects/ 29
![Page 30: Graph Databases Overview and Applications - · PDF fileGraph Databases Overview and Applications By Rodger Lepinsky University of Winnipeg ... • Tools: Neo4J, Infinite Graph, OrientDB](https://reader030.fdocuments.net/reader030/viewer/2022020213/5a71697d7f8b9aa7538cd2bf/html5/thumbnails/30.jpg)
Graph Structures
Directed Undirected
Copyright Rodger Lepinsky
2014
http://techportal.inviqa.com/2009/09/07/graphs-in-the-database-sql-meets-social-networks/
30
![Page 31: Graph Databases Overview and Applications - · PDF fileGraph Databases Overview and Applications By Rodger Lepinsky University of Winnipeg ... • Tools: Neo4J, Infinite Graph, OrientDB](https://reader030.fdocuments.net/reader030/viewer/2022020213/5a71697d7f8b9aa7538cd2bf/html5/thumbnails/31.jpg)
Applications For Graphs
• Where the data model is connected:
• social
• telecommunications
• logistics
• master data management
• bioinformatics
• fraud detection
Copyright Rodger Lepinsky
2014
http://www.databaserevolution.com/2012/11/nosql-big-data-and-graphs/ 31
![Page 32: Graph Databases Overview and Applications - · PDF fileGraph Databases Overview and Applications By Rodger Lepinsky University of Winnipeg ... • Tools: Neo4J, Infinite Graph, OrientDB](https://reader030.fdocuments.net/reader030/viewer/2022020213/5a71697d7f8b9aa7538cd2bf/html5/thumbnails/32.jpg)
Applications For Graphs
• Data Connections and complex interrelationships:
• network management
• content management
• property and asset management
• relationship management (CRM, ERM),
• Not only does an association between nodes state that a relationship exists, but also describes how.
•
• Most of the data inside of the enterprise is very complex: Key/value stores may not work.
Copyright Rodger Lepinsky
2014
http://readwrite.com/2011/10/26/latest-neo4j-nosql-release-tak 32
![Page 33: Graph Databases Overview and Applications - · PDF fileGraph Databases Overview and Applications By Rodger Lepinsky University of Winnipeg ... • Tools: Neo4J, Infinite Graph, OrientDB](https://reader030.fdocuments.net/reader030/viewer/2022020213/5a71697d7f8b9aa7538cd2bf/html5/thumbnails/33.jpg)
Applications For Graphs
• Aggregate data stores (key-value, column,
document db) - solve problems related to atomic
intelligence
• Graph databases - leverage connected
intelligence
Copyright Rodger Lepinsky
2014
http://www.databaserevolution.com/2012/11/nosql-big-data-and-graphs/ 33
![Page 34: Graph Databases Overview and Applications - · PDF fileGraph Databases Overview and Applications By Rodger Lepinsky University of Winnipeg ... • Tools: Neo4J, Infinite Graph, OrientDB](https://reader030.fdocuments.net/reader030/viewer/2022020213/5a71697d7f8b9aa7538cd2bf/html5/thumbnails/34.jpg)
Application For Graphs
• social networking
• logistics networks (for package routing)
• financial transaction graphs (for detecting fraud)
• telecommunications networks
• ad optimization
• recommendation engines
• bioinformatics
Copyright Rodger Lepinsky
2014
http://www.zdnet.com/facebook-neo4j-7000009866/ 34
![Page 35: Graph Databases Overview and Applications - · PDF fileGraph Databases Overview and Applications By Rodger Lepinsky University of Winnipeg ... • Tools: Neo4J, Infinite Graph, OrientDB](https://reader030.fdocuments.net/reader030/viewer/2022020213/5a71697d7f8b9aa7538cd2bf/html5/thumbnails/35.jpg)
Application For Graphs
• Social
• Recommendations
• Geo
• Logistics Networks: for package routing, finding shortest Path
• Financial Transaction Graphs: for fraud detection
• Master Data Management
• Bioinformatics: Era7 to relate complex web of information that includes genes, proteins and enzymes
• Authorization and Access Control: Adobe Creative Cloud, Telenor
Copyright Rodger Lepinsky
2014
http://www.slideshare.net/gagana24/graph-db 35
![Page 36: Graph Databases Overview and Applications - · PDF fileGraph Databases Overview and Applications By Rodger Lepinsky University of Winnipeg ... • Tools: Neo4J, Infinite Graph, OrientDB](https://reader030.fdocuments.net/reader030/viewer/2022020213/5a71697d7f8b9aa7538cd2bf/html5/thumbnails/36.jpg)
Applications For Graphs
• friend-of-friend
• shortest path
• Gartner: “five richest big data sources on the
Web:”
• social graph
• intent graph
• consumption graph
• interest graph
• mobile graph
Copyright Rodger Lepinsky
2014
http://www.databaserevolution.com/2012/11/nosql-big-data-and-graphs/ 36
![Page 37: Graph Databases Overview and Applications - · PDF fileGraph Databases Overview and Applications By Rodger Lepinsky University of Winnipeg ... • Tools: Neo4J, Infinite Graph, OrientDB](https://reader030.fdocuments.net/reader030/viewer/2022020213/5a71697d7f8b9aa7538cd2bf/html5/thumbnails/37.jpg)
Organizations Using Graph Databases
• Linked In
• Cisco
• Mozilla (Firefox)
• T-Mobile
• NSA – US National Security Agency
Copyright Rodger Lepinsky
2014
37
![Page 38: Graph Databases Overview and Applications - · PDF fileGraph Databases Overview and Applications By Rodger Lepinsky University of Winnipeg ... • Tools: Neo4J, Infinite Graph, OrientDB](https://reader030.fdocuments.net/reader030/viewer/2022020213/5a71697d7f8b9aa7538cd2bf/html5/thumbnails/38.jpg)
Social Network Analysis
• Facebook became one of the most prominent
technology companies in the world by
understanding that the relationships connecting
people are just as important as the people
themselves.
• Linked IN: Relationships matter
Copyright Rodger Lepinsky
2014
http://www.computerweekly.com/feature/Whiteboard-it-the-power-of-graph-databases 38
![Page 39: Graph Databases Overview and Applications - · PDF fileGraph Databases Overview and Applications By Rodger Lepinsky University of Winnipeg ... • Tools: Neo4J, Infinite Graph, OrientDB](https://reader030.fdocuments.net/reader030/viewer/2022020213/5a71697d7f8b9aa7538cd2bf/html5/thumbnails/39.jpg)
• Facebook’s Graph Search feature contains billions
of nodes and trillions of edges (understood to be in
the low trillions)
• Facebook users are generating more than 500
terabytes of new data every day.
Copyright Rodger Lepinsky
2014
http://gigaom.com/2013/06/06/heres-how-the-nsa-analyzes-all-that-call-data/
http://gigaom.com/2013/06/07/under-the-covers-of-the-nsas-big-data-effort/
39
![Page 40: Graph Databases Overview and Applications - · PDF fileGraph Databases Overview and Applications By Rodger Lepinsky University of Winnipeg ... • Tools: Neo4J, Infinite Graph, OrientDB](https://reader030.fdocuments.net/reader030/viewer/2022020213/5a71697d7f8b9aa7538cd2bf/html5/thumbnails/40.jpg)
Copyright Rodger Lepinsky
2014
40 Brendan Griffen –The Graph Of A Social Network
http://griffsgraphs.com/2012/07/02/a-facebook-network/
Facebook User’s
Network of
Connections
![Page 41: Graph Databases Overview and Applications - · PDF fileGraph Databases Overview and Applications By Rodger Lepinsky University of Winnipeg ... • Tools: Neo4J, Infinite Graph, OrientDB](https://reader030.fdocuments.net/reader030/viewer/2022020213/5a71697d7f8b9aa7538cd2bf/html5/thumbnails/41.jpg)
Copyright Rodger Lepinsky
November 2013
Inmaps.linkedinlabs.com 41 Inmaps.LinkedInLabs.com, LinkedIn User’s Network
![Page 42: Graph Databases Overview and Applications - · PDF fileGraph Databases Overview and Applications By Rodger Lepinsky University of Winnipeg ... • Tools: Neo4J, Infinite Graph, OrientDB](https://reader030.fdocuments.net/reader030/viewer/2022020213/5a71697d7f8b9aa7538cd2bf/html5/thumbnails/42.jpg)
Social Network Research Study
• Leskovic and Horvitz - 2008
• Analyzed Whole of Microsoft Messenger System
• 30 billion conversations
• 240 million people
• Mean: 125 conversations per person
Copyright Rodger Lepinsky
2014
arXiv.org > physics > arXi:0803.0939, Leskovic and Horvitz - 2008
42
![Page 43: Graph Databases Overview and Applications - · PDF fileGraph Databases Overview and Applications By Rodger Lepinsky University of Winnipeg ... • Tools: Neo4J, Infinite Graph, OrientDB](https://reader030.fdocuments.net/reader030/viewer/2022020213/5a71697d7f8b9aa7538cd2bf/html5/thumbnails/43.jpg)
Social Network Research Study
• Social network
• 180 million nodes
• 1.3 billion undirected edges
• Graph is well connected and robust to node
removal
• Average path length among messenger users: 6.6
• "Six degrees of separation"
Copyright Rodger Lepinsky
2014
Leskovic and Horvitz - 2008 43
![Page 44: Graph Databases Overview and Applications - · PDF fileGraph Databases Overview and Applications By Rodger Lepinsky University of Winnipeg ... • Tools: Neo4J, Infinite Graph, OrientDB](https://reader030.fdocuments.net/reader030/viewer/2022020213/5a71697d7f8b9aa7538cd2bf/html5/thumbnails/44.jpg)
Copyright Rodger Lepinsky
2014
Simon Rapier - Graphing the history of philosophy
http://drunks-and-lampposts.com/2012/06/13/graphing-the-history-of-philosophy/
44
Use Case:
History of philosophy
Each philosopher is a
node in the network.
Edges represents lines
of influence.
SPARQL - language to
query the semantic
web
Queries information
that is structured in
triples
subject-relationship-
object
![Page 45: Graph Databases Overview and Applications - · PDF fileGraph Databases Overview and Applications By Rodger Lepinsky University of Winnipeg ... • Tools: Neo4J, Infinite Graph, OrientDB](https://reader030.fdocuments.net/reader030/viewer/2022020213/5a71697d7f8b9aa7538cd2bf/html5/thumbnails/45.jpg)
Copyright Rodger Lepinsky
2014
Simon Rapier - Graphing the history of philosophy
http://drunks-and-lampposts.com/2012/06/13/graphing-the-history-of-philosophy/
45
![Page 46: Graph Databases Overview and Applications - · PDF fileGraph Databases Overview and Applications By Rodger Lepinsky University of Winnipeg ... • Tools: Neo4J, Infinite Graph, OrientDB](https://reader030.fdocuments.net/reader030/viewer/2022020213/5a71697d7f8b9aa7538cd2bf/html5/thumbnails/46.jpg)
Copyright Rodger Lepinsky
2014
Brendan Griffen - The Graph of Ideas:
http://griffsgraphs.com/2012/07/03/graphing-every-idea-in-history/
46
![Page 47: Graph Databases Overview and Applications - · PDF fileGraph Databases Overview and Applications By Rodger Lepinsky University of Winnipeg ... • Tools: Neo4J, Infinite Graph, OrientDB](https://reader030.fdocuments.net/reader030/viewer/2022020213/5a71697d7f8b9aa7538cd2bf/html5/thumbnails/47.jpg)
Social Network of Homer
Copyright Rodger Lepinsky
2014
http://www.technologyreview.com/view/516081/the-remarkable-properties-of-mythological-social-networks/
47
The social network
between characters
in Homer’s Odyssey
is remarkably similar
to real social
networks today.
Suggests the story is
based, at least in
part, on real events
![Page 48: Graph Databases Overview and Applications - · PDF fileGraph Databases Overview and Applications By Rodger Lepinsky University of Winnipeg ... • Tools: Neo4J, Infinite Graph, OrientDB](https://reader030.fdocuments.net/reader030/viewer/2022020213/5a71697d7f8b9aa7538cd2bf/html5/thumbnails/48.jpg)
Social Network of Alexander The Great
Copyright Rodger Lepinsky
2014
http://www.academia.edu/2153390/The_Social_network_of_Alexander_the_Great_Social_Network_Analysis_in_
Ancient_History
Diane H. Cline, Ph.D. University of Cincinnati
48
![Page 49: Graph Databases Overview and Applications - · PDF fileGraph Databases Overview and Applications By Rodger Lepinsky University of Winnipeg ... • Tools: Neo4J, Infinite Graph, OrientDB](https://reader030.fdocuments.net/reader030/viewer/2022020213/5a71697d7f8b9aa7538cd2bf/html5/thumbnails/49.jpg)
World Events
• US diplomatic relations with Algiers
• Network of parties involved
• 1785 to 1800
• Green: Algiers
• Red: United States
• Purple: England
• Light blue: Tripoli
• Darker blue: France
• Light purple: Spain
• Yellow: Portugal
• Orange: Sweden
Copyright Rodger Lepinsky
2014
A Graph of Diplomatic Wrangling in Algiers
http://abbymullen.org/a-graph-of-diplomatic-wrangling-in-algiers/
49
![Page 50: Graph Databases Overview and Applications - · PDF fileGraph Databases Overview and Applications By Rodger Lepinsky University of Winnipeg ... • Tools: Neo4J, Infinite Graph, OrientDB](https://reader030.fdocuments.net/reader030/viewer/2022020213/5a71697d7f8b9aa7538cd2bf/html5/thumbnails/50.jpg)
Copyright Rodger Lepinsky
2014
A Graph of Diplomatic Wrangling in Algiers
http://abbymullen.org/a-graph-of-diplomatic-wrangling-in-algiers/
50
World Events
![Page 51: Graph Databases Overview and Applications - · PDF fileGraph Databases Overview and Applications By Rodger Lepinsky University of Winnipeg ... • Tools: Neo4J, Infinite Graph, OrientDB](https://reader030.fdocuments.net/reader030/viewer/2022020213/5a71697d7f8b9aa7538cd2bf/html5/thumbnails/51.jpg)
Copyright Rodger Lepinsky
2014
MarketVisual.com
http://v9.marketvisual.com/d/d06610aa-dafc-4a6c-b846-5eb3268e8780
51
Marketing
Intelligence Are there any conflicts
of interest in our
proposal?
Who could refer or
introduce me to Larry
Ellison?
![Page 52: Graph Databases Overview and Applications - · PDF fileGraph Databases Overview and Applications By Rodger Lepinsky University of Winnipeg ... • Tools: Neo4J, Infinite Graph, OrientDB](https://reader030.fdocuments.net/reader030/viewer/2022020213/5a71697d7f8b9aa7538cd2bf/html5/thumbnails/52.jpg)
Market Intelligence
Copyright Rodger Lepinsky
2014
MarketVisual.com
http://v9.marketvisual.com/Profile/MapFull?eid=d06610aa-dafc-4a6c-b846-5eb3268e8780
52
![Page 53: Graph Databases Overview and Applications - · PDF fileGraph Databases Overview and Applications By Rodger Lepinsky University of Winnipeg ... • Tools: Neo4J, Infinite Graph, OrientDB](https://reader030.fdocuments.net/reader030/viewer/2022020213/5a71697d7f8b9aa7538cd2bf/html5/thumbnails/53.jpg)
Financial Exposure
Copyright Rodger Lepinsky
2014
MarketVisual.com
http://v9.marketvisual.com/Profile/MapFull?eid=d06610aa-dafc-4a6c-b846-5eb3268e8780
53
![Page 54: Graph Databases Overview and Applications - · PDF fileGraph Databases Overview and Applications By Rodger Lepinsky University of Winnipeg ... • Tools: Neo4J, Infinite Graph, OrientDB](https://reader030.fdocuments.net/reader030/viewer/2022020213/5a71697d7f8b9aa7538cd2bf/html5/thumbnails/54.jpg)
Financial Networks
Copyright Rodger Lepinsky
2014
The network of the top borrowers.
http://www.nature.com/srep/2012/120802/srep00541/full/srep00541.html
54
Top borrowers, Financial Exposure
![Page 55: Graph Databases Overview and Applications - · PDF fileGraph Databases Overview and Applications By Rodger Lepinsky University of Winnipeg ... • Tools: Neo4J, Infinite Graph, OrientDB](https://reader030.fdocuments.net/reader030/viewer/2022020213/5a71697d7f8b9aa7538cd2bf/html5/thumbnails/55.jpg)
Corporate Ownership
• Can be very complex.
• Corporate Ownership can actually be circular:
• A owns B. B owns C. C owns stock in A.
• Accounting rules: conglomerates must aggregate
intra-company sales.
Copyright Rodger Lepinsky
2014
55
![Page 56: Graph Databases Overview and Applications - · PDF fileGraph Databases Overview and Applications By Rodger Lepinsky University of Winnipeg ... • Tools: Neo4J, Infinite Graph, OrientDB](https://reader030.fdocuments.net/reader030/viewer/2022020213/5a71697d7f8b9aa7538cd2bf/html5/thumbnails/56.jpg)
http://opencorporates.com 56
Canada - 17
Cayman Islands - 739
USA - 1475
Australia - 116
New Zealand - 155
Netherlands - 50
Luxembourg - 202
Ireland - 90
United Kingdom - 127
Japan - 63
Brazil - 27
Mauritius - 50
South Africa - 14
Germany - 368
India - 27
Hong Kong- 8
Bermuda - 19
Corporate Ownership – Goldman Sachs
Copyright Rodger Lepinsky
2014
![Page 57: Graph Databases Overview and Applications - · PDF fileGraph Databases Overview and Applications By Rodger Lepinsky University of Winnipeg ... • Tools: Neo4J, Infinite Graph, OrientDB](https://reader030.fdocuments.net/reader030/viewer/2022020213/5a71697d7f8b9aa7538cd2bf/html5/thumbnails/57.jpg)
Corporate Ownership – TransUnion Canada
Copyright Rodger Lepinsky
2014
http://opencorporates.com 57
![Page 58: Graph Databases Overview and Applications - · PDF fileGraph Databases Overview and Applications By Rodger Lepinsky University of Winnipeg ... • Tools: Neo4J, Infinite Graph, OrientDB](https://reader030.fdocuments.net/reader030/viewer/2022020213/5a71697d7f8b9aa7538cd2bf/html5/thumbnails/58.jpg)
GSCP VI Parallel North Holding Corporation
Copyright Rodger Lepinsky
2014
http://opencorporates.com 58
![Page 59: Graph Databases Overview and Applications - · PDF fileGraph Databases Overview and Applications By Rodger Lepinsky University of Winnipeg ... • Tools: Neo4J, Infinite Graph, OrientDB](https://reader030.fdocuments.net/reader030/viewer/2022020213/5a71697d7f8b9aa7538cd2bf/html5/thumbnails/59.jpg)
GS Asset Management International - UK
Copyright Rodger Lepinsky
2014
http://opencorporates.com 59
![Page 60: Graph Databases Overview and Applications - · PDF fileGraph Databases Overview and Applications By Rodger Lepinsky University of Winnipeg ... • Tools: Neo4J, Infinite Graph, OrientDB](https://reader030.fdocuments.net/reader030/viewer/2022020213/5a71697d7f8b9aa7538cd2bf/html5/thumbnails/60.jpg)
Goldman Sachs Group – As A Tree
Copyright Rodger Lepinsky
2014
http://opencorporates.com/companies/us_de/2923466/network 60
![Page 61: Graph Databases Overview and Applications - · PDF fileGraph Databases Overview and Applications By Rodger Lepinsky University of Winnipeg ... • Tools: Neo4J, Infinite Graph, OrientDB](https://reader030.fdocuments.net/reader030/viewer/2022020213/5a71697d7f8b9aa7538cd2bf/html5/thumbnails/61.jpg)
Goldman Sachs Group – As A Network
Copyright Rodger Lepinsky
2014
http://opencorporates.com/companies/us_de/2923466/network 61
![Page 62: Graph Databases Overview and Applications - · PDF fileGraph Databases Overview and Applications By Rodger Lepinsky University of Winnipeg ... • Tools: Neo4J, Infinite Graph, OrientDB](https://reader030.fdocuments.net/reader030/viewer/2022020213/5a71697d7f8b9aa7538cd2bf/html5/thumbnails/62.jpg)
Old Navy Canada – Subsidiary of The Gap
Copyright Rodger Lepinsky
2014
http://opencorporates.com/companies/ca/3659372/network 62
![Page 63: Graph Databases Overview and Applications - · PDF fileGraph Databases Overview and Applications By Rodger Lepinsky University of Winnipeg ... • Tools: Neo4J, Infinite Graph, OrientDB](https://reader030.fdocuments.net/reader030/viewer/2022020213/5a71697d7f8b9aa7538cd2bf/html5/thumbnails/63.jpg)
Pearson PLC
Copyright Rodger Lepinsky
2014
http://opencorporates.com/companies/ca/3659372/network 63
Pearson PLC
- UK Public
Limited Company
- Owns Pearson
Canada Finance
Unlimited
![Page 64: Graph Databases Overview and Applications - · PDF fileGraph Databases Overview and Applications By Rodger Lepinsky University of Winnipeg ... • Tools: Neo4J, Infinite Graph, OrientDB](https://reader030.fdocuments.net/reader030/viewer/2022020213/5a71697d7f8b9aa7538cd2bf/html5/thumbnails/64.jpg)
Google And Graphs
• Social networks: graphs that describe relationships among people.
• Transportation routes: create a graph of physical connections among geographical locations
• Paths of disease outbreaks form a graph
• Games among soccer teams
• Computer network topologies
• Citations among scientific papers
• Internet / World Wide Web: documents are vertices and links are edges.
Copyright Rodger Lepinsky
2014
http://googleresearch.blogspot.com/2009/06/large-scale-graph-computing-at-google.html
64
![Page 65: Graph Databases Overview and Applications - · PDF fileGraph Databases Overview and Applications By Rodger Lepinsky University of Winnipeg ... • Tools: Neo4J, Infinite Graph, OrientDB](https://reader030.fdocuments.net/reader030/viewer/2022020213/5a71697d7f8b9aa7538cd2bf/html5/thumbnails/65.jpg)
Google And Graphs
• Pregel: Google’s other data-processing infrastructure
• Google: MapReduce (Hadoop) is used for 80% of all the data processing needs: indexing web content, clustering engines for Google News, Google Trends, processing satellite imagery, language model processing for statistical machine translation, data backup and restore.
• The other 20% is handled by a lesser known infrastructure called “Pregel” which is optimized to mine relationships from “graphs”.
Copyright Rodger Lepinsky
2014
http://www.royans.net/arch/pregel-googles-other-data-processing-infrastructure/ 65
![Page 66: Graph Databases Overview and Applications - · PDF fileGraph Databases Overview and Applications By Rodger Lepinsky University of Winnipeg ... • Tools: Neo4J, Infinite Graph, OrientDB](https://reader030.fdocuments.net/reader030/viewer/2022020213/5a71697d7f8b9aa7538cd2bf/html5/thumbnails/66.jpg)
Google And Graphs
• Google extracts more than 200 signals from the web graph: language of webpages, number and quality of other pages pointing to it.
• Google: scalable infrastructure, named Pregel, to mine a wide range of graphs. In Pregel, programs are expressed as a sequence of iterations.
• PageRank, for example, takes only about 15 lines of code.
•
Copyright Rodger Lepinsky
2014
http://googleresearch.blogspot.com/2009/06/large-scale-graph-computing-at-google.html
66
![Page 67: Graph Databases Overview and Applications - · PDF fileGraph Databases Overview and Applications By Rodger Lepinsky University of Winnipeg ... • Tools: Neo4J, Infinite Graph, OrientDB](https://reader030.fdocuments.net/reader030/viewer/2022020213/5a71697d7f8b9aa7538cd2bf/html5/thumbnails/67.jpg)
Security – National Security Agency
• NSA Application: determine who else is in contact
with suspected terrorists
• Stores tens of petabytes of data
• Internal system, built on top of Hadoop
• Accumulo is able to process:
• 4.4-trillion-node, 70-trillion-edge graph.
• Human brains:
• 100 billion nodes/vertices, 100 trillion edges
Copyright Rodger Lepinsky
2014
http://gigaom.com/2013/06/06/heres-how-the-nsa-analyzes-all-that-call-data/
http://www.pdl.cmu.edu/SDI/2013/slides/big_graph_nsa_rd_2013_56002v1.pdf
67
![Page 68: Graph Databases Overview and Applications - · PDF fileGraph Databases Overview and Applications By Rodger Lepinsky University of Winnipeg ... • Tools: Neo4J, Infinite Graph, OrientDB](https://reader030.fdocuments.net/reader030/viewer/2022020213/5a71697d7f8b9aa7538cd2bf/html5/thumbnails/68.jpg)
National Security Agency - NSA
Copyright Rodger Lepinsky
2014
http://gigaom.com/2013/06/06/heres-how-the-nsa-analyzes-all-that-call-data/
http://www.pdl.cmu.edu/SDI/2013/slides/big_graph_nsa_rd_2013_56002v1.pdf
68
![Page 69: Graph Databases Overview and Applications - · PDF fileGraph Databases Overview and Applications By Rodger Lepinsky University of Winnipeg ... • Tools: Neo4J, Infinite Graph, OrientDB](https://reader030.fdocuments.net/reader030/viewer/2022020213/5a71697d7f8b9aa7538cd2bf/html5/thumbnails/69.jpg)
National Security Agency - NSA
Copyright Rodger Lepinsky
2014
http://gigaom.com/2013/06/06/heres-how-the-nsa-analyzes-all-that-call-data/
http://www.pdl.cmu.edu/SDI/2013/slides/big_graph_nsa_rd_2013_56002v1.pdf
69
• Largest supercomputer installations do not have enough memory to process the Brain Graph (3 PB)!
• Electrical power cost
• At 10 cents per kilowatt-hour — $7 million per year
• Class Scale Storage
• Toy 26 17 GB
• Mini 29 140 GB
• Small 32 1 TB
• Medium 36 17 TB
• Large 39 140 TB
• Huge 42 1.1 PB
![Page 70: Graph Databases Overview and Applications - · PDF fileGraph Databases Overview and Applications By Rodger Lepinsky University of Winnipeg ... • Tools: Neo4J, Infinite Graph, OrientDB](https://reader030.fdocuments.net/reader030/viewer/2022020213/5a71697d7f8b9aa7538cd2bf/html5/thumbnails/70.jpg)
Fraud – Cheque Kiting Scheme
Copyright Rodger Lepinsky
2014
http://photos.cleveland.com/plain-dealer/2012/02/19fvisualejpg.html 70
![Page 71: Graph Databases Overview and Applications - · PDF fileGraph Databases Overview and Applications By Rodger Lepinsky University of Winnipeg ... • Tools: Neo4J, Infinite Graph, OrientDB](https://reader030.fdocuments.net/reader030/viewer/2022020213/5a71697d7f8b9aa7538cd2bf/html5/thumbnails/71.jpg)
Bernie Madoff Corporate Network
Copyright Rodger Lepinsky
2014
http://twinkle_toes_engineering.home.comcast.net/~twinkle_toes_engineering/ponzi_madoff.htm 71
Diagram of
companies feeding
money to Bernie
Madoff
![Page 72: Graph Databases Overview and Applications - · PDF fileGraph Databases Overview and Applications By Rodger Lepinsky University of Winnipeg ... • Tools: Neo4J, Infinite Graph, OrientDB](https://reader030.fdocuments.net/reader030/viewer/2022020213/5a71697d7f8b9aa7538cd2bf/html5/thumbnails/72.jpg)
Shortest Paths - Trains
• “In 2007, a colleague and I used Java with Oracle 9i to implement
Dijkstra’s Algorithm. Our “MapQuest for Trains” application would
route a rail train over various right-of-ways while minimizing cost.
The cost was a function of distance, fuel surcharge, and obstacles.
The task to route a train from Los Angeles to Chicago had a
grotesquely long response time. Nobody wanted their applications
deployed on our nodes because we spiked the servers!”
• Solved by using a Neo4J graph database as the
underlying storage
Copyright Rodger Lepinsky
2014
http://keyholesoftware.com/2013/01/28/mapping-shortest-routes-using-a-graph-database/
http://myjavaneo4j.herokuapp.com/
72
![Page 73: Graph Databases Overview and Applications - · PDF fileGraph Databases Overview and Applications By Rodger Lepinsky University of Winnipeg ... • Tools: Neo4J, Infinite Graph, OrientDB](https://reader030.fdocuments.net/reader030/viewer/2022020213/5a71697d7f8b9aa7538cd2bf/html5/thumbnails/73.jpg)
Shortest Paths - Trains
Copyright Rodger Lepinsky
2014
http://keyholesoftware.com/2013/01/28/mapping-shortest-routes-using-a-graph-database/
http://myjavaneo4j.herokuapp.com/
73
![Page 74: Graph Databases Overview and Applications - · PDF fileGraph Databases Overview and Applications By Rodger Lepinsky University of Winnipeg ... • Tools: Neo4J, Infinite Graph, OrientDB](https://reader030.fdocuments.net/reader030/viewer/2022020213/5a71697d7f8b9aa7538cd2bf/html5/thumbnails/74.jpg)
Social Network Prediction Engine
• Problem: predict which blog posts a WordPress
user would ‘like’ based on prior user activity and
blog content
Copyright Rodger Lepinsky
2014
http://www.overkillanalytics.net/kaggles-wordpress-challenge-the-like-graph/
74
![Page 75: Graph Databases Overview and Applications - · PDF fileGraph Databases Overview and Applications By Rodger Lepinsky University of Winnipeg ... • Tools: Neo4J, Infinite Graph, OrientDB](https://reader030.fdocuments.net/reader030/viewer/2022020213/5a71697d7f8b9aa7538cd2bf/html5/thumbnails/75.jpg)
Social Network Prediction Engine
Copyright Rodger Lepinsky
2014
http://www.overkillanalytics.net/kaggles-wordpress-challenge-the-like-graph/
75
Results:
Nearly 50% of all
new likes are from
blogs one ‘edge’
from the user
A distance of 3
edges/likes
traversed –
encompasses 90%
of all new likes.
![Page 76: Graph Databases Overview and Applications - · PDF fileGraph Databases Overview and Applications By Rodger Lepinsky University of Winnipeg ... • Tools: Neo4J, Infinite Graph, OrientDB](https://reader030.fdocuments.net/reader030/viewer/2022020213/5a71697d7f8b9aa7538cd2bf/html5/thumbnails/76.jpg)
Graph DB on the Market
Copyright Rodger Lepinsky
2014
http://blog.octo.com/en/graph-databases-an-overview/
76
![Page 77: Graph Databases Overview and Applications - · PDF fileGraph Databases Overview and Applications By Rodger Lepinsky University of Winnipeg ... • Tools: Neo4J, Infinite Graph, OrientDB](https://reader030.fdocuments.net/reader030/viewer/2022020213/5a71697d7f8b9aa7538cd2bf/html5/thumbnails/77.jpg)
Visualization
Copyright Rodger Lepinsky
2014
http://www.yasiv.com/graphs
77
Andrei Kashcha:
Uses VivaGraphJS, google
app engine, U of F sparse
matrices
http://www.yasiv.com/graphs
![Page 78: Graph Databases Overview and Applications - · PDF fileGraph Databases Overview and Applications By Rodger Lepinsky University of Winnipeg ... • Tools: Neo4J, Infinite Graph, OrientDB](https://reader030.fdocuments.net/reader030/viewer/2022020213/5a71697d7f8b9aa7538cd2bf/html5/thumbnails/78.jpg)
Visualization
• Tools in the browser (Neo4J, Linkurious, D3,
Keylines)
• Gephi, on the desktop
• With Excel: nodexl.codeplex.com
• Nathan Yau, Flowing Data (more for R)
Copyright Rodger Lepinsky
2014
http://www.yasiv.com/graphs
78
![Page 79: Graph Databases Overview and Applications - · PDF fileGraph Databases Overview and Applications By Rodger Lepinsky University of Winnipeg ... • Tools: Neo4J, Infinite Graph, OrientDB](https://reader030.fdocuments.net/reader030/viewer/2022020213/5a71697d7f8b9aa7538cd2bf/html5/thumbnails/79.jpg)
Job Trends
Copyright Rodger Lepinsky
2014
http://www.indeed.com/jobtrends?q=java&l=New+York%2C+NY 79
Slowing demand in
Oracle
Java
From 3% to 2% in
about 7 years
![Page 80: Graph Databases Overview and Applications - · PDF fileGraph Databases Overview and Applications By Rodger Lepinsky University of Winnipeg ... • Tools: Neo4J, Infinite Graph, OrientDB](https://reader030.fdocuments.net/reader030/viewer/2022020213/5a71697d7f8b9aa7538cd2bf/html5/thumbnails/80.jpg)
Job Trends
Copyright Rodger Lepinsky
2014
http://www.indeed.com/jobtrends?q=%22Data+Science%22&l=New+York%2C+NY 80
Strong demand
growth in
“Data Science”
“Big Data”
“R statistics”
Although, fewer
jobs overall
![Page 81: Graph Databases Overview and Applications - · PDF fileGraph Databases Overview and Applications By Rodger Lepinsky University of Winnipeg ... • Tools: Neo4J, Infinite Graph, OrientDB](https://reader030.fdocuments.net/reader030/viewer/2022020213/5a71697d7f8b9aa7538cd2bf/html5/thumbnails/81.jpg)
Data Science
• New profession
• Expertise involves:
• Computer and software
• Math and Statistics
• Data (often Big Data)
• Subject Matter Domain Knowledge
• Find significant inferences, trends
• Add value to the organization
• Jingjing’s thesis
Copyright Rodger Lepinsky
2014
http://rodgersnotes.wordpress.com/2013/12/28/detecting-diseases-by-analyzing-the-pulse-waves-of-heartbeats/ 81
![Page 82: Graph Databases Overview and Applications - · PDF fileGraph Databases Overview and Applications By Rodger Lepinsky University of Winnipeg ... • Tools: Neo4J, Infinite Graph, OrientDB](https://reader030.fdocuments.net/reader030/viewer/2022020213/5a71697d7f8b9aa7538cd2bf/html5/thumbnails/82.jpg)
Canadian Universities
• Queen’s
• Master’s Degree in Management Analytics (Business)
• University of Toronto
• Certificate: Management of Enterprise Data Analytics
• York University, Toronto, Ontario
• Master of Science in Business Analytics
• University of Ottawa
• Master in Electronic Business Technologies
• Simon Fraser:
• Master's program in Big Data
Copyright Rodger Lepinsky
2014
http://www.informationweek.com/big-data/big-data-analytics/big-data-analytics-masters-degrees-20-top-
programs/d/d-id/1108042?page_number=19
http://www.kdnuggets.com/education/usa-canada.html 82
![Page 83: Graph Databases Overview and Applications - · PDF fileGraph Databases Overview and Applications By Rodger Lepinsky University of Winnipeg ... • Tools: Neo4J, Infinite Graph, OrientDB](https://reader030.fdocuments.net/reader030/viewer/2022020213/5a71697d7f8b9aa7538cd2bf/html5/thumbnails/83.jpg)
US Universities • Arizona State University
• Bentley University, Waltham, Mass.
• Carnegie Mellon University, Pittsburgh, Pa.
• Columbia University, New York, N.Y.
• DePaul University, Chicago, Ill.
• Drexel University, Philadelphia, Pa.
• Fordham University, New York, N.Y.
• Harvard University, Cambridge, Mass.
• Louisiana State University, Baton Rouge, La.
• Massachusetts Institute of Technology, Cambridge, Mass.
• New York University, New York, N.Y.
• North Carolina State University, Raleigh, N.C.
• Northwestern University, Evanston, Ill.
• Purdue University, Lafayette, Ind.
• Rutgers University, New Brunswick, N.J.
• University of San Francisco, San Francisco, Cal.
• Stanford University, Stanford, Calif.
• University of California at Berkeley, Berkeley, California
• University of Southern California, Los Angeles, California
• University of Cincinnati, Cincinnati, Ohio
• University of Connecticut, Graduate Learning Center, Hartford, Conn.
• University of Illinois, Champaign, Ill.
• University of Tennessee, Knoxville, Tenn.
Copyright Rodger Lepinsky
2014
http://www.informationweek.com/big-data/big-data-analytics/big-data-analytics-masters-degrees-20-top-
programs/d/d-id/1108042
83
![Page 84: Graph Databases Overview and Applications - · PDF fileGraph Databases Overview and Applications By Rodger Lepinsky University of Winnipeg ... • Tools: Neo4J, Infinite Graph, OrientDB](https://reader030.fdocuments.net/reader030/viewer/2022020213/5a71697d7f8b9aa7538cd2bf/html5/thumbnails/84.jpg)
US University Degrees • Master of Business Administration
• Master of Business Administration, Business Analytics
• Master of Business Administration, specialization In Business Analytics
• Master of Business And Science degree in Operations Research and Business Analytics
• Master of Engineering
• Master of Information and Data Science
• Master of Information Systems Management, Business Intelligence and Data Analytics.
• Master of Science (MS), Applied Urban Science and Informatics
• Master of Science In Analytics
• Master of Science in Business Analytics
• Master of Science in Business Analytics and Project Management
• Master of Science in Computer Science - Data Science
• Master of Science In Computer Science, Specialization in Information Management and Analytics
• Master of Science In Marketing Analytics
• Master of Science in Predictive Analytics
• Master of Science in Statistics: Analytics Concentration
• Masters of Science in Computational Science and Engineering
• Masters of Science in Computer Science, Machine Learning
Copyright Rodger Lepinsky
2014
http://www.informationweek.com/big-data/big-data-analytics/big-data-analytics-masters-degrees-20-top-
programs/d/d-id/1108042
84
![Page 85: Graph Databases Overview and Applications - · PDF fileGraph Databases Overview and Applications By Rodger Lepinsky University of Winnipeg ... • Tools: Neo4J, Infinite Graph, OrientDB](https://reader030.fdocuments.net/reader030/viewer/2022020213/5a71697d7f8b9aa7538cd2bf/html5/thumbnails/85.jpg)
MOOC
• Massive Open Online Courses
• Coursera.com
• MIT
• Code Academy
• Khan Academy
Copyright Rodger Lepinsky
2014
https://en.wikipedia.org/wiki/Massive_Open_Online_Course 85
![Page 86: Graph Databases Overview and Applications - · PDF fileGraph Databases Overview and Applications By Rodger Lepinsky University of Winnipeg ... • Tools: Neo4J, Infinite Graph, OrientDB](https://reader030.fdocuments.net/reader030/viewer/2022020213/5a71697d7f8b9aa7538cd2bf/html5/thumbnails/86.jpg)
Coursera
• Johns Hopkins: Data Science Specialization • The Data Scientist’s Toolbox
• R Programming
• Getting and Cleaning Data
• Exploratory Data Analysis
• Reproducible Research
• Statistical Inference
• Regression Models
• Practical Machine Learning
• Developing Data Products
• Capstone Project
Copyright Rodger Lepinsky
2014
https://www.coursera.org/specialization/jhudatascience/1?utm_medium=courseDescripTop 86
![Page 87: Graph Databases Overview and Applications - · PDF fileGraph Databases Overview and Applications By Rodger Lepinsky University of Winnipeg ... • Tools: Neo4J, Infinite Graph, OrientDB](https://reader030.fdocuments.net/reader030/viewer/2022020213/5a71697d7f8b9aa7538cd2bf/html5/thumbnails/87.jpg)
Coursera
• Core concepts in data analysis: getting started
• National Research University - Higher School of
Economics (HSE), Russia
• Duke University:
• Irrational Behavior – Dan Ariely
Copyright Rodger Lepinsky
2014
https://class.coursera.org/datan-001
https://www.coursera.org/course/behavioralecon
87
![Page 88: Graph Databases Overview and Applications - · PDF fileGraph Databases Overview and Applications By Rodger Lepinsky University of Winnipeg ... • Tools: Neo4J, Infinite Graph, OrientDB](https://reader030.fdocuments.net/reader030/viewer/2022020213/5a71697d7f8b9aa7538cd2bf/html5/thumbnails/88.jpg)
Questions
Copyright Rodger Lepinsky
2014
88