MySQL to Neo4j: A DBA Perspective - David Stern @ GraphConnect NY 2013
Discovering Emerging Tech through Graph Analysis - Henry Hwangbo @ GraphConnect Chicago 2013
-
Upload
neo4j-the-open-source-graph-database -
Category
Technology
-
view
1.949 -
download
0
description
Transcript of Discovering Emerging Tech through Graph Analysis - Henry Hwangbo @ GraphConnect Chicago 2013
![Page 1: Discovering Emerging Tech through Graph Analysis - Henry Hwangbo @ GraphConnect Chicago 2013](https://reader033.fdocuments.net/reader033/viewer/2022051609/547cb030b4af9fbe158b5324/html5/thumbnails/1.jpg)
Discovering Emerging Technology Through Graph Analysis
GraphConnect | Chicago June 2013
![Page 2: Discovering Emerging Tech through Graph Analysis - Henry Hwangbo @ GraphConnect Chicago 2013](https://reader033.fdocuments.net/reader033/viewer/2022051609/547cb030b4af9fbe158b5324/html5/thumbnails/2.jpg)
About Me
[email protected] || [email protected]
@henry74
henry74
Founder / Director of PwC's Emerging Tech Lab
![Page 3: Discovering Emerging Tech through Graph Analysis - Henry Hwangbo @ GraphConnect Chicago 2013](https://reader033.fdocuments.net/reader033/viewer/2022051609/547cb030b4af9fbe158b5324/html5/thumbnails/3.jpg)
What is the Emerging Tech Lab?
We build stuff to help people get smart about applying technology to solve problems
● Founded 3 years ago to identify and experiment with new technologies relevant to but not widely adopted by the Enterprise
● Focuses on rapid prototyping & MVP build-outs for both tactical internal projects and more creative, exploratory ideas
● Permanent core team, but operates a rotational program for staff to provide them an opportunity for hands-on technical experience, learning agile & lean principles, and exposure to a startup-like environment
![Page 4: Discovering Emerging Tech through Graph Analysis - Henry Hwangbo @ GraphConnect Chicago 2013](https://reader033.fdocuments.net/reader033/viewer/2022051609/547cb030b4af9fbe158b5324/html5/thumbnails/4.jpg)
The Challenge
![Page 5: Discovering Emerging Tech through Graph Analysis - Henry Hwangbo @ GraphConnect Chicago 2013](https://reader033.fdocuments.net/reader033/viewer/2022051609/547cb030b4af9fbe158b5324/html5/thumbnails/5.jpg)
It usually starts with an idea…
“Build a platform to help discover emerging technologies.”
![Page 6: Discovering Emerging Tech through Graph Analysis - Henry Hwangbo @ GraphConnect Chicago 2013](https://reader033.fdocuments.net/reader033/viewer/2022051609/547cb030b4af9fbe158b5324/html5/thumbnails/6.jpg)
…followed by some pretty mock-ups…
…to raise expectations.
![Page 7: Discovering Emerging Tech through Graph Analysis - Henry Hwangbo @ GraphConnect Chicago 2013](https://reader033.fdocuments.net/reader033/viewer/2022051609/547cb030b4af9fbe158b5324/html5/thumbnails/7.jpg)
Envisioning success
● What are some emerging technologies?
● How are they being used to solve real problems?
● Who is talking about them?● Who are the players?● Are there related technologies?
● Get up to speed quickly ● Discover related topics ● Understand what is trending● Find interesting applications● See what's possible
![Page 8: Discovering Emerging Tech through Graph Analysis - Henry Hwangbo @ GraphConnect Chicago 2013](https://reader033.fdocuments.net/reader033/viewer/2022051609/547cb030b4af9fbe158b5324/html5/thumbnails/8.jpg)
What makes technology “emerging”?
● Cannot already be mainstream technology
● Needs to be more than a single event to be an emerging trend
● Must be growing in popularity, but not yet popular
● "Technology" could be a thing (e.g. nanotubes), but also an
aggregation or application of technologies (e.g. cloud
computing, quantified self)
![Page 9: Discovering Emerging Tech through Graph Analysis - Henry Hwangbo @ GraphConnect Chicago 2013](https://reader033.fdocuments.net/reader033/viewer/2022051609/547cb030b4af9fbe158b5324/html5/thumbnails/9.jpg)
The Journey
![Page 10: Discovering Emerging Tech through Graph Analysis - Henry Hwangbo @ GraphConnect Chicago 2013](https://reader033.fdocuments.net/reader033/viewer/2022051609/547cb030b4af9fbe158b5324/html5/thumbnails/10.jpg)
Initial design
Data Feeds (RSS)
Pull & Store Raw
Data
MongoDB
Analyze VisualizeSource
?Postgres
![Page 11: Discovering Emerging Tech through Graph Analysis - Henry Hwangbo @ GraphConnect Chicago 2013](https://reader033.fdocuments.net/reader033/viewer/2022051609/547cb030b4af9fbe158b5324/html5/thumbnails/11.jpg)
Breaking ground
● Natural Language Processing
● Named Entity Recognition
● ???
● ???
● ???
● ???
● ???
Extract Text
Understand Text
Discover Insights
![Page 12: Discovering Emerging Tech through Graph Analysis - Henry Hwangbo @ GraphConnect Chicago 2013](https://reader033.fdocuments.net/reader033/viewer/2022051609/547cb030b4af9fbe158b5324/html5/thumbnails/12.jpg)
A bit more clarity
Data Feeds (RSS)
Pull & Store Raw
Data
MongoDB
Analyze VisualizeSource
?
3rd Party APIs
Tag & Update
Postgres
![Page 13: Discovering Emerging Tech through Graph Analysis - Henry Hwangbo @ GraphConnect Chicago 2013](https://reader033.fdocuments.net/reader033/viewer/2022051609/547cb030b4af9fbe158b5324/html5/thumbnails/13.jpg)
Digging a little deeper
● Natural Language Processing
● Named Entity Recognition
● Collocation?
● K-means clustering?
● Information Ontology?
● ???
● ???
Extract Text
Understand Text
Discover Insights
![Page 14: Discovering Emerging Tech through Graph Analysis - Henry Hwangbo @ GraphConnect Chicago 2013](https://reader033.fdocuments.net/reader033/viewer/2022051609/547cb030b4af9fbe158b5324/html5/thumbnails/14.jpg)
The Eureka moment...
…took a bit longer than it should have
Graphs are everywhere
![Page 15: Discovering Emerging Tech through Graph Analysis - Henry Hwangbo @ GraphConnect Chicago 2013](https://reader033.fdocuments.net/reader033/viewer/2022051609/547cb030b4af9fbe158b5324/html5/thumbnails/15.jpg)
Final design
Data Feeds (RSS)
Pull & Store Raw
Data
MongoDB
Analyze VisualizeSource
3rd Party API
Tag & Update
Neo4j Postgres
![Page 16: Discovering Emerging Tech through Graph Analysis - Henry Hwangbo @ GraphConnect Chicago 2013](https://reader033.fdocuments.net/reader033/viewer/2022051609/547cb030b4af9fbe158b5324/html5/thumbnails/16.jpg)
![Page 17: Discovering Emerging Tech through Graph Analysis - Henry Hwangbo @ GraphConnect Chicago 2013](https://reader033.fdocuments.net/reader033/viewer/2022051609/547cb030b4af9fbe158b5324/html5/thumbnails/17.jpg)
Lesson #1 - Graph data modeling is iterative
What should be a node, relationship, or a property? Depends on:● What will you search on? ● How do you start your searches?● How much data do you expect to have? What data?
Expect to change your graph based on:● Experimentation● Query syntax available to extract and aggregate graph data ● Query performance
TIP: Plan to reload your graph many times - save the raw data, start small,
use batch loading until you get it right
…but more flexible than traditional data modeling
![Page 18: Discovering Emerging Tech through Graph Analysis - Henry Hwangbo @ GraphConnect Chicago 2013](https://reader033.fdocuments.net/reader033/viewer/2022051609/547cb030b4af9fbe158b5324/html5/thumbnails/18.jpg)
Modeling the data
DOC
P
P
C
K
K
C
T
C
DOC
P
P
C
K
K
O
T
Document are described by its entities, concepts, and keywords through relationships
This means:
● Document are related to other documents through shared entities, concepts, and keywords
● Concepts and entities are related to each other through shared documents
● Incoming relationships measures # of referring documents
Simple, yet powerful
TAGGED_AS
RELATES_TO
REFERS_TO
CONTAINS
REFERS_TO
![Page 19: Discovering Emerging Tech through Graph Analysis - Henry Hwangbo @ GraphConnect Chicago 2013](https://reader033.fdocuments.net/reader033/viewer/2022051609/547cb030b4af9fbe158b5324/html5/thumbnails/19.jpg)
Lesson #2 - Connections are important
Highly connected data creates richer graphs and increases potential for discovering greater insights
BUT unnecessary connections can create noise & extra work
Don't create artificial connections, but clean up data before importing when it makes sense (e.g. networking, networks, network)
Prevent duplication which can impact your insights based on aggregation (e.g. # of relationships) or certain patterns
![Page 20: Discovering Emerging Tech through Graph Analysis - Henry Hwangbo @ GraphConnect Chicago 2013](https://reader033.fdocuments.net/reader033/viewer/2022051609/547cb030b4af9fbe158b5324/html5/thumbnails/20.jpg)
Keeping it clean
Techniques Graph Benefits
Text extraction with readability scoring
● Better named entity extraction● Improve neighbor relevance● Minimize invalid nodes & relationships
Similarity Hashing ● Improve validity of relationships● Increase graph connectedness
Porter Stemming ● Improve graph connectedness
![Page 21: Discovering Emerging Tech through Graph Analysis - Henry Hwangbo @ GraphConnect Chicago 2013](https://reader033.fdocuments.net/reader033/viewer/2022051609/547cb030b4af9fbe158b5324/html5/thumbnails/21.jpg)
Lesson #3 - Understand Cypher
● Cypher experimentation opens up the possible● SQL users will be at home - tabular results, similar
syntax● Start without parameters, check with Neo4j shell,
move to parameterized queries for security & performance (caching)
● Don't forget Lucene syntax● Continues to evolve for the better - check new release
changes (http://docs.neo4j.org/refcard/1.9/)
● Let Cypher do the work
![Page 22: Discovering Emerging Tech through Graph Analysis - Henry Hwangbo @ GraphConnect Chicago 2013](https://reader033.fdocuments.net/reader033/viewer/2022051609/547cb030b4af9fbe158b5324/html5/thumbnails/22.jpg)
Useful Cypher Syntax
START with an indexMATCH defines your universeWHERE filters it downWITH combines multiple statementsHAS checks if a property existsAS lets you name your return valuesIN checks against an arrayCOLLECT aggregates into an arrayORDER just like SQLLIMIT for performance
![Page 23: Discovering Emerging Tech through Graph Analysis - Henry Hwangbo @ GraphConnect Chicago 2013](https://reader033.fdocuments.net/reader033/viewer/2022051609/547cb030b4af9fbe158b5324/html5/thumbnails/23.jpg)
Prototype highlights
● 4 people & 4 months (first version)● Data Stores - Neo4J, MongoDB, Redis, Postgres● Visuals - D3.js, Vivagraph.js, Twitter Bootstrap● Key Languages/Libraries - Ruby, Rails, Cypher,
Knockout.js, Amplify.js, HTML5, CSS3, jQuery, Neography gem, Resque gem
● 3rd Party - Alchemy, OpenCalais, RSS feeds, Wikipedia
● Concepts - natural language processing, named entity extraction, text cleansing & de-duplication (map/reduce), similarity hashing, large-scale information retrieval
● 1M+ nodes, 3M+ relationships, 6M+ properties after 6 months
![Page 24: Discovering Emerging Tech through Graph Analysis - Henry Hwangbo @ GraphConnect Chicago 2013](https://reader033.fdocuments.net/reader033/viewer/2022051609/547cb030b4af9fbe158b5324/html5/thumbnails/24.jpg)
Emerging Tech Radar Demo
![Page 25: Discovering Emerging Tech through Graph Analysis - Henry Hwangbo @ GraphConnect Chicago 2013](https://reader033.fdocuments.net/reader033/viewer/2022051609/547cb030b4af9fbe158b5324/html5/thumbnails/25.jpg)
Tag Cloud / Search
DOC C
K
K
C
DOC
C
K
K
DOC
DOC
DOC
DOC
● Index keywords and search across keywords (tip: use Lucene syntax)● Identify documents with strong relationships to keywords● Locate concepts with strongest relationships to relevant documents● Popularity based on number of incoming relationships
![Page 26: Discovering Emerging Tech through Graph Analysis - Henry Hwangbo @ GraphConnect Chicago 2013](https://reader033.fdocuments.net/reader033/viewer/2022051609/547cb030b4af9fbe158b5324/html5/thumbnails/26.jpg)
Emerging Index / Popularity / Doc List
DOC CDOC
(E) OC
DOC(NE)
DOC(E)
DOC(E)
DOC(NE)
DOC(E)
DOC(NE)
DOC(E)
Cloud computing (Concept) and Google (Org)
● Strong relationships with documents shared between concepts to filter and rank relevant content
● Ratio and strength of relationships to quantify emerging index● Popularity based on number of incoming relationships of each type of
document (emerging versus non-emerging)
![Page 27: Discovering Emerging Tech through Graph Analysis - Henry Hwangbo @ GraphConnect Chicago 2013](https://reader033.fdocuments.net/reader033/viewer/2022051609/547cb030b4af9fbe158b5324/html5/thumbnails/27.jpg)
Node Graph
DOC CK DOC OC
DOC
DOC
DOC
DOC DOC
DOC● Existing relationships with documents shared between concepts to
filter relevant neighbors● Strength of relationships based on # and weight for ranking relevance
(color)
C
![Page 28: Discovering Emerging Tech through Graph Analysis - Henry Hwangbo @ GraphConnect Chicago 2013](https://reader033.fdocuments.net/reader033/viewer/2022051609/547cb030b4af9fbe158b5324/html5/thumbnails/28.jpg)
The Takeaway
![Page 29: Discovering Emerging Tech through Graph Analysis - Henry Hwangbo @ GraphConnect Chicago 2013](https://reader033.fdocuments.net/reader033/viewer/2022051609/547cb030b4af9fbe158b5324/html5/thumbnails/29.jpg)
Final Thoughts
● Graphs makes it simple to generate complex insights - you don't need to be a data scientist
● Graphs are a natural fit for anything connected...which is most things (e.g. social media, internet of things, sensor data)
● Experimentation is the best way to learn the power of graphs
● Make graph databases a first class citizen in your technology toolkit - many things can be solved better with a graph
The best way to discover emerging technologies is to try them out
![Page 30: Discovering Emerging Tech through Graph Analysis - Henry Hwangbo @ GraphConnect Chicago 2013](https://reader033.fdocuments.net/reader033/viewer/2022051609/547cb030b4af9fbe158b5324/html5/thumbnails/30.jpg)
Thanks for Listening - Q & A
Special thanks to Max De Marzi for his neography gem (https://github.com/maxdemarzi/neography) and ongoing advice, suggestions, troubleshooting