Post on 16-Jul-2020
Elastic Stack in A Day Milano – 16 Giugno 2016
RICERCA E GRAPH ANALYTICS
• IN THIS SESSION I’LL EXPLAIN WHY …
RELATIONSHIPS MATTERS
DO YOU MEAN THESE ONES?DO YOU MEAN THESE ONES?
NO! RELATIONSHIP IN YOUR DATA!
RELATIONSHIPS IN YOUR DATA HOLD
TREMENDOUS VALUE
‣ INTRO
‣ RELATIONSHIPS MATTER
‣ GRAPH DATABASE
‣ ELASTIC GRAPH IS NOT A GRAPH DATABASE
‣ POLYGLOT PERSISTENCE
‣ GRAPH BASED SEARCH
• AGENDA
‣ INTRO
‣ RELATIONSHIPS MATTER
‣ GRAPH DATABASE
‣ ELASTIC GRAPH IS NOT A GRAPH DATABASE
‣ POLYGLOT PERSISTENCE
‣ GRAPH BASED SEARCH
• AGENDA
‣ LORENZO SPERANZONI
‣ lorenzo.speranzoni@larus-ba.it @inserpio inserpio.wordpress.com it.linkedin.com/in/lorenzosperanzoni
‣ Founder, CEO @ LARUS Business Automation
‣ I like studying, learning, talking, sharing, teaching innovative open source technologies, especially graph databases
‣ I love cycling, art and nature
• WHO AM I
‣ Founded in 2004
‣ Strongly rooted in Venice Area
‣ Strong attention to technological and methodological innovation
‣ Mission: bridging the “historical” gap between business and IT
‣ Our specialities:
‣ custom software design and development
‣ consulting and coaching on new technologies
‣ training on agile and lean methodologies
‣ Customer acknowledgements:
‣ reliability, competence, enthusiasm
• LARUS BUSINESS AUTOMATION
www.larus-ba.it
• LARUS BUSINESS AUTOMATION
THE ITALIAN’S #1 NEO4J CONSULTANCY
• LARUS BUSINESS AUTOMATION
• LARUS BUSINESS AUTOMATION
2011
First Spikes
2013
LARUS’ TRIP WITH NEO4J
2016
Neo4j Contrib
2015
• WE ARE THE CREATORS OF THE NEO4J JDBC DRIVER 3.0
github.com/neo4j-contrib/neo4j-jdbc
• AND WORKING ON A NEO4J - COUCHBASE CONNECTOR
github.com/larusba/neo4j-couchbase-connector
‣ INTRO
‣ RELATIONSHIPS MATTER
‣ GRAPH DATABASE
‣ ELASTIC GRAPH IS NOT A GRAPH DATABASE
‣ POLYGLOT PERSISTENCE
‣ GRAPH BASED SEARCH
• AGENDA
“Big data grows bigger every year, but today’s enterprise leaders don’t only need to manage larger volumes of data, but they critically need to generate insight from their existing data.”
JIM WEBBER & IAN ROBINSON http://neo4j.com/blog/competitive-advantage-graph-databases-enterprise/
• ABOUT BIG DATA
“Without a doubt, the ability to connect the dots is rare, prized and valuable. Connecting dots, solving the problem that hasn't been solved before, seeing the pattern before it is made obvious, is more essential than ever before.”
SETH GODIN sethgodin.typepad.com/seths_blog/2014/04/connecting-dots-or-collecting-dots.html
• CONNECTING THE DOTS
“To paraphrase Seth Godin, businesses need to stop merely collecting data points, and start connecting them. In other words, the relationships between data points matter almost more than the individual points themselves.”
JIM WEBBER & IAN ROBINSON http://neo4j.com/blog/competitive-advantage-graph-databases-enterprise/
• CONNECTING THE DOTS
• WHAT ABOUT RELATIONAL DATABASES?
‣ Cannot model or store relationships without complexity
‣ Performance degrades with number and levels of relationships, and database size
‣ Query complexity grows with need for JOINs ‣ Adding new types of data and relationships
requires schema redesign, increasing time to market
• RELATIONAL DATABASES CAN’T HANDLE RELATIONSHIPS WELL
… making traditional databases inappropriate when data relationships are valuable in real-time
• RELATIONAL DATABASES CAN’T HANDLE RELATIONSHIPS WELL
• WHAT ABOUT NO-SQL DATABASES?
• NO-SQL: COLUMN-ORIENTED DATABASES
• NO-SQL: KEY-VALUE STORE
• NO-SQL: DOCUMENT ORIENTED DATABASES
• AGGREGATED NO-SQL DON’T HANDLE RELATIONSHIPS WELL
“There is a significant downside - the whole approach works really well when data access is aligned with the aggregates, but what if you want to look at the data in a different way? Order entry naturally stores orders as aggregates, but analyzing product sales cuts across the aggregate structure. The advantage of not using an aggregate structure in the database is that it allows you to slice and dice your data different ways for different audiences.
This is why aggregate-oriented stores talk so much about map-reduce”
MARTIN FOWLER http://martinfowler.com/bliki/AggregateOrientedDatabase.html
‣ No data structure to model and store relationships
‣ No query construct to support data relationships
‣ Relating data requires JOIN logic in the application
‣ No ACID support for transactions
• AGGREGATED NO-SQL DON’T HANDLE RELATIONSHIPS WELL
… making them inappropriate when data relationships are valuable in real-time
‣ INTRO
‣ RELATIONSHIPS MATTER
‣ GRAPH DATABASE
‣ ELASTIC GRAPH IS NOT A GRAPH DATABASE
‣ POLYGLOT PERSISTENCE
‣ GRAPH BASED SEARCH
• AGENDA
• WHAT ABOUT GRAPH DATABASES?
LEONHARD EULER
‣ 1703 - 1783
‣ Swiss Mathematician
‣ Inventor of Graph Theory
• SOME HISTORY …
The Seven Bridges of Königsberg is a historically notable problem in mathematics. Its negative resolution by Leonhard Euler in 1735 laid the foundations of graph theory and prefigured the idea of topology.
http://en.wikipedia.org/wiki/Seven_Bridges_of_Königsberg
• THE RISE OF GRAPH DATABASES
“Graph analysis is possibly the single most effective competitive differentiator for organizations pursuing data-driven operations and decisions after the design of data capture.”
“Forrester estimates that over 25% of enterprises will be using graph databases by 2017”
• AND EVERYBODY IS BECOMING VERY SENSITIVE INDEED …
• NEO4J
http://db-engines.com/en/ranking/graph+dbms
THE WORLD’S LEADING GRAPH DATABASE
• NEO4J HAS A GRAPH NATIVE STORAGE
Do one thing and do it well UNIX PHILOSOPHY
• GRAPH DATABASES ARE DIFFERENT
Other NoSQL Relational DBMS Neo4j Graph DB
Discrete Data Minimally
connected data
Connected Data Focused on
Data Relationships
Neo4j is designed for data relaionships
• GRAPH DATABASES ARE DIFFERENT
Relational Model Graph Model
KNOWS
KNOWS
KNOWS
ANDREAS
TOBIAS
MICA
DELIA
Person FriendPerson-Friend
ANDREASDELIA
TOBIAS
MICA
• PROPERTY GRAPH MODEL
Nodes:
‣ The objects in the graph
‣ Can have name-value properties
‣ Can be labeled
Relationships:
‣ Relate nodes by type and direction
‣ Can have name-value properties CAR
DRIVES
name: “Dan” born: May 29, 1970
twitter: “@dan”name: “Ann”
born: Dec 5, 1975
since: Jan 10, 2011
brand: “Volvo” model: “V70”
LOVES
LOVES
LIVES WITH
OWNS
PERSON PERSON
• CYPHER QUERY LANGUAGE
MATCH ( :Person { name: “Dan” } ) -[ :LOVES ]-> ( :Person { name: “Ann” } )
LOVESDan Ann
LABEL PROPERTY
NODE NODE
LABEL PROPERTY
• NATIVE GRAPH PROCESSING
Connectedness and Size of Data Set
Resp
onse
Tim
e
0 to 2 hops 0 to 3 degrees Thousands of connections
Tens to hundreds of hopsThousands of degrees Billions of connections
Relational and Other NoSQLDatabases
• GRAPHS ARE EVERYWHERE
NETWORK IMPACT ANALYSIS
• GRAPHS ARE EVERYWHERE
FRAUD DETECTION
• GRAPHS ARE EVERYWHERE
REAL-TIME RECOMMENDATION
• GRAPHS ARE EVERYWHERE
IDENTITY AND ACCESS MANAGEMENT
• GRAPHS ARE EVERYWHERE
MASTER DATA MANAGEMENT
• GRAPHS ARE EVERYWHERE
ROUTE FINDING
• GRAPHS ARE EVERYWHERE
LOGISTICS
• GRAPHS ARE EVERYWHERE
GRAPH BASED SEARCH
‣ Centrality analysis: To identify the most central entities in your network, a very useful capability for influencer marketing
‣ Path analysis: To identify all the connections between a pair of entities, useful in understanding risks and exposure
‣ Community detection: To identify clusters or communities, which is of great importance to understanding issues in sociology and biology.
‣ Sub-graph isomorphism: To search for a pattern of relationships, useful for validating hypotheses and searching for abnormal situations, such as hacker attacks.
‣ Connectivity analysis: This type of graph analysis can be applied for determining weaknesses in networks such as a utility power grid. It also enables comparing connectivity across networks.
• GRAPH ANALYTICS
‣ In order to leverage data relationships, organizations needs a database technology that stores relationship information as a first-class entity:
THAT TECHNOLOGY IS A GRAPH DATABASE
• GRAPH DATABASES HANDLE RELATIONSHIPS
‣ INTRO
‣ RELATIONSHIPS MATTER
‣ GRAPH DATABASE
‣ ELASTIC GRAPH IS NOT A GRAPH DATABASE
‣ POLYGLOT PERSISTENCE
‣ GRAPH BASED SEARCH
• AGENDA
• ELASTIC GRAPH IS NOT A GRAPH DATABASE
• ELASTIC GRAPH IS NOT A GRAPH DATABASE
ELASTIC SEARCH NEO4J
Description A modern search and analytics engine based on Apache Lucene
Open source graph database
Database Model Search Engine Graph DBMS
Data Scheme Schema-Free Schema-Free
APIS and other Access Methods Java API, RESTful HTTP/JSON API Cypher Query Language Java API, RESTful HTTP API
Implementation Language Java Java
Partitioning Methods Sharding None
Replication Methods Yes Master - Slave
Transaction Concepts No ACID
http://db-engines.com/en/system/Elasticsearch;Neo4j
• ELASTIC GRAPH
‣ Elastic Graph provides a way to discover how items in an Elasticsearch index are related
‣ You can explore the connections between indexed terms and see which connections are the most meaningful
‣ What’s more, Elastic Graph is able to identify the most meaningful connections by taking advantage of Elasticsearch relevance scoring.
• ELASTIC GRAPH
‣ It works out of the box with existing Elasticsearch indices and doesn’t require you to store any additional data:
1) The vertices are simply the terms that you’ve already indexed
2) The connections are derived on the fly using Elasticsearch aggregations
‣ Graph consists of two components: an Elasticsearch plugin that provides a simple, yet powerful graph exploration API, and a Kibana plugin that provides an interactive Graph visualization tool.
seacom.it/la-campagna-elettorale-di-donald-trump-analizzata-con-graph-graph/
• LEARN MORE ABOUT ELASTIC GRAPH
• ELASTIC GRAPH
‣ Elasticsearch doesn't have the capability of graph databases to answer precise graph questions - involving for example paths and conditions along the paths and returning specific "instances"
E.g. It cannot answer this "how many friends of Lorenzo (friends index) that have been to Paris (trip index) and that have talked more than 3 times in the last weeks in forums mentioning (message index) the word Impressionism"
‣ INTRO
‣ RELATIONSHIPS MATTER
‣ GRAPH DATABASE
‣ ELASTIC GRAPH IS NOT A GRAPH DATABASE
‣ POLYGLOT PERSISTENCE
‣ GRAPH BASED SEARCH
• AGENDA
• POLYGLOT PERSISTENCE
“I'm confident to say that if you're starting a new strategic enterprise application you should no longer be assuming that your persistence should be relational. The relational option might be the right one - but you should seriously look at other alternatives. One of the interesting consequences of this is that we are gearing up for a shift to polyglot persistence where any decent sized enterprise will have a variety of different data storage technologies for different kinds of data.”
MARTIN FOWLER http://martinfowler.com/bliki/PolyglotPersistence.html
This is still billions of nodes and relationships( )
• SELECTION CRITERIA FOR NO-SQL DATABASES
~90% threshold
‣ INTRO
‣ RELATIONSHIPS MATTER
‣ GRAPH DATABASE
‣ ELASTIC GRAPH IS NOT A GRAPH DATABASE
‣ POLYGLOT PERSISTENCE
‣ GRAPH BASED SEARCH
• AGENDA
‣ In early days, basic ways to access data was based on ‘listing’ or ‘keyword’ search, where you type in a word or phrase and get back a list of all the items that include said keyword.
‣ This method is essentially plain pattern recognition: a cumbersome process of repeatedly redefining your search terms until you finally hit on something of interest.
• GRAPH BASED SEARCH
‣ Knowledge Graph is a database that enhances Google’s search engine results with semantic information gathered from a wide variety of sources
• GRAPH BASED SEARCH
‣ Facebook’s Graph Search enables users to combine search phrases to get more structured search results, rather than simply using standard keywords and getting the results that match those words
• GRAPH BASED SEARCH
‣ As graph systems understand the way data is related, they return much more precise and richer results:
the key to their enhanced search capability is that the first search takes into account the entire structure of connected data available.
• GRAPH BASED SEARCH
‣ Graph-based search is able to deliver relevant information that you may not have specifically asked for – offering a more proactive and targeted search experience that allows you to quickly triangulate onto the data points that are of the most interest to you.
‣ In essence, graph-based search is intelligent: you can ask much more precise and useful questions and get back the most relevant and meaningful information.
• GRAPH BASED SEARCH RETURNS MORE PRECISE RESULTS
• A DEMO?
ELASTIC SEARCH - NEO4J INTEGRATION
‣ Each database handles its chore best, e.g. data volume or complexity and their combination provides the best of both worlds for application development productivity and efficient large-scale data management.
• A DEMO?
Q&A