RICERCA E GRAPH ANALYTICS - Seacom · 2018-10-24 · ‣ Community detection: To identify clusters...

Post on 16-Jul-2020

1 views 0 download

Transcript of RICERCA E GRAPH ANALYTICS - Seacom · 2018-10-24 · ‣ Community detection: To identify clusters...

Elastic Stack in A Day Milano – 16 Giugno 2016

RICERCA E GRAPH ANALYTICS

• IN THIS SESSION I’LL EXPLAIN WHY …

RELATIONSHIPS MATTERS

DO YOU MEAN THESE ONES?DO YOU MEAN THESE ONES?

NO! RELATIONSHIP IN YOUR DATA!

RELATIONSHIPS IN YOUR DATA HOLD

TREMENDOUS VALUE

‣ INTRO

‣ RELATIONSHIPS MATTER

‣ GRAPH DATABASE

‣ ELASTIC GRAPH IS NOT A GRAPH DATABASE

‣ POLYGLOT PERSISTENCE

‣ GRAPH BASED SEARCH

• AGENDA

‣ INTRO

‣ RELATIONSHIPS MATTER

‣ GRAPH DATABASE

‣ ELASTIC GRAPH IS NOT A GRAPH DATABASE

‣ POLYGLOT PERSISTENCE

‣ GRAPH BASED SEARCH

• AGENDA

‣ LORENZO SPERANZONI

‣ lorenzo.speranzoni@larus-ba.it @inserpio inserpio.wordpress.com it.linkedin.com/in/lorenzosperanzoni

‣ Founder, CEO @ LARUS Business Automation

‣ I like studying, learning, talking, sharing, teaching innovative open source technologies, especially graph databases

‣ I love cycling, art and nature

• WHO AM I

‣ Founded in 2004

‣ Strongly rooted in Venice Area

‣ Strong attention to technological and methodological innovation

‣ Mission: bridging the “historical” gap between business and IT

‣ Our specialities:

‣ custom software design and development

‣ consulting and coaching on new technologies

‣ training on agile and lean methodologies

‣ Customer acknowledgements:

‣ reliability, competence, enthusiasm

• LARUS BUSINESS AUTOMATION

www.larus-ba.it

• LARUS BUSINESS AUTOMATION

THE ITALIAN’S #1 NEO4J CONSULTANCY

• LARUS BUSINESS AUTOMATION

• LARUS BUSINESS AUTOMATION

2011

First Spikes

2013

LARUS’ TRIP WITH NEO4J

2016

Neo4j Contrib

2015

• WE ARE THE CREATORS OF THE NEO4J JDBC DRIVER 3.0

github.com/neo4j-contrib/neo4j-jdbc

• AND WORKING ON A NEO4J - COUCHBASE CONNECTOR

github.com/larusba/neo4j-couchbase-connector

‣ INTRO

‣ RELATIONSHIPS MATTER

‣ GRAPH DATABASE

‣ ELASTIC GRAPH IS NOT A GRAPH DATABASE

‣ POLYGLOT PERSISTENCE

‣ GRAPH BASED SEARCH

• AGENDA

“Big data grows bigger every year, but today’s enterprise leaders don’t only need to manage larger volumes of data, but they critically need to generate insight from their existing data.”

JIM WEBBER & IAN ROBINSON http://neo4j.com/blog/competitive-advantage-graph-databases-enterprise/

• ABOUT BIG DATA

“Without a doubt, the ability to connect the dots is rare, prized and valuable. Connecting dots, solving the problem that hasn't been solved before, seeing the pattern before it is made obvious, is more essential than ever before.”

SETH GODIN sethgodin.typepad.com/seths_blog/2014/04/connecting-dots-or-collecting-dots.html

• CONNECTING THE DOTS

“To paraphrase Seth Godin, businesses need to stop merely collecting data points, and start connecting them. In other words, the relationships between data points matter almost more than the individual points themselves.”

JIM WEBBER & IAN ROBINSON http://neo4j.com/blog/competitive-advantage-graph-databases-enterprise/

• CONNECTING THE DOTS

• WHAT ABOUT RELATIONAL DATABASES?

‣ Cannot model or store relationships without complexity

‣ Performance degrades with number and levels of relationships, and database size

‣ Query complexity grows with need for JOINs ‣ Adding new types of data and relationships

requires schema redesign, increasing time to market

• RELATIONAL DATABASES CAN’T HANDLE RELATIONSHIPS WELL

… making traditional databases inappropriate when data relationships are valuable in real-time

• RELATIONAL DATABASES CAN’T HANDLE RELATIONSHIPS WELL

• WHAT ABOUT NO-SQL DATABASES?

• NO-SQL: COLUMN-ORIENTED DATABASES

• NO-SQL: KEY-VALUE STORE

• NO-SQL: DOCUMENT ORIENTED DATABASES

• AGGREGATED NO-SQL DON’T HANDLE RELATIONSHIPS WELL

“There is a significant downside - the whole approach works really well when data access is aligned with the aggregates, but what if you want to look at the data in a different way? Order entry naturally stores orders as aggregates, but analyzing product sales cuts across the aggregate structure. The advantage of not using an aggregate structure in the database is that it allows you to slice and dice your data different ways for different audiences.

This is why aggregate-oriented stores talk so much about map-reduce”

MARTIN FOWLER http://martinfowler.com/bliki/AggregateOrientedDatabase.html

‣ No data structure to model and store relationships

‣ No query construct to support data relationships

‣ Relating data requires JOIN logic in the application

‣ No ACID support for transactions

• AGGREGATED NO-SQL DON’T HANDLE RELATIONSHIPS WELL

… making them inappropriate when data relationships are valuable in real-time

‣ INTRO

‣ RELATIONSHIPS MATTER

‣ GRAPH DATABASE

‣ ELASTIC GRAPH IS NOT A GRAPH DATABASE

‣ POLYGLOT PERSISTENCE

‣ GRAPH BASED SEARCH

• AGENDA

• WHAT ABOUT GRAPH DATABASES?

LEONHARD EULER

‣ 1703 - 1783

‣ Swiss Mathematician

‣ Inventor of Graph Theory

• SOME HISTORY …

The Seven Bridges of Königsberg is a historically notable problem in mathematics. Its negative resolution by Leonhard Euler in 1735 laid the foundations of graph theory and prefigured the idea of topology.

http://en.wikipedia.org/wiki/Seven_Bridges_of_Königsberg

• THE RISE OF GRAPH DATABASES

http://db-engines.com/

• THE RISE OF GRAPH DATABASES

“Graph analysis is possibly the single most effective competitive differentiator for organizations pursuing data-driven operations and decisions after the design of data capture.”

“Forrester estimates that over 25% of enterprises will be using graph databases by 2017”

• AND EVERYBODY IS BECOMING VERY SENSITIVE INDEED …

• NEO4J

http://db-engines.com/en/ranking/graph+dbms

THE WORLD’S LEADING GRAPH DATABASE

• NEO4J HAS A GRAPH NATIVE STORAGE

Do one thing and do it well UNIX PHILOSOPHY

• GRAPH DATABASES ARE DIFFERENT

Other NoSQL Relational DBMS Neo4j Graph DB

Discrete Data Minimally

connected data

Connected Data Focused on

Data Relationships

Neo4j is designed for data relaionships

• GRAPH DATABASES ARE DIFFERENT

Relational Model Graph Model

KNOWS

KNOWS

KNOWS

ANDREAS

TOBIAS

MICA

DELIA

Person FriendPerson-Friend

ANDREASDELIA

TOBIAS

MICA

• PROPERTY GRAPH MODEL

Nodes:

‣ The objects in the graph

‣ Can have name-value properties

‣ Can be labeled

Relationships:

‣ Relate nodes by type and direction

‣ Can have name-value properties CAR

DRIVES

name: “Dan” born: May 29, 1970

twitter: “@dan”name: “Ann”

born: Dec 5, 1975

since: Jan 10, 2011

brand: “Volvo” model: “V70”

LOVES

LOVES

LIVES WITH

OWNS

PERSON PERSON

• CYPHER QUERY LANGUAGE

MATCH ( :Person { name: “Dan” } ) -[ :LOVES ]-> ( :Person { name: “Ann” } )

LOVESDan Ann

LABEL PROPERTY

NODE NODE

LABEL PROPERTY

• NATIVE GRAPH PROCESSING

Connectedness and Size of Data Set

Resp

onse

Tim

e

0 to 2 hops 0 to 3 degrees Thousands of connections

Tens to hundreds of hopsThousands of degrees Billions of connections

Relational and Other NoSQLDatabases

• GRAPHS ARE EVERYWHERE

NETWORK IMPACT ANALYSIS

• GRAPHS ARE EVERYWHERE

FRAUD DETECTION

• GRAPHS ARE EVERYWHERE

REAL-TIME RECOMMENDATION

• GRAPHS ARE EVERYWHERE

IDENTITY AND ACCESS MANAGEMENT

• GRAPHS ARE EVERYWHERE

MASTER DATA MANAGEMENT

• GRAPHS ARE EVERYWHERE

ROUTE FINDING

• GRAPHS ARE EVERYWHERE

LOGISTICS

• GRAPHS ARE EVERYWHERE

GRAPH BASED SEARCH

‣ Centrality analysis: To identify the most central entities in your network, a very useful capability for influencer marketing

‣ Path analysis: To identify all the connections between a pair of entities, useful in understanding risks and exposure

‣ Community detection: To identify clusters or communities, which is of great importance to understanding issues in sociology and biology.

‣ Sub-graph isomorphism: To search for a pattern of relationships, useful for validating hypotheses and searching for abnormal situations, such as hacker attacks.

‣ Connectivity analysis: This type of graph analysis can be applied for determining weaknesses in networks such as a utility power grid. It also enables comparing connectivity across networks.

• GRAPH ANALYTICS

‣ In order to leverage data relationships, organizations needs a database technology that stores relationship information as a first-class entity:

THAT TECHNOLOGY IS A GRAPH DATABASE

• GRAPH DATABASES HANDLE RELATIONSHIPS

‣ INTRO

‣ RELATIONSHIPS MATTER

‣ GRAPH DATABASE

‣ ELASTIC GRAPH IS NOT A GRAPH DATABASE

‣ POLYGLOT PERSISTENCE

‣ GRAPH BASED SEARCH

• AGENDA

• ELASTIC GRAPH IS NOT A GRAPH DATABASE

• ELASTIC GRAPH IS NOT A GRAPH DATABASE

ELASTIC SEARCH NEO4J

Description A modern search and analytics engine based on Apache Lucene

Open source graph database

Database Model Search Engine Graph DBMS

Data Scheme Schema-Free Schema-Free

APIS and other Access Methods Java API, RESTful HTTP/JSON API Cypher Query Language Java API, RESTful HTTP API

Implementation Language Java Java

Partitioning Methods Sharding None

Replication Methods Yes Master - Slave

Transaction Concepts No ACID

http://db-engines.com/en/system/Elasticsearch;Neo4j

• ELASTIC GRAPH

‣ Elastic Graph provides a way to discover how items in an Elasticsearch index are related

‣ You can explore the connections between indexed terms and see which connections are the most meaningful

‣ What’s more, Elastic Graph is able to identify the most meaningful connections by taking advantage of Elasticsearch relevance scoring.

• ELASTIC GRAPH

‣ It works out of the box with existing Elasticsearch indices and doesn’t require you to store any additional data:

1) The vertices are simply the terms that you’ve already indexed

2) The connections are derived on the fly using Elasticsearch aggregations

‣ Graph consists of two components: an Elasticsearch plugin that provides a simple, yet powerful graph exploration API, and a Kibana plugin that provides an interactive Graph visualization tool.

seacom.it/la-campagna-elettorale-di-donald-trump-analizzata-con-graph-graph/

• LEARN MORE ABOUT ELASTIC GRAPH

• ELASTIC GRAPH

‣ Elasticsearch doesn't have the capability of graph databases to answer precise graph questions - involving for example paths and conditions along the paths and returning specific "instances"

E.g. It cannot answer this "how many friends of Lorenzo (friends index) that have been to Paris (trip index) and that have talked more than 3 times in the last weeks in forums mentioning (message index) the word Impressionism"

‣ INTRO

‣ RELATIONSHIPS MATTER

‣ GRAPH DATABASE

‣ ELASTIC GRAPH IS NOT A GRAPH DATABASE

‣ POLYGLOT PERSISTENCE

‣ GRAPH BASED SEARCH

• AGENDA

• POLYGLOT PERSISTENCE

“I'm confident to say that if you're starting a new strategic enterprise application you should no longer be assuming that your persistence should be relational. The relational option might be the right one - but you should seriously look at other alternatives. One of the interesting consequences of this is that we are gearing up for a shift to polyglot persistence where any decent sized enterprise will have a variety of different data storage technologies for different kinds of data.”

MARTIN FOWLER http://martinfowler.com/bliki/PolyglotPersistence.html

This is still billions of nodes and relationships( )

• SELECTION CRITERIA FOR NO-SQL DATABASES

~90% threshold

‣ INTRO

‣ RELATIONSHIPS MATTER

‣ GRAPH DATABASE

‣ ELASTIC GRAPH IS NOT A GRAPH DATABASE

‣ POLYGLOT PERSISTENCE

‣ GRAPH BASED SEARCH

• AGENDA

‣ In early days, basic ways to access data was based on ‘listing’ or ‘keyword’ search, where you type in a word or phrase and get back a list of all the items that include said keyword.

‣ This method is essentially plain pattern recognition: a cumbersome process of repeatedly redefining your search terms until you finally hit on something of interest.

• GRAPH BASED SEARCH

‣ Knowledge Graph is a database that enhances Google’s search engine results with semantic information gathered from a wide variety of sources

• GRAPH BASED SEARCH

‣ Facebook’s Graph Search enables users to combine search phrases to get more structured search results, rather than simply using standard keywords and getting the results that match those words

• GRAPH BASED SEARCH

‣ As graph systems understand the way data is related, they return much more precise and richer results:

the key to their enhanced search capability is that the first search takes into account the entire structure of connected data available.

• GRAPH BASED SEARCH

‣ Graph-based search is able to deliver relevant information that you may not have specifically asked for – offering a more proactive and targeted search experience that allows you to quickly triangulate onto the data points that are of the most interest to you.

‣ In essence, graph-based search is intelligent: you can ask much more precise and useful questions and get back the most relevant and meaningful information.

• GRAPH BASED SEARCH RETURNS MORE PRECISE RESULTS

• A DEMO?

ELASTIC SEARCH - NEO4J INTEGRATION

‣ Each database handles its chore best, e.g. data volume or complexity and their combination provides the best of both worlds for application development productivity and efficient large-scale data management.

• A DEMO?

GRAZIE

www.larus-ba.it

Q&A