OrientDB: Unlock the Value of Document Data Relationships

32
OrientDB: Unlock the Value of Document Data Relationships Fabrizio Fortino @fabriziofortino 11th April 2016 #HUGIreland @boistartups

Transcript of OrientDB: Unlock the Value of Document Data Relationships

Page 1: OrientDB: Unlock the Value of Document Data Relationships

OrientDB: Unlock the Value of Document Data Relationships

Fabrizio Fortino@fabriziofortino 11th April 2016 #HUGIreland

@boistartups

Page 2: OrientDB: Unlock the Value of Document Data Relationships

The world is changing

UnstructuredData

Big Data Explosion

ConnectedDataMobile, IOT

http://destinhaus.com/internet-of-things-the-rise-of-smart-manufacturing/

Page 3: OrientDB: Unlock the Value of Document Data Relationships

“… starting a new strategic enterprise application you should no longer be assuming that your persistence should be relational. The relational option might be the right one - but

you should seriously look at other alternatives.”

Polyglot Persistence [2011] Martin Fowler

Rethink how we store data

Page 4: OrientDB: Unlock the Value of Document Data Relationships

A Polyglot Persistence example

E-commerce Application

Primary Store+

Financial Data(RDBMS)

Recommendations(Graph)

Products Catalog(Document)

User Sessions(Key-Value)

ETL Jobs / Data Synchronisation

Page 5: OrientDB: Unlock the Value of Document Data Relationships

• Hire experts for each database type

• No standards between NOSQL products

• Increased overall complexity

• High TCO

• Write and maintain ETL and data synchronisation

• Hard to refactor

• Testing can be tough

More flexibility, at what price?

Page 6: OrientDB: Unlock the Value of Document Data Relationships

Entering Multi-Model Databases

Graph

Document

Object

Key/Value

Full-Text

Spatial

Multi-Model represents the intersection

of multiple models in a single product

Page 7: OrientDB: Unlock the Value of Document Data Relationships

Product Positioning QuadrantRe

latio

nshi

p C

ompl

exity

>

Data Complexity >

Relational

Key Value

Column

Graph

Document

Multi-Model

Page 8: OrientDB: Unlock the Value of Document Data Relationships

• First Multi-Model DBMS with a Graph Engine

• Community Edition FREE (Apache v2 License)

• Enterprise Edition (profiler, live monitor, telereporter, etc)

• Vibrant community (≈ 100 contributors, ≈ 15K commits)

• Easy to install and use

• Zero configuration Multi-Master Architecture

• ACID

• Reactive (Live Queries)

OrientDB at a Glance

Page 9: OrientDB: Unlock the Value of Document Data Relationships

Quite a long journey

1998 2009 2010 2011 20152012 20142013

OrientDB: First ever multi-model DBMS released as Open

Source

R&D

2016

OrientDB Enterprise Launch

0

12K

70K

3K1K

200

Downloads / month

Orient ODBMS: First ever ODBMS with

index-free adjacency

Page 10: OrientDB: Unlock the Value of Document Data Relationships

Under the hood

Storage

MemoryWorks in Memory Only

(Ideal for Integration Testing)

PLocalWrite/Read to/from File System

RemoteDelegates all Operations to a Remote

Server

Document APIHandles Records as Documents

Graph APITinkerPop Blueprints Implementation

Object APIPOJO to Document mapping

User Application

Page 11: OrientDB: Unlock the Value of Document Data Relationships

• Embedded (in-process)

• Single, Standalone Node

• Multi-Master Replica

• Mixed

Deployment options

Application

Application

ApplicationApplication

Application

Page 12: OrientDB: Unlock the Value of Document Data Relationships

Document API

• Lowest level API

• Document (record) is the storage’s unit

• An immutable id (ORID) is automatically set to each document

• Documents can contain key-value pairs or nested/embedded documents (no ORID)

• Transactions support (optimistic mode with MVCC)

• Classes are logical sets of documents

Page 13: OrientDB: Unlock the Value of Document Data Relationships

Schema-less, Schema-full or Hybrid?

Schema-lessrelaxed model, the type of each

field is inferred for each document

Schema-fullstrict model, schema with constraints on fields and

validation rules

Hybridmixed model, schema with

mandatory and optional fields with constraints and

validation rules

Page 14: OrientDB: Unlock the Value of Document Data Relationships

• Can inherits from other classes, creating a tree (similar to RDF Schema)

• A sub-class inherits all the schema fields from the parents

• An abstract class is used as the foundation for other classes (it cannot have records)

• Class hierarchies allow native polymorphic queries

• 1 to 1 mapping with domain objects

Class concept is taken from OOP

Page 15: OrientDB: Unlock the Value of Document Data Relationships

Let’s create a Document

`

{ ”@rid": “#12:216”, ”@class": ”user", “name”: “Fabrizio”, “meetups”: [ { “name”: “HUG Ireland”, “city”: “Dublin”, “since”: “14-03-2014” } ], “details”: { “@type”: “d”, “@class”: “user_details” “city”:”Dublin”,

“nationality”:”IT” } }

Immutable Record IDLogical set

Property

Array of objects

Embedded document

Page 16: OrientDB: Unlock the Value of Document Data Relationships

Let’s create a Document

`

{ ”@rid": “#12:216”, ”@class": ”user", “name”: “Fabrizio”, “meetups”: [ { “name”: “HUG Ireland”, “city”: “Dublin”, “since”: “14-03-2014” } ], “details”: { “@type”: “d”, “@class”: “user_details” “city”:”Dublin”,

“nationality”:”IT” } }

Immutable Record IDLogical set

Property

Array of objects

Embedded document

With a traditional Document DB you have to duplicate your data to some degree. The degree

depends on how complex are the interdependencies of the application domain.

OrientDB combines the unique flexibility of documents with the power of graphs to unlock the business value of Document Data Relationships.

Page 17: OrientDB: Unlock the Value of Document Data Relationships

Graphs: everything old is new again

https://en.wikipedia.org/wiki/Seven_Bridges_of_Königsberg

Page 18: OrientDB: Unlock the Value of Document Data Relationships

What is a Graph Database?

“A Graph Database is any storage system that provides index-free adjacency”

The Graph Traversal Pattern [2010] Marco A. Rodriguez

G = (V, E)G

raph

Vert

ex

Edge

A

Page 19: OrientDB: Unlock the Value of Document Data Relationships

• Given a User (Fabrizio)

• Find Fabrizio (id=10) in member table O(log n)

• Find 18 and 24 (Hug Ireland & Microservices) in Meetup table O(log n)

What’s wrong with joins?

name idFabrizio 10

Uli 12

John 13

Eddie 88

Useruser_id meetup_id

10 18

10 24

13 18

88 66

memberid name18 HUG Ireland

57 AWS Users

24 Microservices

66 Scala

Meetup

• Joins are computed every time you cross relationships

• Time complexity grows with data: O(log n)

• Joining 3-4 tables with million of records could create billion combinations

Page 20: OrientDB: Unlock the Value of Document Data Relationships

• Given a User (Fabrizio)

• Traverse the edges member to reach Hug Ireland O(1) & Microservices O(1)

• Fabrizio is the index to reach the linked Meetups!

The Graph as an Index

• Every vertex and edge is “hard wired” to its adjacent vertex or edge

• Traversing an edge does not require complex computation, near O(1)

• The traversal time is not affected by the database size

Fabrizio

HUG Ireland

Micro Services

member

member

Easier to sketch!

Page 21: OrientDB: Unlock the Value of Document Data Relationships

Combine Documents with Graphs

`

{ “@rid”: “12:216”, “@class”: ”user", “name”: “Fabrizio”, “details”: {

“@type”: “d”, “@class”: “user_detail”,

“city”: “Dublin”, “nationality”: ”IT”

}

`

{ “@rid”: “13:12”, “@class”: “meetup”, “name”: “HUG Ireland”, “city”: “Dublin” }

`

{ “@rid”: “14:32”, “@class”: “member”, “since”: “14-03-2014”, “in”: “12:216”, “out”: “13:12” }

out_member=14:32 in_member=14:32

{ “@rid”: “15:79”, “@class”: “talk”, “title”: “OrientDB”, “on”: “11-04-2016”, “in”: “12:216”, “out”: “13:12” }

out_talk=15:79

in_talk=15:79

Page 22: OrientDB: Unlock the Value of Document Data Relationships

Combine Documents with Graphs

`

{ “@rid”: “12:216”, “@class”: ”user", “name”: “Fabrizio”, “details”: {

“@type”: “d”, “@class”: “user_detail”,

“city”: “Dublin”, “nationality”: ”IT”

}

`

{ “@rid”: “13:12”, “@class”: “meetup”, “name”: “HUG Ireland”, “city”: “Dublin” }

`

{ “@rid”: “14:32”, “@class”: “member”, “since”: “14-03-2014”, “in”: “12:216”, “out”: “13:12” }

out_member=14:32 in_member=14:32

{ “@rid”: “15:79”, “@class”: “talk”, “title”: “OrientDB”, “on”: “11-04-2016”, “in”: “12:216”, “out”: “13:12” }

out_talk=15:79

in_talk=15:79

Multi-relational Document Graph

Page 23: OrientDB: Unlock the Value of Document Data Relationships

Will you believe me if I said you can query documents/graphs with SQL like syntax?

Show me something now! OK, time for a quick demo.

http://www.sharegoodstuffs.com/2011_12_12_archive.html

Page 24: OrientDB: Unlock the Value of Document Data Relationships

Use Case: raise standards in Irish Public Office

Page 25: OrientDB: Unlock the Value of Document Data Relationships

• Aggressive deadline

• Large amount of data from different sources with different formats

• Messy, dirty data

• Connects records from different sources representing the same thing without a common identifier

• Multiple steps traverse of fixed and inferred links to identify disparate entities connected by a path

The challenges

Page 26: OrientDB: Unlock the Value of Document Data Relationships

The solution

OrientDB

Fuzzy Inference Engine

Page 27: OrientDB: Unlock the Value of Document Data Relationships

• Main Language: Groovy

• Database Type: OrientDB Embedded

• Fuzzy Inference Engine: Duke

• minHash proximity index based on Lucene to avoid cartesian product

• probabilistic model with configurable statistical algorithms (Levenshtein, NGram, Soundex, Custom, etc) to identify the same entities despite differences

• End-To-End Process Time < 10 min

• Deliverable: Database

• Preset of queries to answer the main questions (analysts are completely independent to add / modify where conditions)

• Graph View to visually search and visualise data

Technical Details

Page 28: OrientDB: Unlock the Value of Document Data Relationships

What people from home perceived

≈ 20K tweets

Top hashtag in Ireland for 24 hours#rteinvestigates

Page 29: OrientDB: Unlock the Value of Document Data Relationships

“While we’ve long understood the value of Big Data to better understand how people interact with us, we’ve noticed an

alarming trend of Big Data envy: organizations using complex tools to handle “not-really-that-big” Data. Distributed map-reduce algorithms are a handy technique for large data sets,but many data sets we see could easily fit in a single node

relational or graph database. Even if you do havemore data than that, usually the best thing to do isto first pick out the data you need, which can often

then be processed on such a single node”

OK but what about Big Data?

ThoughtWorks Technology Radar, 5 April 2016

Page 30: OrientDB: Unlock the Value of Document Data Relationships

Begin the journey!

https://www.udemy.com/orientdb-getting-started/

Page 31: OrientDB: Unlock the Value of Document Data Relationships

• http://martinfowler.com/bliki/PolyglotPersistence.html

• https://en.wikipedia.org/wiki/Multi-model_database

• http://orientdb.com/

• https://en.wikipedia.org/wiki/Seven_Bridges_of_Königsberg

• http://arxiv.org/pdf/1004.1001.pdf

• https://www.udemy.com/orientdb-getting-started/

• http://www.rte.ie/news/investigations-unit/2015/1207/751833-rte-investigates/

• https://github.com/larsga/Duke

• https://www.thoughtworks.com/radar

Resources

Page 32: OrientDB: Unlock the Value of Document Data Relationships

Q A

Thank you!

&

Fabrizio Fortino@fabriziofortino 11th April 2016 #HUGIreland

@boistartups