Codemotion 2015 Infinispan Tech lab

ROME 27-28 march 2015 – Ugo Landini

Quick Start Lab JBoss Data Grid

Ugo LandiniSenior Solution [email protected] March 26th 2015

mailto:[email protected]?subject=

Quick Start Lab - JBoss Data Grid2

• Big Data & NoSQL: super quick introduction to terminology • What developers do to scale out • Consistent Hashing • What’s a Data Grid • DEMO • Infinispan/JDG features • Q&A

Agenda



Agenda


new generation of technologies ... designed to

economically extract value from very large volumes of a

wide variety of data, by enabling high velocity

capture, discovery and/or analysis

IDC, 2012

Big Data


Not Only SQL

Just an alternative to RDBMS

NoSQL


K/V StoreDocument Store

Column based DBGraph DB

XML, Object DB, Multidimensional, Grid/Cloud, …

see map on https://451research.com/images/Marketing/dataplatformsmapoctober2014.pdf

NoSQL

https://451research.com/images/Marketing/dataplatformsmapoctober2014.pdf


NoSQL


We’re here

NoSQL


•Very hard to categorise in a systematic way•Many nuances•Many cases of “Evolutionary Convergence”

•i.e. evolving similar features having to adapt to similar environments

NoSQL


CAP Theorem


•Brewer’s Theorem (2000, proven in 2002)•Three guarantees of a Distributed System

•Consistency•Availability•Partition Tolerance

CAP Theorem


All nodes see the same data at the same time

Consistency


A guarantee that every request receives a response about whether it succeeded or failed

Availability


The system continues to operate despite arbitrary message loss or failure of part of the system

Partition Tolerance


Consistency: Transactions

Availability: Redundancy

Partition Tolerance: Scaleout

CAP: Popular Version





NO GO






RDBMS






NoSQL



Brewer wrote an essay in 2012 to clarify some of the CAP implications

http://www.infoq.com/articles/cap-twelve-years-later-how-the-rules-have-changed

CAP: Modern Version

http://www.infoq.com/articles/cap-twelve-years-later-how-the-rules-have-changed


The "two out of three" concept can be misleading or misapplied and it should be considered as a tautology

Many vendors used CAP theorem just as an excuse to sacrifice Consistency

CAP: Modern Version


Partitions are rare, so there is little reason to forfeit C or A when the system is not partitioned

The choice between C and A can occur many times within the same system at very fine granularity

CAP: Modern Version


Different decisions about C and A:

•for different operations•for different data•in different moments

CAP: Modern Version


Finally, C, A e P are more continuos than binary:

•A is obviously continuous•Many levels of Consistency (think isolation level in classic DB)

•Even Partitions have nuances, including disagreement within the system about whether a partition exists

CAP: Modern Version



Agenda

26

Virtual Machine 1

Client

Cache

RDBMS

read & write

Local Caching

27

Virtual Machine 1

Client

Cache

RDBMS

read & write

•Single JVM•few memory•no HA

Local Caching

28

Virtual Machine 1

Client 1

Cache 1

RDBMS

Virtual Machine 2

Client 2

Cache 2

1. Client 1 reads A

First try at distributed caching

29

Virtual Machine 1

Client 1

Cache 1

RDBMS

Virtual Machine 2

Client 2

Cache 2

2. Client 1 writes A to Cache 1


30

Virtual Machine 1

Client 1

Cache 1

RDBMS

Virtual Machine 2

Client 2

Cache 2

3. Client 2 writes A2 to RDBMS


31

Virtual Machine 1

Client 1

Cache 1

RDBMS

Virtual Machine 2

Client 2

Cache 2

4. Client 1 reads A from Cache 1


32

Distributed Caching on many nodesWhat about dirty reads? (i.e. how to cope with multiple writes, invalidation, etc.)


33

Virtual Machine 1

Client 1

Cache 1

RDBMS

Virtual Machine 2

Client 2

Cache 2

1. Client 2 writes A2 to RDBMS

Second try at distributed caching

34

Virtual Machine 1

Client 1

Cache 1

RDBMS

Virtual Machine 2

Client 2

Cache 2

2. Client 2 updates Cache 2


35

Virtual Machine 1

Client 1

Cache 1

RDBMS

Virtual Machine 2

Client 2

Cache 2

3. sync Caches


36

Virtual Machine 1

Client 1

Cache 1

RDBMS

Virtual Machine 2

Client 2

Cache 2

1. Client 1 reads A2 from Cache 1



New Cache topologyStartup timeState transfers Incompatible JVM tuningsGCsNon Java clients



• Big Data & NoSQL: super quick introduction to terminology • What developers do to scale out • Consistent Hashing • What’s a Data Grid • Infinispan/JDG features • Q&A

Agenda


Hashing Wheel: a mathematical “wheel” on which you hash Ks (keys) and Ns (nodes).

The relative position of Ks and Ns determines which Node is the “owner” of that particular K in a topology

Consistent Hashing


N1 Node 1

N2

N3

Node 2

Node 3

Consistent Hashing


Ns (nodes) on the “wheel” partition the hash space in segments

Every segment contains a range of Ks

Consistent Hashing


N1 Node 1

N2

N3

Node 2

Node 3

K250

Consistent Hashing


N1 Node 1

N2

N3

Node 2

Node 3

K250 owner = N2

Consistent Hashing


N1 Node 1

N2

N3

Node 2

Node 3

K250

K570

K700

K900K53

Consistent Hashing


Going clockwise from the K:

•the first N is the owner•next N is the replica•next next N could be another replica, and so on

Consistent Hashing


N1 Node 1

N2

N3

Node 2

Node 3

K250

K570

K700

K900K53

owner = N2replica = N3

Consistent Hashing


What happens if a node dies?

Consistent Hashing


N1 Node 1

N3

Node 2

Node 3

K250

K570

K700

K900K53


Consistent Hashing


N1 Node 1

N3Node 3

K250

K570

K700

K900K53

Consistent Hashing


N1 Node 1

N3Node 3

K250

K570

K700

K900K53


Consistent Hashing


The real CH algorithm implemented in JDG is slightly differentCH is optimized to minimize state transfer (i.e. number of keys moving when a node dies or a new one joins the cluster)

Consistent Hashing



Agenda


Distributed Memory Storage EngineNetworked MemoryA Distributed Cache “on steroids”A Transactional NoSQL

What’s a Data Grid?


•Key/Value storage•Search Engine (from K/V to Document storage)

•Linear Scalability, Elasticity and Fault tolerance•Thanks to CH

•Memory based•Persistence engines are optional

What’s a Data Grid?


•Different Topologies•Querying•Task Execution & Map/Reduce•Partition Handling•Data Affinity (to squeeze every bit of performance)

Data Grid > Distributed Caching


•LOCAL•INVALIDATION•REPLICATED•DISTRIBUTED

JDG Cache Topologies (Cluster modes)


•LOCAL•simple cache (EHCache-like)

•INVALIDATION•REPLICATED•DISTRIBUTED



•LOCAL•INVALIDATION

•no sharing•REPLICATED•DISTRIBUTED



•LOCAL•INVALIDATION•REPLICATED

•All node are equals•4 Nodes @ 8 GB = 8 GB

•DISTRIBUTED



•LOCAL•INVALIDATION•REPLICATED•DISTRIBUTED

•For example: 1 Replica•4 Nodes @ 8 GB = 16 GB


61

Server B

JDG 3 JDG 4

Server A

JDG 1 JDG 2 cluster

4 JDG Nodes on 2 servers

A Simple Grid

62

JDG 1 JDG 2 JDG 3 JDG 4

K0

K1K6

K3

K8

K2

K4

K9

K5K7

Distributed without Replica

63


K0

K1K6

K3

K8

K2

K4

K9

K5K7

K5

K2 K9

K7

K4K3

K1K0

K8

K6

Distributed with Replica

64


K0K1

K6

K3

K8

K2

K4

K9K5

K7

K0K1

K6

K3

K8

K2

K4

K9K5

K7

K0K1

K6

K3

K8

K2

K4

K9K5

K7

K0K1

K6

K3

K8

K2

K4

K9K5

K7

Replicated


•Replicated:•“Small” set of data with high % of reads vs writes

•Distributed: •“Big” set of data: linear scaling•You need M/R & Distexec

How do I choose?


•You can have different Cache configurations in the same CacheManager•mix&match Replicated and Distributed as needed



•Default hashing (Distributed mode): MurmurHash3.

•It’s a simple and standard Hashing:•you can change it as you like, f.e. if your key already identifies a partitioning criteria

Tuning your hashing


•Can be “fine tuned” in 4 different ways:•Server Hinting•Virtual Servers•Grouping•Key Affinity

Tuning your hashing


•A triple (site, rack, server)•You increase availability avoiding that replicas ends up in the same (site, rack, server) of the master

Server Hinting


•Number of di “segments” in which the cluster is partitioned

•Improve the node distribution on the hashing wheel to have a better distribution of keys

•Default: 60

Virtual Servers


•Data colocation•A cache node contains K but also other relevant data afferent to K•Example: customer and its bank movements

•You just have to define a group, JDG will colocate all data of the same group in the same node

Grouping


•Like Grouping, but from another perspective:•You just ask a node for a key that will be hashed on that node

•Grouping/Affinity are your best friends if you want to reach JDG Nirvana!

Key Affinity


•All data needed by a node of your application are local, at the distance of a single Java method call

JDG Nirvana



Agenda


•Small self-contained projects that can be used to simply explain JDG to customers

•https://github.com/redhat-italy/jdg-quickstarts

JDG Quickstarts

https://github.com/redhat-italy/jdg-quickstarts



Agenda


•If JDG detects a split brain, partitions enter in degraded mode

•A degraded partition can read/write ONLY fully owned keys•A partition fully owns a key if contains master and replicas nodes for that key

•You’ll get an AvailabilityException for other keys

Partition Handling


•Cache Store•Not only in memory!•Write through & write behind (ACK sync or async)

•Pluggable “drivers”•File System, JPA, LevelDB (supported)•MongoDB, Cassandra, BerkeleyDB, etc. (community)

Persistence


•To avoid Out Of Memory•Entry can be “passivated” on disk (you’ll need a CacheStore)

Eviction


•You assign a lifespan or a max idle time to a key

•The key will then be automatically removed after that time

•You don’t need to write “Garbage Clean code”

Expiry


Expiry


•Both avoid Out Of Memory•“Evicted” data can be maintained in the Grid with Passivation

•Eviction is a Cache configuration•Expiration is a Key configuration•Expiration could be a business requisite•Eviction is a system feature

Eviction/Expiry: differences


•JDG has full support for transactions•Local Transactions•Global Transactions (XA): if running inside an AS automatically uses its TX Manager

•Batching API

Transactions


•Cache/CacheManager events•Topology changes•Entries being added, removed, modified•Cluster listeners

Listener/Notifications


•Infinispan-query module•Hibernate Search & Lucene •Querying via DSL•Lucene indexes can be kept in memory, on disk or in the grid

Querying the grid


•with M/R you can implement distributed global operation on the grid

•Each node works on its data (Map)•Results are later aggregated (Reduce)

Map/Reduce


Map/Reduce


•JDG 7 will implement HDFS API•So it will be able to act as a super fast Hadoop store

Hadoop, coming soon…


•With Distexec you can submit “tasks” to the Grid

•The task can be executed on each node or on a subset of the nodes

•The task can modify data in the Grid

Distributed Execution (Distexec)


Cross Site Replication


•“Follow the Sun” architectures•Many different clusters that can be kept in sync

Cross Site Replication


•JSR-107•Java Temporary Caching API•Confirmed in January 2015•In roadmap for JDG 6.5

•JSR-347•Data Grids for the Java Platform•JSR Retired in January 2015

Standard APIs


•Command Line Console•JMX•JON Plugin

Management Tooling


•User Authentication•SASL•Role Based Access Control (RBAC)

•Users, Roles and mapping between roles and operations on Cache / Cache-Manager

•Node Authentication & Authorisation •Encrypted communication between nodes

Data Security


•Library mode•Embedded in your JVM

•C/S mode•REST•Memcached•Hot Rod

Embedded vs Client/Server


Embedded vs Client/Server


Protocol Client Libs

Smart Routing

Load Balancing/Failover

TX Listeners M/R Dist Querying Separated Cluster

Library mode

inVM N/A Yes Dynamic Yes Yes Yes Yes Yes No

REST Text HTTP NoAny HTTP

load balancer

No No No No No Yes

Memcached Text Many No Predefined server list

No No No No No Yes

Hot Rod BinaryJava/

Python/C++

Yes Dynamic Local w MVCC

Yes (6.4) No No Yes (6.3) Yes

Protocol Comparison


Q&A


Thank You! Leave your feedback on Joind.in!https://joind.in/event/view/3347

Quick Start Lab JBoss Data Grid

Ugo LandiniSenior Solution [email protected] March 26th 2015

https://joind.in/event/view/3347

mailto:[email protected]?subject=

Codemotion 2015 Infinispan Tech lab

Engineering

Transcript of Codemotion 2015 Infinispan Tech lab