Codemotion 2015 Infinispan Tech lab

101
ROME 27-28 march 2015 – Ugo Landini Quick Start Lab JBoss Data Grid Ugo Landini Senior Solution Architect [email protected] March 26th 2015

Transcript of Codemotion 2015 Infinispan Tech lab

ROME 27-28 march 2015 – Ugo Landini

Quick Start Lab JBoss Data Grid

Ugo LandiniSenior Solution [email protected] March 26th 2015

Quick Start Lab - JBoss Data Grid2

• Big Data & NoSQL: super quick introduction to terminology • What developers do to scale out • Consistent Hashing • What’s a Data Grid • DEMO • Infinispan/JDG features • Q&A

Agenda

Quick Start Lab - JBoss Data Grid3

• Big Data & NoSQL: super quick introduction to terminology • What developers do to scale out • Consistent Hashing • What’s a Data Grid • DEMO • Infinispan/JDG features • Q&A

Agenda

Quick Start Lab - JBoss Data Grid4

new generation of technologies ... designed to

economically extract value from very large volumes of a

wide variety of data, by enabling high velocity

capture, discovery and/or analysis

IDC, 2012

Big Data

Quick Start Lab - JBoss Data Grid5

Not Only SQL

Just an alternative to RDBMS

NoSQL

Quick Start Lab - JBoss Data Grid6

K/V StoreDocument Store

Column based DBGraph DB

XML, Object DB, Multidimensional, Grid/Cloud, …

see map on https://451research.com/images/Marketing/dataplatformsmapoctober2014.pdf

NoSQL

Quick Start Lab - JBoss Data Grid7

NoSQL

Quick Start Lab - JBoss Data Grid8

We’re here

NoSQL

Quick Start Lab - JBoss Data Grid9

•Very hard to categorise in a systematic way•Many nuances•Many cases of “Evolutionary Convergence”

•i.e. evolving similar features having to adapt to similar environments

NoSQL

Quick Start Lab - JBoss Data Grid10

CAP Theorem

Quick Start Lab - JBoss Data Grid11

•Brewer’s Theorem (2000, proven in 2002)•Three guarantees of a Distributed System

•Consistency•Availability•Partition Tolerance

CAP Theorem

Quick Start Lab - JBoss Data Grid12

All nodes see the same data at the same time

Consistency

Quick Start Lab - JBoss Data Grid13

A guarantee that every request receives a response about whether it succeeded or failed

Availability

Quick Start Lab - JBoss Data Grid14

The system continues to operate despite arbitrary message loss or failure of part of the system

Partition Tolerance

Quick Start Lab - JBoss Data Grid15

The system continues to operate despite arbitrary message loss or failure of part of the system

Partition Tolerance

Quick Start Lab - JBoss Data Grid16

Consistency: Transactions

Availability: Redundancy

Partition Tolerance: Scaleout

CAP: Popular Version

Quick Start Lab - JBoss Data Grid17

Consistency: Transactions

Availability: Redundancy

Partition Tolerance: Scaleout

NO GO

CAP: Popular Version

Quick Start Lab - JBoss Data Grid18

Consistency: Transactions

Availability: Redundancy

Partition Tolerance: Scaleout

RDBMS

CAP: Popular Version

Quick Start Lab - JBoss Data Grid19

Consistency: Transactions

Availability: Redundancy

Partition Tolerance: Scaleout

NoSQL

CAP: Popular Version

Quick Start Lab - JBoss Data Grid20

Brewer wrote an essay in 2012 to clarify some of the CAP implications

http://www.infoq.com/articles/cap-twelve-years-later-how-the-rules-have-changed

CAP: Modern Version

Quick Start Lab - JBoss Data Grid21

The "two out of three" concept can be misleading or misapplied and it should be considered as a tautology

Many vendors used CAP theorem just as an excuse to sacrifice Consistency

CAP: Modern Version

Quick Start Lab - JBoss Data Grid22

Partitions are rare, so there is little reason to forfeit C or A when the system is not partitioned

The choice between C and A can occur many times within the same system at very fine granularity

CAP: Modern Version

Quick Start Lab - JBoss Data Grid23

Different decisions about C and A:

•for different operations•for different data•in different moments

CAP: Modern Version

Quick Start Lab - JBoss Data Grid24

Finally, C, A e P are more continuos than binary:

•A is obviously continuous•Many levels of Consistency (think isolation level in classic DB)

•Even Partitions have nuances, including disagreement within the system about whether a partition exists

CAP: Modern Version

Quick Start Lab - JBoss Data Grid25

• Big Data & NoSQL: super quick introduction to terminology • What developers do to scale out • Consistent Hashing • What’s a Data Grid • DEMO • Infinispan/JDG features • Q&A

Agenda

26

Virtual Machine 1

Client

Cache

RDBMS

read & write

Local Caching

27

Virtual Machine 1

Client

Cache

RDBMS

read & write

•Single JVM•few memory•no HA

Local Caching

28

Virtual Machine 1

Client 1

Cache 1

RDBMS

Virtual Machine 2

Client 2

Cache 2

1. Client 1 reads A

First try at distributed caching

29

Virtual Machine 1

Client 1

Cache 1

RDBMS

Virtual Machine 2

Client 2

Cache 2

2. Client 1 writes A to Cache 1

First try at distributed caching

30

Virtual Machine 1

Client 1

Cache 1

RDBMS

Virtual Machine 2

Client 2

Cache 2

3. Client 2 writes A2 to RDBMS

First try at distributed caching

31

Virtual Machine 1

Client 1

Cache 1

RDBMS

Virtual Machine 2

Client 2

Cache 2

4. Client 1 reads A from Cache 1

First try at distributed caching

32

Distributed Caching on many nodesWhat about dirty reads? (i.e. how to cope with multiple writes, invalidation, etc.)

First try at distributed caching

33

Virtual Machine 1

Client 1

Cache 1

RDBMS

Virtual Machine 2

Client 2

Cache 2

1. Client 2 writes A2 to RDBMS

Second try at distributed caching

34

Virtual Machine 1

Client 1

Cache 1

RDBMS

Virtual Machine 2

Client 2

Cache 2

2. Client 2 updates Cache 2

Second try at distributed caching

35

Virtual Machine 1

Client 1

Cache 1

RDBMS

Virtual Machine 2

Client 2

Cache 2

3. sync Caches

Second try at distributed caching

36

Virtual Machine 1

Client 1

Cache 1

RDBMS

Virtual Machine 2

Client 2

Cache 2

1. Client 1 reads A2 from Cache 1

Second try at distributed caching

Quick Start Lab - JBoss Data Grid37

New Cache topologyStartup timeState transfers Incompatible JVM tuningsGCsNon Java clients

Second try at distributed caching

Quick Start Lab - JBoss Data Grid38

• Big Data & NoSQL: super quick introduction to terminology • What developers do to scale out • Consistent Hashing • What’s a Data Grid • Infinispan/JDG features • Q&A

Agenda

Quick Start Lab - JBoss Data Grid39

Hashing Wheel: a mathematical “wheel” on which you hash Ks (keys) and Ns (nodes).

The relative position of Ks and Ns determines which Node is the “owner” of that particular K in a topology

Consistent Hashing

Quick Start Lab - JBoss Data Grid40

N1 Node 1

N2

N3

Node 2

Node 3

Consistent Hashing

Quick Start Lab - JBoss Data Grid41

Ns (nodes) on the “wheel” partition the hash space in segments

Every segment contains a range of Ks

Consistent Hashing

Quick Start Lab - JBoss Data Grid42

N1 Node 1

N2

N3

Node 2

Node 3

K250

Consistent Hashing

Quick Start Lab - JBoss Data Grid43

N1 Node 1

N2

N3

Node 2

Node 3

K250 owner = N2

Consistent Hashing

Quick Start Lab - JBoss Data Grid44

N1 Node 1

N2

N3

Node 2

Node 3

K250

K570

K700

K900K53

Consistent Hashing

Quick Start Lab - JBoss Data Grid45

Going clockwise from the K:

•the first N is the owner•next N is the replica•next next N could be another replica, and so on

Consistent Hashing

Quick Start Lab - JBoss Data Grid46

N1 Node 1

N2

N3

Node 2

Node 3

K250

K570

K700

K900K53

owner = N2replica = N3

Consistent Hashing

Quick Start Lab - JBoss Data Grid47

What happens if a node dies?

Consistent Hashing

Quick Start Lab - JBoss Data Grid48

N1 Node 1

N3

Node 2

Node 3

K250

K570

K700

K900K53

owner = N2replica = N3

Consistent Hashing

Quick Start Lab - JBoss Data Grid49

N1 Node 1

N3Node 3

K250

K570

K700

K900K53

Consistent Hashing

Quick Start Lab - JBoss Data Grid50

N1 Node 1

N3Node 3

K250

K570

K700

K900K53

owner = N3replica = N1

Consistent Hashing

Quick Start Lab - JBoss Data Grid51

The real CH algorithm implemented in JDG is slightly differentCH is optimized to minimize state transfer (i.e. number of keys moving when a node dies or a new one joins the cluster)

Consistent Hashing

Quick Start Lab - JBoss Data Grid52

• Big Data & NoSQL: super quick introduction to terminology • What developers do to scale out • Consistent Hashing • What’s a Data Grid • DEMO • Infinispan/JDG features • Q&A

Agenda

Quick Start Lab - JBoss Data Grid53

Distributed Memory Storage EngineNetworked MemoryA Distributed Cache “on steroids”A Transactional NoSQL

What’s a Data Grid?

Quick Start Lab - JBoss Data Grid54

•Key/Value storage•Search Engine (from K/V to Document storage)

•Linear Scalability, Elasticity and Fault tolerance•Thanks to CH

•Memory based•Persistence engines are optional

What’s a Data Grid?

Quick Start Lab - JBoss Data Grid55

•Different Topologies•Querying•Task Execution & Map/Reduce•Partition Handling•Data Affinity (to squeeze every bit of performance)

Data Grid > Distributed Caching

Quick Start Lab - JBoss Data Grid56

•LOCAL•INVALIDATION•REPLICATED•DISTRIBUTED

JDG Cache Topologies (Cluster modes)

Quick Start Lab - JBoss Data Grid57

•LOCAL•simple cache (EHCache-like)

•INVALIDATION•REPLICATED•DISTRIBUTED

JDG Cache Topologies (Cluster modes)

Quick Start Lab - JBoss Data Grid58

•LOCAL•INVALIDATION

•no sharing•REPLICATED•DISTRIBUTED

JDG Cache Topologies (Cluster modes)

Quick Start Lab - JBoss Data Grid59

•LOCAL•INVALIDATION•REPLICATED

•All node are equals•4 Nodes @ 8 GB = 8 GB

•DISTRIBUTED

JDG Cache Topologies (Cluster modes)

Quick Start Lab - JBoss Data Grid60

•LOCAL•INVALIDATION•REPLICATED•DISTRIBUTED

•For example: 1 Replica•4 Nodes @ 8 GB = 16 GB

JDG Cache Topologies (Cluster modes)

61

Server B

JDG 3 JDG 4

Server A

JDG 1 JDG 2 cluster

4 JDG Nodes on 2 servers

A Simple Grid

62

JDG 1 JDG 2 JDG 3 JDG 4

K0

K1K6

K3

K8

K2

K4

K9

K5K7

Distributed without Replica

63

JDG 1 JDG 2 JDG 3 JDG 4

K0

K1K6

K3

K8

K2

K4

K9

K5K7

K5

K2 K9

K7

K4K3

K1K0

K8

K6

Distributed with Replica

64

JDG 1 JDG 2 JDG 3 JDG 4

K0K1

K6

K3

K8

K2

K4

K9K5

K7

K0K1

K6

K3

K8

K2

K4

K9K5

K7

K0K1

K6

K3

K8

K2

K4

K9K5

K7

K0K1

K6

K3

K8

K2

K4

K9K5

K7

Replicated

Quick Start Lab - JBoss Data Grid65

•Replicated:•“Small” set of data with high % of reads vs writes

•Distributed: •“Big” set of data: linear scaling•You need M/R & Distexec

How do I choose?

Quick Start Lab - JBoss Data Grid66

•You can have different Cache configurations in the same CacheManager•mix&match Replicated and Distributed as needed

JDG Cache Topologies (Cluster modes)

Quick Start Lab - JBoss Data Grid67

•Default hashing (Distributed mode): MurmurHash3.

•It’s a simple and standard Hashing:•you can change it as you like, f.e. if your key already identifies a partitioning criteria

Tuning your hashing

Quick Start Lab - JBoss Data Grid68

•Can be “fine tuned” in 4 different ways:•Server Hinting•Virtual Servers•Grouping•Key Affinity

Tuning your hashing

Quick Start Lab - JBoss Data Grid69

•A triple (site, rack, server)•You increase availability avoiding that replicas ends up in the same (site, rack, server) of the master

Server Hinting

Quick Start Lab - JBoss Data Grid70

•Number of di “segments” in which the cluster is partitioned

•Improve the node distribution on the hashing wheel to have a better distribution of keys

•Default: 60

Virtual Servers

Quick Start Lab - JBoss Data Grid71

•Data colocation•A cache node contains K but also other relevant data afferent to K•Example: customer and its bank movements

•You just have to define a group, JDG will colocate all data of the same group in the same node

Grouping

Quick Start Lab - JBoss Data Grid72

•Like Grouping, but from another perspective:•You just ask a node for a key that will be hashed on that node

•Grouping/Affinity are your best friends if you want to reach JDG Nirvana!

Key Affinity

Quick Start Lab - JBoss Data Grid73

•All data needed by a node of your application are local, at the distance of a single Java method call

JDG Nirvana

Quick Start Lab - JBoss Data Grid74

• Big Data & NoSQL: super quick introduction to terminology • What developers do to scale out • Consistent Hashing • What’s a Data Grid • DEMO • Infinispan/JDG features • Q&A

Agenda

Quick Start Lab - JBoss Data Grid75

•Small self-contained projects that can be used to simply explain JDG to customers

•https://github.com/redhat-italy/jdg-quickstarts

JDG Quickstarts

Quick Start Lab - JBoss Data Grid76

• Big Data & NoSQL: super quick introduction to terminology • What developers do to scale out • Consistent Hashing • What’s a Data Grid • DEMO • Infinispan/JDG features • Q&A

Agenda

Quick Start Lab - JBoss Data Grid77

•If JDG detects a split brain, partitions enter in degraded mode

•A degraded partition can read/write ONLY fully owned keys•A partition fully owns a key if contains master and replicas nodes for that key

•You’ll get an AvailabilityException for other keys

Partition Handling

Quick Start Lab - JBoss Data Grid78

•Cache Store•Not only in memory!•Write through & write behind (ACK sync or async)

•Pluggable “drivers”•File System, JPA, LevelDB (supported)•MongoDB, Cassandra, BerkeleyDB, etc. (community)

Persistence

Quick Start Lab - JBoss Data Grid79

•To avoid Out Of Memory•Entry can be “passivated” on disk (you’ll need a CacheStore)

Eviction

Quick Start Lab - JBoss Data Grid80

•To avoid Out Of Memory•Entry can be “passivated” on disk (you’ll need a CacheStore)

Eviction

Quick Start Lab - JBoss Data Grid81

•You assign a lifespan or a max idle time to a key

•The key will then be automatically removed after that time

•You don’t need to write “Garbage Clean code”

Expiry

Quick Start Lab - JBoss Data Grid82

Expiry

Quick Start Lab - JBoss Data Grid83

•Both avoid Out Of Memory•“Evicted” data can be maintained in the Grid with Passivation

•Eviction is a Cache configuration•Expiration is a Key configuration•Expiration could be a business requisite•Eviction is a system feature

Eviction/Expiry: differences

Quick Start Lab - JBoss Data Grid84

•JDG has full support for transactions•Local Transactions•Global Transactions (XA): if running inside an AS automatically uses its TX Manager

•Batching API

Transactions

Quick Start Lab - JBoss Data Grid85

•Cache/CacheManager events•Topology changes•Entries being added, removed, modified•Cluster listeners

Listener/Notifications

Quick Start Lab - JBoss Data Grid86

•Infinispan-query module•Hibernate Search & Lucene •Querying via DSL•Lucene indexes can be kept in memory, on disk or in the grid

Querying the grid

Quick Start Lab - JBoss Data Grid87

•with M/R you can implement distributed global operation on the grid

•Each node works on its data (Map)•Results are later aggregated (Reduce)

Map/Reduce

Quick Start Lab - JBoss Data Grid88

Map/Reduce

Quick Start Lab - JBoss Data Grid89

Map/Reduce

Quick Start Lab - JBoss Data Grid90

•JDG 7 will implement HDFS API•So it will be able to act as a super fast Hadoop store

Hadoop, coming soon…

Quick Start Lab - JBoss Data Grid91

•With Distexec you can submit “tasks” to the Grid

•The task can be executed on each node or on a subset of the nodes

•The task can modify data in the Grid

Distributed Execution (Distexec)

Quick Start Lab - JBoss Data Grid92

Cross Site Replication

Quick Start Lab - JBoss Data Grid93

•“Follow the Sun” architectures•Many different clusters that can be kept in sync

Cross Site Replication

Quick Start Lab - JBoss Data Grid94

•JSR-107•Java Temporary Caching API•Confirmed in January 2015•In roadmap for JDG 6.5

•JSR-347•Data Grids for the Java Platform•JSR Retired in January 2015

Standard APIs

Quick Start Lab - JBoss Data Grid95

•Command Line Console•JMX•JON Plugin

Management Tooling

Quick Start Lab - JBoss Data Grid96

•User Authentication•SASL•Role Based Access Control (RBAC)

•Users, Roles and mapping between roles and operations on Cache / Cache-Manager

•Node Authentication & Authorisation •Encrypted communication between nodes

Data Security

Quick Start Lab - JBoss Data Grid97

•Library mode•Embedded in your JVM

•C/S mode•REST•Memcached•Hot Rod

Embedded vs Client/Server

Quick Start Lab - JBoss Data Grid98

Embedded vs Client/Server

Quick Start Lab - JBoss Data Grid99

Protocol Client Libs

Smart Routing

Load Balancing/Failover

TX Listeners M/R Dist Querying Separated Cluster

Library mode

inVM N/A Yes Dynamic Yes Yes Yes Yes Yes No

REST Text HTTP NoAny HTTP

load balancer

No No No No No Yes

Memcached Text Many No Predefined server list

No No No No No Yes

Hot Rod BinaryJava/

Python/C++

Yes Dynamic Local w MVCC

Yes (6.4) No No Yes (6.3) Yes

Protocol Comparison

ROME 27-28 march 2015 – Ugo Landini

Q&A

ROME 27-28 march 2015 – Ugo Landini

Thank You! Leave your feedback on Joind.in!https://joind.in/event/view/3347

Quick Start Lab JBoss Data Grid

Ugo LandiniSenior Solution [email protected] March 26th 2015