Codemotion 2015 Infinispan Tech lab
-
Upload
ugo-landini -
Category
Engineering
-
view
455 -
download
0
Transcript of Codemotion 2015 Infinispan Tech lab
ROME 27-28 march 2015 – Ugo Landini
Quick Start Lab JBoss Data Grid
Ugo LandiniSenior Solution [email protected] March 26th 2015
Quick Start Lab - JBoss Data Grid2
• Big Data & NoSQL: super quick introduction to terminology • What developers do to scale out • Consistent Hashing • What’s a Data Grid • DEMO • Infinispan/JDG features • Q&A
Agenda
Quick Start Lab - JBoss Data Grid3
• Big Data & NoSQL: super quick introduction to terminology • What developers do to scale out • Consistent Hashing • What’s a Data Grid • DEMO • Infinispan/JDG features • Q&A
Agenda
Quick Start Lab - JBoss Data Grid4
new generation of technologies ... designed to
economically extract value from very large volumes of a
wide variety of data, by enabling high velocity
capture, discovery and/or analysis
IDC, 2012
Big Data
Quick Start Lab - JBoss Data Grid6
K/V StoreDocument Store
Column based DBGraph DB
XML, Object DB, Multidimensional, Grid/Cloud, …
see map on https://451research.com/images/Marketing/dataplatformsmapoctober2014.pdf
NoSQL
Quick Start Lab - JBoss Data Grid9
•Very hard to categorise in a systematic way•Many nuances•Many cases of “Evolutionary Convergence”
•i.e. evolving similar features having to adapt to similar environments
NoSQL
Quick Start Lab - JBoss Data Grid11
•Brewer’s Theorem (2000, proven in 2002)•Three guarantees of a Distributed System
•Consistency•Availability•Partition Tolerance
CAP Theorem
Quick Start Lab - JBoss Data Grid13
A guarantee that every request receives a response about whether it succeeded or failed
Availability
Quick Start Lab - JBoss Data Grid14
The system continues to operate despite arbitrary message loss or failure of part of the system
Partition Tolerance
Quick Start Lab - JBoss Data Grid15
The system continues to operate despite arbitrary message loss or failure of part of the system
Partition Tolerance
Quick Start Lab - JBoss Data Grid16
Consistency: Transactions
Availability: Redundancy
Partition Tolerance: Scaleout
CAP: Popular Version
Quick Start Lab - JBoss Data Grid17
Consistency: Transactions
Availability: Redundancy
Partition Tolerance: Scaleout
NO GO
CAP: Popular Version
Quick Start Lab - JBoss Data Grid18
Consistency: Transactions
Availability: Redundancy
Partition Tolerance: Scaleout
RDBMS
CAP: Popular Version
Quick Start Lab - JBoss Data Grid19
Consistency: Transactions
Availability: Redundancy
Partition Tolerance: Scaleout
NoSQL
CAP: Popular Version
Quick Start Lab - JBoss Data Grid20
Brewer wrote an essay in 2012 to clarify some of the CAP implications
http://www.infoq.com/articles/cap-twelve-years-later-how-the-rules-have-changed
CAP: Modern Version
Quick Start Lab - JBoss Data Grid21
The "two out of three" concept can be misleading or misapplied and it should be considered as a tautology
Many vendors used CAP theorem just as an excuse to sacrifice Consistency
CAP: Modern Version
Quick Start Lab - JBoss Data Grid22
Partitions are rare, so there is little reason to forfeit C or A when the system is not partitioned
The choice between C and A can occur many times within the same system at very fine granularity
CAP: Modern Version
Quick Start Lab - JBoss Data Grid23
Different decisions about C and A:
•for different operations•for different data•in different moments
CAP: Modern Version
Quick Start Lab - JBoss Data Grid24
Finally, C, A e P are more continuos than binary:
•A is obviously continuous•Many levels of Consistency (think isolation level in classic DB)
•Even Partitions have nuances, including disagreement within the system about whether a partition exists
CAP: Modern Version
Quick Start Lab - JBoss Data Grid25
• Big Data & NoSQL: super quick introduction to terminology • What developers do to scale out • Consistent Hashing • What’s a Data Grid • DEMO • Infinispan/JDG features • Q&A
Agenda
28
Virtual Machine 1
Client 1
Cache 1
RDBMS
Virtual Machine 2
Client 2
Cache 2
1. Client 1 reads A
First try at distributed caching
29
Virtual Machine 1
Client 1
Cache 1
RDBMS
Virtual Machine 2
Client 2
Cache 2
2. Client 1 writes A to Cache 1
First try at distributed caching
30
Virtual Machine 1
Client 1
Cache 1
RDBMS
Virtual Machine 2
Client 2
Cache 2
3. Client 2 writes A2 to RDBMS
First try at distributed caching
31
Virtual Machine 1
Client 1
Cache 1
RDBMS
Virtual Machine 2
Client 2
Cache 2
4. Client 1 reads A from Cache 1
First try at distributed caching
32
Distributed Caching on many nodesWhat about dirty reads? (i.e. how to cope with multiple writes, invalidation, etc.)
First try at distributed caching
33
Virtual Machine 1
Client 1
Cache 1
RDBMS
Virtual Machine 2
Client 2
Cache 2
1. Client 2 writes A2 to RDBMS
Second try at distributed caching
34
Virtual Machine 1
Client 1
Cache 1
RDBMS
Virtual Machine 2
Client 2
Cache 2
2. Client 2 updates Cache 2
Second try at distributed caching
35
Virtual Machine 1
Client 1
Cache 1
RDBMS
Virtual Machine 2
Client 2
Cache 2
3. sync Caches
Second try at distributed caching
36
Virtual Machine 1
Client 1
Cache 1
RDBMS
Virtual Machine 2
Client 2
Cache 2
1. Client 1 reads A2 from Cache 1
Second try at distributed caching
Quick Start Lab - JBoss Data Grid37
New Cache topologyStartup timeState transfers Incompatible JVM tuningsGCsNon Java clients
Second try at distributed caching
Quick Start Lab - JBoss Data Grid38
• Big Data & NoSQL: super quick introduction to terminology • What developers do to scale out • Consistent Hashing • What’s a Data Grid • Infinispan/JDG features • Q&A
Agenda
Quick Start Lab - JBoss Data Grid39
Hashing Wheel: a mathematical “wheel” on which you hash Ks (keys) and Ns (nodes).
The relative position of Ks and Ns determines which Node is the “owner” of that particular K in a topology
Consistent Hashing
Quick Start Lab - JBoss Data Grid41
Ns (nodes) on the “wheel” partition the hash space in segments
Every segment contains a range of Ks
Consistent Hashing
Quick Start Lab - JBoss Data Grid43
N1 Node 1
N2
N3
Node 2
Node 3
K250 owner = N2
Consistent Hashing
Quick Start Lab - JBoss Data Grid44
N1 Node 1
N2
N3
Node 2
Node 3
K250
K570
K700
K900K53
Consistent Hashing
Quick Start Lab - JBoss Data Grid45
Going clockwise from the K:
•the first N is the owner•next N is the replica•next next N could be another replica, and so on
Consistent Hashing
Quick Start Lab - JBoss Data Grid46
N1 Node 1
N2
N3
Node 2
Node 3
K250
K570
K700
K900K53
owner = N2replica = N3
Consistent Hashing
Quick Start Lab - JBoss Data Grid48
N1 Node 1
N3
Node 2
Node 3
K250
K570
K700
K900K53
owner = N2replica = N3
Consistent Hashing
Quick Start Lab - JBoss Data Grid50
N1 Node 1
N3Node 3
K250
K570
K700
K900K53
owner = N3replica = N1
Consistent Hashing
Quick Start Lab - JBoss Data Grid51
The real CH algorithm implemented in JDG is slightly differentCH is optimized to minimize state transfer (i.e. number of keys moving when a node dies or a new one joins the cluster)
Consistent Hashing
Quick Start Lab - JBoss Data Grid52
• Big Data & NoSQL: super quick introduction to terminology • What developers do to scale out • Consistent Hashing • What’s a Data Grid • DEMO • Infinispan/JDG features • Q&A
Agenda
Quick Start Lab - JBoss Data Grid53
Distributed Memory Storage EngineNetworked MemoryA Distributed Cache “on steroids”A Transactional NoSQL
What’s a Data Grid?
Quick Start Lab - JBoss Data Grid54
•Key/Value storage•Search Engine (from K/V to Document storage)
•Linear Scalability, Elasticity and Fault tolerance•Thanks to CH
•Memory based•Persistence engines are optional
What’s a Data Grid?
Quick Start Lab - JBoss Data Grid55
•Different Topologies•Querying•Task Execution & Map/Reduce•Partition Handling•Data Affinity (to squeeze every bit of performance)
Data Grid > Distributed Caching
Quick Start Lab - JBoss Data Grid56
•LOCAL•INVALIDATION•REPLICATED•DISTRIBUTED
JDG Cache Topologies (Cluster modes)
Quick Start Lab - JBoss Data Grid57
•LOCAL•simple cache (EHCache-like)
•INVALIDATION•REPLICATED•DISTRIBUTED
JDG Cache Topologies (Cluster modes)
Quick Start Lab - JBoss Data Grid58
•LOCAL•INVALIDATION
•no sharing•REPLICATED•DISTRIBUTED
JDG Cache Topologies (Cluster modes)
Quick Start Lab - JBoss Data Grid59
•LOCAL•INVALIDATION•REPLICATED
•All node are equals•4 Nodes @ 8 GB = 8 GB
•DISTRIBUTED
JDG Cache Topologies (Cluster modes)
Quick Start Lab - JBoss Data Grid60
•LOCAL•INVALIDATION•REPLICATED•DISTRIBUTED
•For example: 1 Replica•4 Nodes @ 8 GB = 16 GB
JDG Cache Topologies (Cluster modes)
63
JDG 1 JDG 2 JDG 3 JDG 4
K0
K1K6
K3
K8
K2
K4
K9
K5K7
K5
K2 K9
K7
K4K3
K1K0
K8
K6
Distributed with Replica
64
JDG 1 JDG 2 JDG 3 JDG 4
K0K1
K6
K3
K8
K2
K4
K9K5
K7
K0K1
K6
K3
K8
K2
K4
K9K5
K7
K0K1
K6
K3
K8
K2
K4
K9K5
K7
K0K1
K6
K3
K8
K2
K4
K9K5
K7
Replicated
Quick Start Lab - JBoss Data Grid65
•Replicated:•“Small” set of data with high % of reads vs writes
•Distributed: •“Big” set of data: linear scaling•You need M/R & Distexec
How do I choose?
Quick Start Lab - JBoss Data Grid66
•You can have different Cache configurations in the same CacheManager•mix&match Replicated and Distributed as needed
JDG Cache Topologies (Cluster modes)
Quick Start Lab - JBoss Data Grid67
•Default hashing (Distributed mode): MurmurHash3.
•It’s a simple and standard Hashing:•you can change it as you like, f.e. if your key already identifies a partitioning criteria
Tuning your hashing
Quick Start Lab - JBoss Data Grid68
•Can be “fine tuned” in 4 different ways:•Server Hinting•Virtual Servers•Grouping•Key Affinity
Tuning your hashing
Quick Start Lab - JBoss Data Grid69
•A triple (site, rack, server)•You increase availability avoiding that replicas ends up in the same (site, rack, server) of the master
Server Hinting
Quick Start Lab - JBoss Data Grid70
•Number of di “segments” in which the cluster is partitioned
•Improve the node distribution on the hashing wheel to have a better distribution of keys
•Default: 60
Virtual Servers
Quick Start Lab - JBoss Data Grid71
•Data colocation•A cache node contains K but also other relevant data afferent to K•Example: customer and its bank movements
•You just have to define a group, JDG will colocate all data of the same group in the same node
Grouping
Quick Start Lab - JBoss Data Grid72
•Like Grouping, but from another perspective:•You just ask a node for a key that will be hashed on that node
•Grouping/Affinity are your best friends if you want to reach JDG Nirvana!
Key Affinity
Quick Start Lab - JBoss Data Grid73
•All data needed by a node of your application are local, at the distance of a single Java method call
JDG Nirvana
Quick Start Lab - JBoss Data Grid74
• Big Data & NoSQL: super quick introduction to terminology • What developers do to scale out • Consistent Hashing • What’s a Data Grid • DEMO • Infinispan/JDG features • Q&A
Agenda
Quick Start Lab - JBoss Data Grid75
•Small self-contained projects that can be used to simply explain JDG to customers
•https://github.com/redhat-italy/jdg-quickstarts
JDG Quickstarts
Quick Start Lab - JBoss Data Grid76
• Big Data & NoSQL: super quick introduction to terminology • What developers do to scale out • Consistent Hashing • What’s a Data Grid • DEMO • Infinispan/JDG features • Q&A
Agenda
Quick Start Lab - JBoss Data Grid77
•If JDG detects a split brain, partitions enter in degraded mode
•A degraded partition can read/write ONLY fully owned keys•A partition fully owns a key if contains master and replicas nodes for that key
•You’ll get an AvailabilityException for other keys
Partition Handling
Quick Start Lab - JBoss Data Grid78
•Cache Store•Not only in memory!•Write through & write behind (ACK sync or async)
•Pluggable “drivers”•File System, JPA, LevelDB (supported)•MongoDB, Cassandra, BerkeleyDB, etc. (community)
Persistence
Quick Start Lab - JBoss Data Grid79
•To avoid Out Of Memory•Entry can be “passivated” on disk (you’ll need a CacheStore)
Eviction
Quick Start Lab - JBoss Data Grid80
•To avoid Out Of Memory•Entry can be “passivated” on disk (you’ll need a CacheStore)
Eviction
Quick Start Lab - JBoss Data Grid81
•You assign a lifespan or a max idle time to a key
•The key will then be automatically removed after that time
•You don’t need to write “Garbage Clean code”
Expiry
Quick Start Lab - JBoss Data Grid83
•Both avoid Out Of Memory•“Evicted” data can be maintained in the Grid with Passivation
•Eviction is a Cache configuration•Expiration is a Key configuration•Expiration could be a business requisite•Eviction is a system feature
Eviction/Expiry: differences
Quick Start Lab - JBoss Data Grid84
•JDG has full support for transactions•Local Transactions•Global Transactions (XA): if running inside an AS automatically uses its TX Manager
•Batching API
Transactions
Quick Start Lab - JBoss Data Grid85
•Cache/CacheManager events•Topology changes•Entries being added, removed, modified•Cluster listeners
Listener/Notifications
Quick Start Lab - JBoss Data Grid86
•Infinispan-query module•Hibernate Search & Lucene •Querying via DSL•Lucene indexes can be kept in memory, on disk or in the grid
Querying the grid
Quick Start Lab - JBoss Data Grid87
•with M/R you can implement distributed global operation on the grid
•Each node works on its data (Map)•Results are later aggregated (Reduce)
Map/Reduce
Quick Start Lab - JBoss Data Grid90
•JDG 7 will implement HDFS API•So it will be able to act as a super fast Hadoop store
Hadoop, coming soon…
Quick Start Lab - JBoss Data Grid91
•With Distexec you can submit “tasks” to the Grid
•The task can be executed on each node or on a subset of the nodes
•The task can modify data in the Grid
Distributed Execution (Distexec)
Quick Start Lab - JBoss Data Grid93
•“Follow the Sun” architectures•Many different clusters that can be kept in sync
Cross Site Replication
Quick Start Lab - JBoss Data Grid94
•JSR-107•Java Temporary Caching API•Confirmed in January 2015•In roadmap for JDG 6.5
•JSR-347•Data Grids for the Java Platform•JSR Retired in January 2015
Standard APIs
Quick Start Lab - JBoss Data Grid96
•User Authentication•SASL•Role Based Access Control (RBAC)
•Users, Roles and mapping between roles and operations on Cache / Cache-Manager
•Node Authentication & Authorisation •Encrypted communication between nodes
Data Security
Quick Start Lab - JBoss Data Grid97
•Library mode•Embedded in your JVM
•C/S mode•REST•Memcached•Hot Rod
Embedded vs Client/Server
Quick Start Lab - JBoss Data Grid99
Protocol Client Libs
Smart Routing
Load Balancing/Failover
TX Listeners M/R Dist Querying Separated Cluster
Library mode
inVM N/A Yes Dynamic Yes Yes Yes Yes Yes No
REST Text HTTP NoAny HTTP
load balancer
No No No No No Yes
Memcached Text Many No Predefined server list
No No No No No Yes
Hot Rod BinaryJava/
Python/C++
Yes Dynamic Local w MVCC
Yes (6.4) No No Yes (6.3) Yes
Protocol Comparison
ROME 27-28 march 2015 – Ugo Landini
Thank You! Leave your feedback on Joind.in!https://joind.in/event/view/3347
Quick Start Lab JBoss Data Grid
Ugo LandiniSenior Solution [email protected] March 26th 2015