Infinispan, a distributed in-memory key/value data grid and cache

Post on 15-Jul-2015

195 views 4 download

Transcript of Infinispan, a distributed in-memory key/value data grid and cache

Infinispan

Distributed in-memory key/value data grid and cache

@infinispan

Agenda

• Introduction

• Part 1

• Hash Tables

• Distributed Hash Tables

• Consistent Hashing

• Chord Lookup Protocol

• Part 2

• Data Grids

• Infinispan

• Architecture

• Consistent Hashing / Split Clusters

• Other features

Part I – A (very) short introduction to distributed hash tables

Hash Tables

Source: Wikipedia http://commons.wikimedia.org/wiki/File:Hash_table_5_0_1_1_1_1_1_LL.svg#/media/File:Hash_table_5_0_1_1_1_1_1_LL.svg

Distributed Hash Tables (DHT)

Source: Wikipedia - http://commons.wikimedia.org/wiki/File:DHT_en.svg#/media/File:DHT_en.svg

• Decentralized Hash Table functionality

• Interface

• put(K,V)

• get(K) -> V

• Nodes can fail, join and leave

• The system has to scale

Distributed Hash Tables (DHT)

• Flooding in N nodes

• put() – store in any node O(1)

• get() – send query to all nodes O(N)

• Full replication in N nodes

• put() – store in all nodes O(N)

• get() – check any node O(1)

Simple solutions

Fixed Hashing

NodeID = hash(key) % TotalNodes.

Fixed Hashing with High Availability

NodeID = hash(key) % TotalNodes.

Fixed Hashing and Scalability

NodeID = hash(key) % TotalNodes+1.

2 Nodes, Key Space={0,1,2,3,4,5}

NodeID = hash(key) % 2.

NodeID = hash(key) % 3.

N0 (key mod 2 = 0) N1 (key mod 2 = 1)

0,2,4 1,3,5

N0 (key mod 3 = 0) N1 (key mod 3 = 1) N2 (key mod 3 = 2)

0,3 1,4 2,5

Consistent Hashing

Consistent Hashing – The Hash Ring

0

N0

N1

N2

K1

K2

K3

K4

K5

K6

Consistent Hashing – Nodes Joining, Leaving

Source: http://www.griddynamics.com/distributed-algorithms-in-nosql-databases/

Chord: Peer-to-peer Lookup Protocol

• Load Balance – distributed hash function, spreading keys evenly over nodes

• Decentralization – fully distributed no SPOF

• Scalability – logarithmic growth of lookup cost with the number of nodes, large systems are feasible

• Availability – automatically adjusts its internal tables to ensure the node responsible for a key is always found

• Flexible naming – key-space is flat (flexibility in how to map names to keys)

Chord – Lookup O(N)

Source: Chord: A Scalable Peer-to-peer Lookup Protocol for Internet ApplicationsIon Stoica , Robert Morrisz, David Liben-Nowellz, David R. Kargerz, M. Frans Kaashoekz, Frank Dabekz, Hari Balakrishnanz

Chord – Lookup O(logN)

Source: Chord: A Scalable Peer-to-peer Lookup Protocol for Internet ApplicationsIon Stoica , Robert Morrisz, David Liben-Nowellz, David R. Kargerz, M. Frans Kaashoekz, Frank Dabekz, Hari Balakrishnanz

• K=6 (0, 26−1)

• Finger[i] = first node that succeeds

(N+ 2𝑖−1) mod 2K , where 1 ≤ 𝑖 ≤ 𝐾

• Successor/Predecessor – the next/previous node on circle

Chord – Node Join

Source: Chord: A Scalable Peer-to-peer Lookup Protocol for Internet ApplicationsIon Stoica , Robert Morrisz, David Liben-Nowellz, David R. Kargerz, M. Frans Kaashoekz, Frank Dabekz, Hari Balakrishnanz

• Node 26 joins the system between nodes 21 and 32.

• (a) Initial state: node 21 points to node 32;

• (b) node 26 finds its successor (i.e., node 32) and points to it;

• (c) node 26 copies all keys less than 26 from node 32;

• (d) the stabilize procedure updates the successor of node 21 to node 26.

• CAN (Hypercube), Chord (Ring), Pastry (Tree+Ring), Tapestry (Tree+Ring), Viceroy, Kademlia, Skipnet, Symphony (Ring), Koorde, Apocrypha, Land, Bamboo, ORDI …

The world of DHTs …

Part II – A short introduction to Infinispan

Where do we store data?One size does not fit all...

Infinispan – History

• 2002 – JBoss App Server needed a clustered solution forHTTP and EJB session state replication for HA clusters.JGroups (open source group communication suite) had areplicated map demo, expanded to a tree data structure,added eviction and JTA transactions.

• 2003 – this was moved to JBoss AS code base

• 2005 – JBoss Cache was extracted and became a standalone project

… JBoss Cache evolved into Infinispan, core parts redesigned

• 2009 – JBoss Cache 3.2 and Infinispan 4.0.0.ALPHA1 was released

• 2015 - 7.2.0.Alpha1

• Check the Infinispan RoadMap for more details

Code?

<dependency>

<groupId>org.infinispan</groupId>

<artifactId>infinispan-embedded</artifactId>

<version>7.1.0.Final</version>

</dependency>

EmbeddedCacheManager cacheManager = new DefaultCacheManager();

Cache<String,String> cache = cacheManager.getCache();

cache.put("Hello", "World!");

Usage Modes

• Embedded / library mode

• clustering for apps and frameworks (e.g. JBosssession replication)

• Local mode single cache

• JSR 107: JCACHE - Java Temporary Caching API

• Transactional local cache

• Eviction, expiration, write through, write behind, preloading, notifications, statistics

• Cluster of caches

• Invalidation, Hibernate 2nd level cache

• Server mode – remote data store

• REST, MemCached, HotRod, WebSocket (*)

Code?

Configuration config = new ConfigurationBuilder()

.clustering()

.cacheMode(CacheMode.DIST_SYNC)

.sync()

.l1().lifespan(25000L)

.hash().numSegments(100).numOwners(3)

.build();

Configuration config = new ConfigurationBuilder()

.eviction()

.maxEntries(20000).strategy(EvictionStrategy.LRU)

.expiration()

.wakeUpInterval(5000L)

.maxIdle(120000L)

.build();

Infinispan – Core Architecture

Remote App 1 (C++) Remote App 2 (Java) Remote App 3 (.NET)

Network (TCP)

Node (JVM)

MemCached, HotRod, REST, WebSocket (*)

Embedded App (Java)

Transport (JGroups)

NotificationTransactions / XA

QueryMap / Reduce

Monitoring

Storage Engine(RAM +

Overflow)

Node (JVM)

MemCached, HotRod, REST, WebSocket (*)

Embedded App (Java)

Transport (JGroups)

NotificationTransactions / XA

QueryMap / Reduce

Monitoring

Storage Engine(RAM +

Overflow)

TCP/UDP

Infinispan Clustering and Consistent Hashing

• JGroups Views

• Each node has a unique address

• View changes when nodes join, leave

• Keys are hashed using MurmurHash3 algorithm

• Hash Space is divided into segments

• Key > Segment > Owners

• Primary and Backup Owners

Does it scale?

• 320 nodes, 3000 caches, 20 TB RAM

• Largest cluster formed: 1000 nodes

Empty Cluster

CLUSTER

Add 1 Entry

CLUSTER

K1

Primary and Backup

CLUSTER

K1

K1

Add another one

CLUSTER

K1

K1

K2

Primary And Backup

CLUSTER

K1

K1

K2K2

A cluster with more keys

CLUSTER

K1

K1

K2K2

K3

K3K4

K4

K5

K5

A node dies…

CLUSTER

K1

K1

K2K2

K3

K3K4

K4

K5

K5

The cluster heals

CLUSTER

K1

K1

K2K2

K3 K3

K4

K4

K5

K5

If multiple nodes fail…

• CAP Theorem to the rescue:

• Formulated by Eric Brewer in 1998

• C - Consistency

• A - High Availability

• P - Tolerance to Network Partitions

• Can only satisfy 2 at the same time:

• Consistency + Availability: The Ideal World where network partitions do not exist

• Partitioning + Availability: Data might be different between partitions

• Partitioning + Consistency: Do not corrupt data!

Infinispan Partition Handling Strategies

• In the presence of network partitions

• Prefer availability (partition handling DISABLED)

• Prefer consistency (partition handling ENABLED)

• Split Detection with partition handling ENABLED:

• Ensure stable topology

• LOST > numOwners OR no simple majority

• Check segment ownership

• Mark partition as Available / Degraded

• Send PartitionStatusChangedEvent to listeners

Cluster Partitioning – No data lost

K1

K1

K2K2

K3

K3K4

K4

K5

K5

Partition1 Partition2

Cluster Partitioning – Lost data

K1

K1

K2K2

K3

K3K4

K4

K5

K5

Partition1

Partition2

Merging Split Clusters

• Split Clusters see each other again

• Step1: Ensure stable topology

• Step2: Automatic: based on partition state

• 1 Available -> attempt merge

• All Degraded -> attempt merge

• Step3: Manual

• Data was lost

• Custom listener on Merge

• Application decides

Querying Infinispan

• Apache Lucene Index

• Native Query API (Query DSL)

• Hibernate Search and Apache Lucene to index and search

• Native Map/Reduce

• Index-less

• Distributed Execution Framework

• Hadoop Integration (WIP)

• Run existing map/reduce jobs on Infinispan data

Map Reduce:

MapReduceTask<String, String, String, Integer> mapReduceTask

= new MapReduceTask<>(wordCache);

mapReduceTask

.mappedWith(new WordCountMapper())

.reducedWith(new WordCountReducer());

Map<String, Integer> wordCountMap = mapReduceTask.execute();

Query DSL:

QueryParser qp = new QueryParser("default", new

StandardAnalyzer());

Query luceneQ = qp

.parse("+station.name:airport +year:2014 +month:12

+(avgTemp < 0)");

CacheQuery cq = Search.getSearchManager(cache)

.getQuery(luceneQ, DaySummary.class);

List<Object> results = query.list();

Other features

• JMX Management

• RHQ (JBoss Enterprise Management Solution)

• CDI Support

• JSR 107 (JCACHE) integration

• Custom interceptors

• Runs on Amazon Web Services Platform

• Command line client

• JTA with JBoss TM, Bitronix, Atomikos

• GridFS (experimental API), CloudTM, Cross Site Replication

DEMO

Q & A

Thank you!

Resources:

http://www.griddynamics.com/distributed-algorithms-in-nosql-databases/http://www.infoq.com/articles/cap-twelve-years-later-how-the-rules-have-changedhttp://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdfhttp://pdos.csail.mit.edu/papers/ton:chord/paper-ton.pdfhttp://www.martinbroadhurst.com/Consistent-Hash-Ring.html

http://infinispan.org/docs/7.2.x/user_guide/user_guide.htmlhttps://github.com/infinispan/infinispan/wiki