Post on 18-Jan-2017
Apache Geode,and Pivotal's leadership role
in open sourcing (Gemfire)
Nitin Lamba
(incubating)
Pivotal’s Open Source strategy
What is Apache Geode?
History
Differentiators
Basic Concepts
Resources
Q & A
Agenda
2
3
4
In 2015, Pivotal granted the components of its Big Data Suite to open source
6 Million Lines of Code4 new open source communities
5
May 2015 Sept 2015
Sept 2015Oct 2015
From GEMFIRE to GEODE…
6
A distributed, memory-based data management platform for data oriented apps that need:• high performance, scalability,
resiliency and continuous availability
• fast access to critical data sets• location-aware distributed data
processing• event-driven data architecture
What is GEODE?
7
• 1000+ systems in production (real customers)• Cutting edge use cases
Incubating but ROCK solid…
8
<2000 2004 2008 2012 2016
Early drivers• Data Volumes• Margins/ transactions• IT maintenance costs • Elasticity needs
Real-time needs• Real-time response• Time to market needs• Flexible Data Models • Persistent+In-memory
Global Data• Visibility across DC• Fast Ingest• Device to enterprise • Uptime (always on)
Open Source!• Apache Incubation• Gemfire > Geode• Geode M1 release• 1st Geode Summit
Financial Services
US DoDTrade Clearing
Travel Portal
Online Gambling
TelcosManufacturing
Auto InsurancePayroll processing
Rail systems
…with both SCALE and SPEED, …
9
40KTransactionsper second
3TB Data
in-memory
17B Records
in-memory
120KConcurrent
users
… and impacting a LOT of people!
10
China RailwayCorporation
Indian Railways
17%
19%
36%of the world population
High-level Architecture
11
Powerful app development kit• APIs: Java & REST• Adapters: Redis, Lucene*, Spark*, …
Multiple persistence options• Filesystem, RDBMS or HDFS*• Sync: read-through, write-through• Async: write-behind
Durable <K,V> cache/ store• Data replicated or partitioned• Redundant storage in-memory/ disk• Flexible data retention policiesÎ
!
Loca
tor
Serv
er
Serv
er
Serv
er
Serv
er +""""
"
$
%%%
&& &% % %% %% %%
&&
A Peer-2-Peer in-memory Distributed System
REST
!
* Experimental and waiting community feedback
• Minimize copying
• Minimize contention points
• Run user code in-process
• Partitioning & parallelism
• Avoid disk seeks
• Automated benchmarks
What makes it go FAST?
12
• Cache• Region• Member• Client Cache• Persistence• Functions
Let’s talk about a few BASIC CONCEPTS…
13
• In-memory storage and management for your data
• Configurable through XML, Java API or CLI
• Collection of Region
What is a CACHE?
14
• Distributed java.util.Map on steroids (Key/Value)
• Consistent API regardless of where or how data is stored
• Observable (reactive)
• Highly available, redundant on cache Member (s).
What is a REGION?
15
• Local, Replicated or Partitioned
• In-memory or persistent
• Redundant
• LRU
• Overflow
Region: Types & Options
16
LOCALLOCAL_HEAP_LRULOCAL_OVERFLOWLOCAL_PERSISTENTLOCAL_PERSISTENT_OVERFLOWPARTITIONPARTITION_HEAP_LRUPARTITION_OVERFLOWPARTITION_PERSISTENTPARTITION_PERSISTENT_OVERFLOWPARTITION_PROXYPARTITION_PROXY_REDUNDANTPARTITION_REDUNDANTPARTITION_REDUNDANT_HEAP_LRUPARTITION_REDUNDANT_OVERFLOWPARTITION_REDUNDANT_PERSISTENTPARTITION_REDUNDANT_PERSISTENT_OVERFLOWREPLICATEREPLICATE_HEAP_LRUREPLICATE_OVERFLOWREPLICATE_PERSISTENTREPLICATE_PERSISTENT_OVERFLOWREPLICATE_PROXY
• Durability
• WAL for efficient writing
• Consistent recovery
• Compaction
Persistent Regions
17
Server 1 Server N
• A process that has a connection to the system
• A process that has created a cache
• Embeddable within your application
What is a MEMBER?
18
Client
Locator
Server
• A process connected to the Geode server(s)
• Can have a local copy of the data
• Run OQL queries on local data
• Can be notified about events on the servers
What is a CLIENT CACHE?
19
Persistence - Shared Nothing
20
Server 3Server 2Server 1
Persistence - Shared Nothing
21
Server 3Server 2Server 1
B1
B3
B2
B1
B3
B2
Primary
Secondary
Persistence - Shared Nothing
22
Server 3Server 2Server 1
B1
B3
B2
B1
B3
B2
Primary
Secondary
Persistence - Shared Nothing
23
Server 3Server 2Server 1
B1
B3
B2
B1
B3
B2
Primary
Secondary
Persistence - Shared Nothing
24
Server 3Server 2Server 1
B1
B3
B2
B1
B3
B2
Primary
Secondary
B3
B2
Server 1 waits for others when it starts
Persistence - Shared Nothing
25
Server 3Server 2Server 1
B1
B3
B2
B1
B3
B2
Primary
Secondary
Fetches missed operations on restart
Persistence - Operational Logs
26
Create k1->v1
Create k2->v2
Modifyk1->v3
Create k4->v4
Modify k1->v5
Create k6->v6
Member 1Put k6->v6
Oplog2.crf
Oplog1.crf
Append to operation log
Persistence - Operational Logs: Compaction
27
Create k1->v1
Create k2->v2
Modifyk1->v3
Create k4->v4
Modify k1->v5
Create k6->v6
Member 1Put k6->v6
Oplog2.crf
Oplog1.crf
Append to operation log
Copy live data forward
• Used for distributed concurrent processing (Map/Reduce, stored procedure)
• Highly available
• Data oriented
• Member oriented
Functions
28
Functions
29
30
• Check out: http://geode.incubator.apache.org
• Subscribe: user-subscribe@geode.incubator.apache.org
• Download: http://geode.incubator.apache.org/releases/
Join the Community!
31
Thank you!
Additional Slides
32
Built for PERFORMANCE…
33
0
200,000
400,000
600,000
800,000
1,000,000
A Re
ads
A Up
date
s
B Re
ads
B Up
date
s
C Re
ads
D In
serts
D Re
ads
F Re
ads
F Up
date
s
Ope
ratio
ns p
er s
econ
d
YCSB Workloads
Cassandra Geode
…and horizontal, consistent SCALABILITY!
34
Horizontal scaling for reads, consistent latency and CPU
0.
4.5
9.
13.5
18.
0.
1.25
2.5
3.75
5.
6.25
2 4 6 8 10
Speedu
p
ServerHosts
speedup latency(ms) CPU%
• Scaled from 256 clients and 2 servers to 1280 clients and 10 servers• Partitioned region with redundancy and 1K data size
High Availability
35