Predictable Big Data Performance in Real-time
-
Upload
aerospike-inc -
Category
Technology
-
view
1.014 -
download
0
description
Transcript of Predictable Big Data Performance in Real-time
Predictable BiG data
Performance in Real-time
Srini V. SRINIVASAN
17th BIG data London MeetupAPRIL 22, 2013
Response time: Hours, WeeksTB to PBRead Intensive
TRANSACTIONS (OLTP)
Response time: SecondsGigabytes of data
Balanced Reads/Writes
ANALYTICS (OLAP)
STRUCTURED DATA
Response time: Seconds
Terabytes of dataRead Intensive
© 2013 Aerospike. All rights reserved. Confidential Pg. 2
BIG DATA ANALYTICS
Real-time TransactionsResponse time: < 10 ms1-20 TBBalanced Reads/Writes24x7x365 Availability
UNSTRUCTURED DATA
REAL-TIME BIG DATA
Database Landscape
Requirements for Internet Enterprises1. Know who the Interaction is
with Monitor 200+ Million US Consumers,
5+ Billion mobile devices and sensors
2. Determine intent based on current context
Page views, search terms, game state, last purchase, friends list, ads served, location
3. Respond now, use big data for more accurate decisions
Display the most relevant Ad Recommend the best product Deliver the richest gaming
experience Eliminate fraud…
4. Service can NEVER go down!
© 2013 Aerospike. All rights reserved. Confidential Pg. 3
Challenges1. Handle extremely high rates of persistent
read/write transactions
2. Avoid hot spots to maintain tight latency SLAs
3. Provide immediate consistency with replication
4. Allow long running tasks with transactions
5. Scale linearly as data sizes increase
6. Add capacity with no service interruption© 2013 Aerospike. All rights reserved. Pg. 4
Native Flash Performance
➤ Low Latency at High Throughput
© 2012 Aerospike. All rights reserved. Confidential Pg. 5
© 2013 Aerospike. All rights reserved. Confidential Pg. 6
“Only Aerospike was able to function in synchronous mode with a replication factor of two.. it is a significant advantage that Aerospike is able to function reliably on a smaller amount of hardware while still maintaining true consistency.”
Shared-Nothing Architecture
© 2013 Aerospike. All rights reserved. Pg. 7
OHIO Data Center
➤ Every node in a cluster is identical,
handles both transactions and long running tasks
➤ Data is replicated synchronously with immediate consistency within the cluster
➤ Data is replicated asynchronously across data centers
Distributed Hash TableHow Data Is Distributed (Replication Factor 2)
➤ Every key is hashed into a 20 byte (fixed length) string using the RIPEMD160 hash function
➤ This hash + additional data (fixed 64 bytes)are stored in RAM in the index
➤ 4 bytes of this hash are used to compute the partition id
➤ There are 4096 partitions
➤ Partition id maps to node id based on cluster membership
© 2013 Aerospike. All rights reserved. Pg. 8
cookie-abcdefg-12345678cookie-abcdefg-12345678
182023kh15hh3kahdjsh182023kh15hh3kahdjsh
PartitionID
Master node
Replica node
… 1 4
1820 2 3
1821 3 2
4096 4 1
Organizing the cluster
➤ Automatic multicast gossip protocol for node discovery➤ Paxos consensus algorithm determines nodes in cluster➤ Ordered list of nodes determines data location➤ Data partitions balanced for minimal data motion➤ Vote initiated and terminated in 100 milliseconds
© 2013 Aerospike. All rights reserved. Pg. 9
How it Works
1. Write sent to row master
2. Latch against simultaneous writes
3. Apply write to master memory and replica memory synchronously
4. Queue operations to disk
5. Signal completed transaction (optional storage commit wait)
6. Master applies conflict resolution policy (rollback/ rollforward)
© 2013 Aerospike. All rights reserved. Pg. 10
master replica
1. Cluster discovers new node via gossip protocol
2. Paxos vote determines new data organization
3. Partition migrations scheduled
4. When a partition migration starts, write journal starts on destination
5. Partition moves atomically
6. Journal is applied and source data deleted
transactions continue
Writing with Immediate Consistency Adding a Node
Intelligent Client Shields Applications from the Complexity of the Cluster
➤ Implements Aerospike API
➤ Optimistic row locking➤ Optimized binary
protocol
➤ Cluster tracking Learns about cluster
changes, partition map Gossip protocol
➤ Transaction semantics Global transaction ID Retransmit and timeout
© 2013 Aerospike. All rights reserved. Pg. 11
Cross Data Center Replication (XDR)➤ Asynchronous replication for long link
delays and outages➤ Namespace is configured to replicate
to a destination cluster – master / slave, including star and ring
➤ Replication process Transaction journal on partition master
and replica XDR process writes batches to destination Transmission state shared with source
replica Retransmission in case of network fault When data arrives back at originating
cluster, transaction ID matching prevents subsequent application and forwarding
➤ In master / master replication, conflict resolution via multiple versions, or timestamp
© 2013 Aerospike. All rights reserved. Confidential Pg. 12
Multi-core Optimization Right Architecture
Shared nothing In-memory (or multiple SSDs) Tight code loop Lock free isolation
OS, Programming Language, Libraries Modern Linux kernel C language Use epoll
Tweaks Pin threads to processor cores IRQ affinity settings for NIC CPU Socket Isolation via pairing of CPU to NIC
Russ’s 10 Ingredient Recipe for Making 1 Million TPS on $5K Hardware
© 2013 Aerospike. All rights reserved. Pg. 13
Flash-optimized Storage Layer➤ Direct device access
Direct attach performance Data written in flash optimal
large block patterns All indexes in RAM for low
wear Constant background
defragmentation Log structured file system,
“copy on write” Clean restart through shared
memory
➤ Random distribution using hash does not require RAID hardware
© 2013 Aerospike. All rights reserved. Pg. 14
…
SSD performance varies widely•Aerospike has a certified hardware list•Free SSD certification tool, CIO, is also available
Native Flash 17x better TCO“…data-in-DRAM implementations like SAP HANA..should be bypassed… ..current leading data-in-flash database for transactional analytic apps is Aerospike.” - David Floyer, CTO, Wikibon
© 2012 Aerospike. All rights reserved. Confidential | Pg. 15
$$$
http://wikibon.org/wiki/v/Data_in_DRAM_is_a_Flash_in_the_Pan
Case studies
Proven in Production➤ AppNexus - #2 RTB after
Google 27 Billion auctions per day 600+ QPS Aerospike servers in 6 clusters
in 3 data centers
➤ Chango – #2 Search after Google
Sees more Searches than Yahoo! + bing
Data on 300 Million users
➤ TradeDesk – first Ad Exchange Facebook Exchange partner FBX serves 25% of Ads on the
Internet 1200% growth in 2012
“Aerospike has operated without interruptions and easily scaled to meet our performance demands.” – Mike Nolet, CTO, AppNexus
© 2013 Aerospike. All rights reserved. Confidential Pg. 17
Proven in Production➤ eXelate – Data on 500 Million
users
Online data plus Nielsen, Mastercard, Autobytel, Bizo data..
Data on 400 million users 20 Billion Transactions per month 4x2 TB data per cluster 4 clusters across 4 data centers
“Scale. Real-time performance. Real-time replication at 4 datacenters. Aerospike delivered.”- Elad Efraim, eXelate CTO
➤ BlueKai – Serves half the Fortune 30
#1 Data Exchange 2 Trillion Transactions per month
© 2013 Aerospike. All rights reserved. Confidential Pg. 18
Fast? Scale & Never Fail?
➤ Cluster-aware Client Layer➤ Per Node Optimizations
Thread-core-pinning Real-time prioritization
➤ Extremely efficient primary index scheme Index in DRAM 64 byte index entry size Kernel quality C code;
no degradation due to Java garbage collection
➤ Flash-optimized Data Layer
➤ Shared-nothing Distribution Layer Intelligent data
migration and re-balancing
Smart data expiration and eviction
Rolling upgrades and background backups
➤ Cross Datacenter Replication (XDR)
What makes Aerospike…
➤ © 2013 Aerospike. All rights reserved. Pg. 19
Mission➤ Build the Modern Real-time Data Platform
1. Scaling the Internet of Everything2. Pushing the limits of modern hardware3. No data loss and No downtime
© 2013 Aerospike. All rights reserved. Confidential Pg. 20
Publish & Subscribe
• ASQL & NoSQL• Powerful Aggregations
(MapReduce++)
• ASQL & NoSQL• Powerful Aggregations
(MapReduce++)
• Secondary Index Queries
Transactions
• User Defined Functions (UDF)
SecurityEncryptionCompressi
on
AEROSPIKE REAL-TIME DATA DATA PLATFORM
• Distribution - Shared Nothing, ACID, Scale-out, Multiple datacenters
• Data Types – Int, Str, Blob, List, Map, Large Stack, Large Set, Large List
• Storage– DRAM, SSD, HDD
How to get Aerospike?
Free Community Edition Enterprise Edition➤ For developers
looking for speed and stability and transparently scale as they grow All features for
2 nodes, 100GB 1 cluster 1 datacenter
Community support
➤ For mission critical apps needing to scale right from the start Unlimited number of
nodes, clusters, data centers
Cross data center replication
Premium 24x7 support Priced by TBs of
unique data (not replicas)
➤ © 2013 Aerospike. All rights reserved. Pg. 21
Questions
© 2013 Aerospike. All rights reserved. Confidential Pg. 22