Scylla db deck, july 2017
-
Upload
dor-laor -
Category
Engineering
-
view
223 -
download
3
Transcript of Scylla db deck, july 2017
Dor Laor, CEO, co-founder
ScyllaDB - Achieve No-Compromise Performance and Availability
+ Intro: Who we are, what we do, who uses it+ Why ScyllaDB+ Why should you care+ How we made design decisions to achieve no-compromise
performance and availability
Today we will cover:
Introduction+ Founded by KVM hypervisor creators+ Q2 2014 - Pivot to the database world+ Q3 2015 - Decloak during Cassandra Summit 2015, Beta+ Q1 2016 - General Availability+ Q3 2016 - First Scylla Summit: 100+ Attendees+ Q1 2017 - Completed B round+ $25MM in funding+ HQs: Palo Alto, CA; Herzelia, Israel+ 42+ employees, hiring!
What we do: Scylla, towards the best NoSQL
+ > 1 million OPS per node
+ < 1ms 99% latency
+ Auto tuned
+ Scale up and out
+ Open source
+ Large community (piggyback on Cassandra)
+ Blends in the ecosystem- Spark, Presto, time series, search, ..
Where Scylla is deployed?
+ Intro: Who we are, what we do, who uses it+ Why we started ScyllaDB+ Why should you care+ How we made design decisions to achieve no-compromise
performance and availability
Today we will cover:
Why ?@#$%$$%^?
Scylla benchmark by Samsung
op/s
Why we started Scylla? (cont)
+ Originally it was about performance/efficiency only+ Over time, we understood we can deliver more:
+ SLA between background and foreground tasks+ Work well on any given hardware {back pressure}+ Deliver consistent, low 99th percentile latency+ Reduction in admin effort+ Low latency under the face of failures (hot cache load balancing)+ High observability
Cassandra ScyllaThroughput: Cannot utilize multi-core efficiently Scales linearly - shard-per-core
Latency: High due to Java and JVM’s GC Low and consistent - own cache
Complexity: Intricate tuning and configuration Auto tuned, dynamic scheduling
Admin: Maintenance impacts performance SLA guarantee for admin vs serving
+ Outbrain is the world’s largest content discovery platform.
+ Over 557 million unique visitors from across the globe.
+ 250 billion personalized content recommendations every month.
Case study: Document column family
+ First read from memcached, go to Cassandra on misses.+ Pain
+ Stale data from cache+ Complexity+ Cold cache -> C* gets full volume
ReadMicroservice
Write Process
Outbrain: Cassandra plus Memcache
+ Writes are written in parallel to C* and Scylla+ Reads are done in parallel:
+ Memcached + Cassandra+ Scylla (no cache at all)
Scylla/Cassandra side by side deployment
ReadMicroservice
Write Process
Scylla (w/o cache) vs Cassandra + MemcachedScylla Cassandra Diff%
Requests/Minute
12,000,000 500,000(memcache
handles 11,500,000)
24X
AVG Latency
4 ms 8 ms 2X
Max Latency
8 ms 35 ms 3.5X
Hardware 9 machines 30+9 machines
4.3X
14
Cassandra’s latency
Scylla’s latency
What does it mean for a non Cassandra user?
+ Throughput, latency and scale benefits+ Wide range of big data integration: {Kariosdb, Spark,
JanusGraph, Presto, Kafka, Elastic}+ Best HA/DR in the industry. + Stop using caches in front of the database+ Consolidate HBase, Redis, MySQL, Mongo and others
Assorted Quotes
+ Intro: Who we are, what we do, who uses it+ Why we started ScyllaDB+ Why should you care+ How we made design decisions to achieve no-compromise
performance and availability
Today we will cover:
Design decisions: #1 The trivials
+ SSTable file format+ Configuration file format+ CQL language+ CQL native protocol+ JMX management protocol+ Management command line
Design decisions: #2 Compatibility
Double cluster - Migration w/o downtime
AppCassandra
Scylla
CQLproxy
Design decisions: #3 All things async
Threads Shards
Design decisions: #4 Shard per core
SCYLLA DB: Network Comparison
Kernel
Cassandra
TCP/IPScheduler
queuequeuequeuequeuequeuethreads
NICQueues
Kernel
Traditional stack SeaStar’s sharded stack
Memory
Lock contentionCache contentionNUMA unfriendly
Application
TCP/IP
Task Schedulerqueuequeuequeuequeuequeuesmp queue
NICQueue
DPDK
Kernel (isn’t
involved)
Userspace
Application
TCP/IP
Task Schedulerqueuequeuequeuequeuequeuesmp queue
NICQueue
DPDK
Kernel (isn’t
involved)
Userspace
Application
TCP/IP
Task Schedulerqueuequeuequeuequeuequeuesmp queue
NICQueue
DPDK
Kernel (isn’t
involved)
Userspace
No contentionLinear scalingNUMA friendly
CoreDatabase
Task Scheduler
queuequeuequeuequeuequeuesmp queue
NICQueue
Userspace
Scylla has its own task schedulerTraditional stack Scylla’s stack
Promise
Task
Promise
Task
Promise
Task
Promise
Task
CPU
Promise
Task
Promise
Task
Promise
Task
Promise
Task
CPU
Promise
Task
Promise
Task
Promise
Task
Promise
Task
CPU
Promise
Task
Promise
Task
Promise
Task
Promise
Task
CPU
Promise
Task
Promise
Task
Promise
Task
Promise
Task
CPU
Promise is a pointer to eventually computed value
Task is a pointer to a lambda function
Scheduler
CPU
Scheduler
CPU
Scheduler
CPU
Scheduler
CPU
Scheduler
CPU
Thread
Stack
Thread
Stack
Thread
Stack
Thread
Stack
Thread
Stack
Thread
Stack
Thread
Stack
Thread
Stack
Thread is a function pointer
Stack is a byte array from 64k to megabytes
Context switch cost is
high. Large stacks pollutes
the caches No sharing, millions of
parallel events
SCYLLA IS DIFFERENT
❏ DMA❏ Log structured❏ merge tree❏ DBaware cache❏ Userspace I/O❏ scheduler
❏ NUMA friendly❏ Log structured
allocator❏ Zero copy
❏ Thread per core❏ lock-free❏ Task scheduler❏ Reactor programing❏ C++14
❏ Multi queue ❏ Poll mode❏ Userspace
TCP/IP
Scylla vs C* latency by
Design Decision: #5 Unified cache vs C* cache plethora Cassandra Scylla
Key cache
Row cache
On-heap /Off-heap
Linux page cache
SSTables
Unified cache
SSTables
Complex Tuning
Unified cache vs Cassandra cache plethora Cassandra Scylla
Key cache
Row cache
On-heap /Off-heap
Linux page cache
SSTables
Unified cache
SSTables
SSTable page (4k)
Your data (300b)
Parasitic Rows - 300b/4kb
Cassandra Scylla
Key cache
Row cache
On-heap /Off-heap
Linux page cache
SSTables
Unified cache
SSTables
App thread
Kernel
SSD
Page faultSuspend thread
Initiate I/OContext switch
I/O completesInterruptContext switch
Map pageResume threadPage
fault
Unified cache vs Cassandra cache plethora
Cassandra Streaming configuration
Design decisions: #6 I/O scheduler
Scylla I/O Scheduling
Query
Commitlog
Compaction
Queue
Queue
Queue
UserspaceI/O
SchedulerDisk
Max useful disk concurrency
I/O queued in FS/deviceNo queues
I/O scheduler result
Memtable
Seastar SchedulerCompaction
Query
Repair
Commitlog
SSD
Compaction Backlog Monitor
Memory Monitor
Adjust priority
Adjust priority
WAN
CPU
Design Decision: #8..#10 Workload conditioning
Workload Conditioning in practice
Disk can’t keep up:workload conditioning will figure out the right request rate
+ Intro: Who we are, what we do, who uses it+ Why we started ScyllaDB+ How we made design decisions to achieve no-compromise
performance and availability
Today we will cover:
+ Enterprise release, based on 1.6+ 1.7 - May 2017
+ Counters+ New intra-node sharding algorithm+ SStableloader from 2.2/3.x+ Debian
+ 2.0 - August 2017+ Materialized views+ Execution blocks (cpu cache optimization which boost performance)+ Partial row cache (for wide row streaming)
Upcoming releases
Vertical HorizontalCore database
Scylla Beyond Cassandra
Q&A
Resources
slideshare.net/ScyllaDB
[email protected] (@DorLaor)
[email protected] (@AviKivity)
@scylladb
http://bit.ly/2oHAfok
youtube.com/c/scylladbscylladb.com/blog
NoSQL GOES NATIVE
Thank you.