Scylla db deck, july 2017

39
Dor Laor, CEO, co-founder ScyllaDB - Achieve No-Compromise Performance and Availability

Transcript of Scylla db deck, july 2017

Page 1: Scylla db deck, july 2017

Dor Laor, CEO, co-founder

ScyllaDB - Achieve No-Compromise Performance and Availability

Page 2: Scylla db deck, july 2017

+ Intro: Who we are, what we do, who uses it+ Why ScyllaDB+ Why should you care+ How we made design decisions to achieve no-compromise

performance and availability

Today we will cover:

Page 3: Scylla db deck, july 2017

Introduction+ Founded by KVM hypervisor creators+ Q2 2014 - Pivot to the database world+ Q3 2015 - Decloak during Cassandra Summit 2015, Beta+ Q1 2016 - General Availability+ Q3 2016 - First Scylla Summit: 100+ Attendees+ Q1 2017 - Completed B round+ $25MM in funding+ HQs: Palo Alto, CA; Herzelia, Israel+ 42+ employees, hiring!

Page 4: Scylla db deck, july 2017

What we do: Scylla, towards the best NoSQL

+ > 1 million OPS per node

+ < 1ms 99% latency

+ Auto tuned

+ Scale up and out

+ Open source

+ Large community (piggyback on Cassandra)

+ Blends in the ecosystem- Spark, Presto, time series, search, ..

Page 5: Scylla db deck, july 2017

Where Scylla is deployed?

Page 6: Scylla db deck, july 2017

+ Intro: Who we are, what we do, who uses it+ Why we started ScyllaDB+ Why should you care+ How we made design decisions to achieve no-compromise

performance and availability

Today we will cover:

Page 7: Scylla db deck, july 2017

Why ?@#$%$$%^?

Page 8: Scylla db deck, july 2017

Scylla benchmark by Samsung

op/s

Page 9: Scylla db deck, july 2017

Why we started Scylla? (cont)

+ Originally it was about performance/efficiency only+ Over time, we understood we can deliver more:

+ SLA between background and foreground tasks+ Work well on any given hardware {back pressure}+ Deliver consistent, low 99th percentile latency+ Reduction in admin effort+ Low latency under the face of failures (hot cache load balancing)+ High observability

Page 10: Scylla db deck, july 2017

Cassandra ScyllaThroughput: Cannot utilize multi-core efficiently Scales linearly - shard-per-core

Latency: High due to Java and JVM’s GC Low and consistent - own cache

Complexity: Intricate tuning and configuration Auto tuned, dynamic scheduling

Admin: Maintenance impacts performance SLA guarantee for admin vs serving

Page 11: Scylla db deck, july 2017

+ Outbrain is the world’s largest content discovery platform.

+ Over 557 million unique visitors from across the globe.

+ 250 billion personalized content recommendations every month.

Case study: Document column family

Page 12: Scylla db deck, july 2017

+ First read from memcached, go to Cassandra on misses.+ Pain

+ Stale data from cache+ Complexity+ Cold cache -> C* gets full volume

ReadMicroservice

Write Process

Outbrain: Cassandra plus Memcache

Page 13: Scylla db deck, july 2017

+ Writes are written in parallel to C* and Scylla+ Reads are done in parallel:

+ Memcached + Cassandra+ Scylla (no cache at all)

Scylla/Cassandra side by side deployment

ReadMicroservice

Write Process

Page 14: Scylla db deck, july 2017

Scylla (w/o cache) vs Cassandra + MemcachedScylla Cassandra Diff%

Requests/Minute

12,000,000 500,000(memcache

handles 11,500,000)

24X

AVG Latency

4 ms 8 ms 2X

Max Latency

8 ms 35 ms 3.5X

Hardware 9 machines 30+9 machines

4.3X

14

Cassandra’s latency

Scylla’s latency

Page 15: Scylla db deck, july 2017

What does it mean for a non Cassandra user?

+ Throughput, latency and scale benefits+ Wide range of big data integration: {Kariosdb, Spark,

JanusGraph, Presto, Kafka, Elastic}+ Best HA/DR in the industry. + Stop using caches in front of the database+ Consolidate HBase, Redis, MySQL, Mongo and others

Page 16: Scylla db deck, july 2017

Assorted Quotes

Page 17: Scylla db deck, july 2017

+ Intro: Who we are, what we do, who uses it+ Why we started ScyllaDB+ Why should you care+ How we made design decisions to achieve no-compromise

performance and availability

Today we will cover:

Page 18: Scylla db deck, july 2017

Design decisions: #1 The trivials

Page 19: Scylla db deck, july 2017

+ SSTable file format+ Configuration file format+ CQL language+ CQL native protocol+ JMX management protocol+ Management command line

Design decisions: #2 Compatibility

Page 20: Scylla db deck, july 2017

Double cluster - Migration w/o downtime

AppCassandra

Scylla

CQLproxy

Page 21: Scylla db deck, july 2017

Design decisions: #3 All things async

Page 22: Scylla db deck, july 2017

Threads Shards

Design decisions: #4 Shard per core

Page 23: Scylla db deck, july 2017

SCYLLA DB: Network Comparison

Kernel

Cassandra

TCP/IPScheduler

queuequeuequeuequeuequeuethreads

NICQueues

Kernel

Traditional stack SeaStar’s sharded stack

Memory

Lock contentionCache contentionNUMA unfriendly

Application

TCP/IP

Task Schedulerqueuequeuequeuequeuequeuesmp queue

NICQueue

DPDK

Kernel (isn’t

involved)

Userspace

Application

TCP/IP

Task Schedulerqueuequeuequeuequeuequeuesmp queue

NICQueue

DPDK

Kernel (isn’t

involved)

Userspace

Application

TCP/IP

Task Schedulerqueuequeuequeuequeuequeuesmp queue

NICQueue

DPDK

Kernel (isn’t

involved)

Userspace

No contentionLinear scalingNUMA friendly

CoreDatabase

Task Scheduler

queuequeuequeuequeuequeuesmp queue

NICQueue

Userspace

Page 24: Scylla db deck, july 2017

Scylla has its own task schedulerTraditional stack Scylla’s stack

Promise

Task

Promise

Task

Promise

Task

Promise

Task

CPU

Promise

Task

Promise

Task

Promise

Task

Promise

Task

CPU

Promise

Task

Promise

Task

Promise

Task

Promise

Task

CPU

Promise

Task

Promise

Task

Promise

Task

Promise

Task

CPU

Promise

Task

Promise

Task

Promise

Task

Promise

Task

CPU

Promise is a pointer to eventually computed value

Task is a pointer to a lambda function

Scheduler

CPU

Scheduler

CPU

Scheduler

CPU

Scheduler

CPU

Scheduler

CPU

Thread

Stack

Thread

Stack

Thread

Stack

Thread

Stack

Thread

Stack

Thread

Stack

Thread

Stack

Thread

Stack

Thread is a function pointer

Stack is a byte array from 64k to megabytes

Context switch cost is

high. Large stacks pollutes

the caches No sharing, millions of

parallel events

Page 25: Scylla db deck, july 2017

SCYLLA IS DIFFERENT

❏ DMA❏ Log structured❏ merge tree❏ DBaware cache❏ Userspace I/O❏ scheduler

❏ NUMA friendly❏ Log structured

allocator❏ Zero copy

❏ Thread per core❏ lock-free❏ Task scheduler❏ Reactor programing❏ C++14

❏ Multi queue ❏ Poll mode❏ Userspace

TCP/IP

Page 26: Scylla db deck, july 2017

Scylla vs C* latency by

Page 27: Scylla db deck, july 2017

Design Decision: #5 Unified cache vs C* cache plethora Cassandra Scylla

Key cache

Row cache

On-heap /Off-heap

Linux page cache

SSTables

Unified cache

SSTables

Complex Tuning

Page 28: Scylla db deck, july 2017

Unified cache vs Cassandra cache plethora Cassandra Scylla

Key cache

Row cache

On-heap /Off-heap

Linux page cache

SSTables

Unified cache

SSTables

SSTable page (4k)

Your data (300b)

Parasitic Rows - 300b/4kb

Page 29: Scylla db deck, july 2017

Cassandra Scylla

Key cache

Row cache

On-heap /Off-heap

Linux page cache

SSTables

Unified cache

SSTables

App thread

Kernel

SSD

Page faultSuspend thread

Initiate I/OContext switch

I/O completesInterruptContext switch

Map pageResume threadPage

fault

Unified cache vs Cassandra cache plethora

Page 30: Scylla db deck, july 2017

Cassandra Streaming configuration

Design decisions: #6 I/O scheduler

Page 31: Scylla db deck, july 2017

Scylla I/O Scheduling

Query

Commitlog

Compaction

Queue

Queue

Queue

UserspaceI/O

SchedulerDisk

Max useful disk concurrency

I/O queued in FS/deviceNo queues

Page 32: Scylla db deck, july 2017

I/O scheduler result

Page 33: Scylla db deck, july 2017

Memtable

Seastar SchedulerCompaction

Query

Repair

Commitlog

SSD

Compaction Backlog Monitor

Memory Monitor

Adjust priority

Adjust priority

WAN

CPU

Design Decision: #8..#10 Workload conditioning

Page 34: Scylla db deck, july 2017

Workload Conditioning in practice

Disk can’t keep up:workload conditioning will figure out the right request rate

Page 35: Scylla db deck, july 2017

+ Intro: Who we are, what we do, who uses it+ Why we started ScyllaDB+ How we made design decisions to achieve no-compromise

performance and availability

Today we will cover:

Page 36: Scylla db deck, july 2017

+ Enterprise release, based on 1.6+ 1.7 - May 2017

+ Counters+ New intra-node sharding algorithm+ SStableloader from 2.2/3.x+ Debian

+ 2.0 - August 2017+ Materialized views+ Execution blocks (cpu cache optimization which boost performance)+ Partial row cache (for wide row streaming)

Upcoming releases

Page 37: Scylla db deck, july 2017

Vertical HorizontalCore database

Scylla Beyond Cassandra

Page 38: Scylla db deck, july 2017

Q&A

Resources

slideshare.net/ScyllaDB

[email protected] (@DorLaor)

[email protected] (@AviKivity)

@scylladb

http://bit.ly/2oHAfok

youtube.com/c/scylladbscylladb.com/blog

Page 39: Scylla db deck, july 2017

NoSQL GOES NATIVE

Thank you.