ScyllaDB @ Apache BigData, may 2016

28
Cloudius Systems presents: NoSQL goes NATIVE Don Marti Tzach Livyatan @dmarti @TzachL

Transcript of ScyllaDB @ Apache BigData, may 2016

Cloudius Systems presents:

NoSQL goes NATIVEDon Marti Tzach Livyatan

@dmarti @TzachL

Capable of 1,800,000 operations per secondPER NODE

With predictable, low latenciesCompatible with Apache Cassandra

Scylla: A new Open Source NoSQL Database

BACKGROUND

SQL: Structured, no scale

Document store: No structureSome scale

Column store: Some structureScale outAwesome HA/DR

Key-value: SimpleScaleNot a real DB

THE POWER OFCASSANDRA AT THE SPEED OF REDIS

SOLUTION: SCYLLA DB

AWESOME REDUNDANCY & HA

+ Multi DC+ Spark+ CQL+ Auto sharding+ Wide rows

RESULTS: THROUGHPUT/SCALE UP

Benchmark configuration● Server type: Rackspace Bare Metal IO Class v1● CPU: Dual 2.8 GHz, 10 core Intel® Xeon® E5-2680 v2● RAM: 128 GB● Networking: Redundant 10 Gb/s connections in a high availability bond● Data Disks: 2 * 1.6 TB PCIe flash cards● OS: CentOS 7.2.1511, Kernel version: 3.10.0-327.10.1.el7.x86_64● Java

○ Cassandra - Oracle jdk-8u65○ Scylla - Open JDK 1.8 (used only for scylla-jmx)

Source: http://www.scylladb.com/technology/ycsb-cassandra-scylla/

LATENCY - AVG

LATENCY - P99

LATENCY - P99

FULLY COMPATIBLE

WHAT WOULD YOU DO WITH 1 MILLION TPS?Shrink your cluster by a factor of X10Handle 10X traffic spikes on Black FridayFaster repairs, faster scale out.Get the most out of your data - Run more queriesAdministration operations while servingStop using caches in front of the database

TECHNOLOGY:HOW IT WORKS

SCYLLA IS QUITE DIFFERENTShard-per-core, no locks, no threads, zero-copyReactor programing with C++14Our own efficient, DB-aware cache, not using Linux page cacheBetter storage engineMax out all HW resources - NUMA friendly, multiqueue NICs, etcUserspace I/O schedulerBased on Seastar project

SCYLLA ARCHITECTURE COMPARISON

● KVM was invented by Avi in 2006, development was managed by Dor● It was a new hypervisor after VMW, Xen had dominated the market● By smart design choices and leveraging Linux and the hardware it became the most

performing hypervisor.○ KVM holds SPECvirt performance record○ KVM holds max IOPS record

● The Open Virtualization Alliance includes hundreds of companies, including HP, IBM, Intel, AMD, Red Hat, etc

● KVM is the engine behind many clouds such as OpenStack, IBM, NTT, Fujitsu, HP, Google, DigitalOcean, etc.

Cassandra

TCP/IPScheduler

queuequeuequeuequeuequeuethreads

NICQueues

Kernel

Traditional stack Seastar’s sharded stack

Memory

Lock contentionCache contentionNUMA unfriendly

Application

TCP/IP

Task Schedulerqueuequeuequeuequeuequeuesmp queue

NICQueue

DPDK

Kernel (isn’t

involved)

Userspace

Application

TCP/IP

Task Schedulerqueuequeuequeuequeuequeuesmp queue

NICQueue

DPDK

Kernel (isn’t

involved)

Userspace

Application

TCP/IP

Task Schedulerqueuequeuequeuequeuequeuesmp queue

NICQueue

DPDK

Kernel (isn’t

involved)

Userspace

No contentionLinear scalingNUMA friendly

Kernel

CoreDatabase

Task Schedulerqueuequeuequeuequeuequeuesmp queue

Userspace

Scylla has its own task schedulerTraditional stack Scylla’s stack

Promise

Task

Promise

Task

Promise

Task

Promise

Task

CPU

Promise

Task

Promise

Task

Promise

Task

Promise

Task

CPU

Promise

Task

Promise

Task

Promise

Task

Promise

Task

CPU

Promise

Task

Promise

Task

Promise

Task

Promise

Task

CPU

Promise

Task

Promise

Task

Promise

Task

Promise

Task

CPU

Promise is a pointer to eventually computed value

Task is a pointer to a lambda function

Scheduler

CPU

Scheduler

CPU

Scheduler

CPU

Scheduler

CPU

Scheduler

CPU

Thread

Stack

Thread

Stack

Thread

Stack

Thread

Stack

Thread

Stack

Thread

Stack

Thread

Stack

Thread

Stack

Thread is a function pointer

Stack is a byte array from 64k to megabytes

Context switch cost is

high. Large stacks pollutes

the caches No sharing, millions of

parallel events

Unified cacheCassandra Scylla

Key cache

Row cache

On-heap /Off-heap

Linux page cache

SSTables

Unified cache

SSTables

Unified cacheCassandra Scylla

Key cache

Row cache

On-heap /Off-heap

Linux page cache

SSTables

Unified cache

SSTables

Page faultsParasitic rows

Tuning

Scylla has an I/O scheduler

Traditional stack

Scylla stackMax useful disk concurrency

I/O scheduled

by priority

here

Source: http://www.scylladb.com/2016/04/14/io-scheduler-1/

Scylla has an I/O scheduler

total, 14825839, 25002, 25002, 25002, 0.5, 0.3, 0.5, 5.0, 12.8, 22.2, 592.6, 0.00076total, 14851605, 24980, 24980, 24980, 0.5, 0.3, 0.6, 6.7, 12.9, 21.6, 593.6, 0.00076total, 14877443, 25004, 25004, 25004, 0.5, 0.3, 0.5, 6.5, 17.5, 38.8, 594.7, 0.00076total, 14903361, 25017, 25017, 25017, 0.5, 0.3, 0.5, 6.9, 29.9, 39.9, 595.7, 0.00076total, 14927655, 23553, 23553, 23553, 4.6, 0.3, 34.3, 66.8, 203.0, 255.9, 596.7, 0.00076total, 14956055, 26384, 26384, 26384, 5.0, 0.4, 27.2, 53.9, 81.5, 99.9, 597.8, 0.00077total, 14981910, 24987, 24987, 24987, 0.5, 0.3, 0.7, 6.2, 13.5, 25.0, 598.8, 0.00077total, 15007673, 25003, 25003, 25003, 0.4, 0.3, 0.5, 3.7, 12.5, 24.5, 599.9, 0.00077total, 15033484, 25006, 25006, 25006, 0.4, 0.3, 0.5, 3.8, 12.4, 32.8, 600.9, 0.00077total, 15059256, 25004, 25004, 25004, 0.4, 0.3, 0.5, 2.2, 14.9, 33.0, 601.9, 0.00076total, 15085126, 24994, 24994, 24994, 0.4, 0.3, 0.5, 2.4, 10.4, 19.4, 603.0, 0.00076total, 15110948, 24988, 24988, 24988, 0.5, 0.3, 0.6, 4.1, 10.3, 19.9, 604.0, 0.00076

Compatibility (and speed): Repair

Jan 13 17:33:44 server-02.localdomain scylla[7730]: [shard 27] stream_session - [Stream #cbbf8dc1-ba1b-11e5-a51a-00000000000b ID#0] Creating new streaming plan for repair-inJan 13 17:33:44 server-02.localdomain scylla[7730]: [shard 27] stream_session - [Stream #cbbf8dc1-ba1b-11e5-a51a-00000000000b ID#0] Received streaming plan for repair-inJan 13 17:33:45 server-02.localdomain scylla[7730]: [shard 27] stream_session - [Stream #cbc02a01-ba1b-11e5-a51a-00000000000b ID#0] Creating new streaming plan for repair-inJan 13 17:33:45 server-02.localdomain scylla[7730]: [shard 27] stream_session - [Stream #cbc02a01-ba1b-11e5-a51a-00000000000b ID#0] Received streaming plan for repair-in

SCYLLA as an INFRASTRUCTUREScale up with the number of coresKernel bypass for direct networking and block I/OGood match for upcoming Non Volatile Memory technologyHigh availability with gossip and flexible replicationRuns everywhere: Physical, virtual, containersCan be integrated with microservices with its own httpd

Monitoring Scylla

Connections to Apache Ecosystem: today

Connections to Apache Ecosystem: Soon

❏ Build a community❏ Core database improvements❏ VERTICAL: Spark, Solr, distributed SQL engines❏ HORIZONTAL: Microservice integration, more

protocols❏ Upcoming releases: Scylla 1.0.3, Scylla 1.1

WHAT’S NEXT?

SCYLLA, NoSQL GOES NATIVEThank you.