Cloudius Systems presents - Percona · Cloudius Systems presents: NoSQL goes NATIVE Avi Kivity...

39
Cloudius Systems presents: NoSQL goes NATIVE Avi Kivity (@AviKivity) Percona Live 2016

Transcript of Cloudius Systems presents - Percona · Cloudius Systems presents: NoSQL goes NATIVE Avi Kivity...

Page 1: Cloudius Systems presents - Percona · Cloudius Systems presents: NoSQL goes NATIVE Avi Kivity (@AviKivity) Percona Live 2016

Cloudius Systems presents:

NoSQL goes NATIVEAvi Kivity (@AviKivity)

Percona Live 2016

Page 2: Cloudius Systems presents - Percona · Cloudius Systems presents: NoSQL goes NATIVE Avi Kivity (@AviKivity) Percona Live 2016

Capable of 1,000,000 operations per secondPER NODE

With predictable, low latenciesCompatible with Apache Cassandra

Scylla: A new NoSQL Database

Page 3: Cloudius Systems presents - Percona · Cloudius Systems presents: NoSQL goes NATIVE Avi Kivity (@AviKivity) Percona Live 2016
Page 4: Cloudius Systems presents - Percona · Cloudius Systems presents: NoSQL goes NATIVE Avi Kivity (@AviKivity) Percona Live 2016

BACKGROUND

SQL: Structured, no scale

Document store: No structureSome scale

Column store: Some structureScale outAwesome HA/DR

Key-value: SimpleScaleNot a real DB

Page 5: Cloudius Systems presents - Percona · Cloudius Systems presents: NoSQL goes NATIVE Avi Kivity (@AviKivity) Percona Live 2016

YESSCALE UP OR SCALE OUT?

Page 6: Cloudius Systems presents - Percona · Cloudius Systems presents: NoSQL goes NATIVE Avi Kivity (@AviKivity) Percona Live 2016

CPUs AND DISKS ARE NETWORK DEVICES

Page 7: Cloudius Systems presents - Percona · Cloudius Systems presents: NoSQL goes NATIVE Avi Kivity (@AviKivity) Percona Live 2016

THE POWER OFCASSANDRA AT THE SPEED OF REDIS

SOLUTION: SCYLLA DB

Page 8: Cloudius Systems presents - Percona · Cloudius Systems presents: NoSQL goes NATIVE Avi Kivity (@AviKivity) Percona Live 2016

AWESOME REDUNDANCY & HA

+ Multi DC

+ Spark

+ CQL

+ Auto sharding

+ Wide rows

Page 9: Cloudius Systems presents - Percona · Cloudius Systems presents: NoSQL goes NATIVE Avi Kivity (@AviKivity) Percona Live 2016

RESULTS: THROUGHPUT/SCALE UP

Page 10: Cloudius Systems presents - Percona · Cloudius Systems presents: NoSQL goes NATIVE Avi Kivity (@AviKivity) Percona Live 2016

LATENCY - AVG

Page 11: Cloudius Systems presents - Percona · Cloudius Systems presents: NoSQL goes NATIVE Avi Kivity (@AviKivity) Percona Live 2016

LATENCY - P99

Page 12: Cloudius Systems presents - Percona · Cloudius Systems presents: NoSQL goes NATIVE Avi Kivity (@AviKivity) Percona Live 2016

FULLY COMPATIBLE

Page 13: Cloudius Systems presents - Percona · Cloudius Systems presents: NoSQL goes NATIVE Avi Kivity (@AviKivity) Percona Live 2016

SELF TUNING

No need to guess thread pool sizes

No need to guess compaction/streaming bandwidth

No fiddling with heap sizes, on/off heap

No endless tweaking of Garbage Collector parameters

Page 14: Cloudius Systems presents - Percona · Cloudius Systems presents: NoSQL goes NATIVE Avi Kivity (@AviKivity) Percona Live 2016

WHAT WOULD YOU DO WITH 1 MILLION TPS?

Shrink your cluster by a factor of X10

Handle 10X traffic spikes on Black Friday

Model your data instead of use Cassandra as K/V

Get the most out of your data - Run more queries

Maintain your clusters while serving

Stop using caches in front of the database

Page 15: Cloudius Systems presents - Percona · Cloudius Systems presents: NoSQL goes NATIVE Avi Kivity (@AviKivity) Percona Live 2016

TECHNOLOGY:HOW IT WORKS

Page 16: Cloudius Systems presents - Percona · Cloudius Systems presents: NoSQL goes NATIVE Avi Kivity (@AviKivity) Percona Live 2016

SeaStar

Before: Thread model After: SeaStar shards

Page 17: Cloudius Systems presents - Percona · Cloudius Systems presents: NoSQL goes NATIVE Avi Kivity (@AviKivity) Percona Live 2016

SCYLLA IS QUITE DIFFERENT

Shard-per-core, no locks, no threads, zero-copy

Reactor programing with C++14

Our own efficient,DB-aware cache, not using Linux page cache

Better storage engine than cassandra

Max out all HW resources - NUMA friendly, multiqueue NICs, etc

Page 18: Cloudius Systems presents - Percona · Cloudius Systems presents: NoSQL goes NATIVE Avi Kivity (@AviKivity) Percona Live 2016

SCYLLA DB: NETWORK COMPARISON

Kernel

Cassandra

TCP/IPScheduler

queuequeuequeuequeuequeue

threads

NIC

Queues

Kernel

Traditional stack SeaStar’s sharded stack

Memory

Application

TCP/I

P

Task Scheduler

queuequeuequeuequeuequeue

smp queue

NIC

Queue

DPDK

Kernel

(isn’t

involved)

Userspace

Application

TCP/I

P

Task Scheduler

queuequeuequeuequeuequeue

smp queue

NIC

Queue

DPDK

Kernel

(isn’t

involved)

Userspace

Application

TCP/I

P

Task Scheduler

queuequeuequeuequeuequeue

smp queue

NIC

Queue

DPDK

Kernel

(isn’t

involved)

Userspace

Core

Database

TCP/I

P

Task Scheduler

queuequeuequeuequeuequeue

smp queue

NIC

Queue

DPDK

Kernel

(isn’t

involved)

Userspace

Page 19: Cloudius Systems presents - Percona · Cloudius Systems presents: NoSQL goes NATIVE Avi Kivity (@AviKivity) Percona Live 2016

Scylla has its own task schedulerTraditional stack Scylla’s stack

Promise

Task

Promise

Task

Promise

Task

Promise

Task

CPU

Promise

Task

Promise

Task

Promise

Task

Promise

Task

CPU

Promise

Task

Promise

Task

Promise

Task

Promise

Task

CPU

Promise

Task

Promise

Task

Promise

Task

Promise

Task

CPU

Promise

Task

Promise

Task

Promise

Task

Promise

Task

CPU

Promise is a

pointer to

eventually

computed value

Task is a

pointer to a

lambda function

Scheduler

CPU

Scheduler

CPU

Scheduler

CPU

Scheduler

CPU

Scheduler

CPU

Threa

d

Stack

Threa

d

Stack

Threa

d

Stack

Threa

d

Stack

Threa

d

Stack

Threa

d

Stack

Threa

d

Stack

Threa

d

Stack

Thread is a

function pointer

Stack is a byte

array from 64k

to megabytes

Page 20: Cloudius Systems presents - Percona · Cloudius Systems presents: NoSQL goes NATIVE Avi Kivity (@AviKivity) Percona Live 2016

Unified cacheCassandra Scylla

Key cache

Row cache

On-heap /Off-heap

Linux page cache

SSTables

Unified cache

SSTables

TuningParasitic rowsPage faults

Page 21: Cloudius Systems presents - Percona · Cloudius Systems presents: NoSQL goes NATIVE Avi Kivity (@AviKivity) Percona Live 2016

total, 14825839, 25002, 25002, 25002, 0.5, 0.3, 0.5, 5.0, 12.8, 22.2, 592.6, 0.00076

total, 14851605, 24980, 24980, 24980, 0.5, 0.3, 0.6, 6.7, 12.9, 21.6, 593.6, 0.00076

total, 14877443, 25004, 25004, 25004, 0.5, 0.3, 0.5, 6.5, 17.5, 38.8, 594.7, 0.00076

total, 14903361, 25017, 25017, 25017, 0.5, 0.3, 0.5, 6.9, 29.9, 39.9, 595.7, 0.00076

total, 14927655, 23553, 23553, 23553, 4.6, 0.3, 34.3, 66.8, 203.0, 255.9, 596.7, 0.00076

total, 14956055, 26384, 26384, 26384, 5.0, 0.4, 27.2, 53.9, 81.5, 99.9, 597.8, 0.00077

total, 14981910, 24987, 24987, 24987, 0.5, 0.3, 0.7, 6.2, 13.5, 25.0, 598.8, 0.00077

total, 15007673, 25003, 25003, 25003, 0.4, 0.3, 0.5, 3.7, 12.5, 24.5, 599.9, 0.00077

total, 15033484, 25006, 25006, 25006, 0.4, 0.3, 0.5, 3.8, 12.4, 32.8, 600.9, 0.00077

total, 15059256, 25004, 25004, 25004, 0.4, 0.3, 0.5, 2.2, 14.9, 33.0, 601.9, 0.00076

total, 15085126, 24994, 24994, 24994, 0.4, 0.3, 0.5, 2.4, 10.4, 19.4, 603.0, 0.00076

total, 15110948, 24988, 24988, 24988, 0.5, 0.3, 0.6, 4.1, 10.3, 19.9, 604.0, 0.00076

Compatibility (and speed): Repair

Jan 13 17:33:44 server-02.localdomain scylla[7730]: [shard 27] stream_session - [Stream #cbbf8dc1-ba1b-11e5-a51a-00000000000b ID#0]

Creating new streaming plan for repair-in

Jan 13 17:33:44 server-02.localdomain scylla[7730]: [shard 27] stream_session - [Stream #cbbf8dc1-ba1b-11e5-a51a-00000000000b ID#0]

Received streaming plan for repair-in

Jan 13 17:33:45 server-02.localdomain scylla[7730]: [shard 27] stream_session - [Stream #cbc02a01-ba1b-11e5-a51a-00000000000b ID#0]

Creating new streaming plan for repair-in

Jan 13 17:33:45 server-02.localdomain scylla[7730]: [shard 27] stream_session - [Stream #cbc02a01-ba1b-11e5-a51a-00000000000b ID#0]

Received streaming plan for repair-in

Page 22: Cloudius Systems presents - Percona · Cloudius Systems presents: NoSQL goes NATIVE Avi Kivity (@AviKivity) Percona Live 2016

I/O Scheduling

Query

Commitlog

Compaction

Queue

Queue

Queue

I/O

SchedulerDisk

Page 23: Cloudius Systems presents - Percona · Cloudius Systems presents: NoSQL goes NATIVE Avi Kivity (@AviKivity) Percona Live 2016

STATUS

1.0 Released

Major CQL feature support

Good match for upcoming Non Volatile Memory technology

High availability with gossip and flexible replication

Runs everywhere: Physical, virtual, containers

Can be integrated with microservices with its own httpd

Page 24: Cloudius Systems presents - Percona · Cloudius Systems presents: NoSQL goes NATIVE Avi Kivity (@AviKivity) Percona Live 2016

❏Build a community❏Core database improvements❏VERTICAL: Spark, Solr, distributed SQL engines❏HORIZONTAL: Microservice integration, more

protocols

WHAT’S NEXT?

Page 25: Cloudius Systems presents - Percona · Cloudius Systems presents: NoSQL goes NATIVE Avi Kivity (@AviKivity) Percona Live 2016

SCYLLA, NoSQL GOES NATIVEThank you.

Page 26: Cloudius Systems presents - Percona · Cloudius Systems presents: NoSQL goes NATIVE Avi Kivity (@AviKivity) Percona Live 2016
Page 27: Cloudius Systems presents - Percona · Cloudius Systems presents: NoSQL goes NATIVE Avi Kivity (@AviKivity) Percona Live 2016
Page 28: Cloudius Systems presents - Percona · Cloudius Systems presents: NoSQL goes NATIVE Avi Kivity (@AviKivity) Percona Live 2016
Page 29: Cloudius Systems presents - Percona · Cloudius Systems presents: NoSQL goes NATIVE Avi Kivity (@AviKivity) Percona Live 2016

Basic model

■ Futures

■ Promises

■ Continuations

Page 30: Cloudius Systems presents - Percona · Cloudius Systems presents: NoSQL goes NATIVE Avi Kivity (@AviKivity) Percona Live 2016

F-P-C defined: Future

A future is a result of a computation

that may not be available yet. ■ Data buffer from the network

■ Timer expiration

■ Completion of a disk write

■ Result computation that requires the values from one or

more other futures.

Page 31: Cloudius Systems presents - Percona · Cloudius Systems presents: NoSQL goes NATIVE Avi Kivity (@AviKivity) Percona Live 2016

F-P-C defined: Promise

A promise is an object or function

that provides you with a future, with

the expectation that it will fulfil the

future.

Page 32: Cloudius Systems presents - Percona · Cloudius Systems presents: NoSQL goes NATIVE Avi Kivity (@AviKivity) Percona Live 2016

F-P-C defined: Continuation

A continuation is a computation that

is executed when a future becomes

ready (yielding a new future).

Page 33: Cloudius Systems presents - Percona · Cloudius Systems presents: NoSQL goes NATIVE Avi Kivity (@AviKivity) Percona Live 2016

Basic future/promise

future<int> get(); // promises an int will be produced eventuallyfuture<> put(int) // promises to store an int

void f() {get().then([] (int value) {

put(value + 1).then([] {std::cout << "value stored successfully\n";

});});

}

Page 34: Cloudius Systems presents - Percona · Cloudius Systems presents: NoSQL goes NATIVE Avi Kivity (@AviKivity) Percona Live 2016

Chaining

future<int> get(); // promises an int will be produced eventuallyfuture<> put(int) // promises to store an int

void f() {get().then([] (int value) {

return put(value + 1);}).then([] {

std::cout << "value stored successfully\n";});

}

Page 35: Cloudius Systems presents - Percona · Cloudius Systems presents: NoSQL goes NATIVE Avi Kivity (@AviKivity) Percona Live 2016

Parallelism

void f() {

std::cout << "Sleeping... " << std::flush;

using namespace std::chrono_literals;

sleep(200ms).then([] { std::cout << "200ms " << std::flush; });

sleep(100ms).then([] { std::cout << "100ms " << std::flush; });

sleep(1s).then([] { std::cout << "Done.\n"; engine_exit(); });

}

Page 36: Cloudius Systems presents - Percona · Cloudius Systems presents: NoSQL goes NATIVE Avi Kivity (@AviKivity) Percona Live 2016

Zero copy friendly

future<temporary_buffer> connected_socket::read(size_t n);

■ temporary_buffer points at driver-provided pages if possible

■ discarded after use

Page 37: Cloudius Systems presents - Percona · Cloudius Systems presents: NoSQL goes NATIVE Avi Kivity (@AviKivity) Percona Live 2016

Zero copy friendly (2)

future<size_t>connected_socket::write(temporary_buffer);

■ Future becomes ready when TCP window allows sending more data (usually immediately)

■ temporary_buffer discarded after data is ACKed■ can call delete[] or decrement a reference count

Page 38: Cloudius Systems presents - Percona · Cloudius Systems presents: NoSQL goes NATIVE Avi Kivity (@AviKivity) Percona Live 2016

Dual Networking Stack

Networking API

Seastar (native) Stack POSIX (hosted) stack

Linux kernel (sockets)

User-space TCP/IP

Interface layer

DPDK

Virtio Xen

igb ixgb

Page 39: Cloudius Systems presents - Percona · Cloudius Systems presents: NoSQL goes NATIVE Avi Kivity (@AviKivity) Percona Live 2016

Disk I/O

■ Zero copy using Linux AIO and O_DIRECT■ Some operations using worker threads (open()

etc.)■ Plans for direct NVMe support