Иван Глушков (Echo)

NewSQL Overview

Ivan Glushkov ivan.glushkov@gmail.com

@gliush

About myself• MIPT • MCST, Elbrus compiler project • Echo, real-time social platform

(PaaS)

History of SQL• Relational Model in 1970:

disk-oriented, row, sql • “One size fits all” doesn’t

work: – Column-oriented for OLAP. – Key-Value storages,

Document storages

Startups lifecycle• Start: no money, no users,

open source • Middle: more users, storage

optimization • Final: plenty of users, storage

failure

Sharding in application• Additional logic in application • Difficulties with cross-sharding

transactions • More servers to maintain • More components — higher

prob for breakdown

New requirements• Huge and growing data sets

9M msg per hour in Facebook 50M msg per day in Twitter

• Generated by devices • High concurrency • Data model with some relations • Transactional integrity

Architecture change

Client Side

Server Side

Cloud Storage

Client Side

Server Side

Database

Architecture change

• ‘P’ in CAP is not discrete • ACID vs BASE • Managing partitions: detection,

limitations in operations, recovery

NoSQL• CAP: first ‘A’, then ‘C’ • Horizontal scaling • Custom API • Schemaless • Types: Key-Value, Document,

Graph, …

NewSQL: definition

“A DBMS that delivers the scalability !and flexibility promised by NoSQL!while retaining the support for SQL !queries and/or ACID, or to improve !performance for appropriate workloads.”!

451 Group

NewSQL: definition

❖ SQL as the primary interface!❖ ACID support for transactions!❖ Non-locking concurrency control!❖ High per-node performance!❖ Parallel, shared nothing architecture!

Michael Stonebraker

Shared nothing• Each node is independent and

self-sufficient • No shared memory or disk • Scale infinitely • Data partitioning • Slow multi-shards requests

Column-oriented DBMS• Store content by column • Efficient in hard disk access • Good for sparse data • Higher data compression • More reads/writes for large

records with a lot of fields

• Useful work is only 12%

• By Stonebraker and research group

Traditional DBMS

18% 20%

In-memory storage• High throughput • Low latency • No Buffer Management • If serialized, no Locking or

Latching

In-memory storage• Amazon price reduction !!!!

• Current price for 1TB: $14K

NewSQL: categories• New approaches: VoltDB,

Clustrix, NuoDB • New storage engines: TokuDB,

ScaleDB • Transparent clustering:

ScaleBase, dbShards

NuoDB• Multi-tier architecture: – Administrative: managing,

stats, cli, web-ui – Transactional: ACID except

‘D’, cache – Storage: key-value store

NuoDB• Everything is an ‘Atom’ • P2P, encrypted sessions • MVCC + Append-only storage

YCSB• Yahoo Cloud Serving

Benchmark • Key-value:

insert/read/update/scan • Measures – Performance – Scaling

NuoDB: YCSBThroughput, tps/nodes

275 000

550 000

825 000

1 100 000

1 2 4 8 16 24

Update latency, s

1 2 4 8 16 24

Read latency, s

1 2 4 8 16 24

Hosts: 32GB, Xeon 8 cores, 1TB HDD, 1Gb LAN5% updates, 95% reads

VoltDB• In-memory storage • Stored procedure interface • Serializing all data access • Horizontal partitioning • Replication (“K-safety”) • Snapshots + Command Logging

VoltDB• Open-source • Partitioning and Replication

control • Java + C++

VoltDB: kv bench

90%reads, 10%writes 3 nodes: 64GB, dual 2.93GHz intel 6 core processors

VoltDB: sql bench

26 SQL statements "per transaction

• Shared data • No single point

of failure

ScaleDBApplication

MySQL MySQL MySQL…

Mirrored"Storage! Mirrored"

Storage!

… Mirrored"Storage!

ClustrixDB• “Query fragment”: • read/write/execute funs • perform synchronisation • send rows to query

fragments • Partition, replication: “slices”

ClustrixDB• “Move query to the data” • Dynamic and transparent data

layout • Linear scale

• OLTP benchmark • 9 types of tables • “New-order transaction”

ClustrixDB• ~ 400GB • linear

ClustrixDB: examples• 30M users, 10M logins per day • 4.4B transactions per day • 1.08/4.69 Petabytes per month

writes/reads • 42 nodes, 336 cores, 2TB

memory, 46TB SSD

Conclusions• NewSQL is an established trend

with a number of options • Hard to pick one because

they're not on a common scale • No silver bullet • Need more efficient ways to

store and process it

Questions?Ivan Glushkov ivan.glushkov@gmail.com @gliush

Иван Глушков (Echo)

Internet

Transcript of Иван Глушков (Echo)

Иван Ильин

степан глушков исследование потребностей и интересов интернет-аудитории

Глушков - 68edu.ru · Глушков Виктор Андреевич По воспоминаниям педагогов прошлых лет, Виктор был очень

Роман Глушков "Холодная кровь"

Иван Пасечник

Иван Шестаков, Megogo

Виктор Глушков

Symantec, Швыдченко Иван

Иван Фролков

Иван Сусанин

Иван Федоров первопечатник

чашкин иван

Пятков Иван

иван ведър

Иван Иноземцев — Fantom

прозоров иван

Бунин Иван Алексеевич

богомолов иван

Скачков Иван Емельянович

Иван Стоянов - Възраждане