Power of the Log: LSM & Append Only Data Structures

71
The Power of the Log LSM & Append Only Data Structures Ben Stopford Confluent Inc

Transcript of Power of the Log: LSM & Append Only Data Structures

Page 1: Power of the Log: LSM & Append Only Data Structures

The Power of the Log LSM & Append Only Data Structures

Ben Stopford Confluent Inc

Page 2: Power of the Log: LSM & Append Only Data Structures

@benstopford

Page 3: Power of the Log: LSM & Append Only Data Structures

The Log Connectors Connectors

Producer Consumer

Streaming Engine

Kafka: a Streaming Platform

Page 4: Power of the Log: LSM & Append Only Data Structures

KAFKA’s Distributed Log

Linear Scans Append Only

Page 5: Power of the Log: LSM & Append Only Data Structures

Messaging is a Log-Shaped Problem

Linear Scans Append Only

Page 6: Power of the Log: LSM & Append Only Data Structures

Not all problems are Log-Shaped

Page 7: Power of the Log: LSM & Append Only Data Structures

Many problems benefit from being addressed in a “log-shaped” way

Page 8: Power of the Log: LSM & Append Only Data Structures

Supporting Lookups

Page 9: Power of the Log: LSM & Append Only Data Structures

Lookups in a log

Head Tail

Page 10: Power of the Log: LSM & Append Only Data Structures

Trees provide Selectivity

bob dave fred hary mike steve vince

Index

Page 11: Power of the Log: LSM & Append Only Data Structures

But the overarching structure implies Dispersed Writes

bob dave fred hary mike steve vince

Random IO

Page 12: Power of the Log: LSM & Append Only Data Structures

Log Structured Merge Trees 1996

Page 13: Power of the Log: LSM & Append Only Data Structures

Used in a range of modern databases

•  BigTable

•  HBase

•  LevelDB

•  SQLite4

•  RocksDB

•  MongoDB

•  WiredTiger

•  Cassandra

•  MySQL

•  InfluxDB ...

Page 14: Power of the Log: LSM & Append Only Data Structures

If a systems have a natural grain, it is one formed of sequential

operations which favour locality

Page 15: Power of the Log: LSM & Append Only Data Structures

Caching & Prefetching

L3 cache L2 cache

L1 cache

Pre-fetch is your friend

CPU Caches

Page Cache

Application-level caching

Disk Controller

Page 16: Power of the Log: LSM & Append Only Data Structures

Write efficiency comes from amortising writes into sequential

operations

Page 17: Power of the Log: LSM & Append Only Data Structures

Taken from ACMQueue: The Pathologies of Big Data

Page 18: Power of the Log: LSM & Append Only Data Structures

So if we go against the grain of the system, RAM can actually be

slower than disk

Page 19: Power of the Log: LSM & Append Only Data Structures

Going against the grain means dispersed operations that break locality

Poor Locality Good Locality

Page 20: Power of the Log: LSM & Append Only Data Structures

The beauty of the log lies in its sequentially

Linear Scans Append Only

Page 21: Power of the Log: LSM & Append Only Data Structures

LSM is about re-imagining search as as a “log-shaped” problem

Page 22: Power of the Log: LSM & Append Only Data Structures

Arrange writes to be Append Only

Append Only Journal

(Sequential IO)

Update in Place Ordered File (Random IO)

Bob = Carpenter

Bob = Carpenter

Bob = Cabinet Maker

Bob = Cabinet Maker

Page 23: Power of the Log: LSM & Append Only Data Structures

Avoid dispersed writes

Page 24: Power of the Log: LSM & Append Only Data Structures

Simple LSM

Page 25: Power of the Log: LSM & Append Only Data Structures

Writes are collected in memory Writes

sort

write to disk

older files

small index file

RAM

Page 26: Power of the Log: LSM & Append Only Data Structures

When enough have buffered, sort. Writes

write to disk

older files

small index file

Batched sorted

RAM

Page 27: Power of the Log: LSM & Append Only Data Structures

Write the sorted file to disk Writes

write to disk

older files

Small, sorted immutable file

Batched sorted

Page 28: Power of the Log: LSM & Append Only Data Structures

Repeat... Writes

write to disk

Older files New files

Batched sorted

Page 29: Power of the Log: LSM & Append Only Data Structures

Batching -> Fast Sequential IO Writes

write to disk

Older files New files

Batched

Sorted memtable

Page 30: Power of the Log: LSM & Append Only Data Structures

That’s the core write path

Page 31: Power of the Log: LSM & Append Only Data Structures

What about reads?

Page 32: Power of the Log: LSM & Append Only Data Structures

Search reverse-chronologically

older files

newer files

(1) Is “bob” here?

(2) Is “bob” here?

(3) Is “bob” here?

(4) Is “bob” here?

Page 33: Power of the Log: LSM & Append Only Data Structures

Worst Case We consult every file

Page 34: Power of the Log: LSM & Append Only Data Structures

We might have a lot of files!

Page 35: Power of the Log: LSM & Append Only Data Structures

LSM naturally optimises for writes, over reads

This is a reasonable tradeoff to make

Page 36: Power of the Log: LSM & Append Only Data Structures

Optimizing reads is easier than optimising writes

Page 37: Power of the Log: LSM & Append Only Data Structures

Optimisation 1 Bound the number of files

Page 38: Power of the Log: LSM & Append Only Data Structures

Create levels

Level-0

Level-1

Page 39: Power of the Log: LSM & Append Only Data Structures

Separate thread merges old files, de-duplicating them.

Level-0

Level-1

Page 40: Power of the Log: LSM & Append Only Data Structures

Separate thread merges old files, de-duplicating them.

Level-0

Level-1

Page 41: Power of the Log: LSM & Append Only Data Structures

Merging process is reminiscent of merge sort

Page 42: Power of the Log: LSM & Append Only Data Structures

Take this further with levels

Level-0

Level-1

Level-2

Level-3 Memtable

Page 43: Power of the Log: LSM & Append Only Data Structures

But single reads still require many individual lookups:

•  Number of searches: –  1 per base level –  1 per level above

Page 44: Power of the Log: LSM & Append Only Data Structures

Optimisation 2 Caching & Friends

Page 45: Power of the Log: LSM & Append Only Data Structures

Add Memory i.e. More Caching / Pre-fetch

Page 46: Power of the Log: LSM & Append Only Data Structures

Read Ahead & Prefetch

L3 cache L2 cache

L1 cache

Pre-fetch is your friend

Page Cache

Disk Controller

Page 47: Power of the Log: LSM & Append Only Data Structures

If only there was a more efficient way to avoid searching each file!

Page 48: Power of the Log: LSM & Append Only Data Structures

Elven Magic?

Page 49: Power of the Log: LSM & Append Only Data Structures

Bloom Filters Answers the question:

Do I need to look in this file to find the value for this key?

Size -> probability of false positive Key

Hash Function

Bit Set

Page 50: Power of the Log: LSM & Append Only Data Structures

Bloom Filters •  Space efficient, probabilistic

data structure

•  As keyspace grows: –  p(collision) increases

–  Index size is fixed

Page 51: Power of the Log: LSM & Append Only Data Structures

Many more degrees of freedom for optimising reads

RAM

Disk

file metadata & bloom filter

Page 52: Power of the Log: LSM & Append Only Data Structures

Log Structured Merge Trees •  A collection of small, immutable indexes

•  All sequential operations, de-duplicate by merging files

•  Index/Bloom in RAM to increase read performance

Page 53: Power of the Log: LSM & Append Only Data Structures

Subtleties •  Writes are 1 x IO (blind writes) , rather than 2 x IO’s

(read + modify)

•  Batching writes decreases write amplification. In trees leaf pages must be updated.

Page 54: Power of the Log: LSM & Append Only Data Structures

Immutability => Simpler locking semantics

Only memtable is mutable

Page 55: Power of the Log: LSM & Append Only Data Structures

Does it work? Lots of real world examples

Page 56: Power of the Log: LSM & Append Only Data Structures

Measureable in the real world

•  Innodb vs MyRocks results, taken from Mark Callaghan’s blog: http://bit.ly/2mhWT7p

•  There are many subtleties. Take all benchmarks with a pinch of salt.

Page 57: Power of the Log: LSM & Append Only Data Structures

Elements of Beauty •  Reframing the problem to be Log-Centric. To go with

the grain of the system.

•  Optimise for the harder problem

•  Compartmentalises writes (coordination) to a single point. Reads -> immutable structures.

Page 58: Power of the Log: LSM & Append Only Data Structures

Applies in many other areas •  Sequentiality –  Databases: write ahead logs

–  Columnar databases: Merge Joins

–  Kafka •  Immutability –  Snapshot isolation over explicit locking.

–  Replication (state machines replication)

Page 59: Power of the Log: LSM & Append Only Data Structures

Log-Centric Approaches Work in Applications too

Page 60: Power of the Log: LSM & Append Only Data Structures

Event Sourcing •  Journaling of

state changes

•  No “update in place”

Object

Journal

+ 10.36 - 12.12 + 23.70 + 13.33

Page 61: Power of the Log: LSM & Append Only Data Structures

CQRS Client

Command Query

Write Optimised

Read Optimised

log

Page 62: Power of the Log: LSM & Append Only Data Structures

How Applications or Services share state

Page 63: Power of the Log: LSM & Append Only Data Structures

Log-Centric Services

Writer Read-Replica

Read-Replica

Read-Replica

Writes are localised to a single service

Page 64: Power of the Log: LSM & Append Only Data Structures

Log-Centric Services

Writer Read-Replica

Read-Replica

Read-Replica Immutable log

Page 65: Power of the Log: LSM & Append Only Data Structures

Log-Centric Services

Writer Read-Replica

Read-Replica

Read-Replica Many, independent read replicas

Page 66: Power of the Log: LSM & Append Only Data Structures

Elements of Beauty •  Reframing the problem to be Log-Centric. To go with

the grain of the system.

•  Optimise for the harder problem

•  Compartmentalises writes (coordination) to a single point. Reads -> immutable structures.

Page 67: Power of the Log: LSM & Append Only Data Structures

Decentralised Design In both database design as well as in

application development

Page 68: Power of the Log: LSM & Append Only Data Structures

The Log is the central building block

Pushes us towards the natural grain of the system

Page 69: Power of the Log: LSM & Append Only Data Structures

The Log A single unifying abstraction

Page 70: Power of the Log: LSM & Append Only Data Structures

References LSM:

•  benstopford.com/2015/02/14/log-structured-merge-trees/

•  smalldatum.blogspot.co.uk/2017/02/using-modern-sysbench-to-compare.html

•  www.quora.com/How-does-the-Log-Structured-Merge-Tree-work

•  bLSM paper: http://bit.ly/2mT7Vje

Other

•  Pat Helland (Immutability) cidrdb.org/cidr2015/Papers/CIDR15_Paper16.pdf

•  Peter Ballis (Coordination Avoidance): http://bit.ly/2m7XxnI

•  Jay Kreps: I Heart Logs (O’Reilly 2014)

•  The Data Dichotomy: http://bit.ly/2hk9c2K

Page 71: Power of the Log: LSM & Append Only Data Structures

Thank you

@benstopford http://benstopford.com

[email protected]