Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards...

105
Inside the InfluxDB Storage Engine Paul Dix paul@influxdb.com @pauldix

Transcript of Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards...

Page 1: Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each is an underlying DB efficient to drop old data 10/10/2015 10/13/2015

Inside the InfluxDB Storage Engine

Paul Dix [email protected]

@pauldix

Page 2: Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each is an underlying DB efficient to drop old data 10/10/2015 10/13/2015

TSDB

Page 3: Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each is an underlying DB efficient to drop old data 10/10/2015 10/13/2015

Inverted Index

Page 4: Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each is an underlying DB efficient to drop old data 10/10/2015 10/13/2015

preliminary intro materials…

Page 5: Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each is an underlying DB efficient to drop old data 10/10/2015 10/13/2015

Everything is indexed by time and series

Page 6: Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each is an underlying DB efficient to drop old data 10/10/2015 10/13/2015

Shards

10/11/2015 10/12/2015

Data organized into Shards of time, each is an underlying DBefficient to drop old data

10/13/201510/10/2015

Page 7: Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each is an underlying DB efficient to drop old data 10/10/2015 10/13/2015

InfluxDB data

temperature,device=dev1,building=b1 internal=80,external=18 1443782126

Page 8: Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each is an underlying DB efficient to drop old data 10/10/2015 10/13/2015

InfluxDB data

temperature,device=dev1,building=b1 internal=80,external=18 1443782126

Measurement

Page 9: Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each is an underlying DB efficient to drop old data 10/10/2015 10/13/2015

InfluxDB data

temperature,device=dev1,building=b1 internal=80,external=18 1443782126

Measurement Tags

Page 10: Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each is an underlying DB efficient to drop old data 10/10/2015 10/13/2015

InfluxDB data

temperature,device=dev1,building=b1 internal=80,external=18 1443782126

Measurement Tags Fields

Page 11: Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each is an underlying DB efficient to drop old data 10/10/2015 10/13/2015

InfluxDB data

temperature,device=dev1,building=b1 internal=80,external=18 1443782126

Measurement Tags Fields Timestamp

Page 12: Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each is an underlying DB efficient to drop old data 10/10/2015 10/13/2015

InfluxDB data

temperature,device=dev1,building=b1 internal=80,external=18 1443782126

Measurement Tags Fields Timestamp

We actually store up to ns scale timestamps but I couldn’t fit on the slide

Page 13: Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each is an underlying DB efficient to drop old data 10/10/2015 10/13/2015

Each series and field to a unique IDtemperature,device=dev1,building=b1#internal

temperature,device=dev1,building=b1#external

1

2

Page 14: Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each is an underlying DB efficient to drop old data 10/10/2015 10/13/2015

Data per ID is tuples ordered by timetemperature,device=dev1,building=b1#internal

temperature,device=dev1,building=b1#external

1

2

1 (1443782126,80)

2 (1443782126,18)

Page 15: Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each is an underlying DB efficient to drop old data 10/10/2015 10/13/2015

Arranging in Key/Value Stores

1,1443782126

Key Value

80

ID Time

Page 16: Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each is an underlying DB efficient to drop old data 10/10/2015 10/13/2015

Arranging in Key/Value Stores

1,1443782126

Key Value

802,1443782126 18

Page 17: Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each is an underlying DB efficient to drop old data 10/10/2015 10/13/2015

Arranging in Key/Value Stores

1,1443782126

Key Value

80

2,1443782126 181,1443782127 81 new data

Page 18: Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each is an underlying DB efficient to drop old data 10/10/2015 10/13/2015

Arranging in Key/Value Stores

1,1443782126

Key Value

80

2,1443782126 181,1443782127 81key space

is ordered

Page 19: Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each is an underlying DB efficient to drop old data 10/10/2015 10/13/2015

Arranging in Key/Value Stores

1,1443782126

Key Value

80

2,1443782126 181,1443782127 81

2,1443782256 152,1443782130 17

3,1443700126 18

Page 20: Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each is an underlying DB efficient to drop old data 10/10/2015 10/13/2015

Many existing storage engines have this model

Page 21: Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each is an underlying DB efficient to drop old data 10/10/2015 10/13/2015

New Storage Engine?!

Page 22: Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each is an underlying DB efficient to drop old data 10/10/2015 10/13/2015

First we used LSM Trees

Page 23: Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each is an underlying DB efficient to drop old data 10/10/2015 10/13/2015

deletes expensive

Page 24: Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each is an underlying DB efficient to drop old data 10/10/2015 10/13/2015

too many open file handles

Page 25: Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each is an underlying DB efficient to drop old data 10/10/2015 10/13/2015

Then mmap COW B+Trees

Page 26: Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each is an underlying DB efficient to drop old data 10/10/2015 10/13/2015

write throughput

Page 27: Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each is an underlying DB efficient to drop old data 10/10/2015 10/13/2015

compression

Page 28: Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each is an underlying DB efficient to drop old data 10/10/2015 10/13/2015

met our requirements

Page 29: Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each is an underlying DB efficient to drop old data 10/10/2015 10/13/2015

High write throughput

Page 30: Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each is an underlying DB efficient to drop old data 10/10/2015 10/13/2015

Awesome read performance

Page 31: Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each is an underlying DB efficient to drop old data 10/10/2015 10/13/2015

Better Compression

Page 32: Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each is an underlying DB efficient to drop old data 10/10/2015 10/13/2015

Writes can’t block reads

Page 33: Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each is an underlying DB efficient to drop old data 10/10/2015 10/13/2015

Reads can’t block writes

Page 34: Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each is an underlying DB efficient to drop old data 10/10/2015 10/13/2015

Write multiple ranges simultaneously

Page 35: Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each is an underlying DB efficient to drop old data 10/10/2015 10/13/2015

Hot backups

Page 36: Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each is an underlying DB efficient to drop old data 10/10/2015 10/13/2015

Many databases open in a single process

Page 37: Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each is an underlying DB efficient to drop old data 10/10/2015 10/13/2015

Enter InfluxDB’s Time Structured Merge Tree

(TSM Tree)

Page 38: Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each is an underlying DB efficient to drop old data 10/10/2015 10/13/2015

Enter InfluxDB’s Time Structured Merge Tree

(TSM Tree) like LSM, but different

Page 39: Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each is an underlying DB efficient to drop old data 10/10/2015 10/13/2015

Components

WALIn

memory cache

Index Files

Page 40: Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each is an underlying DB efficient to drop old data 10/10/2015 10/13/2015

Components

WALIn

memory cache

Index Files

Similar to LSM Trees

Page 41: Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each is an underlying DB efficient to drop old data 10/10/2015 10/13/2015

Components

WALIn

memory cache

Index Files

Similar to LSM Trees

Same

Page 42: Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each is an underlying DB efficient to drop old data 10/10/2015 10/13/2015

Components

WALIn

memory cache

Index Files

Similar to LSM Trees

Same like MemTables

Page 43: Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each is an underlying DB efficient to drop old data 10/10/2015 10/13/2015

Components

WALIn

memory cache

Index Files

Similar to LSM Trees

Same like MemTables like SSTables

Page 44: Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each is an underlying DB efficient to drop old data 10/10/2015 10/13/2015

awesome time series data

WAL (an append only file)

Page 45: Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each is an underlying DB efficient to drop old data 10/10/2015 10/13/2015

awesome time series data

WAL (an append only file)

in memory index

Page 46: Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each is an underlying DB efficient to drop old data 10/10/2015 10/13/2015

awesome time series data

WAL (an append only file)

in memory index

on disk index

(periodic flushes)

Page 47: Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each is an underlying DB efficient to drop old data 10/10/2015 10/13/2015

awesome time series data

WAL (an append only file)

in memory index

on disk index

(periodic flushes)

Memory mapped!

Page 48: Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each is an underlying DB efficient to drop old data 10/10/2015 10/13/2015

TSM File

Page 49: Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each is an underlying DB efficient to drop old data 10/10/2015 10/13/2015

TSM File

Page 50: Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each is an underlying DB efficient to drop old data 10/10/2015 10/13/2015

TSM File

Page 51: Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each is an underlying DB efficient to drop old data 10/10/2015 10/13/2015

TSM File

Page 52: Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each is an underlying DB efficient to drop old data 10/10/2015 10/13/2015

TSM File

Page 53: Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each is an underlying DB efficient to drop old data 10/10/2015 10/13/2015

TSM File

Page 54: Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each is an underlying DB efficient to drop old data 10/10/2015 10/13/2015

Compression

Page 55: Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each is an underlying DB efficient to drop old data 10/10/2015 10/13/2015

Timestamps: encoding based on precision and deltas

Page 56: Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each is an underlying DB efficient to drop old data 10/10/2015 10/13/2015

Timestamps (best case): Run length encoding

Deltas are all the same for a block

Page 57: Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each is an underlying DB efficient to drop old data 10/10/2015 10/13/2015

Timestamps (good case): Simple8B

Ann and Moffat in "Index compression using 64-bit words"

Page 58: Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each is an underlying DB efficient to drop old data 10/10/2015 10/13/2015

Timestamps (worst case): raw values

nano-second timestamps with large deltas

Page 59: Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each is an underlying DB efficient to drop old data 10/10/2015 10/13/2015

float64: double deltaFacebook’s Gorilla - google: gorilla time series facebook

https://github.com/dgryski/go-tsz

Page 60: Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each is an underlying DB efficient to drop old data 10/10/2015 10/13/2015

booleans are bits!

Page 61: Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each is an underlying DB efficient to drop old data 10/10/2015 10/13/2015

int64 uses double delta, zig-zagzig-zag same as from Protobufs

Page 62: Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each is an underlying DB efficient to drop old data 10/10/2015 10/13/2015

string uses Snappysame compression LevelDB uses

(might add dictionary compression)

Page 63: Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each is an underlying DB efficient to drop old data 10/10/2015 10/13/2015

UpdatesWrite, resolve at query

Page 64: Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each is an underlying DB efficient to drop old data 10/10/2015 10/13/2015

Deletestombstone, resolve at query & compaction

Page 65: Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each is an underlying DB efficient to drop old data 10/10/2015 10/13/2015

Compactions• Combine multiple TSM files

• Put all series points into same file

• Series points in 1k blocks

• Multiple levels

• Full compaction when cold for writes

Page 66: Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each is an underlying DB efficient to drop old data 10/10/2015 10/13/2015

Example Query

select percentile(90, value) from cpu where time > now() - 12h and “region” = ‘west’ group by time(10m), host

Page 67: Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each is an underlying DB efficient to drop old data 10/10/2015 10/13/2015

Example Query

select percentile(90, value) from cpu where time > now() - 12h and “region” = ‘west’ group by time(10m), host

How to map to series?

Page 68: Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each is an underlying DB efficient to drop old data 10/10/2015 10/13/2015

Inverted Index!

Page 69: Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each is an underlying DB efficient to drop old data 10/10/2015 10/13/2015

Inverted Indexcpu,host=A,region=west#idle -> 1 cpu,host=B,region=west#idle -> 2 series to ID

Page 70: Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each is an underlying DB efficient to drop old data 10/10/2015 10/13/2015

Inverted Indexcpu,host=A,region=west#idle -> 1 cpu,host=B,region=west#idle -> 2

cpu -> [idle] measurement to fields

series to ID

Page 71: Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each is an underlying DB efficient to drop old data 10/10/2015 10/13/2015

Inverted Indexcpu,host=A,region=west#idle -> 1 cpu,host=B,region=west#idle -> 2

cpu -> [idle]

host -> [A, B]

measurement to fields

host to values

series to ID

Page 72: Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each is an underlying DB efficient to drop old data 10/10/2015 10/13/2015

Inverted Indexcpu,host=A,region=west#idle -> 1 cpu,host=B,region=west#idle -> 2

cpu -> [idle]

host -> [A, B]

region -> [west]

measurement to fields

host to values

region to values

series to ID

Page 73: Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each is an underlying DB efficient to drop old data 10/10/2015 10/13/2015

Inverted Indexcpu,host=A,region=west#idle -> 1 cpu,host=B,region=west#idle -> 2

cpu -> [idle]

host -> [A, B]

region -> [west]

cpu -> [1, 2] host=A -> [1] host=B -> [1] region=west -> [1, 2]

measurement to fields

host to values

region to values

series to ID

postings lists

Page 74: Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each is an underlying DB efficient to drop old data 10/10/2015 10/13/2015

Index V1

• In-memory

• Load on boot

• Memory constrained

• Slower boot times with high cardinality

Page 75: Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each is an underlying DB efficient to drop old data 10/10/2015 10/13/2015

Index V2

Page 76: Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each is an underlying DB efficient to drop old data 10/10/2015 10/13/2015

in memory index on disk index (do we already have?)

time series meta data

Page 77: Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each is an underlying DB efficient to drop old data 10/10/2015 10/13/2015

in memory index on disk index (do we already have?)

time series meta data

nope

WAL (an append only file)

Page 78: Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each is an underlying DB efficient to drop old data 10/10/2015 10/13/2015

in memory index on disk index (do we already have?)

time series meta data

nope

WAL (an append only file)

on disk indices

(periodic flushes)

Page 79: Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each is an underlying DB efficient to drop old data 10/10/2015 10/13/2015

in memory index on disk index (do we already have?)

time series meta data

nope

WAL (an append only file)

on disk indices

(periodic flushes)

(compactions)on disk index

Page 80: Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each is an underlying DB efficient to drop old data 10/10/2015 10/13/2015
Page 81: Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each is an underlying DB efficient to drop old data 10/10/2015 10/13/2015

Index File Layout

Page 82: Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each is an underlying DB efficient to drop old data 10/10/2015 10/13/2015
Page 83: Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each is an underlying DB efficient to drop old data 10/10/2015 10/13/2015
Page 84: Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each is an underlying DB efficient to drop old data 10/10/2015 10/13/2015

Robin Hood Hashing

• Can fully load table

• No linked lists for lookup

• Perfect for read-only hashes

Page 85: Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each is an underlying DB efficient to drop old data 10/10/2015 10/13/2015

[ , , , , ]

[ 0, 1, 2, 3, 4 ] Positions

[ 0, 0, 0, 0, 0 ]

Keys

Probe Lengths

Page 86: Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each is an underlying DB efficient to drop old data 10/10/2015 10/13/2015

[ , , , , ]

[ 0, 1, 2, 3, 4 ] Positions

[ 0, 0, 0, 0, 0 ]

Keys

Probe Lengths

A -> 0

Page 87: Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each is an underlying DB efficient to drop old data 10/10/2015 10/13/2015

[ A, , , , ]

[ 0, 1, 2, 3, 4 ] Positions

[ 0, 0, 0, 0, 0 ]

Keys

Probe Lengths

A -> 0

Page 88: Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each is an underlying DB efficient to drop old data 10/10/2015 10/13/2015

[ A, , , , ]

[ 0, 1, 2, 3, 4 ] Positions

[ 0, 0, 0, 0, 0 ]

Keys

Probe Lengths

B -> 1

Page 89: Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each is an underlying DB efficient to drop old data 10/10/2015 10/13/2015

[ A, B, , , ]

[ 0, 1, 2, 3, 4 ] Positions

[ 0, 0, 0, 0, 0 ]

Keys

Probe Lengths

B -> 1

Page 90: Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each is an underlying DB efficient to drop old data 10/10/2015 10/13/2015

[ A, B, , , ]

[ 0, 1, 2, 3, 4 ] Positions

[ 0, 0, 0, 0, 0 ]

Keys

Probe Lengths

C -> 1

Page 91: Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each is an underlying DB efficient to drop old data 10/10/2015 10/13/2015

[ A, B, C, , ]

[ 0, 1, 2, 3, 4 ] Positions

[ 0, 0, 0, 0, 0 ]

Keys

Probe Lengths

C -> 2

Page 92: Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each is an underlying DB efficient to drop old data 10/10/2015 10/13/2015

[ A, B, C, , ]

[ 0, 1, 2, 3, 4 ] Positions

[ 0, 0, 1, 0, 0 ]

Keys

Probe Lengths

C -> probe 1

Page 93: Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each is an underlying DB efficient to drop old data 10/10/2015 10/13/2015

[ A, B, C, , ]

[ 0, 1, 2, 3, 4 ] Positions

[ 0, 0, 1, 0, 0 ]

Keys

Probe Lengths

D -> 0

Page 94: Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each is an underlying DB efficient to drop old data 10/10/2015 10/13/2015

[ A, B, C, , ]

[ 0, 1, 2, 3, 4 ] Positions

[ 0, 0, 1, 0, 0 ]

Keys

Probe Lengths

D -> probe 1

Page 95: Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each is an underlying DB efficient to drop old data 10/10/2015 10/13/2015

[ A, D, C, , ]

[ 0, 1, 2, 3, 4 ] Positions

[ 0, 1, 1, 0, 0 ]

Keys

Probe Lengths

B -> probe 1

Page 96: Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each is an underlying DB efficient to drop old data 10/10/2015 10/13/2015

[ A, D, C, , ]

[ 0, 1, 2, 3, 4 ] Positions

[ 0, 1, 1, 0, 0 ]

Keys

Probe Lengths

B -> probe 2

Page 97: Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each is an underlying DB efficient to drop old data 10/10/2015 10/13/2015

[ A, D, C, B, ]

[ 0, 1, 2, 3, 4 ] Positions

[ 0, 1, 1, 2, 0 ]

Keys

Probe Lengths

B -> probe 2

Page 98: Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each is an underlying DB efficient to drop old data 10/10/2015 10/13/2015

Steal from probe rich, give to probe poor

Page 99: Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each is an underlying DB efficient to drop old data 10/10/2015 10/13/2015

Refinement: average probe

Page 100: Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each is an underlying DB efficient to drop old data 10/10/2015 10/13/2015

[ A, D, C, B, ]

[ 0, 1, 2, 3, 4 ] Positions

[ 0, 1, 1, 2, 0 ]

Keys

Probe LengthsAverage: 1

Page 101: Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each is an underlying DB efficient to drop old data 10/10/2015 10/13/2015

[ A, D, C, B, ]

[ 0, 1, 2, 3, 4 ] Positions

[ 0, 1, 1, 2, 0 ]

Keys

Probe LengthsAverage: 1

D -> hashes to 0 + 1

Page 102: Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each is an underlying DB efficient to drop old data 10/10/2015 10/13/2015

Cardinality Estimation

Page 103: Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each is an underlying DB efficient to drop old data 10/10/2015 10/13/2015

HyperLogLog++

Page 104: Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each is an underlying DB efficient to drop old data 10/10/2015 10/13/2015

More Details & Write Up to Come!

Page 105: Inside the InfluxDB Storage Engine.ppt · Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each is an underlying DB efficient to drop old data 10/10/2015 10/13/2015

Thank you.Paul Dix

[email protected] @pauldix