Getting 20x Performance Improvement in Data Routing

Post on 26-Jan-2017

75 views 1 download

Transcript of Getting 20x Performance Improvement in Data Routing

SignalFx

SignalFx

Getting to 20x Performance Improvement on our Data Routing Layer

Rajiv Kurian, Software Engineerrajiv@signalfx.com

Agenda

1. Introduction2. Properties of modern memory systems3. Evolution of our data router4. Results5. Q&A (hopefully)

SignalFx

What does SignalFx do?

• High resolution: • Any mix of resolutions up to 1 sec

• Streaming analytics: • custom analytics pipelines at any scale• Streaming dashboards update within seconds

• Multidimensional metrics: • add dimensions to model metrics however you like• Use them to aggregate & filter (e.g. 99th-percentile-of-latency-by-

service-by-customer) interactively on streaming data

SignalFx is an advanced monitoring platform for modern applications

SignalFx

What is the data routing layer

SignalFx data routerRaw data in Processed data out

PUBLISHER0

SUBSCRIBER 1

SUBSCRIBER 0

SUBSCRIBER 2

PUBLISHER1

PUBLISHER2

Time Series ID: 1212450

Payload: 0b1000100010

SignalFx data router - subscribers

Subscriptions

PUBLISHER0

SUBSCRIBER 1

SUBSCRIBER 0

SUBSCRIBER 2

PUBLISHER1

PUBLISHER2

Subscriber ID: 1224525566

Time Series ID: 1212450

Routing table

Routing table

Key: 128759 Set<Subscriber>

Key Subscribers

Routing data

SignalFx

Properties of modern memory systems

SignalFx Main memory

L1 D L1 I

L3

L1 D L1 I

L2L2

CORE 1 CORE 2

11

1

1

Cache Lines

•The memory subsystem makes a few bets to help us:•Temporal locality•Spatial locality•Prefetching

SignalFx

L3

L2L2

CORE 1 CORE 2

L1 L1

Main memory1

1

1

2

1

2

2

2

1 2

SignalFx

L1 L1

L2L2

L3

CORE 1 CORE 2

Main memory 1 2 3 4 5 6 7 8

1 2 3 4 5 6 7 8

1 2 3 4 5 6 7 8

2

1 2 3 4 5 6 7 8

1 4 3 6 8 7 5

SignalFx

L1 CORE

SignalFx

L2 CORE

SignalFx

MainMemory CORE

SignalFx

The evolution of our data routing layer

Routing table

Routing table

Key: 128759 Set<Subscriber>

Key Subscribers

Routing table v1

HashMap<Long, HashSet<Subscriber>>

Subscriber Objects

Data Key Set<Subscriber>

1212450 {1228, 4412}

3989 {12244}

8921224 {3244}

245819 {3244, 12244, 1228}

Subscriber ID Host Port

1228 …. ….

Subscriber ID Host Port

12244 …. ….

Subscriber ID Host Port

4412 …. ….

Subscriber ID Host Port

3244 …. ….

But …

We want to be able to support millions of subscriptions per publisher, while doing more than 2 million queries per second

Set<Subscriber>Boxed long

key* value*key* value*

List

List

List

List

HashMap <Long, HashSet<Subscriber>>

1

2

3 4

????

So why did we need a better data router?

• Look ups are O(1) ….• Cache misses • High memory overhead

Routing table v2 - bloom filters

A Bloom filter is a space-efficient probabilistic data structure that is used to test whether an element is a member of a set.

False positive matches are possible, but false negatives are not, thus a Bloom filter has a 100% recall rate

SignalFx

Routing table v2 - write

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Subscriber bloom filter

Hash 1 Hash 2 Hash 3

3 9 12

127829

0 0 0 1 0 0 0 0 0 1 0 0 1 0 0 0

SignalFx

0 0 0 1 0 0 0 0 0 1 0 0 1 0 0 0

Routing table v2 - read hit

Subscriber bloom filter

Hash 1 Hash 2 Hash 33 9 12

127829

0 0 0 1 0 0 0 0 0 1 0 0 1 0 0 0

SignalFx

0 0 0 1 0 0 0 0 0 1 0 0 1 0 0 0

Routing table v2 - read miss

Subscriber bloom filter

Hash 1 Hash 2 Hash 33 9 14

120422

0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0

long 0 long 1 long 2 long 3

long 4 long 5 long 6 long 7

long 8 long 9 long 10 long 11

long 12 long 13 long 14 long 15

long 16 long 17 long 18 long 19

long 20 long 21 long 22 long 23

long 24 long 25 long 26 long 27

long 28 long 29 long 30 long 31

long 32 long 33 long 34 long 35

long 36 long 37 long 38 long 39

1

2

3

Typical bloom filter get lookupKey Hash 1 Hash 2 Hash 3

43 168 312

Bloom Filter 1long 4 long 5 long 6

long long long long long long long long long long long long long long long long long long long long long long long long long long long long long long long long

Routing table v2Key Hash 1 Hash 2 Hash 3

43 168 312

Bloom Filter 2long 4 long 5 long 6

long long long long long long long long long long long long long long long long long long long long long long long long long long long long long long long long

Bloom Filter 2long 4 long 5 long 6

long long long long long long long long long long long long long long long long long long long long long long long long long long long long long long long long

Bloom Filter 4long 4 long 5 long 6

long long long long long long long long long long long long long long long long long long long long long long long long long long long long long long long long

Bloom Filter 5long 4 long 5 long 6

long long long long long long long long long long long long long long long long long long long long long long long long long long long long long long long long

Bloom Filter 6long 4 long 5 long 6

long long long long long long long long long long long long long long long long long long long long long long long long long long long long long long long long

1 2 3

1 2 3

1 2 3

1 2 3

1 2 3

1 2 3 num_sub * 3

cache misses

Progress so far

GetSubscribers() Memory

Naive hash map O(1) high

Bloom filter O(num_subscribers) low

So why did we need a better data router?

• CPU Intensive• What did the profiler say? Data

router -> 32%

• Scaled poorly• CPU performance got worse with

the number of subscribers

So how can we do better?

Specialize - we have a limited number of subscribers present at any time. Fewer than 128

ID transformation

Subscriber ID

1228

4412

12244

3244

Subscriber ID

0

1

2

127

subscribercoordination

publisherassignment

Producer Routing table

Data Key(8 bytes) Set<Subscriber>

Subscriber ID(0 - 127) Key (64 bit)

0 3890

subscribe message

Routing table V3

0000000000…..00013890

16 bytes bit set

Boxed long

key* value*key* value*

List

List

List

List

Routing table V3 - regular hash map

1

2

3 4

long 1 long 2

Routing table V4 - single array of longsEmpty

Empty

Empty

Empty

Empty

Empty

Empty

Empty

Empty

Empty

Empty

Empty

Key

Value 0-63

Value 64-127

Routing table V4 - single array of longsKey 0 hash 0 Empty

Empty

Empty

Empty

Empty

Empty

Empty

Empty

Empty

Empty

Empty

Empty

Key 0

Value 0-63

Value 64-127

Empty

Empty

Empty

Empty

Empty

Empty

Empty

Empty

Empty

Routing table V4 - single array of longsKey 0 hash 0

Routing table V4 - single array of longsKey 1 hash 0 Key 0

Value 0-63

Value 64-127

Empty

Empty

Empty

Empty

Empty

Empty

Empty

Empty

Empty

Routing table V4 - single array of longsKey 1 hash 0 Key 0

Value 0-63

Value 64-127

Key 1

Value 0-63

Value 64-127

Empty

Empty

Empty

Empty

Empty

Empty

Routing table V4 - single array of longsKey 2 hash 3 Key 0

Value 0-63

Value 64-127

Key 1

Value 0-63

Value 64-127

Empty

Empty

Empty

Empty

Empty

Empty

Key 0

Value 0-63

Value 64-127

Key 1

Value 0-63

Value 64-127

Empty

Empty

Empty

Key 3

Value 0-63

Value 64-127

Routing table V4 - single array of longsKey 2 hash 3

Routing table V4 - single array of longsKey 0

Value 0-63

Value 64-127

Key 1

Value 0-63

Value 64-127

Empty

Empty

Empty

Key 3

Value 0-63

Value 64-127

1 Key 1 hash 0

Routing table V4 - single array of longs

Key 0

Value 0-63

Value 64-127

Key 1

Value 0-63

Value 64-127

Key 2

Value 0-63

Value 64-127

Subscribers Array

Subscriber 0Subscriber 1Subscriber 2Subscriber 3Subscriber 4

…Subscriber 127

BitSet024

127

Key 1 hash 0

Key 0

Value 0-63

Value 64-127

Key 1

Value 0-63

Value 64-127

Key 2

Value 0-63

Value 64-127

Progress so far

GetSubscribers() Memory

Naive hash map O(1) high

Bloom filter O(num_subscribers) low

Optimized hash map O(1) medium

SignalFx

Results(library)

Microbenchmark• Method:

• Heap: 3G• Number of subscribers: 128• Number of time series: 1048576• All time series have a random number of subscribers: [1, 128]• 2 million random queries

Writes Reads

Naive hash map 34469 ms (42x) 11900 ms (21x)

Bloom filter 31710 ms (39x) 54995 ms (97x)

Optimized hash map 805 ms (1x) 565 ms (1x)

Memory

2.6 GB (27x)

80 MB (0.83x)

96 MB (1x)

SignalFx

Results(Application)

SignalFx

CPU %

SignalFx

CPU %

6 subscribers45 %

SignalFx

Garbage collection

SignalFx

Garbage collection

6 subscribers63 %

Closing remarks / rant

• “Write code first, optimize later”….

• Analyze your data• Metrics• Logging

SignalFx

Thank You!Rajiv Kurian

rajiv@signalfx.com@rzidane360

WE’RE HIRINGjobs@signalfx.com

@SignalFx - signalfx.com

SignalFx

Q & A