Gnocchi v3 brownbag

45
HUAWEI CANADA Gnocchi v3 Monitoring the next million time-series Gordon Chung, Engineer

Transcript of Gnocchi v3 brownbag

Page 1: Gnocchi v3 brownbag

HUAWEI CANADA

Gnocchi v3

Monitoring the next million time-series

Gordon Chung, Engineer

Page 2: Gnocchi v3 brownbag

HISTORY

do you remember the time…

Page 3: Gnocchi v3 brownbag

built to address storage performance issues encountered in Ceilometer

Page 4: Gnocchi v3 brownbag

designed to be used to store time series and their associated resource metadata

Metric storage (Ceph)

MetricDComputation

workers

data

stores aggregated measurement data

stores metadata

background workers which aggregate data to minimise query computations

Load

Bal

ance

r

AP

IA

PI

AP

I Indexer (SQL)

Page 5: Gnocchi v3 brownbag

MY USE CASE

tired of you tellin' the story your way…

Page 6: Gnocchi v3 brownbag

collect usage information for hundreds of thousands of metrics* over many months for

use in capacity planning recommendations and scheduling

* data is received in batches every x minutes. not streaming

Page 7: Gnocchi v3 brownbag

GETTING STARTED

wanna be startin’ something’…

Page 8: Gnocchi v3 brownbag

HARDWARE

▪ 3 physical hosts▪ 24 physical core

▪ 256GB memory

▪ a bunch of 10K 1TB disks

▪ 1Gb network

Page 9: Gnocchi v3 brownbag

SOFTWARE

▪ Gnocchi 2.1.x (June 3rd 2016)▪ 32 API processes, 1 thread

▪ 3 metricd agents (24 workers each)

▪ PostgreSQL 9.2.15 – single node

▪ Redis 3.0.6 (for coordination) – single node

▪ Ceph 10.2.1 – 3 nodes (20 OSDs, 1 replica)

Page 10: Gnocchi v3 brownbag

POST ~1000 generic resources with20 metrics each (20K metrics)

60 measures per metric.policy rolls up to minute, hour, and day.

8 different aggregations each*.

* min, max, sum, average, median, 95th percentile, count, stdev

Page 11: Gnocchi v3 brownbag

METRIC PROCESSING RATE

• rate drops significantly after initial push

• high variance in processing rate

Page 12: Gnocchi v3 brownbag

uhhh… wtf?this doesn’t happen in NFS backend.

Page 13: Gnocchi v3 brownbag

“LEARNING” HOW TO USE CEPH

everybody's somebody's fool…

Page 14: Gnocchi v3 brownbag

give it more power!add another node… and 10 more OSDs… and more PG groups… and some SSDs for

journals

Page 15: Gnocchi v3 brownbag

~65% better POST rate

Page 16: Gnocchi v3 brownbag

~27% better aggregation rate

Page 17: Gnocchi v3 brownbag

METRIC PROCESSING RATE (with more power)

• same drop in performance

Page 18: Gnocchi v3 brownbag

““LEARNING”” HOW TO USE CEPH

this time around…

Page 19: Gnocchi v3 brownbag

CEPH CONFIGURATIONS

original conf

[osd]

osd journal size = 10000

osd pool default size = 3

osd pool default min size = 2

osd crush chooseleaf type = 1

[osd]

osd journal size = 10000

osd pool default size = 3

osd pool default min size = 2

osd crush chooseleaf type = 1

osd op threads = 36

filestore op threads = 36

filestore queue max ops = 50000

filestore queue committing max ops = 50000

journal max write entries = 50000

journal queue max ops = 50000

good enough conf

http://ceph.com/pgcalc/ to calculate required # of placement groups

Page 20: Gnocchi v3 brownbag

METRIC PROCESSING RATE (varying configurations)

shorter the horizontal length equals better performance.

Higher the spikes equals quicker rate.

Page 21: Gnocchi v3 brownbag

IMPROVING GNOCCHI

take a look at yourself, and then make a change…

Page 22: Gnocchi v3 brownbag

computing and storing ~29 aggregates/worker per second is not bad

Page 23: Gnocchi v3 brownbag

we can minimise IO

Page 24: Gnocchi v3 brownbag

MINIMISING IO

- each aggregation requires:

1. read object

2. update object

3. write object

- with Ceph, we can just write to save.

Page 25: Gnocchi v3 brownbag

NEW STORAGE FORMAT

V2.x{‘values’:{<timestamp>: float, <timestamp>: float, ... <timestamp>: float}}

msgpacks serialised

<time><float><time><float>…<time><float>

binary serialized and lz4 compressed

V3.x

Page 26: Gnocchi v3 brownbag

asking questions about code

Page 27: Gnocchi v3 brownbag

why is this so long?

update existing aggregates

retrieve existing aggregates

why we call this so

much?

writing aggregates

Page 28: Gnocchi v3 brownbag

BENCHMARK RESULTS

showin' how funky strong is your fight…

Page 29: Gnocchi v3 brownbag

WRITE THROUGHPUT

- ~970K measures/s with 5K batches

- ~13K measures/s with 10 measure batch

- 50% gains at higher end

Page 30: Gnocchi v3 brownbag

READ PERFORMANCE

- Negligible change in response time.

- Majority of time is client rendering

Page 31: Gnocchi v3 brownbag

COMPUTATION TIME

- ~0.12s to compute 24 aggregates from 1 point

- ~4.2s to compute 24 aggregates from 11.5K points

- 40%-60% quicker

Page 32: Gnocchi v3 brownbag

DISK USAGE

- 16B/point vs ~6.25B/point (depending on series length and compression schedule)

Page 33: Gnocchi v3 brownbag

OUR USE CASE

- Consistent performance between batches

- 30% to 60% better performance

- more performance gain for larger series.

Page 34: Gnocchi v3 brownbag

OUR USE CASE

- 30% to 40% less operations required

Page 35: Gnocchi v3 brownbag

now computing and storing ~53 aggregates/worker per second.

Page 36: Gnocchi v3 brownbag

USAGE HINTS

what more can i give…

Page 37: Gnocchi v3 brownbag

EFFECTS OF AGGREGATES

- 15%-25% overhead to compute each additional level of granularity

- percentile aggregations requires more CPU time

Page 38: Gnocchi v3 brownbag

THREADING

- set `aggregation_workers_number` to the number of aggregates computed per series

Page 39: Gnocchi v3 brownbag

metricD agents and Ceph OSDs are CPU-intensive services

Page 40: Gnocchi v3 brownbag

EXTRAS

they don’t care about us…

Page 41: Gnocchi v3 brownbag

ADDITIONAL FUNCTIONALITY

▪ aggregate of aggregates▪ get max of means, stdev of maxs, etc…

▪ dynamic resources▪ create and modify resource definitions

▪ aggregate on demand▪ avoid/minimise background aggregation tasks and

defer until request

Page 42: Gnocchi v3 brownbag

GRAFANA V3

Page 43: Gnocchi v3 brownbag

ROADMAP

don’t stop ‘til you get enough…

Page 44: Gnocchi v3 brownbag

FUTURE FUNCTIONALITY

▪ derived granularity aggregates▪ compute annual aggregates using monthly/daily/hourly

aggregates

▪ rolling upgrades

▪ fair scheduling

Page 45: Gnocchi v3 brownbag

thank you