Metrics 2.0 @ Monitorama PDX 2014

95
 

Transcript of Metrics 2.0 @ Monitorama PDX 2014

Page 1: Metrics 2.0 @ Monitorama PDX 2014

   

Page 2: Metrics 2.0 @ Monitorama PDX 2014

   

by niteroi @ panoramio.com

Page 3: Metrics 2.0 @ Monitorama PDX 2014

   

vimeo.com/43800150

Page 4: Metrics 2.0 @ Monitorama PDX 2014

   

Page 5: Metrics 2.0 @ Monitorama PDX 2014

   

Page 6: Metrics 2.0 @ Monitorama PDX 2014

   

Page 7: Metrics 2.0 @ Monitorama PDX 2014

   

problems

Metrics 2.0 concepts

implementations & examples

Page 8: Metrics 2.0 @ Monitorama PDX 2014

   

Mostly

graphite

Page 9: Metrics 2.0 @ Monitorama PDX 2014

   

terminology

sync

Page 10: Metrics 2.0 @ Monitorama PDX 2014

   

(1234567890, 82)

(1234567900, 123)

(1234567910, 109)

(1234567920, 77)

db15.mysql.queries_running

host=db15 mysql.queries_running

Page 11: Metrics 2.0 @ Monitorama PDX 2014

   

Problems

Page 12: Metrics 2.0 @ Monitorama PDX 2014

   

Vimeo.com pagerequests/s?

server X write perf?

Page 13: Metrics 2.0 @ Monitorama PDX 2014

   

Finding metrics

Browse hierarchies

Dashboard search .. which keywords?

Search in source code/documentation?

Ask around

...

Page 14: Metrics 2.0 @ Monitorama PDX 2014

   

stats.hits.vimeo_com

stats_counts.hits.vimeo_com

stats.*.requesthostport.

vimeo_com_80

Page 15: Metrics 2.0 @ Monitorama PDX 2014

   

Meaning, difference

Unit?

Where and how.. hard

Prefixes

Understanding metrics

Page 16: Metrics 2.0 @ Monitorama PDX 2014

   

collectd.db.disk.sda1.disk_time.write

Page 17: Metrics 2.0 @ Monitorama PDX 2014

   

Terminology? Which field is where?

Total so far? From zero per datapoint?

Aggregate? Which?

Point at t=x describes which timeframe?

Understanding metrics

Page 18: Metrics 2.0 @ Monitorama PDX 2014

   

Change agent?

Page 19: Metrics 2.0 @ Monitorama PDX 2014

   

Unclear, inconsistent terminology, format

tightly coupled

lack information

Page 20: Metrics 2.0 @ Monitorama PDX 2014

   

O(S*P*A)   S = # Sources     

P = # People     

A = # Aggregators    

Page 21: Metrics 2.0 @ Monitorama PDX 2014

   

Page 22: Metrics 2.0 @ Monitorama PDX 2014

   

Page 23: Metrics 2.0 @ Monitorama PDX 2014

   

times

N

Page 24: Metrics 2.0 @ Monitorama PDX 2014

   

graph definitions are redundant and a time sink.

Page 25: Metrics 2.0 @ Monitorama PDX 2014

   

Page 26: Metrics 2.0 @ Monitorama PDX 2014

   

http://litlquest.com/forest-trees/see-forest-trees-2

Page 27: Metrics 2.0 @ Monitorama PDX 2014

   

metrics 2.0

concepts

Page 28: Metrics 2.0 @ Monitorama PDX 2014

   

Self-describing

Standardized

Orthogonal dimensions

Page 29: Metrics 2.0 @ Monitorama PDX 2014

   

stats.timers.dfs5.proxy-server.object.GET.200.

timing.upper_90

Page 30: Metrics 2.0 @ Monitorama PDX 2014

   

{

server: dfvimeodfsproxy5,

http_method: GET,

http_code: 200,

unit: ms,

metric_type: gauge,

stat: upper_90,

swift_type: object

}

Page 31: Metrics 2.0 @ Monitorama PDX 2014

   

allow more characters

unit: Req/s, site: vimeo.com, ...

Page 32: Metrics 2.0 @ Monitorama PDX 2014

   

Metadata

meta: {

src: proxy.py:458,

from: diamond

}

Page 33: Metrics 2.0 @ Monitorama PDX 2014

   

Conceptual model vs

wire protocol vs

storage

Page 34: Metrics 2.0 @ Monitorama PDX 2014

   

metrics20.org

Page 35: Metrics 2.0 @ Monitorama PDX 2014

   SI + IEC

B Err Warn ConnJob File Req ...

MB/s Err/dReq/h ...

Page 36: Metrics 2.0 @ Monitorama PDX 2014

   

Immediate understanding

of metrics

Minimize time to graphs,

alerting rules, debugging

compatibility & flexibility

in tooling

Page 37: Metrics 2.0 @ Monitorama PDX 2014

   

Implementations & examples

Page 38: Metrics 2.0 @ Monitorama PDX 2014

   

Page 39: Metrics 2.0 @ Monitorama PDX 2014

   

Carbon-tagger

…stats.gauges.host.foo 125 1234567890

service=foo instance=host target_type=gauge unit=B 123 1234567890

Page 40: Metrics 2.0 @ Monitorama PDX 2014

   

Page 41: Metrics 2.0 @ Monitorama PDX 2014

   

Statsdaemon

unit=B

unit=B

...

unit=ms

unit=ms

...

unit=B/s

unit=ms stat=meanunit=ms stat=upper_90...

Page 42: Metrics 2.0 @ Monitorama PDX 2014

   

Keep metric

tags in sync with data

Page 43: Metrics 2.0 @ Monitorama PDX 2014

   

GraphExplorer

Page 44: Metrics 2.0 @ Monitorama PDX 2014

   

Page 45: Metrics 2.0 @ Monitorama PDX 2014

   

Graph­Explorer queries 101

proxy-server swift server:regex unit=ms

(AND)

Page 46: Metrics 2.0 @ Monitorama PDX 2014

   

Page 47: Metrics 2.0 @ Monitorama PDX 2014

   

Page 48: Metrics 2.0 @ Monitorama PDX 2014

   

Page 49: Metrics 2.0 @ Monitorama PDX 2014

   

Page 50: Metrics 2.0 @ Monitorama PDX 2014

   

Page 51: Metrics 2.0 @ Monitorama PDX 2014

   

Page 52: Metrics 2.0 @ Monitorama PDX 2014

   

Page 53: Metrics 2.0 @ Monitorama PDX 2014

   

upper_90 (or stat=upper_90)

from <datetime>to <datetime>

avg over <timespec>(5M, 1h, 3d, ...)

Page 54: Metrics 2.0 @ Monitorama PDX 2014

   

Compare object put/get

stack …

http_method:(PUT|GET)

swift_type=object

avg by http_code,server

Page 55: Metrics 2.0 @ Monitorama PDX 2014

   

Page 56: Metrics 2.0 @ Monitorama PDX 2014

   

Comparing servers

http_method:(PUT|GET)

group by unit,target_type

avg by http_code,swift_type,http_method

Page 57: Metrics 2.0 @ Monitorama PDX 2014

   

Page 58: Metrics 2.0 @ Monitorama PDX 2014

   

transcode unit=Job/savg over <time>

from <datetime> to <datetime>

Page 59: Metrics 2.0 @ Monitorama PDX 2014

   

Note: data is obfuscated

Page 60: Metrics 2.0 @ Monitorama PDX 2014

   

Bucketing

sum by zone:eu-west|us-east|ap-southeast|us-west|

sa-east|vimeo-df|vimeo-lv

group by state

Page 61: Metrics 2.0 @ Monitorama PDX 2014

   

Note: data is obfuscated

Page 62: Metrics 2.0 @ Monitorama PDX 2014

   

Compare job states per region (zones bucket)

group by zone

Page 63: Metrics 2.0 @ Monitorama PDX 2014

   

Note: data is obfuscated

Page 64: Metrics 2.0 @ Monitorama PDX 2014

   

Unit conversion

unit=Mb/s network server:regexsum by server

Page 65: Metrics 2.0 @ Monitorama PDX 2014

   

Page 66: Metrics 2.0 @ Monitorama PDX 2014

   

Page 67: Metrics 2.0 @ Monitorama PDX 2014

   

Integration

Metric unit=B/s Query unit=TB

Page 68: Metrics 2.0 @ Monitorama PDX 2014

   

Page 69: Metrics 2.0 @ Monitorama PDX 2014

   

Deriving

Metric unit=BQuery unit=GB/d

Page 70: Metrics 2.0 @ Monitorama PDX 2014

   

Page 71: Metrics 2.0 @ Monitorama PDX 2014

   

Bonus round

Page 72: Metrics 2.0 @ Monitorama PDX 2014

   

Page 73: Metrics 2.0 @ Monitorama PDX 2014

   

Page 74: Metrics 2.0 @ Monitorama PDX 2014

   

Page 75: Metrics 2.0 @ Monitorama PDX 2014

   

Page 76: Metrics 2.0 @ Monitorama PDX 2014

   

Page 77: Metrics 2.0 @ Monitorama PDX 2014

   

Page 78: Metrics 2.0 @ Monitorama PDX 2014

   

Page 79: Metrics 2.0 @ Monitorama PDX 2014

   

Page 80: Metrics 2.0 @ Monitorama PDX 2014

   

Dashboard definition

 queries = [

'cpu usage sum by core',

'mem unit=B !total group by type:swap',

'stack network unit=Mb/s',

'unit=B (free|used) group by =mountpoint'

]

Page 81: Metrics 2.0 @ Monitorama PDX 2014

   

Page 82: Metrics 2.0 @ Monitorama PDX 2014

   

Page 83: Metrics 2.0 @ Monitorama PDX 2014

   

Page 84: Metrics 2.0 @ Monitorama PDX 2014

   

Future Work

Page 85: Metrics 2.0 @ Monitorama PDX 2014

   

● Storage aggregation rules

● graphite API functions such as cumulative, summarize and smartSummarize

●consolidateBy & Graph renderers

Page 86: Metrics 2.0 @ Monitorama PDX 2014

   

Page 87: Metrics 2.0 @ Monitorama PDX 2014

   

Self-describing & standardized

stat=upper/lower/mean/...target_type=counter..

Page 88: Metrics 2.0 @ Monitorama PDX 2014

   

Select your view

Page 89: Metrics 2.0 @ Monitorama PDX 2014

   From: dygraphs.com

Page 90: Metrics 2.0 @ Monitorama PDX 2014

   

Facet based suggestions

Page 91: Metrics 2.0 @ Monitorama PDX 2014

   

unit=Err/s

Page 92: Metrics 2.0 @ Monitorama PDX 2014

   

Conclusion

structuredself­describing standardized

metrics = enabler

Page 93: Metrics 2.0 @ Monitorama PDX 2014

   

Conclusion

Manual composing should be last resort, not default

Page 94: Metrics 2.0 @ Monitorama PDX 2014

   

Conclusion

This sucks– Tell me why– What should we do instead?

This is neat!– Help me make it better– Adopt native metrics 2.0, structured_metrics

Page 95: Metrics 2.0 @ Monitorama PDX 2014

   

Seen in this presentation:metrics20.org

vimeo.github.io/graph-explorer

github.com/vimeo/timeserieswidget

github.com/vimeo/carbon-tagger

github.com/vimeo/statsdaemon

github.com/Dieterbe/anthracite

github.com/graphite-ng

github.com/vimeo/graphite-influxdbgithub.com/vimeo/smoketcp

github.com/vimeo/tailgate

twitter.com/Dieter_bedieter.plaetinck.be

You might also like: