Loki: An Opensource Zipkin/Prometheus Mashup written in Go.

Post on 11-Apr-2017

199 views 1 download

Transcript of Loki: An Opensource Zipkin/Prometheus Mashup written in Go.

Loki: a Zipkin/Prometheus Mashup@tom_wilkie, CNCFCon Berlin April 2017

+ =

Why did I write my own tracer?

Debugging a latency performance issue

with Cortex…

Distributor

Ingester Ingester…

Well, thats my rationalisation…

In reality, this is attempt #2

• Prototype “Weave Tracer” circa 2015

• Concept didn’t require application instrumentation

• Used ptrace to intercept syscalls and infer application behaviour

• Kinda worked, for a very limited definition of “worked”

Prometheus = Greek god. Loki = Norse equivalent?

So what makes Loki different?

Prometheus is to Graphite as

Loki is to Zipkin

Push vs Pull

https://prometheus.io/docs/introduction/faq/#why-do-you-pull-rather-than-push?

https://prometheus.io/blog/2016/07/23/pull-does-not-scale-or-does-it/

• Jobs must know where monitoring is • Can overwhelm graphite with too many

samples

Graphite Prometheus

scraping

your jobs

Prometheus

pushing

your jobs

Graphite

• Tell Prometheus where jobs are (via service discovery)

• Prometheus can back off when overwhelmed • Prometheus knows the identity of each job

Zipkin Loki

scraping

your jobs

Loki

pushing

your jobs

Zipkin

http://job/traces

your job

Loki client library

spans

scraping

Loki

• Client library keep pending spans in an in-memory ring buffer.

• /traces HTTP handler grabs all the in-memory spans and serialises them using Thrift.

• Spans will be dropped if not collected frequently enough.

• Retrieval library ‘knows’ identity of scraped endpoints, adds that to received spans

• … jobs don’t need to know their own identity

• … can be consistent with identity used in Prometheus

• Naive in memory storage implementation

• … makes queries slow, as its just a loop.

• Zipkin-compatible API endpoints

• UI _is_ the Zipkin UI

LokiPrometheus retrieval library

In memory storage

Zipkin API

Zipkin UI

Its all open source:

https://github.com/weaveworks-experiments/loki

…and it’s written in go

This all sounds great! Where’s the catch?

❌ Client library doesn’t actually support multiple scrapers (yet)

❌ Loki query performance sucks (for now)

❌ Loki single-process architecture limits scalability

❌ Can dropped spans, gets worse through jitter

… that Cortex performance issue

Debugging a latency performance issue

with Cortex…

Distributor

Ingester Ingester…

It was garbage collection…

100ms ➡ 25ms

Demo

Client Library

• Make is support multiple scrapers

• Move away from thrift to protos

• More languages

• Useful HTML /traces

Loki Server

• Local storage with BoltDB

• Make queries faster

• Make it distributed, use cloud storage

TODO

Why did I write my own tracer?

Because with OpenTracing, I can.

Thank you!Questions?

We’re hiring!London BerlinSan Francisco

jobs@weave.works