Lessons Learned from Building and Operating Scuba

Post on 14-Feb-2017

816 views 3 download

Transcript of Lessons Learned from Building and Operating Scuba

Lessons learned from building and operating Scuba

Ciprian GereaFacebook

ODS

Scuba

events

metrics

livehistorical

Presto & Hive

Demo time

• Getting started– Writing to Scuba

What is Scuba

• Database– Real time ingestion & queries– Simple query model: rollups, no joins– Simple data model, flexible schema

• UI platform• Service– Runs its own ETL– Demand control

• Retention• Queries

Scribelogsfrom

serversScuba GUI

`scuba` CLI

Scuba gauge ScriptAlerts

combinedlogs

for each scribe

category

Tupperware

Ptail

manage perfpipe

tailer

Tailer

Data storagerockfortexpress.wildcard

SMC tier in PRN1

Scuba backend

Root aggregator

Leaf

adddirectly to

leaf servers

queries

results

SparkleTable insertion counts

Scuba system architecture

valid

ation

dataswarm

HiveToScuba

Scuba DB

• Data lives in tables• Columns can be: int/string/vector<string>• Can change schema on the fly.• Shared nothing storage in memory & flash• Data sharded at random• Only support rollup queries:

sum/avg/percentile.• Best effort queries: skip bad nodes.

Demo time

• Let’s run some queries• Customize the UI• ETL control

How we keep it running

• ODS metrics for everything• Scuba data sets for queries & subsystems we’re

actively debugging• Dashboards– Cubism is king!– Unidash for niche cases

• Active management of demand– Table size quotas– CPU load -> push to stream processing

Root cause for outages

• Other systems– Scribe– Hosting layer– Deployment mechanism

• Media failures: high on disk, low on flash.• Queries of doom• High load– DOS workloads– Load shedding bugs

Why it is successful

• Scuba’s niche: – Easy to get started– Fast <50ms P50 wall time– Smooth learning curve– UI is customizable (~1k custom presenters!!!)– Its flaws are acceptable• Not everyone needs transactions from the beginning• Users are OK with retrying queries

• Other tools don’t serve this niche well

What could be better

• Customers ask– More space– More consistent results– More expressive queries

• Sharding• Better persistent storage• Better support for time series

Q & A

Cubism Intro

• Horizon charts