Cassandra in production

Cassandra in Production2012.0π

The Presenter

● Kjetil Valstadsve● Developer at openadex (Open AdExchange)● Various experience

The Task:● Handle 500 requests/sec

● For now● Handle ~100 updates per request

● For now

The Agenda

● Cassandra Essentials and Data Model● What We Do and How● Scaling and Operations● Advice and Admonitions

Cassandra Essentials and Data Model

Cassandra Essentials

● Inspired by BigTable (Google) and Dynamo (Amazon)● Eventually consistent● Multi-level map-like● Column store

● Released by Facebook, adopted by Apache● Supported by DataStax

● EC2 AMI● Commercial product on top: Brisk

Data Model in Brief

● Atomic unit of storage: The Column– Possibly stored in a Super Column

● Collections of columns: The Row– Or Super Columns

● Collections of rows: The Column Family– Or the Super Column Family

● Collections of column families: The Keyspace

The Column

● Key, value and timestamp:

Age

1330945017654

29

The Row

● Many (many, many) columns:● Columns are sorted on key, good for range queries● Scales wildly – just keep on adding columns● In practice, a persistent hash map

● Rows can be stored sorted, or hashed

Age

1330945017654

29Kjetil

The Column Family

● Consists of many (many) rows:

Age

1330945017654

29Kjetil

YOUNG_AND_PROMISING

The Keyspace

● Consists of (many) column families:● Usually a statically known

set

YOUNG_AND_PROMISING

JUST_YOUNG

WTF a Super Column is

● Columns holding (a few) other columns:● Serialized as single value. Do NOT scale wildly.

Kjetil

1330945017654

Can You Relate?

● Concepts mapped to RDB data model levels● Keyspace => Schema● Column family => Table● Row => Row, but without known columns● Column => Column name and value found in a row

● RDB: Rows, column values are dynamic/data, column names are static/structure

● NoSQL: Column keys are dynamic/data, too.

The Column Revisited

● Columns are dynamic● Columns are data, not structure

● Column keys don't have to be strings● Columns can be any supported, sortable primitive

type, e.g. timestamps (Long)● Don't say column name, say column key

● Columns are sorted● Some RDB unlearning required

What's in a KeyspaceSchema?

● Keyspace settings● Partitioning: Decides which node(s) will store rows● Replication factor● Custom strategies for partitioning, placement etc.

● The set of Column Families● For each Column Family, the type of its keys

● Optional meta-data: ● Pre-defined columns

Data Model Notes/(Anti-)Patterns● Super columns are losing favor

● Prefer “synthetic” columns (e.g. columns grouped by prefix)● Columns in super columns are schema, NOT data!● Cassandra devs hate them

● Partitioning inside of rows is common● E.g. for x partitions, compute hash value from column

name and mod by x, obtaining i. E.g. if “Age” hashes to module 2, write to row name Kjetil[2]

● Helps to distribute r/w traffic among nodes, for column families with busy/crowded rows

What We Do and How

What We Do● Count displays of, and clicks on, ads● Use Cassandra to track # of hits, in time intervals:

● Ads● Groups of ads● Advertiser campaigns● Display boxes● Publisher channels● Publisher sites● Other● ... and combinations thereof

One Hit, Two Boxes

Example List of Updates

● Count +1 for:● 6 ads, 6 ad groups, 6 campaigns. (No overlap.)● 2 display boxes, 1 channel (in this case, same

channel), 1 site● 2 channel/ad combinations● Various secret sauce, e.g. another 4

● 28 updates● If click: 11 updates, count +1 for:

● 1 ad, ad group, campaign, box, channel, etc.

But wait, there's more!● Spec says “ in time intervals” => +1 for each of:

● The current hour● Today● This week● This month● This year● Total

● Total: 6x28 = 168 updates● For average of 500 requests/sec, ~100 updates/req:

● ~50,000 writes/second

Cassandra 1.0 Applied

● New feature/godsend: Counter columns!● Like Long values, but ● Accept updates that are increments to current value

● Combined with batched updates● Phew!

● Scale out for write traffic and workable read speed● Done!

Real data: Row and columns

● D[0]● D: Daily interval, partition 0 (hashed from key)

● 20120121● The day: January 21 this year

● channel_ad/Channel:b29-Ad:e13083● 1 click, 7 hits for ad 13083 in channel 29 on that day

Stupid Pet Tricks for Sorting● Funny-looking values in the column key?

● a1● b29● c432● d2345● e34345

● Sortable, more compact and scalable than:● 00000000029● 00000000432● ...

Given hit in channel 29 ...● Read from an application-configured set of rows● Example config: last 4 hours, 3 days, 2 weeks.

● 9 logical rows to read from● Assume 3 partitions for each logical row.● Read from 27 physical rows, all (or a minimum count of)

columns beginning with: – channel_ad/Channel:b29-Ad:

● Obtain synthetic clicks/hits ratio for each ad● And channel_ad is just one of the ratios to use

Caching of Synthetic Ratios● Use ehcache

● In-memory, fast● In-memory, clutters heap, provokes stop-the-world GC

● Cache in Cassandra● Store synthetic reads back in Cassandra (on-demand “denormalization”)● Still sensitive to high Cassandra loads

● Instance-local Redis instance each box● Stand-alone: Isolated from high Cassandra loads ● Off-heap: Reduce stop-the-world GC● Fast: Configured for in-memory caching behavior● Typical time to retrieve a Java object from 200µs to 2ms● Good trade-off

Client Libraries

● Out-of-the-box: Thrift● Usable, but should not be mixed up with business

logic● Java recommendation: Hector

● https://github.com/rantav/hector● Connection pooling● Just-above-Thrift-level● Type-safe(r) r/w

https://github.com/rantav/hector

Scaling and Operations

Operations: Quickstart on EC2

● DataStax AMI:● http://datastax.com/docs/1.0/install/install_ami● Readymade cluster of N nodes● Free OpsCenter

http://datastax.com/docs/1.0/install/install_ami

Operations: Scaling

● Scaling Strategy:● Doubling/halving capacity is very convenient● => New nodes automatically redistribute load

naturally

Operations: Backup

● System-wide backups● Nodes can be asked to dump Snapshots● Recovery: New nodes started from Snapshots

● Selective backups● Selected data can be dumped to/read from JSON● sstable2json/json2sstable

● Incremental backups

Advice and Admonitions

Introducing Cassandra

● Look for data that● Grows fast● Holds useful information, given time to analyze it● Can be reproduced from source data (e.g. log files)

● Avoid business-critical data● Let RDBMS handle all that

Living with Cassandra

● Columns are data that live in a context:● Sorted in pre-defined ways, determining query

efficiency● Queried for by application in other ways

● Columns are data coupled to your logic● Typical: Encoding and parsing column names● Queries will change in development/maintenance

– Persisted formats should change– Code must change

Cost of Change● Your NoSQL data are, relative to your RDB data:

● Bigger● More loosely-defined ● More closely-coupled to application code● Harder to query (and easier queries => bigger data)● Less supported by mature tools

● Affects cost of change● Rebuild-from-source-data is a better option than

migrate-existing-data - if it's practical

Cassandra in production

Technology

Transcript of Cassandra in production