A Guide to the Post Relational Revolution

69
A GUIDE TO THE POST RELATIONAL REVOLUTION @iconara

description

Presentation held at Scandinavian Developer Conference, April 2012

Transcript of A Guide to the Post Relational Revolution

Page 1: A Guide to the Post Relational Revolution

A GUIDE TO THE POST RELATIONAL

REVOLUTION

@iconara

Page 2: A Guide to the Post Relational Revolution

speakerdeck.com/u/iconara(real time!)

Page 3: A Guide to the Post Relational Revolution

Theo / @iconara

Page 4: A Guide to the Post Relational Revolution

Chief Architect atCo-organizer of the local Ruby, Scala and JavaScript user groups

More rep on StackOverflow than both Jeff & Joel

Page 5: A Guide to the Post Relational Revolution

THE WORLDISN’T FLAT

Page 6: A Guide to the Post Relational Revolution

OUT IS THENEW UPwhen scaling up you’re

constrained by Moore’s Law

Page 7: A Guide to the Post Relational Revolution

DISTRIBUTED SYSTEMS ARE

ABOUT TRADEOFFS

Page 8: A Guide to the Post Relational Revolution

WHO NEEDSACID, ANYWAY?

banks, perhaps

Page 9: A Guide to the Post Relational Revolution

JOINS AREA CRUTCH

why split up your data, if all you’re going to do is assemble it over and over again?

Page 10: A Guide to the Post Relational Revolution

OBJECTS DON’TFIT IN TABLES

can you say “impedance mismatch”?

Page 11: A Guide to the Post Relational Revolution

40 YEARS IS A LONG TIME

you didn’t have 256 gigabytes of RAM in 1970

Page 12: A Guide to the Post Relational Revolution

THE RELATIONAL MODEL ISN’T A

GOLDEN HAMMERthe existence of object relational

mappers should be proof enough

Page 13: A Guide to the Post Relational Revolution

WELCOME TO THE POST RELATIONAL

REVOLUTION

Page 14: A Guide to the Post Relational Revolution

POST RELATIONAL STORAGE

Page 15: A Guide to the Post Relational Revolution

KEY/VALUESTORES

the simplest possible database,not exactly a new idea

Page 16: A Guide to the Post Relational Revolution

VALUEKEY

OPAQUE

Riak, Voldemort, LevelDB,Tokyo Cabinet, Berkeley DB

Page 17: A Guide to the Post Relational Revolution

STRUCTUREDKEY/VALUE STORES

sometimes you need just a little bit more

Page 18: A Guide to the Post Relational Revolution

the Bigtable model, “column oriented”, “sparse tables” found in Cassandra and HBase

COLUMN KEYROW KEY

VALUE

COLUMN KEY

VALUE

+ TIMESTAMPSORTED

Page 19: A Guide to the Post Relational Revolution

“datastructure server”, e.g. Redis

KEY VALUE VALUE VALUE

LIST OR SET

KEYVALUE VALUE VALUE

SORTED SET OR HASH

KEY KEY KEY

KEY VALUE

INCREMENT, APPEND, SLICE, CAS

Page 20: A Guide to the Post Relational Revolution

DOCUMENT DATABASES

object databases, but for hipsters

Page 21: A Guide to the Post Relational Revolution
Page 22: A Guide to the Post Relational Revolution

complex objects with lists, numbers, stringssecondary indexes* and partial updates,

MongoDB, CouchDB, RavenDB, Lotus Notes

* subject to availability

{ "firstName": "John", "lastName": "Smith", "age": 25, "address": { "streetAddress": "21 2nd Street", "city": "New York", "state": "NY", "postalCode": "10021" }, "phoneNumber": [ { "type": "home", "number": "212 555-1234" }, { "type": "cell", "number": "646 555-4567" } ] }

Page 23: A Guide to the Post Relational Revolution

GRAPHDATABASES

relational, for real

Page 24: A Guide to the Post Relational Revolution

traversal algorithms, extreme data complexity,Neo4j, AllegroGraph, FlockDB

NODE

NODE

NODE

NODE

NODE

NAME + PROPERTIES

NAME

Page 25: A Guide to the Post Relational Revolution

DIVERSITYI haven’t even mentioned search & indexing systems like Solr and Elastic Search, or distributed filesystems

Page 26: A Guide to the Post Relational Revolution

SOMETIMES TABLES ARE GREAT, TOO

but mostly when you rely heavily on GROUP BY, SUM, AVG, etc. and can’t precompute

Page 27: A Guide to the Post Relational Revolution

POST RELATIONAL SCALING

Page 28: A Guide to the Post Relational Revolution

CAP

Page 29: A Guide to the Post Relational Revolution

CONSISTENCYAVAILABILITY

PARTITION TOLERANCE(choose any two)

Page 30: A Guide to the Post Relational Revolution

OK?

Page 31: A Guide to the Post Relational Revolution

PARTITION TOLERANCE ISN’T

OPTIONAL

Page 32: A Guide to the Post Relational Revolution

CONSISTENCYVS. AVAILABILITY(but in reality, it’s not even that simple)

Page 33: A Guide to the Post Relational Revolution

CONSISTENCYyou can always read what you just wrote,

but keys may become unavailable

Page 34: A Guide to the Post Relational Revolution

AVAILABILITYyou can always read and write,

but you may not always get the latest value

Page 35: A Guide to the Post Relational Revolution

NOT EITHER ORmost databases let you choose

on a query-by-query basis

Page 36: A Guide to the Post Relational Revolution

SHARDINGscaling writes in a consistent system

Page 37: A Guide to the Post Relational Revolution

divide the keyspace into shards, or regions(and store each one redundantly)

SHARD SHARD SHARD

KEYSPACE

REPLICA

REPLICA

REPLICA

REPLICA

REPLICA

REPLICA

REPLICA

REPLICA

REPLICA

DIVIDED BY DATA SIZE

ZA

Page 38: A Guide to the Post Relational Revolution

split a shard when it grows too big, move one of the new shards onto a new node

SHARD SHARD SHARD

KEYSPACE

REPLICA

REPLICA

REPLICA

REPLICA

REPLICA

REPLICA

REPLICA

REPLICA

REPLICA

SPLIT

SHARD

REPLICA

REPLICA

REPLICA

ZA

Page 39: A Guide to the Post Relational Revolution

in reality there’s chunks, tablets or “virtual shards”that are distributed over physical shards

SHARD SHARD SHARD

KEYSPACE

REPLICA

REPLICA

REPLICA

REPLICA

REPLICA

REPLICA

REPLICA

REPLICA

REPLICA

SHARD

REPLICA

REPLICA

REPLICA

ZA

Page 40: A Guide to the Post Relational Revolution

HBASE, MONGODBsharding is easy in theory, hard in practice,

lots data needs to be moved when adding nodes

Page 41: A Guide to the Post Relational Revolution

CONSISTENT HASHING

scaling writes in an available system

Page 42: A Guide to the Post Relational Revolution

each node is responsible for a range of the keyspace,keys are hashed and mapped to the first following node,

(optionally) replicated to subsequent nodes

KEYSPACE

NODE

NODE

NODE

NODE

hash(key)replication

02n

Page 43: A Guide to the Post Relational Revolution

KEYSPACE

NODE

NODE

NODE

NODE

NODE

NEW NODE

02n

when a new node is added, only part of the keyspace needs to be moved

Page 44: A Guide to the Post Relational Revolution

KEYSPACE

NODE

NODE

NODE

NODE

NODE

02n

in practice, “virtual nodes” are evenly distributed over the keyspace, and then mapped onto physical nodes

Page 45: A Guide to the Post Relational Revolution

CASSANDRA, RIAKperfect balance, in theory,

but rings may still need rebalancing

Page 46: A Guide to the Post Relational Revolution

GOSSIP, HINTED HANDOFF, LOG STRUCTURED

STORAGE, COMPACTION, VECTOR CLOCKS, READ REPAIR, JOURNALING, QUORUMS, EVENTUAL

CONSISTENCY, DYNAMO, MAP/REDUCE, 2PC

a few of the things I haven’t mentioned, look them up

Page 47: A Guide to the Post Relational Revolution

LESSONS LEARNED

Page 48: A Guide to the Post Relational Revolution

EVERYTHING THEY TAUGHT YOU

ABOUT DATABASES AT UNIVERSITY

IS WRONG

almost

Page 49: A Guide to the Post Relational Revolution
Page 50: A Guide to the Post Relational Revolution

THINK ABOUT YOUR QUERIES FIRST

don’t optimize for insertion, denormalize heavily, disk is cheap, this ain’t 1970

Page 51: A Guide to the Post Relational Revolution

GIVE A LOT OF THOUGHT TO YOUR

PRIMARY KEYSrange queries over cleverly designedprimary keys can be very powerful,

good keys required for efficient sharding

Page 52: A Guide to the Post Relational Revolution

M04L7NOC5NQSM04L7O05MIU2M04NX42YFUCRM04NYR7VWKJCM04NZA8MJOOAM04NZB88CT14M04NZPOCE8DMM04NZQ9G2T0SM04NZQE7E5VXM04NZSK4V3JNM04NZTRG661RM04NZTSUITJ7M04NZUAILUS5M04NZUG4DTXNM04NZWB9VV0CM04NZWW52T8NM04NZX2JEVO9M04NZX7WD77WM04NZXGOLDEXM04NZXKNQWB3M04NZXLGJ3M6M04NZY7GO39GM04NZZ2SQF1IM04O013HN9L9M04O014DASE6M04O02PE8AD3M04O02PGJBR1M04O03UPTRWGM04O04833ZTLM04O04GH21JFM04O04JQ8B57M04O04UHK3U4M04O056QBNBHM04O05E8XO8NM04O069O8CDKM04O06MG47WKM04O07BHELVDM04O07F30WYXM04O0B39DGEA

Page 53: A Guide to the Post Relational Revolution

M04NZW B9VV0Ctimestamp

2012-02-28 23:59:56 UTCrandom number681 731 004

Page 54: A Guide to the Post Relational Revolution

B9VV0C M04NZWtimestamp2012-02-28 23:59:56 UTC

random number681 731 004

Page 55: A Guide to the Post Relational Revolution

CONSISTENCYIS OVERRATED

when you need it you need it, but most of the time you don’t

Page 56: A Guide to the Post Relational Revolution

DELETING DATA IS NOT TRIVIAL

sometimes delete operations can be more costly than inserts, design your cleaning process early

Page 57: A Guide to the Post Relational Revolution

REDISMONGODB

CASSANDRAour current toolbox

Page 58: A Guide to the Post Relational Revolution

REDISswiss army knife, we use it for “virtual memory”,

counters and even messaging

Page 59: A Guide to the Post Relational Revolution

REDISnot distributed (yet), no automatic failover

Page 60: A Guide to the Post Relational Revolution

MONGODBa very good replacement for MySQL,

replication and automatic failover is fantastic

Page 61: A Guide to the Post Relational Revolution

MONGODBglobal write lock kills performance, easily fragmented,

sharding is complex and (has been) very buggy

Page 62: A Guide to the Post Relational Revolution

MONGODBwe use it for precomputing and storing

metrics for our reporting app

Page 63: A Guide to the Post Relational Revolution

MONGODBwe’re currently pushing around 5K updates/s over three

replica sets, each update incrementing up to 20 numbers

Page 64: A Guide to the Post Relational Revolution

CASSANDRAlow level building blocks, no single point of failure,

great horizontal scalability, TTL on values

Page 65: A Guide to the Post Relational Revolution

CASSANDRAwe use it to store data about website visits,

indexing it to support complex queries

Page 66: A Guide to the Post Relational Revolution

CASSANDRAmillions of rows, some with millions of

columns, adding ~1K new every second

Page 67: A Guide to the Post Relational Revolution

one million writes per second

Page 68: A Guide to the Post Relational Revolution

LEARN SOMETHING NEW TODAY

nosql.mypopescu.comhighscalability.comnosqltapes.com

Page 69: A Guide to the Post Relational Revolution

KTHXBAItwitter.com/iconara

speakerdeck.com/u/iconaraarchitecturalatrocities.com

burtcorp.com