Cassandra - An Introduction

35
LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de Cassandra – An Introduction Mikio L. Braun Leo Jugel TU Berlin, twimpact LinuxTag Berlin 13. Mai 2011

description

An introduction to the Cassandra database and some reports on experiences we had at TWIMPACT applying it to real-time analysis of social media data.

Transcript of Cassandra - An Introduction

Page 1: Cassandra - An Introduction

LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de

Cassandra – An Introduction

Mikio L. BraunLeo Jugel

TU Berlin, twimpact

LinuxTag Berlin13. Mai 2011

Page 2: Cassandra - An Introduction

LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de

What is NoSQL

● For many web applications, “classical data bases” are not the right choice:● Database is just used for storing objects.● Consistency not essential.● A lot of concurrent access.

Page 3: Cassandra - An Introduction

LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de

NoSQL in comparison

Classical Databases NoSQL

Powerful query language very simple query language

Scales by using larger servers(“scaling up”)

skales through clustering(“scaling out”)

Changes of database schema very costly No fixed database schema

ACID: Atomicity, Consistency, Isolation, Duratbility

Typically only “eventually consistent”

Transactions, locking, etc. Typically no support for transactions etc.

Page 4: Cassandra - An Introduction

LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de

Brewer's CAP Theorem

● CAP: Consistency, Availability, Partition Tolerance● Consistency: You never get old data.● Availability: read/write operations always possible.● Partition Tolerance: other guarantees hold even if

network of servers break.

● You can only have two of these!

Gilbert, Lynch, Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services, ACM SIGACT News, Volume 33, Issue 2, June 2002

Page 5: Cassandra - An Introduction

LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de

Homepage http://cassandra.apache.org

Language Java

History ● Developed at Facebook for inbox search, released as Open Source in July 2008● Apache Incubator since March 2009● Apache Top-Level since February 2010

Main Properties ● structured key value store● “eventually consistent”● fully equivalent nodes● cluster can be modified without restarting

Support DataStax (http://datastax.com)

Licence Apache 2.0

Page 6: Cassandra - An Introduction

LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de

Version 0.6.x and 0.7.x

● Most important changes in 0.7.x ● config file format changed from XML to YAML● schema modification (ColumnFamilies) without

restart● Beginning support for secondary indices

● However, also problems with stability initially.

Page 7: Cassandra - An Introduction

LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de

Inspirations for Cassandra

● Amazon Dynamo● Clustering without dedicated master node● Peer-to-peer discovery of nodes, HintedHintoff, etc.

● Google BigTable● data model● requires central master node● Provides much more fine grained control:

– which data should be stored together– on-the-fly compression, etc.

Page 8: Cassandra - An Introduction

LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de

Installation

● Download tar.gz from http://cassandra.apache.org/download/

● Unpack● ./conf contains config files● ./bin/cassandra -f to start Cassandra, Ctrl-C to

stop

Page 9: Cassandra - An Introduction

LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de

Configuration

● Database● Version 0.6.x: conf/storage-conf.xml● Version 0.7.x: conf/cassandra.yaml

● JVM Parameters● Version 0.6.x: bin/cassandra.in.sh● Version 0.7.x: conf/cassandra-env.sh

Page 10: Cassandra - An Introduction

LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de

Cassandra's Data Model

Keyspace

key

Column Family Row

column

{name1: value1, name2: value2, name3: value3, ...}

Super Column Family

keykey {name1: value1, ...}

strings

byte arrays

sorted by name!

(= database)

(= table)

sorted according to partitioner

Page 11: Cassandra - An Introduction

LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de

Example: Simple Object Store

class Person {long id;String name;String affiliation;

}

Convert fields to byte arrays

Keyspace “MyDatabase”:ColumnFamily “Person”:

“1”: {“id”: “1”, “name”: “Mikio Braun, “affiliation”: “TU Berlin”}

Page 12: Cassandra - An Introduction

LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de

Example: Indexclass Page {

long id;…List<Links> links;

}

class Link {long id;...int numberOfHits;

}

Keyspace “MyDatabase”ColumnFamily “Pages”

“3”: {“id”: 3, …}“4”: {“id”: 4, …}

ColumnFamily “Links”“1”: {“id”: 1, “url”: …}“17”. {“id”: 17, “url”: …}

ColumnFamily “LinksPerPageByNumberOfHits”“3”: { “00000132:00000001”: “t”, “000025: 00000017”: …}“4”: { “00000044:00000024”: “t”, … }

Of course, everything encoded in byte arrays,not ASCII

Object data fields

Used for both, linkingand indexing!

Here we exploit thatcolumns are sortedby their names.

Page 13: Cassandra - An Introduction

LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de

Are SuperColumnFamilies necessary?

● Usually, you can replace a SuperColumnFamily by several CollumnFamilies.

● Since SuperColumnFamilies make the implementation and the protocol more compelx, there are also people advocating the remove SuperCFs... .

Page 14: Cassandra - An Introduction

LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de

Cassandra's Architecture

Commit Log

MemTable

SSTable SSTable SSTableWrite Operation

Read Operation

Disk

Memory

Compaction!

Flush

Page 15: Cassandra - An Introduction

LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de

Cassandras API

● THRIFT-based API

Read operations

get single column

get_slice range of columns

multiget_slice range of columns in several rows

get_count column count

get_range_slice several columns from range of rows

get_indexed_slices range of columns from index

Write operations

insert single column

batch_mutate several columns in several rows

remove single column

truncate while ColumnFamily

Sonstige

login, describe_*, add/drop column family/keyspace since 0.7.x

Page 16: Cassandra - An Introduction

LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de

Cassandra Clustering

● Fully equivalent nodes, no master node.● Bootstrapping requires seed node.

Node Node Node

Query

“Storage Proxy”

Reads/writes according to consistency level

Page 17: Cassandra - An Introduction

LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de

Consistency Level and Replication Factor

Consistency Level

ANY A node has received the operation, even a HintedHandoff node.

ONE One node has completed the request.

QUORUM Operation has completed on majority of nodes / newest result is returned.

LOCAL_QUORUM QUORUM in local data center

GLOBAL_QUORUM QUORUM in global data center

ALL Wait till all nodes have completed the request

● Replication factor: On how many nodes is a piece of data stored?

● Consistency level:

Page 18: Cassandra - An Introduction

LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de

How to deal with failure

● As long as requirements of the consistency level can be met, everything is fine.

● Hinted Handoff:● A write operation for a faulty node is stored on another node and

pushed to the other node once it is available again.● Data won't be readable after write!

● Read Repair:● After read operation has completed, data will be compared and

updated on all nodes in the background.

Page 19: Cassandra - An Introduction

LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de

Libraries

Python Pycassa: http://github.com/pycassa/pycass Telephus: http://github.com/driftx/Telephus

Java Datanucleus JDO:http://github.com/tnine/Datanucleus-Cassandra-Plugin Hector: http://github.com/rantav/hector Kundera http://code.google.com/p/kundera/ Pelops: http://github.com/s7/scale7-pelops

Grails grails-cassandra: https://github.com/wolpert/grails-cassandra

.NET Aquiles: http://aquiles.codeplex.com/ FluentCassandra: http://github.com/managedfusion/fluentcassandra

Ruby Cassandra: http://github.com/fauna/cassandra

PHP phpcassa: http://github.com/thobbs/phpcassa SimpleCassie: http://code.google.com/p/simpletools-php/wiki/SimpleCassie

Or roll your own based on THRIFT http://thrift.apache.org/ :)

Page 20: Cassandra - An Introduction

LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de

TWIMPACT: An Application

● Real-time analysis of Twitter● Trend analysis based on retweets● Very high data rate (several million tweets per

day, about 50 per second)

Page 21: Cassandra - An Introduction

LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de

TWIMPACT: twimpact.jp

Page 22: Cassandra - An Introduction

LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de

TWIMPACT: twimpact.com

Page 23: Cassandra - An Introduction

LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de

Application Profile

● Information about tweets, users, and retweets● Text matching for non-API-retweets● Retweet frequency and user impact● Operation profile:

get_slice (all)

get get_slice (range)

batch_mutate (one row)

insert batch_mutate remove

Fraction 50.1% 6.0% 0.1% 14.9% 21.5% 6.8% 0.8%

Duration 1.1ms 1.7ms 0.8ms 0.9ms 1.1ms 0.8ms 1.2ms

Page 24: Cassandra - An Introduction

LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de

Practical Experiences with Cassandra

● Very stable● Read operations relatively expensive● Multithreading leads to a huge performance

increase● Requires quite extensive tuning● Clustering doesn't automatically lead to better

performance● Compaction leads to performance decrease of

up to 50%

Page 25: Cassandra - An Introduction

LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de

Performance through Multithreading

● Multithreading leads to much higher throughput● How to achieve multithreading without locking

support?

1

24

816

32

64

Core i7,4 cores(2 + 2 HT)

Page 26: Cassandra - An Introduction

LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de

Performance through Multithreading

● Multithreading leads to much higher throughput● How to achieve multithreading without locking

support?

Page 27: Cassandra - An Introduction

LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de

Cassandra Tuning

● Tuning opportunities:● Size of memtables, thresholds for flushes● Size of JVM Heap● Frequency and depth of compaction

● Where?● MemTableThresholds etc. in conf/cassandra.yaml● JVM Parameters in conf/cassandra-env.sh

Page 28: Cassandra - An Introduction

LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de

Overview of JVM GC

Young GenerationOld Generation

“Eden” “Survivors”

up to a few hundred MB dozens of GBs

CMSInitiatingOccupancyFraction

Additional memory usage while GCis running

Page 29: Cassandra - An Introduction

LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de

Cassandra's Memory Usage

Flush

Compaction

Memtables,indexes, etc.

Size of Memtable: 128M, JVM Heap: 3G, #CF: 12

Page 30: Cassandra - An Introduction

LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de

Cassandra's Memory Usage

● Memtables may survive for a very long time (up to several hours)● are placed in old generation● GC has to process several dozen GBs● heap to small, GC triggered too late

“GC storm”

● Trade-off:● I/O load vs. memory usage

● Do not neglect compaction!

Page 31: Cassandra - An Introduction

LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de

The Effects of GC and Compactions

Compaction

GroßeGC

Page 32: Cassandra - An Introduction

LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de

Cluster vs Single Node

● Our set-up:● 1 Cluster with six-core CPU and RAID 5 with 6 hard disks● 4 Cluster with six-core CPU and RAID 0 with 2 hard disks

● Single node consistently performs 1,5-3 times better.

● Possible causes:● Overhead through network communication/consistency levels, etc.● Hard disk performance significant● Cluster still too small

● Effectively available disk space:● 1 Cluster: 6 * 500 GB = 3TB with RAID 5 = 2.5 TB (83%)● 4 Cluster: 4 * 1TB = 4TB with replication factor 2 = 2TB (50%)

Page 33: Cassandra - An Introduction

LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de

Alternatives

● MongoDB, CouchDB, redis, even memcached... .

● Persistency: Disk or RAM?● Replication: Master/Slave or Peer-to-Peer?● Sharding?● Upcoming trend towards more complex query

languages (Javascript), map-reduce operations, etc.

Page 34: Cassandra - An Introduction

LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de

Summary: Cassandra

● Platform which scales well● Active user and developer community● Read operations quite expensive● For optimal performance, extensive tuning

necessary● Depending on your application, eventually

consistent and lack of transactions/locking might be problematic.

Page 35: Cassandra - An Introduction

LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de

Links

● Apache Cassandra http://cassandra.apache.org● Apache Cassandra Wiki

http://wiki.apache.org/cassandra/FrontPage● DataStax Dokumentation für Cassandra

http://www.datastax.com/docs/0.7/index● My Blog: http://blog.mikiobraun.de● Twimpact: http://beta.twimpact.com