A NOSQL Study: Apache Cassandra

Post on 23-Feb-2016

69 views 1 download

Tags:

description

A NOSQL Study: Apache Cassandra. Shujaat Hussain. Data Model. A single column. Data Model. A single row. Data Model. CAP Theorem. Consistency –the system is in a consistent state after an operation Availability –the system is “always on”, no downtime - PowerPoint PPT Presentation

Transcript of A NOSQL Study: Apache Cassandra

A NOSQL STUDY: APACHE CASSANDRA

Shujaat Hussain

Data Model

A single column

Data Model

A single row

Data Model

CAP Theorem Consistency –the system is in a consistent

state after an operation Availability –the system is “always on”, no

downtime Partition tolerance–the system continues to

function even when split into disconnected subsets (by a network disruption)

Performance vs MySQL w/ 50GB

MySQL 300ms write 350ms read

Cassandra 0.12ms write 15ms read

Querying: Overview You need a key or keys:

Single: key=‘a’ Range: key=‘a’ through ’f’

And columns to retrieve: Slice: cols={bar through kite} By name: key=‘b’ cols={bar, cat, llama}

Nothing like SQL “WHERE col=‘faz’”

Digg is a social news site that allows people to discover and share content from anywhere on the Internet by submitting stories and links, and voting and commenting on submitted stories and links.

Problems Terabytes of data; high transaction rate (reads

dominated) Multiple clusters Management nightmare (high effort, error

prone) Unsatisfied availability requirements

(geographic isolation) Solution

Cassandra as primary data store Datacenter and rack-aware replication

Twitter is a social networking and microblogging service that enables its users to send and read tweets, text-based posts of up to 140 characters.

Terabytes of data, ~1,000,000 ops/s

Inbox Search 100 TB 160 nodes 1/2 billion writes per day (2yr old number?)

Pros Advantages

Massive scalability High availability Lower cost (than competitive solutions at that

scale) (usually) predictable elasticity Schema flexibility, sparse & semi-structured

data

Cons Disadvantages

Limited query capabilities (so far) Eventual consistency is not intuitive to

program for Makes client applications more complicated

No standardizatrion Portability might be an issue

Insufficient access control