Cassandra at NoSql Matters 2012

46
Apache Cassandra: Real-world scalability, today Jonathan Ellis CTO

description

 

Transcript of Cassandra at NoSql Matters 2012

Page 1: Cassandra at NoSql Matters 2012

Apache Cassandra:Real-world scalability, today!Jonathan Ellis CTO

Page 2: Cassandra at NoSql Matters 2012

©2012 DataStax

Cassandra Job Trends

Page 3: Cassandra at NoSql Matters 2012

©2012 DataStax

“Big Data” trend

Page 4: Cassandra at NoSql Matters 2012

©2012 DataStax

Why Big Data Matters

Research done by McKinsey & Company shows the eye-opening, 10-year category growth rate differences between businesses that smartly use their big data and those that do not.

Page 5: Cassandra at NoSql Matters 2012

©2012 DataStax

Big data

Analytics(Hadoop)

Realtime(“NoSQL”)

?

Page 6: Cassandra at NoSql Matters 2012

©2012 DataStax

Some Casandra users

Page 7: Cassandra at NoSql Matters 2012

©2012 DataStax

• Financial

• Social Media

• Advertising

• Entertainment

• Energy

• E-tail

• Health care

• Government

Industries & use cases• Time series data

• Messaging

• Ad tracking

• Data mining

• User activity streams

• User sessions

• Anything requiring: Scalable performant + highly available

Page 8: Cassandra at NoSql Matters 2012

©2012 DataStax

Why Cassandra?• Fully distributed, no SPOF

• Multi-master, multi-DC

• Linearly scalable

• Larger-than-memory datasets

• Best-in-class performance (not just writes!)

• Fully durable

• Integrated caching

• Tuneable consistency

Page 9: Cassandra at NoSql Matters 2012

©2012 DataStax

Availability• “There is no such thing as standby

infrastructure: there is stuff you always use and stuff that won’t work when you need it.” -- Ben Black: founder, Boundary; ex-AWS

• “The biggest problem with failover is that you're almost never using it until it really hurts. It's like backups that you never test.” -- Rick Branson: instagram; ex-DataStax

Page 10: Cassandra at NoSql Matters 2012

©2012 DataStax

Classic partitioning with SPOFpartition 1 partition 2 partition 3 partition 4

router

client

Page 11: Cassandra at NoSql Matters 2012

©2012 DataStax

Fully distributed, no SPOFclient

p1

p1

p1p3

p6

Page 12: Cassandra at NoSql Matters 2012

©2012 DataStax

Page 13: Cassandra at NoSql Matters 2012

©2012 DataStax

Partitioning

jim

carol

johnny

suzy

age: 36 car: camaro gender: M

age: 37 car: subaru gender: F

age:12 gender: M

age:10 gender: F

Page 14: Cassandra at NoSql Matters 2012

©2012 DataStax

Primary key determines placement*

Partitioning

jim

carol

johnny

suzy

age: 36 car: camaro gender: M

age: 37 car: subaru gender: F

age:12 gender: M

age:10 gender: F

Page 15: Cassandra at NoSql Matters 2012

©2012 DataStax

jim

carol

johnny

suzy

PK

5e02739678...

a9a0198010...

f4eb27cea7...

78b421309e...

MD5 Hash

MD5 hash operation yields

a 128-bit number for

keysof any size.

Page 16: Cassandra at NoSql Matters 2012

©2012 DataStax

Node A

Node D Node C

Node B

The “token ring”

Page 17: Cassandra at NoSql Matters 2012

©2012 DataStax

jim 5e02739678...

carol a9a0198010...

johnny f4eb27cea7...

suzy 78b421309e...

Start EndA 0xc000000000..

10x0000000000..

0

B 0x0000000000..1

0x4000000000..0

C 0x4000000000..1

0x8000000000..0

D 0x8000000000..1

0xc000000000..0

Page 18: Cassandra at NoSql Matters 2012

©2012 DataStax

jim 5e02739678...

carol a9a0198010...

johnny f4eb27cea7...

suzy 78b421309e...

Start EndA 0xc000000000..

10x0000000000..

0

B 0x0000000000..1

0x4000000000..0

C 0x4000000000..1

0x8000000000..0

D 0x8000000000..1

0xc000000000..0

Page 19: Cassandra at NoSql Matters 2012

©2012 DataStax

jim 5e02739678...

carol a9a0198010...

johnny f4eb27cea7...

suzy 78b421309e...

Start EndA 0xc000000000..

10x0000000000..

0

B 0x0000000000..1

0x4000000000..0

C 0x4000000000..1

0x8000000000..0

D 0x8000000000..1

0xc000000000..0

Page 20: Cassandra at NoSql Matters 2012

©2012 DataStax

jim 5e02739678...

carol a9a0198010...

johnny f4eb27cea7...

suzy 78b421309e...

Start EndA 0xc000000000..

10x0000000000..

0

B 0x0000000000..1

0x4000000000..0

C 0x4000000000..1

0x8000000000..0

D 0x8000000000..1

0xc000000000..0

Page 21: Cassandra at NoSql Matters 2012

©2012 DataStax

jim 5e02739678...

carol a9a0198010...

johnny f4eb27cea7...

suzy 78b421309e...

Start EndA 0xc000000000..

10x0000000000..

0

B 0x0000000000..1

0x4000000000..0

C 0x4000000000..1

0x8000000000..0

D 0x8000000000..1

0xc000000000..0

Page 22: Cassandra at NoSql Matters 2012

©2012 DataStax

Node A

Node D Node C

Node B

carol a9a0198010...

Replication

Page 23: Cassandra at NoSql Matters 2012

©2012 DataStax

Node A

Node D Node C

Node B

carol a9a0198010...

Page 24: Cassandra at NoSql Matters 2012

©2012 DataStax

Node A

Node D Node C

Node B

carol a9a0198010...

Page 25: Cassandra at NoSql Matters 2012

©2012 DataStax

Highlights• Adding capacity is application-transparent and

requires no downtime

• No SPOF, not even temporarily• No “primary” replica

• Configurable synchronous/asynchronous

• Tolerates node failure; never have to restart replication “from scratch”

• “Smart” replication avoids correlated failures

Page 26: Cassandra at NoSql Matters 2012

©2012 DataStax

What about performance?• Log-structured storage engine avoids random i/

o

• Excellent performance on both reads and writes

• Row-level isolation via concurrent algorithms• no locking

• Built in compression improves cache hotness

• “Row cache” can replace memcached

Page 27: Cassandra at NoSql Matters 2012

©2012 DataStax

0

5000

10000

15000

20000

25000

30000

35000

Cassandra 0.6

Cassandra 1.0

reads/s writes/s

Page 28: Cassandra at NoSql Matters 2012

©2012 DataStax

Page 29: Cassandra at NoSql Matters 2012

©2012 DataStax

simple text

Netflix

“I can create a Cassandra cluster in any region of the world in 10 minutes. When marketing guys decide we want to move into a certain part of the world, we’re ready.”

Application/Use Case• Manage subscriber interactions with

downloaded movies• Need to handle distributed databases all over

the world (40 countries)• Need better TCO than Oracle

Why Cassandra? • Easy scale and multi-data center support

for geographical data distribution• Data model perfect fit for customer

interaction data• Much better TCO than Oracle or SimpleDB

Page 30: Cassandra at NoSql Matters 2012

©2012 DataStax

simple text

Constant Contact

“Whenever we need new capacity, we just add new nodes online and we’re able to meet whatever demand we have. Cassandra is great for that.”

Application/Use Case• Manage marketing/email campaigns for

small businesses• Needed database to handle social media

data that is very large in volume and must be maintained for long time

• Data is unstructured in nature

Why Cassandra? • Cassandra built for big data scale and able

to persist, manage, and quickly query big data

• Deployed application on Cassandra in 1/3rd the time and 1/10th the cost of Oracle

Page 31: Cassandra at NoSql Matters 2012

©2012 DataStax

simple text

ReachLocalApplication/Use Case• ReachLocal provides end-to-end Internet

advertising services to small and medium-sized businesses in eight countries

• Must track most or all user interaction with marketing campaigns on web sites

Why Cassandra? • The amount of information was beyond

the scalability limits of traditional RDBMS’s

• Has to replicate data to six data centers around the world

• Needed integration with real-time data and analytics/search

Page 32: Cassandra at NoSql Matters 2012

©2012 DataStax

simple text

Backupify

“Cassandra was just a better design all around – more truly horizontally scalable and with less management overhead – and there’s no single point of failure. I looked at Cassandra’s architecture and thought, ‘Yeah, that’s how you do it.’”

Application/Use Case• Cloud-based utility that enables backups and

searches of Google Apps, Gmail, Facebook, Twitter, Blogger and other content.

• Must write lots of data very quickly

Why Cassandra? • Big data requirements necessitated easy

scale out and continuously available database architecture

• Strong Community support of Cassandra• TCO was much better than others

Page 33: Cassandra at NoSql Matters 2012

©2012 DataStax

simple text

OpenWave

“Here are the big ‘checkbox’ items for us with Apache Cassandra: There is no single point of failure, it offers high read-and-write performance, and it has the ability to work on commodity hardware”.

Application/Use Case• Openwave Messaging delivers next

generation converged messaging platform with cloud and social integration capabilities.

Why Cassandra? • Needed new database that would support

geographic redundancy, continuous availability, and big data scale

• Required high IOPS database speed• Better TCO than prior Oracle database

Page 34: Cassandra at NoSql Matters 2012

©2012 DataStax

simple text

Healthx

“We really like the integration with Solr. We get the full redundancy that you’d expect out of Cassandra as well as the full text indexing of Solr. The two things together make a win.”

Application/Use Case• Develops and manages online portals for

healthcare market• Delivered via cloud platform• Manages provider, patient, and other related

data

Why DataStax Enterprise? • Needed to scale, perform, and search data

faster than previous Microsoft SQL Server database farm

• Integrated big data platform that provides one database cluster for all real-time and search data

Page 35: Cassandra at NoSql Matters 2012

©2012 DataStax

Big data

Analytics(Hadoop)

Realtime(“NoSQL”)

?

Page 36: Cassandra at NoSql Matters 2012

©2012 DataStax

The evolution of Analytics

Analytics + Realtime

Page 37: Cassandra at NoSql Matters 2012

©2012 DataStax

The evolution of Analytics

Analytics Realtime

replication

Page 38: Cassandra at NoSql Matters 2012

©2012 DataStax

The evolution of Analytics

ETL

Page 39: Cassandra at NoSql Matters 2012

©2012 DataStax

Big data

Analytics(Hadoop)

Realtime(Cassandra)

DatastaxEnterprise

Page 40: Cassandra at NoSql Matters 2012

©2012 DataStax

Reunification of realtime + analytics

Page 41: Cassandra at NoSql Matters 2012

©2012 DataStax

Page 42: Cassandra at NoSql Matters 2012

©2012 DataStax

Portfolio Demo dataflow

PortfoliosHistorical PricesIntermediate ResultsLargest loss

PortfoliosLive Prices for

today

Largest loss

Page 43: Cassandra at NoSql Matters 2012

©2012 DataStax

Better Hadoop than Hadoop• “Vanilla” Hadoop• 8+ services to setup, monitor, backup, and recover

(NameNode, SecondaryNameNode, DataNode, JobTracker, TaskTracker, Zookeeper, Region Server,...)

• Single points of failure• Can't separate online and offline processing

• DataStax Enterprise• Single, simplified component• Self-organizes based on workload• Peer to peer• JobTracker failover

Page 44: Cassandra at NoSql Matters 2012

©2012 DataStax

SELECT title FROM solr WHERE solr_query='title:natio*';

title-------------------------------------------------------------------------- Bolivia national football team 2002 List of French born footballers who have played for other national teams Lithuania national basketball team at Eurobasket 2009 Bolivia national football team 2000 Kenya national under-20 football team Bolivia national football team 1999 Israel men's national inline hockey team Bolivia national football team 2001

Enterprise search with Solr

Page 45: Cassandra at NoSql Matters 2012

©2012 DataStax

Managing & Monitoring Big DataDataStax OpsCenter manages and monitors all Cassandra and Hadoop operations