Introduction to Apache Cassandra

Post on 15-Aug-2015

97 views 2 download

Tags:

Transcript of Introduction to Apache Cassandra

An Introduction to Cassandra

Sydney Tech Day

instaclustr.com @Instaclustr

Who am I and what do I do?• Adam Zegelin

• Co-founder and Chief Architect of Instaclustr -> www.instaclustr.com

• Instaclustr provides Cassandra-as-a-Service in the cloud.

• Currently in AWS, Azure and Google Cloud in private beta with more to come.

• We currently manage 50+ nodes for various customers, who do various things with it.

Objectives

• A quick history of Databases

• Introducing Cassandra

1980 - Stand Alone and Mainframes

1990 - 2005 Networked Computing

2005+ Real Time Web and Big Data

1980 - Stand Alone and Mainframes

1990 - 2005 Networked Computing

Early 2000’s

• What happens when you have more data than could fit on a single server?

Throw money away at the problem

Lets try a little computer science instead

• BigTable (2006) - 1 Key: Lots of values, Fast sequential access

• Dynamo (2007) - Reliable, Performant, Always On,

• Cassandra (2008) - Dynamo Architecture, BigTable data model and storage

One database, many servers• All servers (nodes) participate

in the cluster

• Shared nothing

• Need more capacity add more servers

• Multiple servers == built in redundancy

1

3

24

What are the benefits to this approach• Linear scalability

Linear scalability

What are the benefits to this approach• Linear scalability

• High Availability (No single point of failure)

What are the benefits to this approach

“During Hurricane Sandy, we lost an entire data center. Completely. Lost. It. Our

application fail-over resulted in us losing just a few moments of serving requests for a

particular region of the country, but our data in Cassandra never went offline.”

Nathan Milford, Outbrain’s head of U.S. IT operations management

What are the benefits to this approach

What are the benefits to this approach• Linear scalability

• High Availability

• Use commodity hardware

What are the benefits to this approach

How does it work ?0

4

28

PartitioningName Age Postcode Gender

Alice 34 2000 F

Bob 26 2000 M

Eve 25 2004 F

Frank 41 2902 M

How does it work ?

client

consistentHash(“Alice”)

0

4

28

Replication Factor = 3

How do we keep data consistent ?

client

consistentHash(“Alice”)

0

4

28

CL.ONE

Write

Ack

How do we keep data consistent ?

client

consistentHash(“Alice”)

0

4

28

CL.ALL

Write

AckAck

Ack

How do we keep data consistent ?

client

consistentHash(“Alice”)

0

4

28

CL.QUORUM

Write

Ack

Ack

X

Also supports multi-dc replication

client

0

4

28

0

4

28

Add capacity1

5

37

client

consistentHash(“Alice”)

0

4

2

6

Thank you!