Introduction to Apache Cassandra

25
An Introduction to Cassandra Sydney Tech Day instaclustr.com @Instaclustr

Transcript of Introduction to Apache Cassandra

Page 1: Introduction to Apache Cassandra

An Introduction to Cassandra

Sydney Tech Day

instaclustr.com @Instaclustr

Page 2: Introduction to Apache Cassandra

Who am I and what do I do?• Adam Zegelin

• Co-founder and Chief Architect of Instaclustr -> www.instaclustr.com

• Instaclustr provides Cassandra-as-a-Service in the cloud.

• Currently in AWS, Azure and Google Cloud in private beta with more to come.

• We currently manage 50+ nodes for various customers, who do various things with it.

Page 3: Introduction to Apache Cassandra

Objectives

• A quick history of Databases

• Introducing Cassandra

Page 4: Introduction to Apache Cassandra

1980 - Stand Alone and Mainframes

1990 - 2005 Networked Computing

2005+ Real Time Web and Big Data

Page 5: Introduction to Apache Cassandra

1980 - Stand Alone and Mainframes

1990 - 2005 Networked Computing

Page 6: Introduction to Apache Cassandra

Early 2000’s

• What happens when you have more data than could fit on a single server?

Page 7: Introduction to Apache Cassandra

Throw money away at the problem

Page 8: Introduction to Apache Cassandra

Lets try a little computer science instead

• BigTable (2006) - 1 Key: Lots of values, Fast sequential access

• Dynamo (2007) - Reliable, Performant, Always On,

• Cassandra (2008) - Dynamo Architecture, BigTable data model and storage

Page 9: Introduction to Apache Cassandra

One database, many servers• All servers (nodes) participate

in the cluster

• Shared nothing

• Need more capacity add more servers

• Multiple servers == built in redundancy

1

3

24

Page 10: Introduction to Apache Cassandra

What are the benefits to this approach• Linear scalability

Page 11: Introduction to Apache Cassandra

Linear scalability

Page 12: Introduction to Apache Cassandra

What are the benefits to this approach• Linear scalability

• High Availability (No single point of failure)

Page 13: Introduction to Apache Cassandra

What are the benefits to this approach

“During Hurricane Sandy, we lost an entire data center. Completely. Lost. It. Our

application fail-over resulted in us losing just a few moments of serving requests for a

particular region of the country, but our data in Cassandra never went offline.”

Nathan Milford, Outbrain’s head of U.S. IT operations management

Page 14: Introduction to Apache Cassandra

What are the benefits to this approach

Page 15: Introduction to Apache Cassandra

What are the benefits to this approach• Linear scalability

• High Availability

• Use commodity hardware

Page 16: Introduction to Apache Cassandra

What are the benefits to this approach

Page 17: Introduction to Apache Cassandra

How does it work ?0

4

28

Page 18: Introduction to Apache Cassandra

PartitioningName Age Postcode Gender

Alice 34 2000 F

Bob 26 2000 M

Eve 25 2004 F

Frank 41 2902 M

Page 19: Introduction to Apache Cassandra

How does it work ?

client

consistentHash(“Alice”)

0

4

28

Replication Factor = 3

Page 20: Introduction to Apache Cassandra

How do we keep data consistent ?

client

consistentHash(“Alice”)

0

4

28

CL.ONE

Write

Ack

Page 21: Introduction to Apache Cassandra

How do we keep data consistent ?

client

consistentHash(“Alice”)

0

4

28

CL.ALL

Write

AckAck

Ack

Page 22: Introduction to Apache Cassandra

How do we keep data consistent ?

client

consistentHash(“Alice”)

0

4

28

CL.QUORUM

Write

Ack

Ack

X

Page 23: Introduction to Apache Cassandra

Also supports multi-dc replication

client

0

4

28

0

4

28

Page 24: Introduction to Apache Cassandra

Add capacity1

5

37

client

consistentHash(“Alice”)

0

4

2

6

Page 25: Introduction to Apache Cassandra

Thank you!