Introduction to Apache Cassandra

99
Introduction to Apache 1

Transcript of Introduction to Apache Cassandra

Page 1: Introduction to Apache Cassandra

Introduction to

Apache

1

Page 2: Introduction to Apache Cassandra

Me

Robert StuppFreelancer, Coder, Architect @snazy [email protected]

Contributor to Apache Cassandra, 3.0 UDFs (CASSANDRA-7395 + related)

Databases, Network, Backend

2

Page 3: Introduction to Apache Cassandra

Agenda

Apache Cassandra History

Design Principles

Outstanding differences

CQL Intro

Access C*

Clusters

Cassandra Future

3

Page 4: Introduction to Apache Cassandra

Apache Cassandra History

4

Page 5: Introduction to Apache Cassandra

Apache Cassandrastarted at Facebook

inspired by

Note: Facebook initially had two data centers.

5

Page 6: Introduction to Apache Cassandra

2.1 released in Sep 2014

6

Page 7: Introduction to Apache Cassandra

Apache Cassandra Design Principles

7

Page 8: Introduction to Apache Cassandra

Hardware failurescan and will occur!

Cassandra handles failures. From single node to whole data center.

From client to server.

8

Page 9: Introduction to Apache Cassandra

The complicated part when learning Cassandra,

is to understand

Cassandra’s simplicity

9

Page 10: Introduction to Apache Cassandra

Keep it simpleall nodes are equal

master-less architecture

no name nodes

no SPOF (single point of failure)

no read before modify(prevent race conditions)

10

Page 11: Introduction to Apache Cassandra

Keep it running

No need to take cluster down … e.g.

during maintenance

during software update

Rolling restart is your friend

11

Page 12: Introduction to Apache Cassandra

Outstanding Differences

12

Page 13: Introduction to Apache Cassandra

Cassandra

Highly scalableruns with a few nodes up to 1000+ nodes cluster!

Linear scalability (proven!)

Multi datacenter aware (world-wide!)

No SPOF

13

Page 14: Introduction to Apache Cassandra

Cassandra @ Apple

14

Page 15: Introduction to Apache Cassandra

Linear Scalability

15

Page 16: Introduction to Apache Cassandra

Scaling Cassandra

More data?-> add more nodes

Faster access?-> add more nodes

16

Page 17: Introduction to Apache Cassandra

Read / Write performance

Reads are fast

Writes are even faster

17

Page 18: Introduction to Apache Cassandra

Durability

Writes are durable - period.

18

Page 19: Introduction to Apache Cassandra

Availability @ Netflix

19

Chaos Monkey

kills nodes randomly

Page 20: Introduction to Apache Cassandra

Availability @ Netflix

20

Chaos Gorilla

kill regions randomly

Page 21: Introduction to Apache Cassandra

Availability @ Netflix

21

Chaos Kong

kills whole data centers

Page 22: Introduction to Apache Cassandra

Availability @ Netflix

22

http://de.slideshare.net/planetcassandra/active-active-c-behind-the-scenes-at-

netflix

Page 23: Introduction to Apache Cassandra

32 node cluster (Rasperry PIs) @DataStax

23

Page 24: Introduction to Apache Cassandra

Most outstanding

Great documentation

Many blog posts

Many presentations

Many videos

Regular webinars

Huge, active and healthy community

24

Page 25: Introduction to Apache Cassandra

Data Distribution

25

Page 26: Introduction to Apache Cassandra

DHT

Data is organized in a

„Distributed Hash Table“

(hash over row key)

26

Page 27: Introduction to Apache Cassandra

DHT

27

0

1

2

3

4

5

6

7

Page 28: Introduction to Apache Cassandra

Replication

28

Page 29: Introduction to Apache Cassandra

Replication Factor 2

29

0

1

2

3

4

5

6

7

Row A

Row B

Page 30: Introduction to Apache Cassandra

Replication Factor 3

30

0

1

2

3

4

5

6

7

Row A

Row B

Page 31: Introduction to Apache Cassandra

Consistency

Consistency defined per request

Several consistency levels (CLs)for different needs

31

Page 32: Introduction to Apache Cassandra

Eventual consistency

is not hopefully consistent

32

EC means there’s a time gap until updates are consistently readable

Page 33: Introduction to Apache Cassandra

Consistency Levels

ANY (only for writes)

ONE, LOCAL_ONE,

TWO, THREE, (not recommended)

ALL, (not recommended)

QUORUM, LOCAL_QUORUM, EACH_QUORUM

SERIAL, LOCAL_SERIAL

33

Page 34: Introduction to Apache Cassandra

Consistency

Data is always replicated

CL defines how many replicas must fulfill the request

34

Page 35: Introduction to Apache Cassandra

Write

35

0

1

2

3

4

5

6

7

Write

Page 36: Introduction to Apache Cassandra

Write

36

0

1

2

3

4

5

6

7

Write

Page 37: Introduction to Apache Cassandra

Mutli DC setup

37

DC 1 DC 2

Page 38: Introduction to Apache Cassandra

Multi DC replication

38

WriteDC 1 DC 2

Page 39: Introduction to Apache Cassandra

Mutli DC replication

39

WriteDC 1 DC 2

Page 40: Introduction to Apache Cassandra

Mutli DC replication

40

WriteDC 1 DC 2

Page 41: Introduction to Apache Cassandra

Replication &Consistency

Define # of replicasusing replication factor

Define required consistencyper request

41

Page 42: Introduction to Apache Cassandra

CQL Introduction

CQL = Cassandra query language

42

Page 43: Introduction to Apache Cassandra

“CQL is SQL minus joins,

minus subqueries, plus collections”

(plus user types, plus tuple types)

43

Page 44: Introduction to Apache Cassandra

Why CQL?

Introduces a schema to Cassandra

Familiar syntax

Easy to understand

DML operations are atomic

44

Page 45: Introduction to Apache Cassandra

Data model(hierarchical view)Keyspace (schema)

Table (column family)

Row

partition key (part of primary key)

static columns

clustering key (part of primary key)

columns

45

Page 46: Introduction to Apache Cassandra

CQL / DDL

Similar to SQL

CREATE TABLE …

ALTER TABLE …

DROP TABLE …

46

Page 47: Introduction to Apache Cassandra

CQL / DML

Similar to SQL

INSERT …

UPDATE …

DELETE …

SELECT …

47

Page 48: Introduction to Apache Cassandra

CQL / BATCH

Group related modifications(INSERT, UPDATE, DELETE)

Atomic operation

48

Page 49: Introduction to Apache Cassandra

CQL types

boolean, int (32bit), bigint (64bit),

float, double,

decimal ("BigDecimal"), varint ("BigInteger"),

ascii, text (= varchar), blob,

inet, timestamp, uuid, timeuuid

49

Page 50: Introduction to Apache Cassandra

CQL collection types

list < foo >

set < foo >

map < foo , bar >

50

Since C* 2.1 collections can contain

any type - even other collections.

Page 51: Introduction to Apache Cassandra

CQL composite types

user types (C* 2.1)are composite types with named fields

tuple types (C* 2.1)are unstructured lists of values

51

Page 52: Introduction to Apache Cassandra

CQL / user types

CREATE TYPE address ( street text, zip int, city text); CREATE TABLE users ( username text, addresses map<text, address>, ...

52

Page 53: Introduction to Apache Cassandra

CassandraData Modeling

Access by keyno access by arbitrary WHERE clause

Duplicate data (it’s ok!)

Aggregate data

Build application maintained indexes

53

Page 54: Introduction to Apache Cassandra

RDBMS modeling

54

Page 55: Introduction to Apache Cassandra

C* modeling

55

Page 56: Introduction to Apache Cassandra

Data Modeling with RDBMS

Driven by

"How can I store something right?"

"What answersdo I have?"

56

Page 57: Introduction to Apache Cassandra

Data Modeling with NoSQL

Driven by

"How can I access something right?"

"What questionsdo I have?"

57

Page 58: Introduction to Apache Cassandra

Data Modeling Basics

Work top-down. Think about:

What does the application do?

What are the access patterns?

Now design data model

58

Page 59: Introduction to Apache Cassandra

Data Modeling

59

http://de.slideshare.net/planetcassandra/cassandra-day-sv-2014-fundamentals-of-apache-cassandra-data-modeling

http://de.slideshare.net/planetcassandra/data-modeling-with-travis-price

Page 60: Introduction to Apache Cassandra

Accessing Cassandra

60

Page 61: Introduction to Apache Cassandra

Command Line

cqlsh CQL shell

nodetoolnode/cluster administration

61

Page 62: Introduction to Apache Cassandra

GUI: DevCenter

Visual query tool

62

Page 63: Introduction to Apache Cassandra

Stress test?

Cassandra 2.1 comes with improved stress tool

Simulate read+write workload

Uses configurable data

Works against older C* versions, too

63

Page 64: Introduction to Apache Cassandra

DataStax APLv2Open Source Drivers

for Java

for Python

for C#

for Scala / Spark

https://github.com/datastax/ or http://www.datastax.com/download

64

Page 65: Introduction to Apache Cassandra

Native protocol

C*’s own net protocol for clients

Request multiplexing

Schema change notifications

Cluster change notifications

65

Page 66: Introduction to Apache Cassandra

Third Party Drivers

for huge number of languages

66

Page 67: Introduction to Apache Cassandra

Mappers

High level mappers exist at least for Java

Special case: Scaladue to its strong+complex type model (DataStax OSS Spark driver)

67

Page 68: Introduction to Apache Cassandra

Spark + Hadoop

Yes - works really good

Note: Spark is about 100x faster

68

Page 69: Introduction to Apache Cassandra

Clusters

69

Page 70: Introduction to Apache Cassandra

Cluster sizes

C* works with a few nodes

C* works with several hundred / thousand nodes

70

Page 71: Introduction to Apache Cassandra

Cluster setup

Configure for multiple data centers

Plan for multi-DC setup :)

71

Page 72: Introduction to Apache Cassandra

Cluster experience

Remember: A single Cassandra clusters works over multiple data centers all over the world

„Desaster proven“

Hurricanes

Amazon DC outages

72

Page 73: Introduction to Apache Cassandra

Apache CassandraFuture

73

Page 74: Introduction to Apache Cassandra

Cassandra 3.0 (in development)

User Defined Functions

Aggregate functions

Functional indexes

Workload recording + playback

Better SSTables, Fully off-heap row cache, Better serial consistency

Indexes w/ high cardinality

74

Subject to

change!!!

Page 75: Introduction to Apache Cassandra

Get active !

75

Page 76: Introduction to Apache Cassandra

Cassandra Community

http://cassandra.apache.org/

http://planetcassandra.org/ - Blog

http://www.slideshare.net/planetcassandra/presentations

http://de.slideshare.net/DataStax/presentations

76

Page 77: Introduction to Apache Cassandra

Cassandra Community

https://www.youtube.com/user/PlanetCassandra

https://www.youtube.com/user/DataStax

http://www.datastax.com/dev/blog/

http://www.datastax.com/docs/

Users Mailing List [email protected]

77

Page 78: Introduction to Apache Cassandra

Free C* Training!

http://planetcassandra.org/cassandra-training/

78

Page 79: Introduction to Apache Cassandra

Get involved!

Ask questions, submit RFEs or experiences to

user mailing list

[email protected]

Answers arrive quickly!

79

Page 80: Introduction to Apache Cassandra

Live DemoUser Defined Functions

80

Page 81: Introduction to Apache Cassandra

C* 3.0 UDFs

Users create functions usingCREATE FUNCTION … LANGUAGE … AS …

Java, JavaScript, Scala, Groovy, JRuby, Jython

Functions work on all nodes

81

Page 82: Introduction to Apache Cassandra

C* 3.0 UDFs

Example

CREATE FUNCTION sin(input double) RETURNS double LANGUAGE javascript AS 'Math.sin(input)';

82

This is JavaScript!

Page 83: Introduction to Apache Cassandra

UDFs for what?

Own aggregation code - e.g. SELECT sum(value) FROM table WHERE …;

Functional indexes - e.g. CREATE INDEX idx ON table ( myFunction(colname) );

83

Targeted for C* 3.0

Page 84: Introduction to Apache Cassandra

Thanksfor your attention

Robert Stupp@[email protected]/RobertStupp

Download Apache Cassandra at http://cassandra.apache.org/

84

Page 85: Introduction to Apache Cassandra

Q & A

85

Page 86: Introduction to Apache Cassandra

86

Page 87: Introduction to Apache Cassandra

BACKUP SLIDESUser-Defined-Functions

Demo

87

Page 88: Introduction to Apache Cassandra

88

Page 89: Introduction to Apache Cassandra

89

Page 90: Introduction to Apache Cassandra

90

Page 91: Introduction to Apache Cassandra

91

Page 92: Introduction to Apache Cassandra

92

Page 93: Introduction to Apache Cassandra

93

Page 94: Introduction to Apache Cassandra

94

Page 95: Introduction to Apache Cassandra

95

Page 96: Introduction to Apache Cassandra

96

Page 97: Introduction to Apache Cassandra

97

Page 98: Introduction to Apache Cassandra

98

Page 99: Introduction to Apache Cassandra

99