NoSQL Matters 2012 - Real Time Big Data in practice with Cassandra

30
Real-Time Big Data in practice with Cassandra Michaël Figuière @mfiguiere

Transcript of NoSQL Matters 2012 - Real Time Big Data in practice with Cassandra

Page 1: NoSQL Matters 2012 - Real Time Big Data in practice with Cassandra

Real-Time Big Data in practice with Cassandra

Michaël Figuière@mfiguiere

Page 2: NoSQL Matters 2012 - Real Time Big Data in practice with Cassandra

©2012 DataStax

Speaker

Michaël Figuière

@mfiguiere

2

Page 3: NoSQL Matters 2012 - Real Time Big Data in practice with Cassandra

©2012 DataStax

Ring Architecture

CassandraNode

Node

NodeNode

Node

Node

3

Page 4: NoSQL Matters 2012 - Real Time Big Data in practice with Cassandra

©2012 DataStax

Ring Architecture

4

Node

Node Replica

Replica

NodeReplica

Page 5: NoSQL Matters 2012 - Real Time Big Data in practice with Cassandra

©2012 DataStax

Linear Scalability

5

Client Writes/s by Node Count - Replication Factor = 3

Page 6: NoSQL Matters 2012 - Real Time Big Data in practice with Cassandra

©2012 DataStax

Client / Server Communication

Client

Client

Client

Client

Node?

Node

6

Replica

Replica

Replica

Node

Page 7: NoSQL Matters 2012 - Real Time Big Data in practice with Cassandra

©2012 DataStax

Client / Server Communication

Client

Client

Client

Client

Node

Node

Coordinator node:Forwards all R/W requeststo corresponding replicas

7

Replica

Replica

ReplicaNode

Page 8: NoSQL Matters 2012 - Real Time Big Data in practice with Cassandra

©2012 DataStax

3 replicas

A A A

Time

8

Tunable Consistency

Page 9: NoSQL Matters 2012 - Real Time Big Data in practice with Cassandra

©2012 DataStax

Write and wait for acknowledge from one node

Write ‘B’

B A A

9

Time

A A A

Tunable Consistency

Page 10: NoSQL Matters 2012 - Real Time Big Data in practice with Cassandra

©2012 DataStax

R + W < N

Read waiting for one node to answer

B A A

10

B A A

A A A

Write and wait for acknowledge from one node

Time

Tunable Consistency

Page 11: NoSQL Matters 2012 - Real Time Big Data in practice with Cassandra

©2012 DataStax

R + W = N

11

B B A

B A

A A A

B

Write and wait for acknowledges from two nodes

Read waiting for one node to answer

Tunable ConsistencyTime

Page 12: NoSQL Matters 2012 - Real Time Big Data in practice with Cassandra

©2012 DataStax

Tunable Consistency

R + W > N

12

B A

B A

A A A

B

B

Write and wait for acknowledges from two nodes

Read waiting for two nodes to answer

Time

Page 13: NoSQL Matters 2012 - Real Time Big Data in practice with Cassandra

©2012 DataStax

Tunable Consistency

R = W = QUORUM

13

B A

B A

A A A

B

B

Time

QUORUM = (N / 2) + 1

Page 14: NoSQL Matters 2012 - Real Time Big Data in practice with Cassandra

©2012 DataStax

Request Path

1

2

2

2

3

3

3

4

14

Client

Client

Client

Client

Node

Node

Node

Replica

Replica

Replica

Coordinator node

Page 15: NoSQL Matters 2012 - Real Time Big Data in practice with Cassandra

©2012 DataStax

Column Family Data Model

15

Jonathan

name

[email protected]

email

123 main

address

TX

statejbellis

Daria

name

[email protected]

email

45 2nd st

address

CA

statedhutch

Eric

name

[email protected]

emailegilmore

Row Key Columns

Page 16: NoSQL Matters 2012 - Real Time Big Data in practice with Cassandra

©2012 DataStax

Column Family Data Model

16

dhutch egilmore datastax mzcassiejbellis

egilmoredhutch

datastax mzcassieegilmore

Row Key Columns

Page 17: NoSQL Matters 2012 - Real Time Big Data in practice with Cassandra

©2012 DataStax

CQL3 Data Model

17

gmason

user_id

1765

tweet_id

phenry

author

Give me liberty or give me death

body

PartitionKey

gmason 1742 gwashington I chopped down the cherry tree

ahamilton 1797 jadams A government of laws, not men

ahamilton 1742 gwashington I chopped down the cherry tree

RemainingKey

Timeline Table

Page 18: NoSQL Matters 2012 - Real Time Big Data in practice with Cassandra

©2012 DataStax

CQL3 Data Model

18

gmason

user_id

1765

tweet_id

phenry

author

Give me liberty or give me death

body

gmason 1742 gwashington I chopped down the cherry tree

ahamilton 1797 jadams A government of laws, not men

ahamilton 1742 gwashington I chopped down the cherry tree

Timeline Table

CREATE TABLE timeline ( user_id varchar, tweet_id uuid, author varchar, body varchar, PRIMARY KEY (user_id, tweet_id));

CQL

Page 19: NoSQL Matters 2012 - Real Time Big Data in practice with Cassandra

©2012 DataStax

CQL3 Data Model

19

gmason

user_id

1765

tweet_id

phenry

author

Give me liberty or give me death

body

gmason 1742 gwashington I chopped down the cherry tree

ahamilton 1797 jadams A government of laws, not men

ahamilton 1742 gwashington I chopped down the cherry tree

gwashington

[1742, author]

I chopped down the...

[1742, body]

phenry

[1765, author]

Give me liberty or give...

[1765, body]gmason

gwashington

[1742, author]

I chopped down the...

[1742, body]

jadams

[1797, author]

A government of laws...

[1797, body]ahamilton

Timeline Table

Timeline Physical Layout

Page 20: NoSQL Matters 2012 - Real Time Big Data in practice with Cassandra

©2012 DataStax

Real-Time Analytics

Google Analytics gives you immediate statistics about

your website traffic

20

Page 21: NoSQL Matters 2012 - Real Time Big Data in practice with Cassandra

©2012 DataStax

Web Analytics Data Model

21

/index.html

url

12:00

time

354

views

/index.html 12:01 402

/contacts.html 12:00 23

/contacts.html 12:01 20

Analytics Table

CREATE TABLE analytics ( url varchar, time timestamp, views counter, from_search counter, direct counter, from_referrer counter, PRIMARY KEY (url, time));

300

from_search

333

3

4

20

direct

25

0

1

34

from_referrer

44

20

15

CQL

Page 22: NoSQL Matters 2012 - Real Time Big Data in practice with Cassandra

©2012 DataStax

Web Analytics Data Model

22

/index.html

url time

354

views

/index.html 402

/contacts.html 23

/contacts.html 20

Analytics Table

UPDATE analyticsSET views = views + 1, from_search = from_search + 1WHERE url = '/index.html'AND time = '2012-10-06 12:00';

300

from_search

333

3

4

20

direct

25

0

1

34

from_referrer

44

20

15

CQL

12:00

12:01

12:00

12:01

Page 23: NoSQL Matters 2012 - Real Time Big Data in practice with Cassandra

©2012 DataStax

Web Analytics Data Model

23

/index.html

url time

354

views

/index.html 402

/contacts.html 23

/contacts.html 20

Analytics Table

SELECT * FROM analyticsWHERE url = '/index.html'

300

from_search

333

3

4

20

direct

25

0

1

34

from_referrer

44

20

15

CQL

12:00

12:01

12:00

12:01

Page 24: NoSQL Matters 2012 - Real Time Big Data in practice with Cassandra

©2012 DataStax

Connect and Write

24

Cluster cluster = Cluster.builder() .addContactPoints("127.0.0.1", "127.0.0.2") .build();

Session session = cluster.connect();

session.execute( "INSERT INTO user (user_id, name, email) VALUES (12345, 'johndoe', '[email protected]')");

Page 25: NoSQL Matters 2012 - Real Time Big Data in practice with Cassandra

©2012 DataStax

Read

25

ResultSet rs = session.execute("SELECT * FROM user");

List<CQLRow> rows = rs.fetchAll(); for (CQLRow row : rows) {

String userId = row.getString("user_id"); String name = row.getString("name"); String email = row.getString("email");}

Page 26: NoSQL Matters 2012 - Real Time Big Data in practice with Cassandra

©2012 DataStax

Object Mapping

26

public enum Gender {

@EnumValue("m") MALE, @EnumValue("f") FEMALE;}

@Table("user_and_messages")public class User { @Column("user_id") private String userId; private String name; private String email; private Gender gender;}

Page 27: NoSQL Matters 2012 - Real Time Big Data in practice with Cassandra

©2012 DataStax

Aggregation

27

@Table("user_and_messages")public class User { @Column("user_id") private String userId; private String name; private String email; @GroupBy("user_id") private List<Message> messages;}

public class Message {

private String title; private String body; }

Page 28: NoSQL Matters 2012 - Real Time Big Data in practice with Cassandra

©2012 DataStax

Inheritance

28

@InheritanceValue("tv")public class TV extends Product {

private float size;}

@Table("catalog")@Inheritance({Phone.class, TV.class})@InheritanceColumn("product_type")public abstract class Product {

@Column("product_id") private String productId; private float price; private String vendor; private String model;

}

Page 29: NoSQL Matters 2012 - Real Time Big Data in practice with Cassandra

©2012 DataStax

Online Business Intelligence

Application Cassandra Hadoop

Storage for application in production

Using results in production

Distributed batch processing

Storage forresults

29

Page 30: NoSQL Matters 2012 - Real Time Big Data in practice with Cassandra

@mfiguiere

blog.datastax.com

Stay Tuned!