Paris Cassandra Meetup - Cassandra for Developers

46
Cassandra for Developers DataStax Drivers in Practice Michaël Figuière Drivers & Developer Tools Architect @mfiguiere

Transcript of Paris Cassandra Meetup - Cassandra for Developers

Page 1: Paris Cassandra Meetup - Cassandra for Developers

Cassandra for DevelopersDataStax Drivers in Practice

Michaël FiguièreDrivers & Developer Tools Architect

@mfiguiere

Page 2: Paris Cassandra Meetup - Cassandra for Developers

© 2014 DataStax, All Rights Reserved.

Cassandra Peer to Peer Architecture

2

Node

Node Node

Node

NodeNode

Each node contains a replica of some partitions of tables

Every node have the same role, there’s no Master or Slave

Page 3: Paris Cassandra Meetup - Cassandra for Developers

© 2014 DataStax, All Rights Reserved.

Cassandra Peer to Peer Architecture

3

Node

Node Replica

Replica

ReplicaNode

Each partition is stored in several Replicas to ensure durability and high availability

Page 4: Paris Cassandra Meetup - Cassandra for Developers

© 2014 DataStax, All Rights Reserved.

Client / Server Communication

4

Client

Client

Client

Client

Node

Node Replica

Replica

ReplicaNode

Coordinator node:Forwards all R/W requeststo corresponding replicas

Page 5: Paris Cassandra Meetup - Cassandra for Developers

© 2014 DataStax, All Rights Reserved.

Tunable Consistency

5

3 replicas

A A A

Time

5

Page 6: Paris Cassandra Meetup - Cassandra for Developers

© 2014 DataStax, All Rights Reserved.

Tunable Consistency

66

Write and wait for acknowledge from one node

Write ‘B’

B A A

Time

A A A

Page 7: Paris Cassandra Meetup - Cassandra for Developers

© 2014 DataStax, All Rights Reserved.

Tunable Consistency

77

Write and wait for acknowledge from one node

Write ‘B’

B A A

Time

A A A

Page 8: Paris Cassandra Meetup - Cassandra for Developers

© 2014 DataStax, All Rights Reserved.

Tunable Consistency

88

R + W < N

Read waiting for one node to answer

B A A

8

B A A

A A A

Write and wait for acknowledge from one node

Time

Page 9: Paris Cassandra Meetup - Cassandra for Developers

© 2014 DataStax, All Rights Reserved.

Tunable Consistency

9

R + W = N

B B A

B A

A A A

B

Write and wait for acknowledges from two nodes

Read waiting for one node to answer

Time

Page 10: Paris Cassandra Meetup - Cassandra for Developers

© 2014 DataStax, All Rights Reserved.

Tunable Consistency

10

R + W > N

B A

B A

A A A

B

B

Write and wait for acknowledges from two nodes

Read waiting for two nodes to answer

Time

Page 11: Paris Cassandra Meetup - Cassandra for Developers

© 2014 DataStax, All Rights Reserved.

Tunable Consistency

11

R = W = QUORUM

B A

B A

A A A

B

B

Time

QUORUM = (N / 2) + 1

Page 12: Paris Cassandra Meetup - Cassandra for Developers

© 2014 DataStax, All Rights Reserved.

Cassandra Query Language (CQL)

• Similar to SQL, mostly a subset• Without joins, sub-queries, and aggregations• Primary Key contains:

• A Partition Key used to select the partition that will store the Row

• Some Clustering Columns, used to define how Rows should be grouped and sorted on the disk

• Support Collections• Support User Defined Types (UDT)

12

Page 13: Paris Cassandra Meetup - Cassandra for Developers

© 2014 DataStax, All Rights Reserved. 13

CQL: Create Table

CREATE TABLE users ( login text, name text, age int, …PRIMARY KEY (login));

login is the partition key, it will be hashed and rows will be spread over the cluster on different partitions

Just like in SQL!

Page 14: Paris Cassandra Meetup - Cassandra for Developers

© 2014 DataStax, All Rights Reserved. 14

CQL: Clustered Table

CREATE TABLE mailbox ( login text, message_id timeuuid, interlocutor text, message text, PRIMARY KEY((login), message_id));

message_id is a clustering column, it means that all the rows with a same login will be grouped and sorted by message_id on the disk

A TimeUUID is a UUID that can be sorted chronologically

Page 15: Paris Cassandra Meetup - Cassandra for Developers

© 2014 DataStax, All Rights Reserved. 15

CQL: Queries

SELECT * FROM mailboxWHERE login = jdoeAND message_id = '2014-09-25 16:00:00';

Get message by user and message_id (date)

SELECT * FROM mailbox WHERE login = jdoeAND message_id <= '2014-09-25 16:00:00'AND message_id >= '2014-09-20 16:00:00';

Get message by user and date interval

WHERE clauses can only be constraints on the primary key and range queries are not possible on the partition key

Page 16: Paris Cassandra Meetup - Cassandra for Developers

© 2014 DataStax, All Rights Reserved. 16

CQL: Collections

CREATE TABLE users ( login text, name text, age int, friends set<text>, hobbies list<text>, languages map<int, text>, … PRIMARY KEY (login)); It’s not possible to use nested

collections… yet

set and list have a similar semantic as in Java

Page 17: Paris Cassandra Meetup - Cassandra for Developers

© 2014 DataStax, All Rights Reserved. 17

Cassandra 2.1: User Defined Type (UDT)

CREATE TABLE users ( login text, … street_number int, street_name text, postcode int, country text, …PRIMARY KEY(login));

CREATE TYPE address ( street_number int, street_name text, postcode int, country text);

CREATE TABLE users ( login text, … location frozen<address>, … PRIMARY KEY(login));

Page 18: Paris Cassandra Meetup - Cassandra for Developers

© 2014 DataStax, All Rights Reserved. 18

Cassandra 2.1: UDT Insert / Update

INSERT INTO users(login,name, location) VALUES ('jdoe','John DOE', { 'street_number': 124, 'street_name': 'Congress Avenue', 'postcode': 95054, 'country': 'USA' });

UPDATE users SET location = { 'street_number': 125, 'street_name': 'Congress Avenue', 'postcode': 95054, 'country': 'USA' } WHERE login = jdoe;

Page 19: Paris Cassandra Meetup - Cassandra for Developers

© 2014 DataStax, All Rights Reserved.

Client / Server Communication

19

Client

Client

Client

Client

Node

Node Replica

Replica

ReplicaNode

Coordinator node:Forwards all R/W requeststo corresponding replicas

Page 20: Paris Cassandra Meetup - Cassandra for Developers

© 2014 DataStax, All Rights Reserved.

Request Pipelining

20

Client

WithoutRequest Pipelining

Cassandra

Client CassandraWith

Request Pipelining

Page 21: Paris Cassandra Meetup - Cassandra for Developers

© 2014 DataStax, All Rights Reserved.

Notifications

21

Client

WithoutNotifications

WithNotifications

NodeNode

Node

Client

NodeNode

Node

Page 22: Paris Cassandra Meetup - Cassandra for Developers

© 2014 DataStax, All Rights Reserved.

Asynchronous Driver Architecture

22

ClientThread

Node

Node

Node

ClientThread

ClientThread

Node

Driver

Page 23: Paris Cassandra Meetup - Cassandra for Developers

© 2014 DataStax, All Rights Reserved.

Asynchronous Driver Architecture

23

ClientThread

Node

Node

Node

ClientThread

ClientThread

Node

6

23

45

1

Driver

Page 24: Paris Cassandra Meetup - Cassandra for Developers

© 2014 DataStax, All Rights Reserved.

Failover

24

ClientThread

Node

Node

Node

ClientThread

ClientThread

Node

7

2

4

531

Driver

6

Page 25: Paris Cassandra Meetup - Cassandra for Developers

© 2014 DataStax, All Rights Reserved.

DataStax Drivers Highlights

• Asynchronous architecture using Non Blocking IOs• Prepared Statements Support• Automatic Failover• Node Discovery• Tunable Load Balancing

• Round Robin, Latency Awareness, Multi Data Centers, Replica Awareness

• Cassandra Tracing Support• Compression & SSL

25

Page 26: Paris Cassandra Meetup - Cassandra for Developers

© 2014 DataStax, All Rights Reserved.

DataCenter Aware Balancing

26

Node

Node

NodeClient

Datacenter B

Node

Node

Node

Client

Client

Client

Client

Client

Datacenter A

Local nodes are queried first, if non are available, the request could be sent to a remote node.

Page 27: Paris Cassandra Meetup - Cassandra for Developers

© 2014 DataStax, All Rights Reserved.

Token Aware Balancing

27

Nodes that own a Replica of the PK being read or written by the query will be contacted first.

Node

Node

ReplicaNode

Client

Replica

Replica

Partition Key will be inferred from Prepared Statements metadata

Page 28: Paris Cassandra Meetup - Cassandra for Developers

© 2014 DataStax, All Rights Reserved.

State of DataStax Drivers

28

Cassandra1.2

Cassandra2.0

Cassandra2.1

Java 1.0 - 2.1 2.0 - 2.1 2.1

Python 1.0 - 2.1 2.0 - 2.1 2.1

C# 1.0 - 2.1 2.0 - 2.1 2.1

Node.js 1.0 1.0 Later

C++ 1.0-beta4 1.0-beta4 Later

Ruby 1.0-beta3 1.0-beta3 Later

Later versions of Cassandra can use earlier Drivers, but some features won’t be supported

Page 29: Paris Cassandra Meetup - Cassandra for Developers

© 2014 DataStax, All Rights Reserved. 29

DataStax Driver in Practice

<dependency>  <groupId>com.datastax.cassandra</groupId>  <artifactId>cassandra-­‐driver-­‐core</artifactId>  <version>2.1.0</version>  

</dependency>  

Java

$  pip  install  cassandra-­‐driver

Python

PM>  Install-­‐Package  CassandraCSharpDriver

C#

gem  install  cassandra-­‐driver  -­‐-­‐pre

Ruby

$  npm  install  cassandra-­‐driver

Node.js

Page 30: Paris Cassandra Meetup - Cassandra for Developers

© 2014 DataStax, All Rights Reserved. 30

Connect and Write

Cluster cluster = Cluster.builder() .addContactPoints("10.1.2.5", "cassandra_node3") .build();

Session session = cluster.connect(“my_keyspace");

session.execute( "INSERT INTO user (user_id, name, email) VALUES (12345, 'johndoe', '[email protected]')");

The rest of the nodes will be discovered by the driver

A keyspace is just like a schema in the SQL world

Page 31: Paris Cassandra Meetup - Cassandra for Developers

© 2014 DataStax, All Rights Reserved. 31

Read

ResultSet resultSet = session.execute( "SELECT * FROM user WHERE user_id IN (1,8,13)");

List<Row> rows = resultSet.all(); for (Row row : rows) {

String userId = row.getString("user_id"); String name = row.getString("name"); String email = row.getString("email");}

Actually ResultSet also implements Iterable<Row>

Session is a thread safe object. A singleton should be instantiated at startup

Page 32: Paris Cassandra Meetup - Cassandra for Developers

© 2014 DataStax, All Rights Reserved. 32

Write with Prepared Statements

PreparedStatement insertUser = session.prepare( "INSERT INTO user (user_id, name, email) VALUES (?, ?, ?)");

BoundStatement statement = insertUser .bind(12345, "johndoe", "[email protected]") .setConsistencyLevel(ConsistencyLevel.QUORUM);

session.execute(statement);

Parameters can be named as well

PreparedStatement objects are also threadsafe, just create a singleton at startup

BoundStatement is a stateful, NON threadsafe object

Consistency Level can be set for each statement

Page 33: Paris Cassandra Meetup - Cassandra for Developers

© 2014 DataStax, All Rights Reserved. 33

Asynchronous Read

ResultSetFuture future = session.executeAsync( "SELECT * FROM user WHERE user_id IN (1,2,3)");

ResultSet resultSet = future.get();

List<Row> rows = resultSet.all(); for (Row row : rows) {

String userId = row.getString("user_id"); String name = row.getString("name"); String email = row.getString("email");}

Will not block. Returns immediately

Will block until less all the connections are busy

Page 34: Paris Cassandra Meetup - Cassandra for Developers

© 2014 DataStax, All Rights Reserved. 34

Asynchronous Read with Callbacks

ResultSetFuture future = session.executeAsync( "SELECT * FROM user WHERE user_id IN (1,2,3)");

future.addListener(new Runnable() { public void run() { // Process the results here }}, executor);

ResultSetFuture implements Guava’s ListenableFuture

executor = Executors .newCachedThreadPool();

executor = MoreExecutors .sameThreadExecutor();

Only if your listener code is trivial and non blocking as it’ll be executed in the IO Thread

…Or any thread pool that you prefer

Page 35: Paris Cassandra Meetup - Cassandra for Developers

© 2014 DataStax, All Rights Reserved. 35

Query Builder

import staticcom.datastax.driver.core.querybuilder.QueryBuilder.*;

Statement selectAll = select().all().from("user").where(eq("user_id", userId));

session.execute(selectAll);

Statement insert = insertInto("user") .value("user_id", 2) .value("name", "johndoe") .value("email", "[email protected]");

session.execute(insert);

import static of QueryBuilder is required in order to use the DSL

Page 36: Paris Cassandra Meetup - Cassandra for Developers

© 2014 DataStax, All Rights Reserved. 36

Python

cluster = Cluster(['10.1.1.3', '10.1.1.4', ’10.1.1.5'])session = cluster.connect('mykeyspace')

def handle_success(rows): user = rows[0] try: process_user(user.name, user.age, user.id) except Exception: log.error("Failed to process user %s", user.id) # don't re-raise errors in the callback

def handle_error(exception): log.error("Failed to fetch user info: %s", exception)

future = session.execute_async("SELECT * FROM users WHERE user_id=3")future.add_callbacks(handle_success, handle_error)

It’s also possible to retrieve the result from the future

object synchronously

Page 37: Paris Cassandra Meetup - Cassandra for Developers

© 2014 DataStax, All Rights Reserved. 37

C#

var cluster = Cluster.Builder() .AddContactPoints("host1", "host2", "host3") .Build();var session = cluster.Connect("sample_keyspace");

var task = session.ExecuteAsync(statement);task.ContinueWith((t) =>{ var rs = t.Result; foreach (var row in rs) { //Get the values from each row }}, TaskContinuationOptions.OnlyOnRanToCompletion);

Asynchronously execute a query using the TPL

Page 38: Paris Cassandra Meetup - Cassandra for Developers

© 2014 DataStax, All Rights Reserved. 38

C / C++

CassString query = cass_string_init("SELECT keyspace_name FROM system.schema_keyspaces;");CassStatement* statement = cass_statement_new(query, 0);

CassFuture* result_future = cass_session_execute(session, statement);

if (cass_future_error_code(result_future) == CASS_OK) { const CassResult* result = cass_future_get_result(result_future); CassIterator* rows = cass_iterator_from_result(result);

while (cass_iterator_next(rows)) { // Process results }

cass_result_free(result); cass_iterator_free(rows);}

cass_future_free(result_future);

Each structure must be freed with the appropriate function

Page 39: Paris Cassandra Meetup - Cassandra for Developers

© 2014 DataStax, All Rights Reserved. 39

Node.js

var cassandra = require('cassandra-driver');var client = new cassandra.Client({ contactPoints: ['host1', 'h2'], keyspace: 'ks1'});var query = 'SELECT email, last_name FROM user_profiles WHERE key=?';

client.execute(query, ['guy'], function(err, result) { assert.ifError(err); console.log('got user profile with email ' + result.rows[0].email);});

Here we’re using a Parameterized Statement, which is not prepared, but still allows parameters

Page 40: Paris Cassandra Meetup - Cassandra for Developers

© 2014 DataStax, All Rights Reserved. 40

Ruby

cluster = Cassandra.cluster

session = cluster.connect(‘system')

future = session.execute_async('SELECT * FROM schema_columnfamilies')

future.on_success do |rows| rows.each do |row| puts "The keyspace #{row['keyspace_name']} has a table called #{row['columnfamily_name']}" endend

future.join

Register a listener on the future, which will be called when results are available

Page 41: Paris Cassandra Meetup - Cassandra for Developers

© 2014 DataStax, All Rights Reserved.

Object Mapper

• Avoid boilerplate for common use cases

• Map Objects to Statements and ResultSets to Objects

• Do NOT hide Cassandra from the developer

• No “clever tricks” à la Hibernate

• Not JPA compatible, but JPA-ish API

41

Page 42: Paris Cassandra Meetup - Cassandra for Developers

© 2014 DataStax, All Rights Reserved. 42

Object Mapper in Practice

<dependency>  <groupId>com.datastax.cassandra</groupId>  <artifactId>cassandra-­‐driver-­‐mapping</artifactId>  <version>2.1.0</version>  

</dependency>  

Additional artifact for object mapping

Available from Driver 2.1.0

Page 43: Paris Cassandra Meetup - Cassandra for Developers

© 2014 DataStax, All Rights Reserved. 43

Basic Object Mapping

CREATE  TYPE  address  (          street  text,          city  text,          zip  int  );      CREATE  TABLE  users  (          email  text  PRIMARY  KEY,          address  address  );

@UDT(keyspace  =  "ks",  name  =  "address")  public  class  Address  {          private  String  street;          private  String  city;          private  int  zip;              //  getters  and  setters  omitted...  }      @Table(keyspace  =  "ks",  name  =  "users")  public  class  User  {          @PartitionKey          private  String  email;          private  Address  address;              //  getters  and  setters  omitted...  }  

Page 44: Paris Cassandra Meetup - Cassandra for Developers

© 2014 DataStax, All Rights Reserved. 44

Basic Object Mapping

MappingManager  manager  =          new  MappingManager(session);  

Mapper  mapper  =  manager.mapper(User.class);      UserProfile  myProfile  =            mapper.get("[email protected]");  

ListenableFuture  saveFuture  =          mapper.saveAsync(anotherProfile);  

mapper.delete("[email protected]");  

Mapper, just like Session, is a thread-safe object. Create a singleton at startup.

get() returns a mapped row for the given Primary Key

ListenableFuture from Guava. Completed when the write is acknowledged.

Page 45: Paris Cassandra Meetup - Cassandra for Developers

© 2014 DataStax, All Rights Reserved. 45

Accessors

UserAccessor  accessor  =          manager.createAccessor(UserAccessor.class);  Result<User>  users  =  accessor.firstN(10);  

for  (User  user  :  users)  {          System.out.println(                  profile.getAddress().getZip()          );  }  

Result is like ResultSet but specialized for a mapped class…

…so we iterate over it just like we would with a ResultSet

@Accessor  interface  UserAccessor  {          @Query("SELECT  *  FROM  user_profiles  LIMIT  :max")          Result<User>  firstN(@Param("max")  int  limit);  }

Page 46: Paris Cassandra Meetup - Cassandra for Developers

We’re Hiring!

@mfiguiere

Cassandra Tech Day - ParisNovember 4th

Cassandra Summit Europe - LondonDecember 3-4th