Introduction to Cassandrafiles.meetup.com/14849742/London Meetup - Introduction to Cassan… ·...

17
Introduction to Cassandra And how to can it co-exist with SQL solutions Carlos Rolo March 2015 Cassandra Consultant

Transcript of Introduction to Cassandrafiles.meetup.com/14849742/London Meetup - Introduction to Cassan… ·...

Page 1: Introduction to Cassandrafiles.meetup.com/14849742/London Meetup - Introduction to Cassan… · Compaction is a process that Cassandra uses to keep local data in check. a. Tables

Introduction to CassandraAnd how to can it co-exist with SQL solutions

Carlos Rolo

March 2015

Cassandra Consultant

Page 2: Introduction to Cassandrafiles.meetup.com/14849742/London Meetup - Introduction to Cassan… · Compaction is a process that Cassandra uses to keep local data in check. a. Tables

About me

● Pythian Cassandra Consultant since January 2015● Working with Cassandra since 2011● Working with distributed systems since 2010

● History:● Pythian● Leaseweb CDN● Portugal Telecom● DRI

● Twitter: @cjrolo

Page 3: Introduction to Cassandrafiles.meetup.com/14849742/London Meetup - Introduction to Cassan… · Compaction is a process that Cassandra uses to keep local data in check. a. Tables

AGENDA

● Introduction● Design Patterns● Cassandra Basics● C* and SQL

Page 4: Introduction to Cassandrafiles.meetup.com/14849742/London Meetup - Introduction to Cassan… · Compaction is a process that Cassandra uses to keep local data in check. a. Tables

Introduction

● Cassandra is a highly scalable distributed masterless noSQL database

● All nodes are the same, highly resilient

Page 5: Introduction to Cassandrafiles.meetup.com/14849742/London Meetup - Introduction to Cassan… · Compaction is a process that Cassandra uses to keep local data in check. a. Tables

Design Patterns

● CAP Theorem● Consistent Hashing● Column Store Architecture● Log Structured Data● CRUD, ACID and C*

Page 6: Introduction to Cassandrafiles.meetup.com/14849742/London Meetup - Introduction to Cassan… · Compaction is a process that Cassandra uses to keep local data in check. a. Tables

CAP Theorem

● The CAP theorem states that you have to pick two of Consistency, Availability, Partition tolerance: You can't have the three at the same time and get an acceptable latency…

● … at any given moment. ● Cassandra values Availability and Partitioning tolerance

(AP). Tradeoffs between consistency and latency are tunable in Cassandra (Per request!).

● Requests offer a tunable level of consistency, all the way from "writes never fail" to "block for all replicas to be readable".

Page 7: Introduction to Cassandrafiles.meetup.com/14849742/London Meetup - Introduction to Cassan… · Compaction is a process that Cassandra uses to keep local data in check. a. Tables

Consistent Hashing

• A hash consists of one or more arithmetic operations on a piece of data – e.g. MD5, Murmur3

• We hash keys in an attempt to spread key hashes in a uniform manner for any given set of keys.

• A consistent hash is one where the hash range is divided up into ranges called a map.

• Once the map is defined a given key will always end up in the same map range.

Page 8: Introduction to Cassandrafiles.meetup.com/14849742/London Meetup - Introduction to Cassan… · Compaction is a process that Cassandra uses to keep local data in check. a. Tables

Log Structured Data

• Instead of rewriting records in place or storing records near each other based on key (clustering), just simply write new records, updates to records or deletes at the end of the file that holds the table.

• Add an index so you can read the table randomly without loading the whole thing into memory

UPSERT people (1,”Jonathan”,”Ellis”);

UPSERT people (2,”Billy”,”Bosworth”);

UPSERT people (2,”William”,”Bosworth”);

DELETE people (1);

Page 9: Introduction to Cassandrafiles.meetup.com/14849742/London Meetup - Introduction to Cassan… · Compaction is a process that Cassandra uses to keep local data in check. a. Tables

CRUD, ACID and Cassandra

● C* doesn’t really have CRUD. Update is a special case of Create, and Delete is not a real Delete.

● C* is not ACID. C* doesn’t support transactions.● C* is BASE: Basically Available Soft-state Eventual

consistency.a. Different versions may live in the cluster at

the same time. Eventually all the nodes will see the newest data.

Page 10: Introduction to Cassandrafiles.meetup.com/14849742/London Meetup - Introduction to Cassan… · Compaction is a process that Cassandra uses to keep local data in check. a. Tables

Cassandra Basics

● Write Path● Read Path● Compaction and Repair

Page 11: Introduction to Cassandrafiles.meetup.com/14849742/London Meetup - Introduction to Cassan… · Compaction is a process that Cassandra uses to keep local data in check. a. Tables

Write Path

Page 12: Introduction to Cassandrafiles.meetup.com/14849742/London Meetup - Introduction to Cassan… · Compaction is a process that Cassandra uses to keep local data in check. a. Tables

Read Path

Page 13: Introduction to Cassandrafiles.meetup.com/14849742/London Meetup - Introduction to Cassan… · Compaction is a process that Cassandra uses to keep local data in check. a. Tables

Compaction and Repair

● Compaction is a process that Cassandra uses to keep local data in check.a. Tables are append only, so obsolete data will live

with current datab. Compaction “cleans the house”

● Repair is a process that Cassandra uses to keep Cluster data in check:a. Nodes can get out-of-sync (Hardware failure,

network issues, etc…)b. Repair makes sure every node have the latest data

Page 14: Introduction to Cassandrafiles.meetup.com/14849742/London Meetup - Introduction to Cassan… · Compaction is a process that Cassandra uses to keep local data in check. a. Tables

CQL - Cassandra Query Language

● CQL is not SQL● Very similar:

cqlsh> CREATE KEYSPACE sandbox WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', DC1 : 1};

cqlsh> USE sandbox;

cqlsh:sandbox>CREATE TABLE data (id uuid, data text, PRIMARY KEY (id));

cqlsh:sandbox> INSERT INTO data (id, data) values (c37d661d-7e61-49ea-96a5-68c34e83db3a, 'testing');

cqlsh:sandbox> SELECT * FROM data;

• Abstracts from the user from the internal structure (Can be dangerous!)

• Provides several benefits over older model (Thrift)

Page 15: Introduction to Cassandrafiles.meetup.com/14849742/London Meetup - Introduction to Cassan… · Compaction is a process that Cassandra uses to keep local data in check. a. Tables

Cassandra and DBMS

● Scale up vs Scale out● High availability vs Continuous availability● Highly structured vs Semi-structured● Replication

a. No need for Copy/Backup/Restore processesb. Can be completely free in Cassandra

(Starting the node) to issue a CQL command (New Datacenter)

Page 16: Introduction to Cassandrafiles.meetup.com/14849742/London Meetup - Introduction to Cassan… · Compaction is a process that Cassandra uses to keep local data in check. a. Tables

Cassandra Co-existence with DMS

● New applicationsa. loose data model avoids app re-writing, data

migration, etc...● Augmentation

a. Scale out better. b. Restruct data model so Cassandra can

absorb high velocity datac. Absorb traffic from several locations

Page 17: Introduction to Cassandrafiles.meetup.com/14849742/London Meetup - Introduction to Cassan… · Compaction is a process that Cassandra uses to keep local data in check. a. Tables

Q&A

● Questions