NOSQL Database: Apache Cassandra

48
NoSQL Database: Apache NoSQL Database: Apache Cassandra Cassandra www.folio3.com @folio_3

Transcript of NOSQL Database: Apache Cassandra

Page 1: NOSQL Database: Apache Cassandra

NoSQL Database: Apache NoSQL Database: Apache CassandraCassandra

www.folio3.com@folio_3

Page 2: NOSQL Database: Apache Cassandra

Folio3 – OverviewFolio3 – Overview

www.folio3.com @folio_3

Page 3: NOSQL Database: Apache Cassandra

Who We Are

We are a Development Partner for our customers

Design software solutions, not just implement them

Focus on the solution – Platform and technology agnostic

Expertise in building applications that are:

Mobile Social Cloud-based Gamified

Page 4: NOSQL Database: Apache Cassandra

What We Do Areas of Focus

Enterprise

Custom enterprise applications

Product development targeting the enterprise

Mobile

Custom mobile apps for iOS, Android, Windows Phone, BB OS

Mobile platform (server-to-server) development

Social Media

CMS based websites for consumers and enterprise (corporate, consumer,

community & social networking)

Social media platform development (enterprise & consumer)

Page 5: NOSQL Database: Apache Cassandra

Folio3 At a Glance Founded in 2005

Over 200 full time employees

Offices in the US, Canada, Bulgaria & Pakistan

Palo Alto, CA. Sofia, Bulgaria

Karachi, Pakistan

Toronto, Canada

Page 6: NOSQL Database: Apache Cassandra

Areas of Focus: Enterprise Automating workflows

Cloud based solutions

Application integration

Platform development

Healthcare

Mobile Enterprise

Digital Media

Supply Chain

Page 7: NOSQL Database: Apache Cassandra

Some of Our Enterprise Clients

Page 8: NOSQL Database: Apache Cassandra

Areas of Focus: Mobile Serious enterprise applications for Banks,

Businesses

Fun consumer apps for app discovery,

interaction, exercise gamification and play

Educational apps

Augmented Reality apps

Mobile Platforms

Page 9: NOSQL Database: Apache Cassandra

Some of Our Mobile Clients

Page 10: NOSQL Database: Apache Cassandra

Areas of Focus: Web & Social Media

Community Sites based on

Content Management Systems

Enterprise Social Networking

Social Games for Facebook &

Mobile

Companion Apps for games

Page 11: NOSQL Database: Apache Cassandra

Some of Our Web Clients

Page 12: NOSQL Database: Apache Cassandra

NoSQL Database: Apache NoSQL Database: Apache CassandraCassandra

www.folio3.com @folio_3

Page 13: NOSQL Database: Apache Cassandra

Agenda What is NOSQL?

Motivations for NOSQL?

Brewer’s CAP Theorem

Taxonomy of NOSQL databases

Apache Cassandra

Features

Data Model

Consistency

Operations

Cluster Membership

What Does NOSQL means for RDBMS?

Page 14: NOSQL Database: Apache Cassandra

What is NOSQL?

Refers to databases that differs from traditional relational database

management system (RDBMS)

Distributed, flexible, horizontally scalable data stores

Confusion with the term NOSQL

NOSQL != No SQL (or Anti-SQL)

NOSQL = Not Only SQL

NOSQL is an inaccurate term since it is commonly used to refer to

"non-relational" databases but the term has stuck

Page 15: NOSQL Database: Apache Cassandra

Motivations for NOSQL

Classical RDBMS unsuitable for today's web applications

because:

Performance (Latency): Variable

Flexibility: Low

Scalability: Variable

Functionality

Page 16: NOSQL Database: Apache Cassandra

Brewer's CAP Theorm

Consistency (C)

Availability (A)

Partition Tolerance (P)

Pick any two

Most NOSQL databases sacrifice Consistency

in favor of high Availability and Performance

Page 17: NOSQL Database: Apache Cassandra

Taxonomy of NOSQL Key/Value Stores - Distributed Hash Tables (DHT)

Memcached, Amazon’s Dynamo, Redis, PStore

Document Stores

Semi structured data (stores entire documents)

CouchDB, MongoDB, RDDB, Riak

Graph Databases *

Based on graph theory

ActiveRDF, AllegroGraph, Neo4J

Object Database *

Versant, Objectivity

Column-oriented Stores

* these are considered soft NOSQL databases and are usually in NOSQL category because of being

"non-relational".

Page 18: NOSQL Database: Apache Cassandra

Column-Oriented Data Stores Semi-structured column-based data stores

Stores each column separately so that aggregate operations for one column

of the entire table are significantly quicker than the traditional row storage

model

Popular examples

Hadoop/HBASE

Apache Cassandra

Google's BigTable

HyperTable

Amazon's SimpleDB

Page 19: NOSQL Database: Apache Cassandra

Apache Cassandra

Fully distributed column oriented data store

Also provides Map Reduce implementation using Hadoop (increased

performance)

Based on Google's BigTable (Data Model) and Amazon's Dynamo

(Consistency & Partition Tolerance)

Cassandra values Availability and Partitioning tolerance (AP) while

providing tunable consistency levels.

Page 20: NOSQL Database: Apache Cassandra

History

Developed at Facebook

Released as open source project on Google Code in July 2008

Became an Apache Incubator Project in March 2009

Became a top level Apache project in February 2010 Performance

Rumors of Facebook having started working on its own separate

version of Cassandra

Page 21: NOSQL Database: Apache Cassandra

Features

Fully Distributed

Highly Scalable

Fault Tolerant (No single point of failure)

Tunable Consistency (Eventually Consistent)

Semi-structured key-value store

High Availability

No Referential Integrity

No Joins

Page 22: NOSQL Database: Apache Cassandra

Data Model

KeySpace (Uppermost namespace)

Column Family / Super Column Family (analogous to table)

Super Column

Column (Name, Value, Timestamp)

Rows are referenced through keys

Each column is stored in a separate physical file

Page 23: NOSQL Database: Apache Cassandra

Standard Column Family

Page 24: NOSQL Database: Apache Cassandra

Super Column Family

Page 25: NOSQL Database: Apache Cassandra

Super Column Family: Static/Static

Page 26: NOSQL Database: Apache Cassandra

Super Column Family: Static/Static

Page 27: NOSQL Database: Apache Cassandra

Super Column Family: Static/Dynamic

Page 28: NOSQL Database: Apache Cassandra

Super Column Family: Static/Dynamic

Page 29: NOSQL Database: Apache Cassandra

Super Column Family: Dynamic/Static

Page 30: NOSQL Database: Apache Cassandra

Super Column Family: Dynamic/Static

Page 31: NOSQL Database: Apache Cassandra

Super Column Family: Dynamic/Dynamic

Page 32: NOSQL Database: Apache Cassandra

Super Column Family: Dynamic/Dynamic

Page 33: NOSQL Database: Apache Cassandra

Apache Cassandra: Consistency

Consistency refers to whether a system is left in a consistent state

after an operation. In distributed data systems like Cassandra, this

usually means that once a writer has written, all readers will see that

write.

If W + R > N, you will have strong consistent behavior; that is, readers

will always see the most recent write

W is the number of nodes to block for on write

R is the number to block for on reads

N is the replication factor (number of replicas)

Page 34: NOSQL Database: Apache Cassandra

Apache Cassandra: Consistency

Relational databases provide strong consistency (ACID)

Cassandra provide eventual consistency (BASE) meaning the database

will eventually reach a consistent state

QUORUM reads and writes gives consistency while still allowing

availability

Q = (N / 2) + 1 (simple majority)

If latency is more important than consistency, you can lower values

for either or both W and R.

Page 35: NOSQL Database: Apache Cassandra

Apache Cassandra: Consistency Levels

Write ZERO ANY ONE QUORUM ALL

Read ZERO ANY ONE QUORUM ALL

Page 36: NOSQL Database: Apache Cassandra

Write Operation

Client sends a write request to a random node; the random node

forwards the request to the proper node (1st replica responsible for

the partition - coordinator)

Coordinator sends requests to N replicas

If W replicas confirm the write operation then OK

Always writable, hinted handoff (If a replica node for the key is down,

Cassandra will write a hint to the live replica node indicating that the

write needs to be replayed to the unavailable node.)

Page 37: NOSQL Database: Apache Cassandra

Read Operation

Coordinator sends requests to N replicas, if R replicas respond then

OK

If different versions are returned then reconcile and write back the

reconciled version (Read Repair)

Page 38: NOSQL Database: Apache Cassandra

Cluster Membership

Gossip Protocol

Every T seconds each node increments its heartbeat counter

and gossips to another node about the state of the cluster;

the receiving node merges the cluster info with its own copy

Cluster state (node in/out, failure) propagated quickly:

O(LogN) where N is the number of nodes in the cluster

Page 39: NOSQL Database: Apache Cassandra

Storage Ring

Cassandra cluster nodes are organized in a virtual ring.

Each node has a single unique token that defines its place in the ring

and which keys it is responsible for

Key ranges are adjusted when the nodes join or leave

Page 40: NOSQL Database: Apache Cassandra

Apache Cassandra: MySQL Comparison

MySQL (> 50 GB data)

Read Average: ~ 350 ms

Write Average: ~ 300 ms

Cassandra (> 50 GB data)

Read Average: 15 ms

Write Average: 0.12 ms

Page 41: NOSQL Database: Apache Cassandra

Apache Cassandra: Client API Low level API

Thrift

High Level API

Java

Hector, Pelops, Kundera

.NET

FluentCassandra, Aquiles

Python

Telephus, Pycassa

PHP

phpcassa, SimpleCassie

Page 42: NOSQL Database: Apache Cassandra

Apache Cassandra: Where to Use?

Use Cassandra, if you want/need

High write throughput

Near-Linear scalability

Automated replication/fault tolerance

Can tolerate low consistency

Can tolerate missing RDBMS features

Page 43: NOSQL Database: Apache Cassandra

Apache Cassandra: Users Facebook (of course)

To power inbox search (previously)

Twitter

To handle user relationships, analytics (but not for tweets)

Digg & Reddit

Both use Cassandra to handle user comments and votes

Rackspace

IBM

To build scalable email system

Cisco's WebEx

To store user feed and activity in near real time

Page 44: NOSQL Database: Apache Cassandra

What does NOSQL mean for the future of RDBMS?

No worries! RDBMSs are here to stay for the foreseeable future

NOSQL data stores can be used in combination with RDBMS in some

situations

NOSQL still has a long way to go, in order to reach the widespread

(mainstream) use and support of the RDBMS

Page 45: NOSQL Database: Apache Cassandra

Weakness of NOSQL

No or limited support for complex queries

No transactions available (operations are atomic)

No standard interface for NOSQL databases (like SQL in relational

databases)

No or limited administrative features available for NOSQL databases

Not suitable (yet) for mainstream use

Page 46: NOSQL Database: Apache Cassandra

Why Still Use RDBMS?

All the weaknesses of NOSQL

Relational databases are widely used and understood

RDBMS DBAs and developers are easily available in the market

For big business, relational databases are a safe choice because they

have heavily invested in relational technology

Many database design and development tools available

Page 47: NOSQL Database: Apache Cassandra

References

http://www.allthingsdistributed.com/2008/12/eventually_consistent.

html

http://wiki.apache.org/cassandra/FrontPage

http://en.wikipedia.org/wiki/Apache_Cassandra

http://www.slideshare.net/gdusbabek/cassandra-presentation-for-

san-antonio-jug

http://www.slideshare.net/Eweaver/cassandra-presentation-at-nosql

http://nosql-database.org/

http://nosqlpedia.com/

Page 48: NOSQL Database: Apache Cassandra

Contact

For more details about our

services, please get in touch with

us.

[email protected]

US Office: (408) 365-4638

www.folio3.com