Presentation of Apache Cassandra

166
Apache Cassandra

Transcript of Presentation of Apache Cassandra

Page 1: Presentation of Apache Cassandra

Apache Cassandra

Page 2: Presentation of Apache Cassandra

1. Introduction to NoSQL systems, Extensible Record Stores and Amazon’s Dynamo + Google Bigtable

2. What Cassandra is and how it is compared with other similar systems

3. What applications are better supported - examples, case studies

4. Technical Description, architecture, internals

5. How is it used and installed, requirements and in what platforms does it run on

6. Demo

7. References

Contents

Page 3: Presentation of Apache Cassandra

BackgroundNoSQL, Extensible Record Stores, Cassandra’s Parents

1.

Page 4: Presentation of Apache Cassandra

NoSQL or Not-Only-SQL systems: Next Generation Databases. The initial movement started in 2009 with the goal of creating modern, web-scale DBs. Currently, they exist more than 225 NoSQL systems.

In general, they share the following features:

• Schema-free databases

• Easy replication support

• Simple API

• Distributed

• Open Source

NoSQL Systems

• BASE (instead of ACID)

• Huge amount of data

• Horizontally scalable

Page 5: Presentation of Apache Cassandra

• Motivated by Google’s Big Table.

• Basic Data Model: Rows and Columns

• Basic Scalability Model: Rows and Columns are splitted into nodes.

• Rows: split across nodes through sharding on the primary key.

• Columns: distributed over multiple nodes by using ‘column groups’.

• Other systems that use this technology: Hypertable, HBase.

Extensible Record Stores (or Wide Column Stores)

Page 6: Presentation of Apache Cassandra

What is it?A highly-available and scalable storage system used by Amazon to store and retrieve user shopping charts and other core services. It pioneered the idea of eventual consistency. Key-Value Store.

How it works?

Allows read and write operations to continue even during network partitions and resolves update conflicts using different conflict resolution mechanisms.

Sacrifices consistency for availability.

Allows customization to meet desired preference.

Consistent Hashing, Vector Clocks (not in Cassandra), Gossip Protocol, Hinted Handoff, Read Repair

Cassandra’s Parents - Amazon Dynamo

Page 7: Presentation of Apache Cassandra

Cassandra’s Parents - Google Bigtable

What is it?A high performance data storage system built on Google File System and other Google technologies.

How it works?Provides both structure and data distribution but relies on a distributed file system for durability.

Richer data model from Dynamo. One key, many values. Fast sequential access.

Columnar, SSTable Storage, Append-only, Memtable, Compaction

Page 8: Presentation of Apache Cassandra

What features does Cassandra use from Google’s BigTable? 1. Column Families2. Memtables3. SSTables

What features does Cassandra use from Amazon Dynamo?4. Consistent hashing5. Partitioning6. Replication

Cassandra’s Parents

Page 9: Presentation of Apache Cassandra

Cassandra and Parents

Page 10: Presentation of Apache Cassandra

Description and ComparisonsWhat Cassandra is and how it is compared with other similar systems

2.

Page 11: Presentation of Apache Cassandra

Avinash Lakshman

• Inventor, Apache Cassandra• Co-inventor, Amazon Dynamo

Page 12: Presentation of Apache Cassandra

Prashant Malik

• Inventor, Apache Cassandra• Technical Leader, Facebook

Page 13: Presentation of Apache Cassandra
Page 14: Presentation of Apache Cassandra

What is cassandra?

Page 15: Presentation of Apache Cassandra

Definition

• A distributed NoSQL database system for managing large amounts of structured data across many commodity servers, while providing highly available service and no single point of failure.

Page 16: Presentation of Apache Cassandra

Timeline with activities

• July 2008Facebook released Cassandra as an open-source project

• March 2009Cassandra became an Apache Incubator project

• 17th February 2010Cassandra graduated to a top-level project

• 2012University of Toronto researchers studying NoSQL systems concluded that “In terms of scalability, there is a clear winner throughout our experiments”

• 2010-2015New releases of Cassandra

Page 17: Presentation of Apache Cassandra

Strengths

• Linear scale performanceThe ability to add nodes without failures leads to predictable increases in performance

• Supports multiple languagesPython, C#/.NET, C++, Ruby, Java, Go, and many more…

• Operational and developmental simplicityThere are no complex software tiers to be managed, so administration duties are greatly simplified.

• Ability to deploy across data centresCassandra can be deployed across multiple, geographically dispersed data centres

Page 18: Presentation of Apache Cassandra

• Cloud availabilityInstallations in cloud environments

• Peer to peer architectureCassandra follows a peer-to-peer architecture, instead of master-slave architecture

• Flexible data modelSupports modern data types with fast writes and reads

• Fault toleranceNodes that fail can easily be restored or replaced

• High PerformanceCassandra has demonstrated brilliant performance under large sets of data

Strengths (1)

Page 19: Presentation of Apache Cassandra

• ColumnFamily Store Cassandra stores columns based on the column names, leading to very quick slicing

• Tunable consistencySupport for strong or eventual data consistency across a widely distributed cluster

• Schema-free/Schema-lessIn Cassandra, columns can be created at your will within the rows. Cassandra data model is also famously known as a schema-optional data model

• AP-CAPCassandra is typically classified as an AP system, meaning that availability and partition tolerance are generally considered to be more important than consistency in Cassandra

Strengths (2)

Page 20: Presentation of Apache Cassandra

CAP and Cassandra

Page 21: Presentation of Apache Cassandra

Variable number of columns per row

Page 22: Presentation of Apache Cassandra

Weaknesses

Use Cases where is better to avoid using Cassandra• If there are too many joins required to retrieve the data• To store configuration data• During compaction, things slow down and throughput degrades• Basic things like aggregation operators are not supported• Range queries on partition key are not supported• If there are transactional data which require 100% consistency• Cassandra can update and delete data but it is not designed to do so

Page 23: Presentation of Apache Cassandra

Business Insider

“The basic problem Cassandra solved is that when you have a lot of data sitting on a lot of servers, as Facebook does, you end up with a house of cards. A single server going down can collapse the whole stack.”

Page 24: Presentation of Apache Cassandra

Cassandra compared to other NoSQL Systems

Page 25: Presentation of Apache Cassandra

Read & Write latency for workload Read/Write

Page 26: Presentation of Apache Cassandra

Throughput for workload Read/Write & Read/Scan/Write

Page 27: Presentation of Apache Cassandra

Insert-mostly Workload

Page 28: Presentation of Apache Cassandra

Mixed Operational & Analytical Workload

Page 29: Presentation of Apache Cassandra

Read-Modify-Write Workload

Page 30: Presentation of Apache Cassandra

Balanced Read/Write Mix

Page 31: Presentation of Apache Cassandra

Read-mostly Workload

Page 32: Presentation of Apache Cassandra

Load Process

Page 33: Presentation of Apache Cassandra

VLDB Benchmark (RWS)

Page 34: Presentation of Apache Cassandra

Differences between Cassandra and RDBMS

RDBMS Cassandrarelational database keyspace

b-trees log-structured merge-trees

rows which do not include a particular column value → NULL (in that position)

for each row, only the columns with a value are stored

support ACID transactions only supports AID

Page 35: Presentation of Apache Cassandra
Page 36: Presentation of Apache Cassandra

Supported Applications - Customers - Case Studies

3.

Page 37: Presentation of Apache Cassandra

What kind of applications are supported by Cassandra

>80% of the clients fit into one of the next categories:

I. Product Catalog/Playlist

II. Recommendation/Personalization Engine

III. Sensor Data/Internet of Things

IV. Messaging (and generally time-series data)

V. Fraud Detection

Page 38: Presentation of Apache Cassandra

In other words, applications that need to...

• store and handle time-series data (most common use case)

• store and handle large volumes of data

• scale predictably

• be continuously available

• protect their data

Page 39: Presentation of Apache Cassandra

Datastax• A software company that develops and provides support for a commercial

edition of Cassandra.

• Massively scalable NoSQL platform able to run online applications for innovative and data-intensive companies (e.g. Netflix).

• Faster to deploy and less expensive to maintain than other database platforms.

• Powered by Cassandra and contains only selected releases of it, chosen by its expert staff.

Page 40: Presentation of Apache Cassandra

Datastax (1)

• Supports businesses that need a progressive data management.

• Can serve as a real-time datastore for online production.

• Delivers a unique, smart data platform, suitable for the cloud.

Page 41: Presentation of Apache Cassandra

Customers

• Over 3.000 companies around the world use (or have used) Apache Cassandra in production.

• Most famous:

Page 42: Presentation of Apache Cassandra

Cassandra Summit• Organized by DataStax for 7 consecutive years (in both US and Europe).

• New product releases are announced.

• Customers describe their usage of Cassandra

Page 43: Presentation of Apache Cassandra

Key Terms• Cluster

• Distributed Location

• Node

Page 44: Presentation of Apache Cassandra

CASE STUDIES

Page 45: Presentation of Apache Cassandra

Category: Messaging

Page 46: Presentation of Apache Cassandra

Facebook Inbox Search - Requirements “The system was required to handle a very high write throughput, billions of writes per day, and also scale with the number of users”

“Since users are served from data centres that are geographically distributed, being able to replicate data across data centres was key to keep search latencies down”

• Lakshman, Malik

Page 47: Presentation of Apache Cassandra

Facebook Inbox Search The reason why Cassandra was initially built.

Facebook maintains a per user index of all messages that have been exchanged between the senders and the recipients of the message.

Two kinds of search features enabled at 2008:I. term search

II. interactions - given a person’s name, returns all the messages have been sent/received from that person

Page 48: Presentation of Apache Cassandra

Facebook Inbox Search (1)How did they do that?The schema consists of two column families. Exploits the “time sorting” feature of Cassandra.

For the term search:• UserID → key

• Words that make up the message → super columns

• Columns within the super column → individual message identifiers (MessageID) of the messages that contain the word.

Page 49: Presentation of Apache Cassandra

Facebook Inbox Search (2)For the interactions:

• UserID → key

• RecipientsID’s → super columns

• Columns within the super columns → MessageID’s

• Cassandra provides certain hooks for intelligent caching of data

Page 50: Presentation of Apache Cassandra

Inbox Search Schema

Page 51: Presentation of Apache Cassandra

Facebook Inbox Search (3)In 2008:• system was storing 50+ TB of data

• on a 150 node cluster

• spread out between east and west coast data centres

Performance:

Page 52: Presentation of Apache Cassandra

Facebook abandoned Cassandra for the Inbox at late 2010

Cassandra has been deployed as the backend storage system for multiple services within Facebook.

Page 53: Presentation of Apache Cassandra

Categories: Fraud Detection and Time-series data

Page 54: Presentation of Apache Cassandra

Instagram Fraud DetectionInitially was using Redis for auditing information related to security and site integrity purposes (e.g. fighting spam, finding abusive users).

But…• data size was growing too quickly

• high write and low read rate

• keeping the data in memory was too costly

So… Cassandra

Page 55: Presentation of Apache Cassandra

Instagram Fraud Detection (1)• Started with 3 nodes and very soon they had grown to a 12 node

cluster.

• No need to store very large instances in memory → put everything on disks.

“Implementing Cassandra cut our costs to the point where we were paying around a quarter of what we were paying before. Not only that, but it also freed us to just throw data at the cluster because it was much more scalable and we could add nodes whenever needed.”

- Brick Branson, Software Engineer at Instagram

Page 56: Presentation of Apache Cassandra

Instagram “Inbox”Newsfeed or inbox part of Instagram: a feed of all the activity that would be associated with a given user’s account.

Previously in Redis, with the same (memory) limitations as in the Fraud Detection case.

Instagram’s Cassandra Cluster: • 12 nodes on EC2 (AWS)

• 1.2 TB of data stored

• 20.000 writes/sec.

• 15.000 reads/sec.

Page 57: Presentation of Apache Cassandra

Category: Sensors and IoT

Page 58: Presentation of Apache Cassandra

i2O WaterDescription: i2O Water helps utility companies operate more efficiently through the use of IoT aiming at solving the water crisis.

Challenges:• Massive volumes of time-series data (>1.5 TB and growing)

• Need for search and analysis of high velocity, streaming data in near real-time

• SQL Server (previously used) has limitations on scalability and performance

Page 59: Presentation of Apache Cassandra

i2O Water (1)Solution: After evaluating many common NoSQL technologies, they chose Cassandra.

Why?I. performance (50-60.000 writes and 20-40.000 reads/sec instead of 0.5 writes/sec and 5

reads/sec with SQL Server)

II. easy to maintain

III. easy to upgrade

IV. ability to handle structured and unstructured real-time streaming data

V. continuous availability and reliability

VI. operationally simple to manage

Page 60: Presentation of Apache Cassandra

i2O Water (2)

Results:I. 235 m. lt. of water saved per day

II. successfully handling massive volumes of data from 15.000 devices without latency or downtime

III. fault tolerance even during upgrades (99.9% availability)

Page 61: Presentation of Apache Cassandra

Category: Product Catalogs and Playlists

Page 62: Presentation of Apache Cassandra

Spotify

Description: Spotify delivers streaming music in real time to over 40 million active users (the number is growing), without interruption.

Challenges:• postgreSQL (previously used) and generally RDBMSs cannot

deliver 100% availability

• limited scalability across data centers

• difficult to analyze massive volumes of data

Page 63: Presentation of Apache Cassandra

Spotify (1)

Solution: Cassandra.

Why?I. high availability (due to masterclass architecture)

II. stores data for the entire product catalog and key customer experience capabilities

III. multi data centre application and no single point of failure

IV. integration with Apache Spark for real time processing and analytics

Page 64: Presentation of Apache Cassandra

Spotify (2)

Results:I. 40.000 requests/sec. handled successfully and on-time

II. >500 nodes across 4.000 servers in 4 data centres

III. >1.5 bn playlists created from 40m active users and managed in real time

Page 65: Presentation of Apache Cassandra

Data Centre Data

Centre

Data Centre

Data Centre

Spotify - Data Centres (2 in the US - 2 in Europe)

Page 66: Presentation of Apache Cassandra

Category: Recommendation/Personalization Engine

Page 67: Presentation of Apache Cassandra

NetflixDescription: Netflix is the world’s leading internet television network with more than 48 million users in 40 countries.

Challenges:• Oracle database (was used until 2010) was approaching its limits

on traffic and capacity

• single centre → single point of failure

• system downtime every two weeks for schema changes

• need for reliability and flexibility for international expansion

Page 68: Presentation of Apache Cassandra

Netflix (1)

Solution: Cassandra (on the Cloud, AWS) was the clear winner of the extensive evaluation of NoSQL DB options. (Later on, Netflix migrated to DataStax Enterprise for security and production.)

Why?I. persistent datastore, 100% uptime and cost-effective scalability

II. ability to create a cluster in any region in 10’

III. expert support

Page 69: Presentation of Apache Cassandra

Netflix (2)

Results:I. throughput of >10 m. transactions/sec.

II. process of >2.1 bn. reads and 4.3 bn. writes/day

III. delivers >76.000 genre types and captures every detail of customers’ habits for tailoring the customer experience

Page 70: Presentation of Apache Cassandra

Category: Product Catalogs and Playlists

Page 71: Presentation of Apache Cassandra

Coursera

Description: Coursera is an education platform which partners with top universities and organizations worldwide, to offer courses online for anyone to take, for free.

Challenges:• MySQL (previously used for class interaction) was insufficient: • unstable performance, • unexpected downtime, • limitation in introducing new features

Page 72: Presentation of Apache Cassandra

Coursera (1)

Solution: After evaluating emerging database technologies, it chose Cassandra (DataStax).

Why?• 100% application uptime needed (customers from all over the

world)• Scalability (enabling storage of growing user data)

Page 73: Presentation of Apache Cassandra

Coursera (2)Results:

I. 3 nodes on AWS in the US East region and plans to expand to multiple data centers across different regions

II. 24x7 availability to the users

III. Helps innovation

IV. Reduced time to market on new features

“High availability with reliable performance is a big win for us. With Datastax Enterprise, our customers around the world are able to take any course, anytime through our on-demand model.”

• Daniel Chia, Software Engineer at Coursera

Page 74: Presentation of Apache Cassandra

Coursera (3)

Page 75: Presentation of Apache Cassandra

Coursera (4)

Page 76: Presentation of Apache Cassandra

Coursera (5)

Page 77: Presentation of Apache Cassandra

Coursera (6)

Page 78: Presentation of Apache Cassandra

Coursera (7)

Page 79: Presentation of Apache Cassandra

Category: Messaging

Page 80: Presentation of Apache Cassandra

The Weather Channel

Description: The Weather Channel delivers breaking news to countless viewers and users from web, desktop and mobile applications.

Challenges:• Customer experience in the center of attention (continuous

availability, global and diverse users)

• New capabilities including statistics from unstructured data, CGS for customer engagement etc.

Page 81: Presentation of Apache Cassandra

The Weather Channel (1)

Solution: Cassandra

Why?I. linear scalability

II. 100% uptime

III. supports almost all possible types of content (e.g. observations, forecasts, marine data, ads)

Page 82: Presentation of Apache Cassandra

The Weather Channel (2)

Results:I. billions requests/month are processed - no fear of downtime

II. node count was grown from 3 to 36 in AWS in 1 year across 3 data centers (US East and West and Western Europe).

III. capability for new offerings (e.g. social weather)

Page 83: Presentation of Apache Cassandra

The Weather Channel (3)

Page 84: Presentation of Apache Cassandra

Technical Description, Architecture, Internals

4.

Page 85: Presentation of Apache Cassandra

Key Terms – Data Structures

• Commit Log

• Memtable

• Sorted String Table (SST)

• Bloom Filter

• Index File

Page 86: Presentation of Apache Cassandra

Key Terms

• Gossip protocol: helps each node learn about the topology of the cluster (communication and detection of faulty nodes).

• Snitch: indicates which node is closest to the current location.

Page 87: Presentation of Apache Cassandra

Log-Structured Merge-Tree (LSM-Tree)What is it?

• A disk-based data structure designed to provide low-cost indexing for a file experiencing a high rate of record inserts.

• A simple LSM-Tree comprises of two tree-like structures: Co (in memory) and C1 (disk).

• Maintains key-value pairs.

• In Cassandra, each value represents a row.

Used in:• BigTable, HBase, MongoDB, SQLite, RocksDB, InfluxDB

Page 88: Presentation of Apache Cassandra

Data Model• Each Row → Identified by a Unique Key (Primary Key)

• Keyspace → Outermost container for data (one or more column families)

• Column Family → Contains Supercolumns or Columns (but not both)

• Column → Basic data structures with: key, value, timestamp

• Supercolumn → Special column, stores a map of sub-columns. Columns that

you are likely to query together should be placed in the same column family.

• Columns could be of variable number per key. For instance, key K1 could

have 1024 columns/supercolumns while K2 could have 64

columns/supercolumns

Page 89: Presentation of Apache Cassandra

Data Model (1)• Partition key: The first column declared in the primary key. Determines which node stores the

data.

• Clustering Columns: The remaining fields of the primary key, which determine the ordering of the data in the disk.

• Any column within a column family is accessed using the convention: column_family: column

• For Supercolumns: column_family: super_column: column

• Values → Addressed by the triple (row-key, column-key, timestamp)

• The system allows columns to be sorted either by time or by name.

• Time sorting: exploited by applications such as FB Inbox Search where the results are always displayed in time sorted order.

Page 90: Presentation of Apache Cassandra

Data Model (2)

Page 91: Presentation of Apache Cassandra

Data Model (3)

Page 92: Presentation of Apache Cassandra

Relational Schema vs Cassandra

Page 93: Presentation of Apache Cassandra

SYSTEM ARCHITECTURE

Page 94: Presentation of Apache Cassandra

IntroductionThe architecture of a storage system that needs to operate in a production setting is complex.

We will focus on the core distributed systems techniques used in Cassandra:

I. Partitioning

II. Replication

III. Membership

IV. Failure Handling

V. Scaling

All these modules work in synchrony to handle read/write requests.

Page 95: Presentation of Apache Cassandra

PartitioningOffers the ability to scale incrementally.

How?• Dynamically partition the data over the set of nodes in the cluster.

• Consistent hashing (order preserving hash function).

• Output range: a ring.

• Each node: is assigned a random value which determines its place on the ring.

Page 96: Presentation of Apache Cassandra

Partitioning (1)• Each data item: is assigned to a node by hashing its key to yield

its position on the ring and then walking the ring clockwise to find the first node with a position larger than the item’s position.

• Each node becomes responsible for the region in the ring between it and its predecessor node on the ring.

• Departure or arrival of a node only affects the immediate neighbours.

Page 97: Presentation of Apache Cassandra

Partitioning (2)Challenges:

I. Random positioning of each node leads to non-uniform data and load distribution.

II. The basic algorithm is oblivious to the heterogeneity in the performance of nodes.

Addressed by:Analysing load information on the ring and having lightly loaded nodes move on the ring to alleviate heavily loaded ones.

Page 98: Presentation of Apache Cassandra

Partitioning (3)

Page 99: Presentation of Apache Cassandra

Node: Storage layer within a serverBefore:

● 1 server/machine (machine: physical server or EC2 instance-AWS)

● 1 node/server (server: an installation of Cassandra)

Now: ● 256 vnodes/server (virtual nodes)

Vnodes or Virtual Nodes or Tokens:Define the section of the ring (token ranges) the node will become responsible for

Why?Much easier and faster in case of a node failure

Page 100: Presentation of Apache Cassandra

Virtual Nodes (version >=1.2)

Page 101: Presentation of Apache Cassandra

ReplicationUsed to achieve high availability and durability.

How?• Replication factor: determines how many copies of your data exist.

• Each data item: is replicated at N hosts (N=replication factor).

• Coordinator node: in charge of the replication of the data items that fall within its range.

• Consistency level: refers to how much up-to-date and synchronized a row of Cassandra is in all of its replicas e.g. quorum → replication_factor/2 + 1.

• Various replication policies: Rack Unaware, Rack Aware and Datacentre Aware.

• Each row is replicated across multiple datacentres which are connected through high speed network links.

Page 102: Presentation of Apache Cassandra

Replication - Rack Unaware

Page 103: Presentation of Apache Cassandra

Replication - Zookeeper• Cassandra elects a leader amongst its nodes using Zookeeper.

• All nodes on joining the cluster contact the leader who tells them for what ranges they are replicas for.

• Leader tries to maintain the invariant that no node is responsible for more than N-1 ranges in the ring.

Page 104: Presentation of Apache Cassandra

Replication - Zookeeper• Cassandra elects a leader amongst its nodes using Zookeeper.

• All nodes on joining the cluster contact the leader who tells them for what ranges they are replicas for.

• Leader tries to maintain the invariant that no node is responsible for more than N-1 ranges in the ring.

• Metadata about the ranges a node is responsible is 1) cached locally at each node and 2) in a fault-tolerant manner inside Zookeeper.

• This way, a node that crashes and comes back knows what ranges it was responsible for.

Page 105: Presentation of Apache Cassandra

Replication - Zookeeper (1)

Page 106: Presentation of Apache Cassandra

MembershipBased on Scuttle-butt, a very efficient anti-entropy Gossip based mechanism.

Benefits: I. Efficient CPU utilization.

II. Efficient utilization of the Gossip Channel.

Gossip: a P2P communication protocol to discover and share location and state information about the other nodes in a Cassandra cluster. Gossip information is also persisted locally by each node to use immediately when a node restarts.

Page 107: Presentation of Apache Cassandra

Gossip

Page 108: Presentation of Apache Cassandra

Gossip (1)

Page 109: Presentation of Apache Cassandra

Gossip (2)

Page 110: Presentation of Apache Cassandra

Gossip (3)

Page 111: Presentation of Apache Cassandra

Membership - Failure DetectionEvery node can locally determine if any other node in the system is up or down.

Used to avoid attempts to communicate with unreachable nodes.

How?• Make use of Φ Accrual Failure Detector (emits a value which represent a suspicion level for

each of monitored nodes)

• With Φ=1, likelihood of mistake: 10%

• With Φ=2, likelihood of mistake: 1%

• and so on…

Page 112: Presentation of Apache Cassandra

Bootstrapping (adding a new node in the cluster)Process of getting data from other nodes in the ring for a new node that starts for the first time.

How?• When the new node enters the cluster, it chooses a random token for its

position in the ring.

• It also reads its configuration file which contains the seeds (initial contact points) of the cluster.

• Token information is then gossiped around the cluster enabling any node to route a request for a key to the correct node.

Page 113: Presentation of Apache Cassandra

Bootstrapping (adding a new node in the cluster) (1)In Facebook’s environment…

• Node outages are often transient but may last for extended intervals.

• Failures can be of various forms such as disk failures, bad CPU, etc.

• A node failure rarely signifies a permanent departure and therefore should not result in re-balancing of the partition assignment.

• Manual error could result in the unintentional startup of new nodes.

• To that effect, every message contains the cluster name of each Cassandra instance.

• An admin uses a cmd tool or a browser to connect to a Cassandra node and issue a membership change to join or leave a cluster.

Page 114: Presentation of Apache Cassandra

Scaling the ClusterAdding a new node on the system in order to alleviate another heavily loaded node.

How?

• Gets assigned a token.

• Splits the responsibility range of the other node.

• Data are streamed between the nodes using kernel-kernel copy techniques.

• Data are transferred at approximately the rate of 40 MB/sec.

Page 115: Presentation of Apache Cassandra

Local Persistence • Cassandra relies on the local file system for data persistence.

• The data is represented on disk using a format that lends itself to efficient data retrieval.

Page 116: Presentation of Apache Cassandra

Implementation DetailsThe Cassandra process on a single machine primarily consists of:

I. A partitioning module,

II. The cluster membership and failure detection module,

III. The storage engine module.

Each of these modules has been implemented from the ground up using Java.

The II) is built on top of a network layer which uses non-blocking I/O.

Application relate messages for replication and request routing relies on TCP.

Page 117: Presentation of Apache Cassandra

Implementation Details (1)The request routing modules are implemented using a certain state machine.

When a read/write request arrives at any node in the cluster the state machine…

I. Identifies the node(s) that own the data for the keyII. Routes the requests to the nodes and wait for the responses to arriveIII. If the replies do not arrive within a configured timeout value fail the

requestIV. Figures out the latest response based on a timestampV. Schedules a repair of the data at any replica if they do not have the latest

piece of data.

Page 118: Presentation of Apache Cassandra

No coordination at all?

“We have learnt that having some amount of coordination is essential to making the implementation of some distributed features tractable”

-Lakshman & Malik

• Integration with Zookeeper → can be used for various tasks in large scale distributed systems.

Page 119: Presentation of Apache Cassandra

WRITE/READ REQUESTS

Page 120: Presentation of Apache Cassandra

Write Request Flow

Page 121: Presentation of Apache Cassandra

Write Request Flow (1)

Page 122: Presentation of Apache Cassandra

Inside the Node (1)

Page 123: Presentation of Apache Cassandra

Inside the Node (2)

Page 124: Presentation of Apache Cassandra

Inside the Node (3)

Page 125: Presentation of Apache Cassandra

Inside the Node (4)

Page 126: Presentation of Apache Cassandra

Write Request Flow in short

Page 127: Presentation of Apache Cassandra

In case of a Node Failure...

I. A locally stored hint with a specified time to live

II. When the nodes are available again, the write operation is sent

Page 128: Presentation of Apache Cassandra

How is a Memtable flushed on the disk? • A background thread keeps checking the size of all the

Memtables while the clients keep writing on the cluster

• If one of the above conditions is met, a new Memtable is created and the previous one is marked for flushing.i. node’s global memory thresholds have been reached, ii. commit log is full, iii. a table level interval has been reached

Page 129: Presentation of Apache Cassandra

How is a Memtable flushed on the disk? (1) • Another thread (or multiple threads) flushes all the marked Memtables on

the disc.

• The commit log segments corresponding to the entries of the flushed Memtable are marked for recycling.

• A bloom filter and an index file are created.

Page 130: Presentation of Apache Cassandra

CompactionWhen the number of SStables has been increased, Cassandra automatically merges multiple SStables, based on an algorithm, specified in the compaction strategy.

• Optimizes read requests.

Page 131: Presentation of Apache Cassandra

Compaction

Page 132: Presentation of Apache Cassandra

Read Request Flow

Page 133: Presentation of Apache Cassandra

Read Request Flow (1)

Page 134: Presentation of Apache Cassandra

Installation, Usage, Requirements, Platforms

5.

Page 135: Presentation of Apache Cassandra

Client Interfaces and Language Support

• CQL (Cassandra Query Language) and Thrift

• Internal API: StorageProxy API available to JVM-based clients (internal use, highly specialized use-cases)

• Spark

• Hadoop (Map/Reduce jobs)

• Client Libraries for: Python, Java, .Net, Ruby, PHP, Perl, C++ etc.

Page 136: Presentation of Apache Cassandra

APIs

The Cassandra API consists of the following three simple methods:

• insert(table, key, rowMutation)

• get(table, key, columnName)

• delete(table, key, columnName)

Page 137: Presentation of Apache Cassandra

Thrift to CQL - An ongoing transition from 2012

Where the legacy Thrift API exposes the internal storage structure of Cassandra pretty much directly, CQL provides a thin abstraction layer over this internal structure.

Page 138: Presentation of Apache Cassandra

CQL

• The primary language for communicating with the Cassandra database.

• Most basic way to interact with Cassandra is using the CQL shell, cqlsh.

• Very similar syntax with SQL.

• Does not support creation of supercolumns.

Page 139: Presentation of Apache Cassandra

Cassandra as a Cloud DatabaseMeets all the requirements of a Cloud Database:

• Transparent elasticity

• Transparent scalability

• High availability

• Security

• Easy data distribution

• Data redundancy

• Support all data formats

• Low cost

• Simple manageability

Page 140: Presentation of Apache Cassandra

Integration with other tools

BI Tools:

• MS Excel

• Pentaho

• Tableau

• Jaspersoft

• Talend

Page 141: Presentation of Apache Cassandra

Monitoring Cassandra

• Integration with Ganglia (distributed performance tool).

• Several system level metrics have been exposed to Gaglia.

• Helps in understanding the system’s behavior in production conditions.

Page 143: Presentation of Apache Cassandra

Steps

Page 144: Presentation of Apache Cassandra

Creating a Virtual Machine Requirements:• Install Ubuntu Server 12.04 LTS 64 bit OS (any linux system

with linux kernel -2.6.x or later)• Update the OS

• sudo apt-get update (Internet Connection is mandatory)

Page 145: Presentation of Apache Cassandra

Installing Virtual Machine

• Download and install Virtual Box• Steps:

Page 147: Presentation of Apache Cassandra

Steps• Open a terminal window• Navigate to the Cassandra folder (via cd command) and then to bin directory e.g.

robinsmac:dev robin$ cd dsc-cassandra-1.2.2/bin • Start Cassandra on terminal e.g. robinsmac:bin robin$ ./cqlsh . • Your terminal window should look like this

Page 149: Presentation of Apache Cassandra

Steps• Start Cassandra in foreground mode

• cd home/virtualmachine_name/cassandra/apache-cassandra-2.0.14- bin

• bin/cassandra –f• Test Cassandra

• bin/cqlsh

Page 150: Presentation of Apache Cassandra

Steps (1)• Untar Cassandra

• cd Cassandra

• tar –xvf apache-cassandra-2.0.14-bin.tar

• Create necessary directories and change ownership

• Sudo mkdir /var/lib/Cassandra

• Sudo mkdir /var/log/Cassandra

• Sudo chown –R $USER:$GROUP /var/lib/Cassandra

• Sudo chown –R$USER:$GROUP /var/log/Cassandra

Page 151: Presentation of Apache Cassandra

Ubuntu commands

Terminal commands

Result

Page 152: Presentation of Apache Cassandra

Demo

6.

Page 153: Presentation of Apache Cassandra
Page 154: Presentation of Apache Cassandra
Page 155: Presentation of Apache Cassandra
Page 156: Presentation of Apache Cassandra

OpsCenter

Page 157: Presentation of Apache Cassandra

OpsCenter (1)

Page 158: Presentation of Apache Cassandra

OpsCenter (2)

Page 159: Presentation of Apache Cassandra

References

7.

Page 160: Presentation of Apache Cassandra

Main Reference

Page 161: Presentation of Apache Cassandra

References1. A. Lakshman and P. Malik. Cassandra: a decentralized structured storage system. SIGOPS Oper. Syst. Rev., 44(2):

35-40, 2010

2. Cassandra.apache.org. (2016). Apache Cassandra. [online] Available at: http://cassandra.apache.org/

3. Cattell, R. (2011). Scalable SQL and NoSQL data stores. ACM SIGMOD Record, 39(4), p.12.

4. Cockcroft, A. (2011). Benchmarking Cassandra Scalability on AWS - Over a million writes per second. [online]

Techblog.netflix.com. Available at: http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-

on.html

5. Cs.uwaterloo.ca. (2016). [online] Available at:

https://cs.uwaterloo.ca/~tozsu/courses/CS848/W15/presentations/Cassandra.pdf

6. Chang, F., Dean, J., Ghemawat, S., Hsieh, W., Wallach, D., Burrows, M., Chandra, T., Fikes, A. and Gruber, R.

(2008). Bigtable. ACM Transactions on Computer Systems, 26(2), pp.1-26.

7. DataStax. (2016). Case Studies. [online] Available at: http://www.datastax.com/resources/casestudies

Page 162: Presentation of Apache Cassandra

References (1)8. Docs.datastax.com. (2016). About hinted handoff writes. [online] Available at:

https://docs.datastax.com/en/cassandra/2.0/cassandra/dml/dml_about_hh_c.html

9. DataStax. (2016). Customers. [online] Available at: http://www.datastax.com/customers

10. Docs.datastax.com. (2016). Introduction to Cassandra Query Language. [online] Available at:

https://docs.datastax.com/en/cql/3.1/cql/cql_intro_c.html

11. DataStax. (2014). What on earth are people using Cassandra for anyway?. [online] Available at:

http://www.datastax.com/2014/06/what-are-people-using-cassandra-for

12. DataStax. (2012). A thrift to CQL3 upgrade guide. [online] Available at: http://www.datastax.com/dev/blog/thrift-

to-cql3

13. DataStax. (2012). Virtual nodes in Cassandra 1.2. [online] Available at:

http://www.datastax.com/dev/blog/virtual-nodes-in-cassandra-1-2

14. DataStax. (2012). Schema in Cassandra 1.1. [online] Available at: http://www.datastax.com/dev/blog/schema-

in-cassandra-1-1

Page 163: Presentation of Apache Cassandra

References (2)15. DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshman, A., Pilchin, A., Sivasubramanian, S.,

Vosshall, P. and Vogels, W. (2007). Dynamo. ACM SIGOPS Operating Systems Review, 41(6), p.205.

16. Docs.datastax.com. (2016). Architecture in brief. [online] Available at:

https://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureIntro_c.html

17. Docs.datastax.com. (2016). How data is distributed across a cluster (using virtual nodes). [online] Available at:

http://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureDataDistributeDistribute_c.html

18. Docs.datastax.com. (2016). Internode communications (gossip). [online] Available at: https://

docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureGossipAbout_c.html

19. D0.awsstatic.com. (2016). [online] Available at: https://d0.awsstatic.com/whitepapers/Cassandra_on_AWS.pdf

20. Edlich, P. (2016). NOSQL Databases. [online] Nosql-database.org. Available at: http://nosql-database.org/

Page 164: Presentation of Apache Cassandra

References (3)

21. Edu.dmst.aueb.gr. (2016). Πύλη Τηλεκπαίδευσης Τμήματος Διοικητικής Επιστήμης & Τεχνολογίας: Είσοδος στο

δικτυακό τόπο. [online] Available at: https://edu.dmst.aueb.gr/pluginfile.php/3614/mod_resource/content/0/

BigDataSystems.pdf

22. En.wikipedia.org. (2016). Apache Cassandra. [online] Available at:

https://en.wikipedia.org/wiki/Apache_Cassandra

23. En.wikipedia.org. (2016). DataStax. [online] Available at: https://en.wikipedia.org/wiki/DataStax

24. En.wikipedia.org. (2016). Log-structured merge-tree. [online] Available at:

https://en.wikipedia.org/wiki/Log-structured_merge-tree

25. Exponential.io. (2016). Cassandra terminology - Exponential.io . [online] Available at:

http://exponential.io/blog/2015/01/08/cassandra-terminology/

Page 165: Presentation of Apache Cassandra

References (4)

26. Facebook.com. (2016). Cassandra – A structured storage system on a P2P Network. [online] Available at:

https://www.facebook.com/notes/facebook-engineering/cassandra-a-structured-storage-system-on-a-p2p-

network/24413138919/

27. O&#039, P. and Neil, E. (2016). The Log-Structured Merge-Tree (LSM-Tree). [online] Citeseerx.ist.psu.edu.

Available at: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.44.2782

28. YouTube. (2016). Getting Started with Cassandra CQL on a Mac. [online] Available at: https://

www.youtube.com/watch?v=9zQc959w6Ho

29. YouTube. (2016). Installing Apache Cassandra In Windows. [online] Available at:

https://www.youtube.com/watch?v=fspXzjwfii0

30. YouTube. (2016). Part 1 - Apache Cassandra Installation From Scratch - Ubuntu. [online] Available at:

https://www.youtube.com/watch?v=ToztU48UxYE

Page 166: Presentation of Apache Cassandra

References (5)31. Weinberger, M. (2016). The Facebook engineer who taught its data how to dance is solving a new complicated

problem. [online] Business Insider. Available at: http://www.businessinsider.com/hedvig-avinash-lakshman-

facebook-cassandra-data-storage-2015-3

32. Wiki.apache.org. (2016). FrontPage - Cassandra Wiki. [online] Available at: https://wiki.apache.org/cassandra/

33. www.tutorialspoint.com. (2016). Cassandra Introduction. [online] Available at:

https://www.tutorialspoint.com/cassandra/cassandra_introduction.htm