About "Apache Cassandra"

download About "Apache Cassandra"

of 87

  • date post

    09-Apr-2017
  • Category

    Technology

  • view

    3.924
  • download

    0

Embed Size (px)

Transcript of About "Apache Cassandra"

  • APACHE CASSANDRAScalability, Performance and Fault Tolerance

    in Distributed databases

    Jihyun.An (jihyun.an@kt.com)

    18, June 2013

    mailto:jihyun.an@kt.com

  • TABLE OF CONTENTS

    Preface

    Basic Concepts

    P2P Architecture

    Primitive Data Model & Architecture

    Basic Operations

    Fault Management

    Consistency

    Performance

    Problem handling

  • TABLE OF CONTENTS (NEXT TIME)

    Maintaining

    Cluster Management

    Node Management

    Problem Handling

    Tuning

    Playing (for Development, Client stance)

    Designing

    Client

    Thrift

    Native

    CQL

    3rd party

    Hector

    OCM

    Extension

    Baas.io

    Hadoop

  • PREFACE

  • OUR WORLD

    Traditional DBMS is very valuable

    Storage(+Memory) and Computational Resources cost is cheap (than before)

    But we meet new section

    Big data

    (near) Real time

    Complex and various requirement

    Recommendation

    Find FOAF

    Event Driven Trigging

    User Session

  • OUR WORLD (CONT)

    Complex applications combine difference types of problems

    Different language -> more productive

    ex: Functional language, Multiprocessing optimized language

    Polyglot persistent layer

    Performance vs Durability?

    Reliability?

  • TRADITIONAL DBMS

    Relational Model

    Well-defined Schema

    Access with Selection/Projection

    Derived from Joining/Grouping/Aggregating(Counting..)

    Small data (from refined)

    But

    Painful data model changes

    Hard to scale out

    Ineffective in handling large volumes of data

    Not considered with hardware

  • TRADITIONAL DBMS (CONT)

    Has many constraints for ACID

    PK/FK & checking

    Domain Type checking

    .. checking checking

    Lots of IO / Processing

    OODBMS, ORDBMS

    Good but .. more more checking / processing

    Not well with Disk IO

  • NOSQL

    Key-value store

    Column : Cassandra, Hbase, Bigtable

    Others : Redis, Dynamo, Voldemort, Hazelcast

    Document oriented

    MongoDB, CouchDB

    Graph store

    Neo4j, Orient DB, BigOWL, FlockDB ..

  • NOSQL (CONT)

    Benefits

    Higher performance

    Higher scalability

    Flexible Datamodel

    More effective for some case

    Less administrative overhead

    Drawbacks Limited Transactions

    Relaxed Consistency

    Unconstrained data

    Limited ad-hoc query capabilities

    Limited administrative aid tools

  • CAP

    Brewers theorem

    We can pick two of

    Consistency

    Availability

    Partition tolerance

    A

    C P

    Amazon Dynamo derivatives

    Cassandra, Voldemort, CouchDB

    , Riak

    Neo4j, Bigtable

    Bigtable derivatives : MongoDB, Hbase

    Hypertable, Redis

    Relational:

    MySQL, MSSQL,

    Postgres

  • Dynamo

    (Architecture)

    BigTable

    (Data model)

    Cassandra

    (Apache) Cassandra is a free, open-source, high scalable,

    distributed database system for managing large amounts of data

    Written in JAVA

    Running on JVM

    References :

    BigTable (http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en//archive/bigtable-osdi06.pdf)

    Dynamo (http://web.archive.org/web/20120129154946/http://s3.amazonaws.com/AllThingsDistributed/sosp/amazon-dynamo-sosp2007.pdf)

    http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/archive/bigtable-osdi06.pdfhttp://web.archive.org/web/20120129154946/http:/s3.amazonaws.com/AllThingsDistributed/sosp/amazon-dynamo-sosp2007.pdf

  • DESIGN GOALS

    Simple Key/Value(Column) store

    limited on storage

    No support anything (aggregating, grouping ) but basic operation (CRUD, Range access)

    But extendable

    Hadoop (MR, HDFS, Pig, Hive ..)

    ESP

    Distributed Processing Interface (ex: BSP, MR)

    Baas.io

  • DESIGN GOALS (CONT)

    High Availability

    Decentralized

    Everyone can accessor

    Replication & Their access

    Multi DC support

    Eventual consistency

    Less write complexity

    Audit and repair when read

    Possible tuning -> Trade offs between consistency, durability and latency

  • DESIGN GOALS (CONT)

    Incremental scalability

    Equal Member

    Linear Scalability

    Unlimited space

    Write / Read throughput increase linearly by add node(member)

    Low total cost

    Minimize administrative work

    Automatic partitioning

    Flush / compaction

    Data balancing / moving

    Virtual nodes (since v1.2)

    Middle powered nodes make good performance

    Collaborating work will make powerful performance and huge space

  • FOUNDER & HISTORY

    Founder

    Avinash Lakshman (one of the authors of Amazon's Dynamo)

    Prashant Malik ( Facebook Engineer )

    Developer

    About 50

    History

    Open sourced by Facebook in July 2008

    Became an Apache Incubator project in March 2009

    Graduated to a top-level project in Feb 2010

    0.6 released (added support for integrated caching, and Apache Hadoop MapReduce) in Apr 2010

    0.7 released (added secondary indexes and online schema change) in Jan 2011

    0.8 released (added the Cassandra Query Language (CQL), self-tuning memtables, and support for zero-downtime upgrades) in Jun 2011

    1.0 released (added integrated compression, leveled compaction, and improved read performance) in Oct 2011

    1.1 released (added self-tuning caches, row-level isolation, and support for mixed ssd/spinning disk deployments) in Apr 2012

    1.2 released (added clustering across virtual nodes, inter-node communication, atomic batches, and request tracing) in Jan 2013

  • PROMINENT USERS

    User Cluster size Node count Usage Now

    Facebook >200 ? Inbox search Abandoned,Moved to HBase

    Cisco WebEx ? ? User feed, activity OK

    Netflix ? ? Backend OK

    Formspring ? (26 million account with 10 m responsed per day)

    ? Social-graph data OK

    Urban airship, Rackspace, Open X, Twitter (preparing move to)

  • BASIC CONCEPTS

  • P2P ARCHITECTURE

    All nodes are same (has equality)

    No single point of failure / Decentralized

    Compare with

    mongoDB

    broker structure (cubrid )

    Master / slave

  • P2P ARCHITECTURE

    Driven linear scalability

    References :

    http://dev.kthcorp.com/2011/12/07/cassandra-on-aws-100-million-writ/

    http://www.google.co.kr/url?sa=i&rct=j&q=&esrc=s&frm=1&source=images&cd=&cad=rja&docid=3YSDAgGnuMHm4M&tbnid=rpuahptcjv4gvM:&ved=0CAUQjRw&url=http://readwrite.com/2011/11/24/netflix-benchmarks-cassandra-o&ei=JfjAUabmMIiQkAX4loDIBQ&bvm=bv.47883778,d.dGI&psig=AFQjCNGBaG1NPmCzZ7tjSKwBgzwboyvxGA&ust=1371687139804572http://www.google.co.kr/url?sa=i&rct=j&q=&esrc=s&frm=1&source=images&cd=&cad=rja&docid=3YSDAgGnuMHm4M&tbnid=rpuahptcjv4gvM:&ved=0CAUQjRw&url=http://readwrite.com/2011/11/24/netflix-benchmarks-cassandra-o&ei=JfjAUabmMIiQkAX4loDIBQ&bvm=bv.47883778,d.dGI&psig=AFQjCNGBaG1NPmCzZ7tjSKwBgzwboyvxGA&ust=1371687139804572http://www.google.co.kr/url?sa=i&rct=j&q=&esrc=s&frm=1&source=images&cd=&cad=rja&docid=bYZ2I3MeFYR8PM&tbnid=v93nfjfUKSBHVM:&ved=0CAUQjRw&url=http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html&ei=oPjAUfHfBsSmkwWV0oGQAQ&bvm=bv.47883778,d.dGI&psig=AFQjCNGBaG1NPmCzZ7tjSKwBgzwboyvxGA&ust=1371687139804572http://www.google.co.kr/url?sa=i&rct=j&q=&esrc=s&frm=1&source=images&cd=&cad=rja&docid=bYZ2I3MeFYR8PM&tbnid=v93nfjfUKSBHVM:&ved=0CAUQjRw&url=http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html&ei=oPjAUfHfBsSmkwWV0oGQAQ&bvm=bv.47883778,d.dGI&psig=AFQjCNGBaG1NPmCzZ7tjSKwBgzwboyvxGA&ust=1371687139804572http://dev.kthcorp.com/2011/12/07/cassandra-on-aws-100-million-writ/

  • PRIMITIVE DATA MODEL & ARCHITECTURE

  • COLUMN

    Basic and primitive type (the smallest increment of data)

    A tuple containing a name, a value and a timestamp

    Timestamp is important

    Provided by client

    Determine the most recent one

    If meet the collision, DBMS chose the latest one

    Name

    Value

    Timestamp

  • COLUMN (CONT)

    Types

    Standard: A column has a name (UUID or UTF8 )

    Composite: A column has composite name (UUID+UTF8 )

    Expiring: TTL marked

    Counter: Only has name and value, timestamp managed by server

    Super: Used to manage wide rows, inferior to using composite

    columns (DO NOT USE, All sub-columns serialized)

    Counter Name

    Value

    Name

    Name

    Value

    Timestamp

    Name

    Value

    Timestamp

  • COLUMN (CONT)

    Types (CQL3 based)

    Standard: Has one primary key.

    Composite: Has more than one primary key,

    recommended for managing wide rows.

    Expiring: Gets deleted during compaction.

    Counter: Counts occurrences of an event.

    Super: Used to manage wide rows, inferior to using

    composite columns (DO NOT USE, All sub-columns

    serialized)

    DDL : CREATE TABLE test (

    user_id varchar,

    article_id uuid,

    content varchar,

    PRIMARY KEY (user_id, article_id)

    );