NOSQL Database: Apache Cassandra

download NOSQL Database: Apache Cassandra

of 48

  • date post

    15-Jul-2015
  • Category

    Software

  • view

    537
  • download

    4

Embed Size (px)

Transcript of NOSQL Database: Apache Cassandra

  • NoSQL Database: Apache Cassandrawww.folio3.com@folio_3

    AG, Folio3 (Pvt) Ltd.

  • Folio3 Overviewwww.folio3.com@folio_3

    AG, Folio3 (Pvt) Ltd.

  • Who We AreWe are a Development Partner for our customersDesign software solutions, not just implement themFocus on the solution Platform and technology agnostic

    Expertise in building applications that are:

    MobileSocialCloud-basedGamified

    AG, Folio3 (Pvt) Ltd.

  • What We DoAreas of FocusEnterpriseCustom enterprise applicationsProduct development targeting the enterprise

    MobileCustom mobile apps for iOS, Android, Windows Phone, BB OSMobile platform (server-to-server) development

    Social MediaCMS based websites for consumers and enterprise (corporate, consumer, community & social networking)Social media platform development (enterprise & consumer)

    AG, Folio3 (Pvt) Ltd.

  • Folio3 At a GlanceFounded in 2005Over 200 full time employeesOffices in the US, Canada, Bulgaria & Pakistan

    Palo Alto, CA.Sofia, BulgariaKarachi, PakistanToronto, Canada

    AG, Folio3 (Pvt) Ltd.

  • Areas of Focus: EnterpriseAutomating workflowsCloud based solutionsApplication integrationPlatform developmentHealthcareMobile EnterpriseDigital Media Supply Chain

    AG, Folio3 (Pvt) Ltd.

  • Some of Our Enterprise Clients

    AG, Folio3 (Pvt) Ltd.

  • Areas of Focus: MobileSerious enterprise applications for Banks, BusinessesFun consumer apps for app discovery, interaction, exercise gamification and playEducational appsAugmented Reality appsMobile Platforms

    AG, Folio3 (Pvt) Ltd.

  • Some of Our Mobile Clients

    AG, Folio3 (Pvt) Ltd.

  • Areas of Focus: Web & Social MediaCommunity Sites based on Content Management SystemsEnterprise Social NetworkingSocial Games for Facebook & MobileCompanion Apps for games

    AG, Folio3 (Pvt) Ltd.

  • Some of Our Web Clients

    AG, Folio3 (Pvt) Ltd.

  • NoSQL Database: Apache Cassandrawww.folio3.com@folio_3

    AG, Folio3 (Pvt) Ltd.

  • AgendaWhat is NOSQL?Motivations for NOSQL?Brewers CAP TheoremTaxonomy of NOSQL databasesApache CassandraFeaturesData ModelConsistencyOperationsCluster MembershipWhatDoes NOSQL means for RDBMS?

    AG, Folio3 (Pvt) Ltd.

  • What is NOSQL?Refers to databases that differs from traditional relational database management system (RDBMS)Distributed, flexible, horizontally scalable data storesConfusion with the term NOSQLNOSQL != No SQL (or Anti-SQL)NOSQL = Not Only SQLNOSQL is an inaccurate term since it is commonly used to refer to "non-relational" databases but the term has stuck

    AG, Folio3 (Pvt) Ltd.

  • Motivations for NOSQLClassical RDBMS unsuitable for today's web applications because:Performance (Latency): VariableFlexibility: LowScalability: VariableFunctionality

    AG, Folio3 (Pvt) Ltd.

  • Brewer's CAP TheormConsistency (C)Availability (A)Partition Tolerance (P)Pick any twoMost NOSQL databases sacrifice Consistency in favor of high Availability and Performance

    AG, Folio3 (Pvt) Ltd.

  • Taxonomy of NOSQLKey/Value Stores - Distributed Hash Tables (DHT)Memcached, Amazons Dynamo, Redis, PStoreDocument StoresSemi structured data (stores entire documents)CouchDB, MongoDB, RDDB, RiakGraph Databases *Based on graph theoryActiveRDF, AllegroGraph, Neo4JObject Database *Versant, ObjectivityColumn-oriented Stores* these are considered soft NOSQL databases and are usually in NOSQL category because of being "non-relational".

    AG, Folio3 (Pvt) Ltd.

  • Column-Oriented Data StoresSemi-structured column-based data storesStores each column separately so that aggregate operations for one column of the entire table are significantly quicker than the traditional row storage modelPopular examplesHadoop/HBASEApache CassandraGoogle's BigTableHyperTableAmazon's SimpleDB

    AG, Folio3 (Pvt) Ltd.

  • Apache CassandraFully distributed column oriented data storeAlso provides Map Reduce implementation using Hadoop (increased performance)Based on Google's BigTable (Data Model) and Amazon's Dynamo (Consistency & Partition Tolerance)Cassandra values Availability and Partitioning tolerance (AP) while providing tunable consistency levels.

    AG, Folio3 (Pvt) Ltd.

  • HistoryDeveloped at FacebookReleased as open source project on Google Code in July 2008Became an Apache IncubatorProject in March 2009Became a top levelApache project in February 2010PerformanceRumors of Facebook having started working on its own separate version of Cassandra

    AG, Folio3 (Pvt) Ltd.

  • FeaturesFully DistributedHighly ScalableFault Tolerant (No single point of failure)Tunable Consistency (Eventually Consistent)Semi-structured key-value storeHigh AvailabilityNo Referential IntegrityNo Joins

    AG, Folio3 (Pvt) Ltd.

  • Data ModelKeySpace (Uppermost namespace)Column Family / Super Column Family (analogous to table)Super ColumnColumn (Name, Value, Timestamp)Rows are referenced through keysEach column is stored in a separate physical file

    AG, Folio3 (Pvt) Ltd.

  • Standard Column Family

    AG, Folio3 (Pvt) Ltd.

  • Super Column Family

    AG, Folio3 (Pvt) Ltd.

  • Super Column Family: Static/Static

    AG, Folio3 (Pvt) Ltd.

  • Super Column Family: Static/Static

    AG, Folio3 (Pvt) Ltd.

  • Super Column Family: Static/Dynamic

    AG, Folio3 (Pvt) Ltd.

  • Super Column Family: Static/Dynamic

    AG, Folio3 (Pvt) Ltd.

  • Super Column Family: Dynamic/Static

    AG, Folio3 (Pvt) Ltd.

  • Super Column Family: Dynamic/Static

    AG, Folio3 (Pvt) Ltd.

  • Super Column Family: Dynamic/Dynamic

    AG, Folio3 (Pvt) Ltd.

  • Super Column Family: Dynamic/Dynamic

    AG, Folio3 (Pvt) Ltd.

  • Apache Cassandra: ConsistencyConsistency refers to whether a system is left in a consistent state after an operation. In distributed data systems like Cassandra, this usually means that once a writer has written, all readers will see that write.If W + R > N, you will have strong consistent behavior; that is, readers will always see the most recent write W is the number of nodes to block for on writeR is the number to block for on readsN is the replication factor (number of replicas)

    AG, Folio3 (Pvt) Ltd.

  • Apache Cassandra: ConsistencyRelational databases provide strong consistency (ACID)Cassandra provide eventual consistency (BASE) meaning the database will eventually reach a consistent stateQUORUM reads and writes gives consistency while still allowing availabilityQ = (N / 2) + 1 (simple majority)If latency is more important than consistency, you can lower values for either or both W and R.

    AG, Folio3 (Pvt) Ltd.

  • Apache Cassandra: Consistency LevelsWriteZEROANYONEQUORUMALLReadZEROANYONEQUORUMALL

    AG, Folio3 (Pvt) Ltd.

  • Write OperationClient sends a write request to a random node; the random node forwards the request to the proper node (1st replica responsible for the partition - coordinator)Coordinator sends requests to N replicasIf W replicas confirm the write operation then OKAlways writable, hinted handoff (If a replica node for the key is down, Cassandra will write a hint to the live replica node indicating that the write needs to be replayed to the unavailable node.)

    AG, Folio3 (Pvt) Ltd.

  • Read OperationCoordinator sends requests to N replicas, if R replicas respond then OKIf different versions are returned then reconcile and write back the reconciled version (Read Repair)

    AG, Folio3 (Pvt) Ltd.

  • Cluster MembershipGossip ProtocolEvery T seconds each node increments its heartbeat counter and gossips to another node about the state of the cluster; the receiving node merges the cluster info with its own copyCluster state (node in/out, failure) propagated quickly: O(LogN) where N is the number of nodes in the cluster

    AG, Folio3 (Pvt) Ltd.

  • Storage RingCassandra cluster nodes are organized in a virtual ring.Each node has a single unique token that defines its place in the ring and which keys it is responsible forKey ranges are adjusted when the nodes join or leave

    AG, Folio3 (Pvt) Ltd.

  • Apache Cassandra: MySQL ComparisonMySQL (> 50 GB data)Read Average: ~ 350 msWrite Average: ~ 300 msCassandra (> 50 GB data)Read Average: 15 msWrite Average: 0.12 ms

    AG, Folio3 (Pvt) Ltd.

  • Apache Cassandra: Client APILow level APIThriftHigh Level APIJavaHector, Pelops, Kundera.NETFluentCassandra, AquilesPythonTelephus, PycassaPHPphpcassa, SimpleCassie

    AG, Folio3 (Pvt) Ltd.

  • Apache Cassandra: Where to Use?Use Cassandra, if you want/needHigh write throughputNear-Linear scalabilityAutomated replication/fault toleranceCan tolerate low consistencyCan tolerate missing RDBMS features

    AG, Folio3 (Pvt) Ltd.

  • Apache Cassandra: UsersFacebook (of course)To power inbox search (previously)TwitterTo handle user relationships, analytics (but not for tweets)Digg & RedditBoth use Cassandra to handle user comments and votesRackspaceIBMTo build scalable email systemCisco's WebExTo store user feed and activity in near real time

    AG, Folio3 (Pvt) Ltd.

  • What does NOSQL mean for the future of RDBMS?No worries! RDBMSs are here to stay for the foreseeable futureNOSQL data stores can be used in combination with RDBMS in some situationsNOSQL still has a long wayto go,in order to reach the widespread (mainstream) use and support of the RDBMS

    AG, Folio3 (Pvt) Ltd.

  • Weakness of NOSQLNo or limited support for complex queriesNo transactions available (operations are atomic)No standard interface for