NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH)...

125
NoSQL Databases Amir H. Payberah [email protected] KTH Royal Institute of Technology Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 1 / 96

Transcript of NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH)...

Page 1: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

NoSQL Databases

Amir H. [email protected]

KTH Royal Institute of Technology

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 1 / 96

Page 2: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

Database and Database Management System

I Database: an organized collection of data.

I Database Management System (DBMS): a software that interactswith users, other applications, and the database itself to captureand analyze data.

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 2 / 96

Page 3: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

Relational Databases Management Systems (RDMBSs)

I RDMBSs: the dominant technology for storing structured data inweb and business applications.

I SQL is good• Rich language and toolset• Easy to use and integrate• Many vendors

I They promise: ACID

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 3 / 96

Page 4: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

ACID Properties

I Atomicity• All included statements in a transaction are either executed or the

whole transaction is aborted without affecting the database.

I Consistency• A database is in a consistent state before and after a transaction.

I Isolation• Transactions can not see uncommitted changes in the database.

I Durability• Changes are written to a disk before a database commits a transaction

so that committed data cannot be lost through a power failure.

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 4 / 96

Page 5: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

ACID Properties

I Atomicity• All included statements in a transaction are either executed or the

whole transaction is aborted without affecting the database.

I Consistency• A database is in a consistent state before and after a transaction.

I Isolation• Transactions can not see uncommitted changes in the database.

I Durability• Changes are written to a disk before a database commits a transaction

so that committed data cannot be lost through a power failure.

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 4 / 96

Page 6: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

ACID Properties

I Atomicity• All included statements in a transaction are either executed or the

whole transaction is aborted without affecting the database.

I Consistency• A database is in a consistent state before and after a transaction.

I Isolation• Transactions can not see uncommitted changes in the database.

I Durability• Changes are written to a disk before a database commits a transaction

so that committed data cannot be lost through a power failure.

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 4 / 96

Page 7: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

ACID Properties

I Atomicity• All included statements in a transaction are either executed or the

whole transaction is aborted without affecting the database.

I Consistency• A database is in a consistent state before and after a transaction.

I Isolation• Transactions can not see uncommitted changes in the database.

I Durability• Changes are written to a disk before a database commits a transaction

so that committed data cannot be lost through a power failure.

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 4 / 96

Page 8: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

RDBMS Challenges

I Web-based applications caused spikes.• Internet-scale data size• High read-write rates• Frequent schema changes

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 5 / 96

Page 9: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

Let’s Scale RDBMSs

I RDBMS were not designed to be distributed.

I Possible solutions:• Replication• Sharding

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 6 / 96

Page 10: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

Let’s Scale RDBMSs - Replication

I Master/Slave architecture

I Scales read operations

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 7 / 96

Page 11: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

Let’s Scale RDBMSs - Sharding

I Dividing the database across many machines.

I It scales read and write operations.

I Cannot execute transactions across shards (partitions).

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 8 / 96

Page 12: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

Scaling RDBMSs is Expensive and Inefficient

[http://www.couchbase.com/sites/default/files/uploads/all/whitepapers/NoSQLWhitepaper.pdf]

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 9 / 96

Page 13: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 10 / 96

Page 14: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

NoSQL

I Avoidance of unneeded complexity

I High throughput

I Horizontal scalability and running on commodity hardware

I Compromising reliability for better performance

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 11 / 96

Page 15: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

NoSQL Cost and Performance

[http://www.couchbase.com/sites/default/files/uploads/all/whitepapers/NoSQLWhitepaper.pdf]

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 12 / 96

Page 16: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

RDBMS vs. NoSQL

[http://www.couchbase.com/sites/default/files/uploads/all/whitepapers/NoSQLWhitepaper.pdf]

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 13 / 96

Page 17: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

NoSQL Data Models

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 14 / 96

Page 18: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

NoSQL Data Models

[http://highlyscalable.wordpress.com/2012/03/01/nosql-data-modeling-techniques]

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 15 / 96

Page 19: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

Key-Value Data Model

I Collection of key/value pairs.

I Ordered Key-Value: processing over key ranges.

I Dynamo, Scalaris, Voldemort, Riak, ...

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 16 / 96

Page 20: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

Column-Oriented Data Model

I Similar to a key/value store, but the value can have multiple at-tributes (Columns).

I Column: a set of data values of a particular type.

I Store and process data by column instead of row.

I BigTable, Hbase, Cassandra, ...

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96

Page 21: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

Document Data Model

I Similar to a column-oriented store, but values can have complexdocuments, instead of fixed format.

I Flexible schema.

I XML, YAML, JSON, and BSON.

I CouchDB, MongoDB, ...

{

FirstName: "Bob",

Address: "5 Oak St.",

Hobby: "sailing"

}

{

FirstName: "Jonathan",

Address: "15 Wanamassa Point Road",

Children: [

{Name: "Michael", Age: 10},

{Name: "Jennifer", Age: 8},

]

}

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 18 / 96

Page 22: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

Graph Data Model

I Uses graph structures with nodes, edges, and properties to representand store data.

I Neo4J, InfoGrid, ...

[http://en.wikipedia.org/wiki/Graph database]

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 19 / 96

Page 23: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

Consistency

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 20 / 96

Page 24: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

Consistency

I Strong consistency• After an update completes, any subsequent access will return the

updated value.

I Eventual consistency• Does not guarantee that subsequent accesses will return the

updated value.• Inconsistency window.• If no new updates are made to the object, eventually all accesses

will return the last updated value.

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 21 / 96

Page 25: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

Consistency

I Strong consistency• After an update completes, any subsequent access will return the

updated value.

I Eventual consistency• Does not guarantee that subsequent accesses will return the

updated value.• Inconsistency window.• If no new updates are made to the object, eventually all accesses

will return the last updated value.

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 21 / 96

Page 26: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

Quorum Model

I N: the number of nodes to which a data item is replicated.

I R: the number of nodes a value has to be read from to be accepted.

I W: the number of nodes a new value has to be written to beforethe write operation is finished.

I To enforce strong consistency: R+W > N

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 22 / 96

Page 27: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

Quorum Model

I N: the number of nodes to which a data item is replicated.

I R: the number of nodes a value has to be read from to be accepted.

I W: the number of nodes a new value has to be written to beforethe write operation is finished.

I To enforce strong consistency: R+W > N

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 22 / 96

Page 28: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

Consistency vs. Availability

I The large-scale applications have to be reliable: availability + re-dundancy

I These properties are difficult to achieve with ACID properties.

I The BASE approach forfeits the ACID properties of consistency andisolation in favor of availability, graceful degradation, and perfor-mance.

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 23 / 96

Page 29: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

BASE Properties

I Basic Availability• Possibilities of faults but not a fault of the whole system.

I Soft-state• Copies of a data item may be inconsistent

I Eventually consistent• Copies becomes consistent at some later time if there are no more

updates to that data item

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 24 / 96

Page 30: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

CAP Theorem

I Consistency• Consistent state of data after the execution of an operation.

I Availability• Clients can always read and write data.

I Partition Tolerance• Continue the operation in the presence of network partitions.

I You can choose only two!

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 25 / 96

Page 31: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 26 / 96

Page 32: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

Dyanmo

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 27 / 96

Page 33: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

Dynamo

I Distributed key/value storage system

I Scalable and Highly available

I CAP: it sacrifices strong consistency for availability: always writable

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 28 / 96

Page 34: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

Data Model

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 29 / 96

Page 35: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

Data Model

[http://highlyscalable.wordpress.com/2012/03/01/nosql-data-modeling-techniques]

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 30 / 96

Page 36: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

Partitioning

I Key/value, where values are stored as objects.

I If size of data exceeds the capacity of a single machine: partitioning

I Consistent hashing is one form of sharding (partitioning).

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 31 / 96

Page 37: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

Partitioning

I Key/value, where values are stored as objects.

I If size of data exceeds the capacity of a single machine: partitioning

I Consistent hashing is one form of sharding (partitioning).

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 31 / 96

Page 38: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

Consistent Hashing

I Hash both data and nodes using the same hash function in a sameid space.

I partition = hash(d) mod n, d: data, n: number of nodes

hash("Fatemeh") = 12

hash("Ahmad") = 2

hash("Seif") = 9

hash("Jim") = 14

hash("Sverker") = 4

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 32 / 96

Page 39: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

Consistent Hashing

I Hash both data and nodes using the same hash function in a sameid space.

I partition = hash(d) mod n, d: data, n: number of nodes

hash("Fatemeh") = 12

hash("Ahmad") = 2

hash("Seif") = 9

hash("Jim") = 14

hash("Sverker") = 4

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 32 / 96

Page 40: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

Load Imbalance (1/4)

I Consistent hashing may lead to imbalance.

I Node identifiers may not be balanced.

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 33 / 96

Page 41: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

Load Imbalance (2/4)

I Consistent hashing may lead to imbalance.

I Data identifiers may not be balanced.

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 34 / 96

Page 42: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

Load Imbalance (3/4)

I Consistent hashing may lead to imbalance.

I Hot spots.

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 35 / 96

Page 43: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

Load Imbalance (4/4)

I Consistent hashing may lead to imbalance.

I Heterogeneous nodes.

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 36 / 96

Page 44: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

Load Balancing via Virtual Nodes

I Each physical node picks multiple random identifiers.

I Each identifier represents a virtual node.

I Each node runs multiple virtual nodes.

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 37 / 96

Page 45: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

Replication

I To achieve high availability and durability, data should be replicateson multiple nodes.

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 38 / 96

Page 46: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

Data Consistency

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 39 / 96

Page 47: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

Data Consistency

I Eventual consistency: updates are propagated asynchronously.

I Each update/modification of an item results in a new and immutableversion of the data.

• Multiple versions of an object may exist.

I Replicas eventually become consistent.

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 40 / 96

Page 48: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

Data Versioning (1/2)

I Use vector clocks for capturing causality, in the form of (node,counter)

• If causal: older version can be forgotten• If concurrent: conflict exists, requiring reconciliation

I Version branching can happen due to node/network failures.

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 41 / 96

Page 49: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

Data Versioning (1/2)

I Use vector clocks for capturing causality, in the form of (node,counter)

• If causal: older version can be forgotten• If concurrent: conflict exists, requiring reconciliation

I Version branching can happen due to node/network failures.

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 41 / 96

Page 50: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

Data Versioning (2/2)

I Client C1 writes new object via Sx.

I C1 updates the object via Sx.

I C1 updates the object via Sy.

I C2 reads D2 and updates theobject via Sz.

I C3 reads D3 and D4 via Sx.• The read context is a

summary of the clocks of D3and D4: [(Sx, 2), (Sy, 1), (Sz, 1)].

I Reconciliation

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 42 / 96

Page 51: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

Data Versioning (2/2)

I Client C1 writes new object via Sx.

I C1 updates the object via Sx.

I C1 updates the object via Sy.

I C2 reads D2 and updates theobject via Sz.

I C3 reads D3 and D4 via Sx.• The read context is a

summary of the clocks of D3and D4: [(Sx, 2), (Sy, 1), (Sz, 1)].

I Reconciliation

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 42 / 96

Page 52: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

Data Versioning (2/2)

I Client C1 writes new object via Sx.

I C1 updates the object via Sx.

I C1 updates the object via Sy.

I C2 reads D2 and updates theobject via Sz.

I C3 reads D3 and D4 via Sx.• The read context is a

summary of the clocks of D3and D4: [(Sx, 2), (Sy, 1), (Sz, 1)].

I Reconciliation

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 42 / 96

Page 53: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

Data Versioning (2/2)

I Client C1 writes new object via Sx.

I C1 updates the object via Sx.

I C1 updates the object via Sy.

I C2 reads D2 and updates theobject via Sz.

I C3 reads D3 and D4 via Sx.• The read context is a

summary of the clocks of D3and D4: [(Sx, 2), (Sy, 1), (Sz, 1)].

I Reconciliation

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 42 / 96

Page 54: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

Data Versioning (2/2)

I Client C1 writes new object via Sx.

I C1 updates the object via Sx.

I C1 updates the object via Sy.

I C2 reads D2 and updates theobject via Sz.

I C3 reads D3 and D4 via Sx.• The read context is a

summary of the clocks of D3and D4: [(Sx, 2), (Sy, 1), (Sz, 1)].

I Reconciliation

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 42 / 96

Page 55: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

Data Versioning (2/2)

I Client C1 writes new object via Sx.

I C1 updates the object via Sx.

I C1 updates the object via Sy.

I C2 reads D2 and updates theobject via Sz.

I C3 reads D3 and D4 via Sx.• The read context is a

summary of the clocks of D3and D4: [(Sx, 2), (Sy, 1), (Sz, 1)].

I Reconciliation

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 42 / 96

Page 56: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

Dynamo API

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 43 / 96

Page 57: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

Dynamo API

I get(key)• Return single object or list of objects with conflicting version and

context.

I put(key, context, object)• Store object and context under key.• Context encodes system metadata, e.g., version number.

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 44 / 96

Page 58: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

put Operation

I Coordinator generates new vector clock and writes the new versionlocally.

I Send to N nodes.

I Wait for response from W nodes.

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 45 / 96

Page 59: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

get Operation

I Coordinator requests existing versions from N.• Wait for response from R nodes.

I If multiple versions, return all versions that are causally unrelated.

I Divergent versions are then reconciled.

I Reconciled version written back.

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 46 / 96

Page 60: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

Sloppy Quorum

I Due to partitions, quorums might not exist.• Sloppy quorum.• Create transient replicas: N healthy

nodes from the preference list.• Reconcile after partition heals.

I Say A is unreachable.

I put will use D.

I Later, D detects A is alive.• Sends the replica to A• Removes the replica.

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 47 / 96

Page 61: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

Sloppy Quorum

I Due to partitions, quorums might not exist.• Sloppy quorum.• Create transient replicas: N healthy

nodes from the preference list.• Reconcile after partition heals.

I Say A is unreachable.

I put will use D.

I Later, D detects A is alive.• Sends the replica to A• Removes the replica.

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 47 / 96

Page 62: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

Membership Management

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 48 / 96

Page 63: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

Membership Management

I Administrator explicitly adds and removes nodes.

I Gossiping to propagate membership changes.• Eventually consistent view.• O(1) hop overlay.

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 49 / 96

Page 64: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

Adding and Removing Nodes

I A new node X added to system.• X is assigned key ranges w.r.t. its virtual servers.• For each key range, it transfers the data items.

I Removing a node: reallocation of keys is a reverse process of addingnodes.

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 50 / 96

Page 65: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

Adding and Removing Nodes

I A new node X added to system.• X is assigned key ranges w.r.t. its virtual servers.• For each key range, it transfers the data items.

I Removing a node: reallocation of keys is a reverse process of addingnodes.

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 50 / 96

Page 66: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

Failure Detection (1/2)

I Passive failure detection.• Use pings only for detection from failed to alive.

I In the absence of client requests, node A doesn’t need to know ifnode B is alive.

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 51 / 96

Page 67: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

Failure Detection (2/2)

I Anti-entropy for replica synchronization.

I Use Merkle trees for fast inconsistency detection and minimumtransfer of data.

• Nodes maintain Merkle tree of each key range.• Exchange root of Merkle tree to check if the key ranges are updated.

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 52 / 96

Page 68: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

Failure Detection (2/2)

I Anti-entropy for replica synchronization.

I Use Merkle trees for fast inconsistency detection and minimumtransfer of data.

• Nodes maintain Merkle tree of each key range.• Exchange root of Merkle tree to check if the key ranges are updated.

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 52 / 96

Page 69: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

BigTable

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 53 / 96

Page 70: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

Motivation

I Lots of (semi-)structured data at Google.• URLs, TextGreenper-user data, geographical locations, ...

I Big data• Billions of URLs, hundreds of millions of users, 100+TB of satellite

image data, ...

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 54 / 96

Page 71: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

BigTable

I Distributed multi-level map

I Fault-tolerant

I Scalable and self-managing

I CAP: strong consistency and partition tolerance

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 55 / 96

Page 72: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

Data Model

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 56 / 96

Page 73: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

Data Model

[http://highlyscalable.wordpress.com/2012/03/01/nosql-data-modeling-techniques]

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 57 / 96

Page 74: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

Column-Oriented Data Model (1/2)

I Similar to a key/value store, but the value can have multiple at-tributes (Columns).

I Column: a set of data values of a particular type.

I Store and process data by column instead of row.

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 58 / 96

Page 75: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

Columns-Oriented Data Model (2/2)

I In many analytical databases queries, few attributes are needed.

I Column values are stored contiguously on disk: reduces I/O.

[Lars George, “Hbase: The Definitive Guide”, O’Reilly, 2011]

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 59 / 96

Page 76: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

BigTable Data Model (1/5)

I Table

I Distributed multi-dimensional sparse map

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 60 / 96

Page 77: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

BigTable Data Model (2/5)

I Rows

I Every read or write in a row is atomic.

I Rows sorted in lexicographical order.

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 61 / 96

Page 78: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

BigTable Data Model (3/5)

I Column

I The basic unit of data access.

I Column families: group of (the same type) column keys.

I Column key naming: family:qualifier

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 62 / 96

Page 79: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

BigTable Data Model (4/5)

I Timestamp

I Each column value may contain multiple versions.

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 63 / 96

Page 80: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

BigTable Data Model (5/5)

I Tablet: contiguous ranges of rows stored together.

I Tables are split by the system when they become too large.

I Auto-Sharding

I Each tablet is served by exactly one tablet server.

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 64 / 96

Page 81: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

Bigtable API

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 65 / 96

Page 82: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

The Bigtable API

I Metadata operations• Create/delete tables, column families, change metadata

I Writes: single-row, atomic• write/delete cells in a row, delete all cells in a row

I Reads: read arbitrary cells in a Bigtable table• Each row read is atomic.• One row, all or specific columns, certain timestamps, and ...

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 66 / 96

Page 83: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

The Bigtable API

I Metadata operations• Create/delete tables, column families, change metadata

I Writes: single-row, atomic• write/delete cells in a row, delete all cells in a row

I Reads: read arbitrary cells in a Bigtable table• Each row read is atomic.• One row, all or specific columns, certain timestamps, and ...

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 66 / 96

Page 84: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

The Bigtable API

I Metadata operations• Create/delete tables, column families, change metadata

I Writes: single-row, atomic• write/delete cells in a row, delete all cells in a row

I Reads: read arbitrary cells in a Bigtable table• Each row read is atomic.• One row, all or specific columns, certain timestamps, and ...

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 66 / 96

Page 85: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

Writing Example

// Open the table

Table *T = OpenOrDie("/bigtable/web/webtable");

// Write a new anchor and delete an old anchor

RowMutation r1(T, "com.cnn.www");

r1.Set("anchor:www.c-span.org", "CNN");

r1.Delete("anchor:www.abc.com");

Operation op;

Apply(&op, &r1);

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 67 / 96

Page 86: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

Reading Example

Scanner scanner(T);

scanner.Lookup("com.cnn.www");

ScanStream *stream;

stream = scanner.FetchColumnFamily("anchor");

stream->SetReturnAllVersions();

for (; !stream->Done(); stream->Next()) {

printf("%s %s %lld %s\n",

scanner.RowName(),

stream->ColumnName(),

stream->MicroTimestamp(),

stream->Value());

}

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 68 / 96

Page 87: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

BigTable Architecture

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 69 / 96

Page 88: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

BigTable Cell

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 70 / 96

Page 89: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

Main Components

I Master server

I Tablet server

I Client library

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 71 / 96

Page 90: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

Master Server

I One master server.

I Assigns tablets to tablet server.

I Balances tablet server load.

I Garbage collection of unneeded files in GFS.

I Handles schema changes, e.g., table and column family creations

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 72 / 96

Page 91: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

Tablet Server

I Many tablet servers.

I Can be added or removed dynamically.

I Each manages a set of tablets (typically 10-1000 tablets/server).

I Handles read/write requests to tablets.

I Splits tablets when too large.

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 73 / 96

Page 92: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

Client Library

I Library that is linked into every client.

I Client data does not move though the master.

I Clients communicate directly with tablet servers for reads/writes.

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 74 / 96

Page 93: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

Building Blocks

I The building blocks for the BigTable are:• Google File System (GFS): raw storage• Chubby: distributed lock manager• Scheduler: schedules jobs onto machines

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 75 / 96

Page 94: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

Google File System (GFS)

I Large-scale distributed file system.

I Store log and data files.

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 76 / 96

Page 95: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

Chubby Lock Service

I Ensure there is only one active master.

I Store bootstrap location of BigTable data.

I Discover tablet servers.

I Store BigTable schema information.

I Store access control lists.

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 77 / 96

Page 96: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

Master Startup

I The master executes the following steps at startup:

• Grabs a unique master lock in Chubby, which prevents concurrentmaster instantiations.

• Scans the servers directory in Chubby to find the live servers.

• Communicates with every live tablet server to discover what tabletsare already assigned to each server.

• Scans the METADATA table to learn the set of tablets.

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 78 / 96

Page 97: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

Tablet Assignment

I 1 tablet → 1 tablet server.

I Master uses Chubby to keep tracks of set of live tablet serves andunassigned tablets.

• When a tablet server starts, it creates and acquires an exclusive lockin Chubby.

I Master detects the status of the lock of each tablet server by check-ing periodically.

I Master is responsible for finding when tablet server is no longerserving its tablets and reassigning those tablets as soon as possible.

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 79 / 96

Page 98: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

Tablet Assignment

I 1 tablet → 1 tablet server.

I Master uses Chubby to keep tracks of set of live tablet serves andunassigned tablets.

• When a tablet server starts, it creates and acquires an exclusive lockin Chubby.

I Master detects the status of the lock of each tablet server by check-ing periodically.

I Master is responsible for finding when tablet server is no longerserving its tablets and reassigning those tablets as soon as possible.

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 79 / 96

Page 99: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

Tablet Assignment

I 1 tablet → 1 tablet server.

I Master uses Chubby to keep tracks of set of live tablet serves andunassigned tablets.

• When a tablet server starts, it creates and acquires an exclusive lockin Chubby.

I Master detects the status of the lock of each tablet server by check-ing periodically.

I Master is responsible for finding when tablet server is no longerserving its tablets and reassigning those tablets as soon as possible.

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 79 / 96

Page 100: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

Tablet Assignment

I 1 tablet → 1 tablet server.

I Master uses Chubby to keep tracks of set of live tablet serves andunassigned tablets.

• When a tablet server starts, it creates and acquires an exclusive lockin Chubby.

I Master detects the status of the lock of each tablet server by check-ing periodically.

I Master is responsible for finding when tablet server is no longerserving its tablets and reassigning those tablets as soon as possible.

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 79 / 96

Page 101: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

Table Serving

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 80 / 96

Page 102: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

Finding a Tablet

I Three-level hierarchy.

I Root tablet contains location of all tablets in a special METADATAtable.

I METADATA table contains location of each tablet under a row.

I The client library caches tablet locations.

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 81 / 96

Page 103: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

Finding a Tablet

I Three-level hierarchy.

I Root tablet contains location of all tablets in a special METADATAtable.

I METADATA table contains location of each tablet under a row.

I The client library caches tablet locations.

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 81 / 96

Page 104: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

Finding a Tablet

I Three-level hierarchy.

I Root tablet contains location of all tablets in a special METADATAtable.

I METADATA table contains location of each tablet under a row.

I The client library caches tablet locations.

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 81 / 96

Page 105: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

Finding a Tablet

I Three-level hierarchy.

I Root tablet contains location of all tablets in a special METADATAtable.

I METADATA table contains location of each tablet under a row.

I The client library caches tablet locations.

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 81 / 96

Page 106: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

SSTable (1/2)

I SSTable file format used internally to store Bigtable data.

I Immutable, sorted file of key-value pairs.

I Each SSTable is stored in a GFS file.

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 82 / 96

Page 107: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

SSTable (2/2)

I Chunks of data plus a block index.• A block index is used to locate blocks.• The index is loaded into memory when the SSTable is opened.

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 83 / 96

Page 108: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

Tablet Serving (1/2)

I Updates committed to a commit log.

I Recently committed updates are stored in memory - memtable

I Older updates are stored in a sequence of SSTables.

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 84 / 96

Page 109: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

Tablet Serving (1/2)

I Updates committed to a commit log.

I Recently committed updates are stored in memory - memtable

I Older updates are stored in a sequence of SSTables.

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 84 / 96

Page 110: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

Tablet Serving (1/2)

I Updates committed to a commit log.

I Recently committed updates are stored in memory - memtable

I Older updates are stored in a sequence of SSTables.

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 84 / 96

Page 111: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

Tablet Serving (2/2)

I Strong consistency• Only one tablet server is responsible for a given piece of data.• Replication is handled on the GFS layer.

I Tradeoff with availability• If a tablet server fails, its portion of data is temporarily unavailable

until a new server is assigned.

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 85 / 96

Page 112: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

Tablet Serving (2/2)

I Strong consistency• Only one tablet server is responsible for a given piece of data.• Replication is handled on the GFS layer.

I Tradeoff with availability• If a tablet server fails, its portion of data is temporarily unavailable

until a new server is assigned.

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 85 / 96

Page 113: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

Loading Tablets

I To load a tablet, a tablet server does the following:

I Finds locaton of tablet through its METADATA.• Metadata for a tablet includes list of SSTables and set of redo

points.

I Read SSTables index blocks into memory.

I Read the commit log since the redo point and reconstructs thememtable.

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 86 / 96

Page 114: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

Compaction

I Minor compaction• Convert the memtable into an SSTable.

I Merging compaction• Reads the contents of a few SSTables and the memtable, and writes

out a new SSTable.

I Major compaction• Merging compaction that results in only one SSTable.• No deleted records, only sensitive live data.

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 87 / 96

Page 115: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

Compaction

I Minor compaction• Convert the memtable into an SSTable.

I Merging compaction• Reads the contents of a few SSTables and the memtable, and writes

out a new SSTable.

I Major compaction• Merging compaction that results in only one SSTable.• No deleted records, only sensitive live data.

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 87 / 96

Page 116: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

Compaction

I Minor compaction• Convert the memtable into an SSTable.

I Merging compaction• Reads the contents of a few SSTables and the memtable, and writes

out a new SSTable.

I Major compaction• Merging compaction that results in only one SSTable.• No deleted records, only sensitive live data.

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 87 / 96

Page 117: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

Cassandra

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 88 / 96

Page 118: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

Cassandra

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 89 / 96

Page 119: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

From Dynamo

I Symmetric P2P architecture

I Gossip based discovery and error detection

I Distributed key-value store: partitioning and topology discovery

I Eventual consistency

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 90 / 96

Page 120: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

From BigTable

I Sparse Column oriented sparse array

I SSTable disk storage• Append-only commit log• Memtable (buffering and sorting)• Immutable sstable files• Compaction

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 91 / 96

Page 121: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

Summary

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 92 / 96

Page 122: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

Summary

I NoSQL data models: key-value, column-oriented, document-oriented, graph-based

I Sharding and consistent hashing

I ACID vs. BASE

I CAP (Consistency vs. Availability)

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 93 / 96

Page 123: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

Summary

I Dynamo: key/value storage: put and get

I Data partitioning: consistent hashing

I Load balancing: virtual server

I Replication: several nodes, preference list

I Data versioning: vector clock, resolve conflict at read time by theapplication

I Membership management: join/leave by admin, gossip-based to up-date the nodes’ views, ping to detect failure

I Handling transient failure: sloppy quorum

I Handling permanent failure: Merkle tree

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 94 / 96

Page 124: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

Summary

I BigTable

I Column-oriented

I Main components: master, tablet server, client library

I Basic components: GFS, chubby, SSTable

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 95 / 96

Page 125: NoSQL Databases - Amir H. Payberah · I BigTable, Hbase, Cassandra, ... Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 17 / 96. Document Data Model I Similar to acolumn-orientedstore,

Questions?

Amir H. Payberah (KTH) NoSQL Databases 2016/09/05 96 / 96