NoSQL Databases - CouchDB By Tom Sausner. Agenda Introduction Review of NoSQL storage options CAP...

46
NoSQL Databases - CouchDB By Tom Sausner

Transcript of NoSQL Databases - CouchDB By Tom Sausner. Agenda Introduction Review of NoSQL storage options CAP...

Page 1: NoSQL Databases - CouchDB By Tom Sausner. Agenda Introduction Review of NoSQL storage options  CAP Theorem  Review categories of storage options CouchDB.

NoSQL Databases - CouchDB

By Tom Sausner

Page 2: NoSQL Databases - CouchDB By Tom Sausner. Agenda Introduction Review of NoSQL storage options  CAP Theorem  Review categories of storage options CouchDB.

Agenda

• Introduction • Review of NoSQL storage options

CAP Theorem Review categories of storage options

• CouchDB Overview Interacting with data Examples

• Technologies applying Couch DB

Page 3: NoSQL Databases - CouchDB By Tom Sausner. Agenda Introduction Review of NoSQL storage options  CAP Theorem  Review categories of storage options CouchDB.

What does it mean?

• Not Only SQL or NO! SQL• A more general definition… a datastore

that does not follow the relational model including using SQL to interact with the data.

• Why? One size does not fit all Relational Model has scaling issues Freedom from the tyranny of the DBA?

Page 4: NoSQL Databases - CouchDB By Tom Sausner. Agenda Introduction Review of NoSQL storage options  CAP Theorem  Review categories of storage options CouchDB.

CAP Theorem

• Eric Brewer of U.C. Berkeley, Seth Gilbert and Nancy Lynch, of MIT

• Relates to distributed systems• Consistency, Availability, Partition

Tolerance… pick 2• A distributed system is built of

“nodes” (computers), which can (attempt to) send messages to each other over a network….

Page 5: NoSQL Databases - CouchDB By Tom Sausner. Agenda Introduction Review of NoSQL storage options  CAP Theorem  Review categories of storage options CouchDB.

Consistency

• “is equivalent to requiring requests of the distributed shared memory to act as if they were executing on a single node, responding to operations one at a time.” Not the same as “ACID”

• Linearizability ~ operations behave as if there were no concurrency.

• Does not mention transactions

Page 6: NoSQL Databases - CouchDB By Tom Sausner. Agenda Introduction Review of NoSQL storage options  CAP Theorem  Review categories of storage options CouchDB.

Available

• “every request received by a non-failing node in the system must result in a response.”

• says nothing about the content of the response. It could be anything; it need not be “successful” or “correct”.

Page 7: NoSQL Databases - CouchDB By Tom Sausner. Agenda Introduction Review of NoSQL storage options  CAP Theorem  Review categories of storage options CouchDB.

Partition Tolerant

• any guarantee of consistency or availability is still guaranteed even if there is a partition.

• if a system is not partition-tolerant, that means that if the network can lose messages or any nodes can fail, then any guarantee of atomicity or consistency is voided.

Page 8: NoSQL Databases - CouchDB By Tom Sausner. Agenda Introduction Review of NoSQL storage options  CAP Theorem  Review categories of storage options CouchDB.

Implications of CAP

• How to best scale your application? The world falls broadly into two ideological camps: the database crowd and the non-database crowd.

• The database crowd, unsurprisingly, like database technology and will tend to address scale by talking of things like optimistic locking and sharding

• The non-database crowd will tend to address scale by managing data outside of the database environment (avoiding the relational world) for as long as possible.

Page 9: NoSQL Databases - CouchDB By Tom Sausner. Agenda Introduction Review of NoSQL storage options  CAP Theorem  Review categories of storage options CouchDB.

Types of NoSQL datastores

• Key - value stores• Column stores• Document stores• Oject stores

Page 10: NoSQL Databases - CouchDB By Tom Sausner. Agenda Introduction Review of NoSQL storage options  CAP Theorem  Review categories of storage options CouchDB.

Key Value stores

• Memcache ( just merged with CouchDB)

• Redis• Riak

Page 11: NoSQL Databases - CouchDB By Tom Sausner. Agenda Introduction Review of NoSQL storage options  CAP Theorem  Review categories of storage options CouchDB.

Column Stores

• Big Table ( Google )• Dynamo• Cassandra• Hadoop/HBase

Page 12: NoSQL Databases - CouchDB By Tom Sausner. Agenda Introduction Review of NoSQL storage options  CAP Theorem  Review categories of storage options CouchDB.

Document Stores

• Couch DB• Mongo

Page 13: NoSQL Databases - CouchDB By Tom Sausner. Agenda Introduction Review of NoSQL storage options  CAP Theorem  Review categories of storage options CouchDB.

Graph, Object Stores

• Neo4J• db4o

Page 14: NoSQL Databases - CouchDB By Tom Sausner. Agenda Introduction Review of NoSQL storage options  CAP Theorem  Review categories of storage options CouchDB.

Couch DB - relax ( taken from website)

• An Apache project create by….Damien Katz…• A document database server, accessible via a

RESTful JSON API.• Ad-hoc and schema-free with a flat address

space.• Distributed, featuring robust, incremental

replication with bi-directional conflict detection and management.

• Recently merged with Membase

Page 15: NoSQL Databases - CouchDB By Tom Sausner. Agenda Introduction Review of NoSQL storage options  CAP Theorem  Review categories of storage options CouchDB.

More on CouchDB

• The CouchDB file layout and commitment system features all Atomic Consistent Isolated Durable (ACID) properties.

• Document updates (add, edit, delete) are serialized, except for binary blobs which are written concurrently.

• CouchDB read operations use a Multi-Version Concurrency Control (MVCC) model where each client sees a consistent snapshot of the database from the beginning to the end of the read operation.

• Eventually Consistent

Page 16: NoSQL Databases - CouchDB By Tom Sausner. Agenda Introduction Review of NoSQL storage options  CAP Theorem  Review categories of storage options CouchDB.

Couch DB Access via CURL

• curl http://127.0.0.1:5984/

• curl -X GET http://127.0.0.1:5984/_all_dbs

• curl -X PUT http://127.0.0.1:5984/baseball

// error.... already exist• curl -X PUT http://127.0.0.1:5984/baseball

• curl -X DELETE http://127.0.0.1:5984/baseball

Page 17: NoSQL Databases - CouchDB By Tom Sausner. Agenda Introduction Review of NoSQL storage options  CAP Theorem  Review categories of storage options CouchDB.

Adding Doc’s via CURL

• curl -X PUT http://127.0.0.1:5984/albums

• curl -X PUT http://127.0.0.1:5984/albums/1000 -d '{"title":"Abbey Road","artist":"The Beatles"} '

• Uuids curl -X GET http://127.0.0.1:5984/_uuids• curl -X GET http://127.0.0.1:5984/albums/1000• _rev - If you want to update or delete a document,

CouchDB expects you to include the _rev field of the revision you wish to change

• curl -X PUT http://127.0.0.1:5984/albums/1000 -d '{"_rev":"1-42c7396a84eaf1728cdbf08415a09a41","title":"Abbey Road", "artist":"The Beatles","year":"1969"}'

Page 18: NoSQL Databases - CouchDB By Tom Sausner. Agenda Introduction Review of NoSQL storage options  CAP Theorem  Review categories of storage options CouchDB.

Futon… Couch DB Maintenence

• http://127.0.0.1:5984/_utils/index.html

• Albums database review Add another document

• Tools• Database, Document, View Creation• Secuity, Compact & Cleanup• Create and Delete

Page 19: NoSQL Databases - CouchDB By Tom Sausner. Agenda Introduction Review of NoSQL storage options  CAP Theorem  Review categories of storage options CouchDB.

Demo Setup

• Examples implemented in Groovy• Use HttpBuilder to interact with the

database• Groovy RESTClient• Use google GSON to move objects

between JSON and Java/Groovy • Use Federal Contribution database

for our dataset.• Eclipse

Page 20: NoSQL Databases - CouchDB By Tom Sausner. Agenda Introduction Review of NoSQL storage options  CAP Theorem  Review categories of storage options CouchDB.

Data Loading Review

• Limited input to NY candidates, and only year 2010

• contributions.fec.2010.csv• Groovy bean for input data• Readfile.groovy• contribDB.put(path:"fed_contrib_test/$

{contrib.transactionId}", contentType: JSON, requestContentType: JSON, body:json )

Page 21: NoSQL Databases - CouchDB By Tom Sausner. Agenda Introduction Review of NoSQL storage options  CAP Theorem  Review categories of storage options CouchDB.

Couch DB Design Documents

• CouchDB is designed to work best when there is a one-to-one correspondence between applications and design documents.

• _design/”design_doc_name”• Design Documents are applications

Ie. A CouchDB can be an application.

Page 22: NoSQL Databases - CouchDB By Tom Sausner. Agenda Introduction Review of NoSQL storage options  CAP Theorem  Review categories of storage options CouchDB.

Design Documents contents

• Update Handler updates: {"hello" : function(doc, req) {…}

• Views ( more on this later)• Validation• Shows• Lists• Filters• libs

Page 23: NoSQL Databases - CouchDB By Tom Sausner. Agenda Introduction Review of NoSQL storage options  CAP Theorem  Review categories of storage options CouchDB.

Updates

• If you have multiple design documents, each with a validate_doc_update function, all of those functions are called upon each incoming write request

• If any of the validate functions fail then the document is not added to the database

Page 24: NoSQL Databases - CouchDB By Tom Sausner. Agenda Introduction Review of NoSQL storage options  CAP Theorem  Review categories of storage options CouchDB.

Validation

• Validation functions are a powerful tool to ensure that only documents you expect end up in your databases.

• validate_doc_update section of the view document

• function(newDoc, oldDoc, userCtx) {} throw({forbidden : message}); throw({unauthorized : message});

Page 25: NoSQL Databases - CouchDB By Tom Sausner. Agenda Introduction Review of NoSQL storage options  CAP Theorem  Review categories of storage options CouchDB.

Ok, how can I see my data?

• CouchDB design documents can contain a “views” section

• Views contain Map/Reduce functions• Map/Reduce functions are

implemented in javascript However there are different Query Servers

available using different languages

Page 26: NoSQL Databases - CouchDB By Tom Sausner. Agenda Introduction Review of NoSQL storage options  CAP Theorem  Review categories of storage options CouchDB.

Views

• Filtering the documents in your database to find those relevant to a particular process.

• Building efficient indexes to find documents by any value or structure that resides in them

• Extracting data from your documents and presenting it in a specific order.

• Use these indexes to represent relationships among documents.

Page 27: NoSQL Databases - CouchDB By Tom Sausner. Agenda Introduction Review of NoSQL storage options  CAP Theorem  Review categories of storage options CouchDB.

Map/Reduce dialog

• Bob: So, how do I query the database?• IT guy: It’s not a database. It’s a key-

value store.• Bob: OK, it’s not a database. How do I

query it?• IT guy: You write a distributed map-

reduce function in Erlang.• Bob: Did you just tell me to go screw

myself?• IT guy: I believe I did, Bob.

Page 28: NoSQL Databases - CouchDB By Tom Sausner. Agenda Introduction Review of NoSQL storage options  CAP Theorem  Review categories of storage options CouchDB.

Map/Reduce in CouchDB

• Map functions have a single parameter a document, and emit a list of key/value pairs of JSON values CouchDB allows arbitrary JSON structures to be

used as keys

• Map is called for every document in the database Efficiency?

• emit() function can be called multiple times in the map function

• View results are stored in B-Trees

Page 29: NoSQL Databases - CouchDB By Tom Sausner. Agenda Introduction Review of NoSQL storage options  CAP Theorem  Review categories of storage options CouchDB.

Reduce/Rereduce

• The reduce function is optional• used to produce aggregate results for that view• Reduce functions must accept, as input, results

emitted by its corresponding map function as well as results returned by the reduce function itself(rereduce).

• On rereduce the key = null• On a large database objects to be reduced will

be sent to your reduce function in batches. These batches will be broken up on B-tree boundaries, which may occur in arbitrary places.

Page 30: NoSQL Databases - CouchDB By Tom Sausner. Agenda Introduction Review of NoSQL storage options  CAP Theorem  Review categories of storage options CouchDB.

More on Map/Reduce

• Linked Documents - If you emit an object value which has {'_id': XXX} then include_docs=true will fetch the document with id XXX rather than the document which was processed to emit the key/value pair.

• Complex Keys emit([lastName, firstName, zipcode], doc)

• Grouping• Grouping Levels

Page 31: NoSQL Databases - CouchDB By Tom Sausner. Agenda Introduction Review of NoSQL storage options  CAP Theorem  Review categories of storage options CouchDB.

Restrictions on Map/Reduce

• Map functions must be referentially transparent. Given the same doc will always issue the same key/value pairs Allows for incremental update

• reduce functions must be able reduce on its own output This requirement of reduce functions allows

CouchDB to store off intermediated reductions directly into inner nodes of btree indexes, and the view index updates and retrievals will have logarithmic cost

Page 32: NoSQL Databases - CouchDB By Tom Sausner. Agenda Introduction Review of NoSQL storage options  CAP Theorem  Review categories of storage options CouchDB.

List Donors

• Map:function(doc) { if(doc.recipientName){ emit(doc.recipientName, doc); } else if(doc.recipientType){

emit(doc.recipientType, doc) }}No reduce function

Page 33: NoSQL Databases - CouchDB By Tom Sausner. Agenda Introduction Review of NoSQL storage options  CAP Theorem  Review categories of storage options CouchDB.

List of Query Parameters

• key• startkey, endkey• startkey_docid , endkey_docid• limit, skip, stale, decending• group, grouplevel• reduce• include_docs, inclusive_end

Page 34: NoSQL Databases - CouchDB By Tom Sausner. Agenda Introduction Review of NoSQL storage options  CAP Theorem  Review categories of storage options CouchDB.

List all NY candidates

• Want a list of all of the unique candidates in the database

• Map: emit(doc.recipientType, null);

• Reduce: return true

• Must set group = true

Page 35: NoSQL Databases - CouchDB By Tom Sausner. Agenda Introduction Review of NoSQL storage options  CAP Theorem  Review categories of storage options CouchDB.

Total Candidate Donations

• List the total campaign contributions for each candidate

• Map: emit(doc.recipientType, doc.amount)

• Reduce: function(keys, values) {

var sum = 0;for(var idx in keys) { sum = sum + parseFloat(values[idx]); }return sum;

• Must set group=true

Page 36: NoSQL Databases - CouchDB By Tom Sausner. Agenda Introduction Review of NoSQL storage options  CAP Theorem  Review categories of storage options CouchDB.

Donation Totals by Zip

• Complex Keys• In the map function:

emit([doc.recipientType, doc.contributorZipCode], doc.amount);

• Reduce: function(keys, values) {

var sum = 0;for(var idx in keys) { sum = sum + parseFloat(values[idx]); }return sum;}

Page 37: NoSQL Databases - CouchDB By Tom Sausner. Agenda Introduction Review of NoSQL storage options  CAP Theorem  Review categories of storage options CouchDB.

Referencing other documents

Page 38: NoSQL Databases - CouchDB By Tom Sausner. Agenda Introduction Review of NoSQL storage options  CAP Theorem  Review categories of storage options CouchDB.

Conflict Management

• Multi-Version Concurrency Control (MVCC)• CouchDB does not attempt to merge the

conflicting revisions this is an application• If there is a conflict in revisions between

nodes App is ultimately responsible for resolving the

conflict All revisions are saved One revision is selected as the most recent _conflict property set

Page 39: NoSQL Databases - CouchDB By Tom Sausner. Agenda Introduction Review of NoSQL storage options  CAP Theorem  Review categories of storage options CouchDB.

Database Replication

• “CouchDB has built-in conflict detection and management and the replication process is incremental and fast, copying only documents and individual fields changed since the previous replication.”

• replication is a unidirectional process.• Databases in CouchDB have a sequence

number that gets incremented every time the database is changed.

Page 40: NoSQL Databases - CouchDB By Tom Sausner. Agenda Introduction Review of NoSQL storage options  CAP Theorem  Review categories of storage options CouchDB.

Replication Continued

• "continuous”: true… automatically replicate over any new docs as

they come into the source to the target…there’s a complex algorithm determining the ideal moment to replicate for maximum performance.

• Create albums_backup using futon replicator

• curl -X PUT http://127.0.0.1:5984/albums/1010 -d '{"title":"Let It Be","artist":"The Beatles"} '

Page 41: NoSQL Databases - CouchDB By Tom Sausner. Agenda Introduction Review of NoSQL storage options  CAP Theorem  Review categories of storage options CouchDB.

Replication & Conflict

• Replicate albums db via Futon• curl -X PUT

http://127.0.0.1:5984/albums/1050 -d '{"title":”RJUG Roundup","artist":"Rob", ”year":”2010"} ’

• Replicate again• curl -X PUT

http://127.0.0.1:5984/albums_backup/1050 -d '{"title":”RJUG Roundup","artist":"Rob", ”year":”2011"} ’

• Replicate, review

Page 42: NoSQL Databases - CouchDB By Tom Sausner. Agenda Introduction Review of NoSQL storage options  CAP Theorem  Review categories of storage options CouchDB.

Notifications

• Polling , long polling _changes

• If executing not from a browser can request continuous changes

• Filters can be applied to changes Ex only notify when level = error

• filterName:function(doc, req) Req contains query parameters Also contains userCtx

Page 43: NoSQL Databases - CouchDB By Tom Sausner. Agenda Introduction Review of NoSQL storage options  CAP Theorem  Review categories of storage options CouchDB.

Security

• ships with OAuth, cookie auth handler, default - standard http

• Authorizations Reader - read/write document Database Admin - compact, add/edit

views Server Admin - create and remove

databases

Page 44: NoSQL Databases - CouchDB By Tom Sausner. Agenda Introduction Review of NoSQL storage options  CAP Theorem  Review categories of storage options CouchDB.

CouchDB Applied

• CouchOne Hosting Services CouchDB on Android

• CouchApp HTML5 applications

• jCouchDB Java layer for CouchDB access

• CouchDB Lounge Clustering support

Page 45: NoSQL Databases - CouchDB By Tom Sausner. Agenda Introduction Review of NoSQL storage options  CAP Theorem  Review categories of storage options CouchDB.

Links

• http://couchdb.apache.org/• http://wiki

.apache.org/couchdb/FrontPage• http://guide.couchdb.org/editions/1/

en/index.html

Page 46: NoSQL Databases - CouchDB By Tom Sausner. Agenda Introduction Review of NoSQL storage options  CAP Theorem  Review categories of storage options CouchDB.

Questions?

• Thanks!