NoSQL and CouchDB
-
Upload
joao-cerdeira -
Category
Technology
-
view
2.058 -
download
3
description
Transcript of NoSQL and CouchDB
Who am I ?
-> My Name: João Cerdeira-> Team Leader-> An Agile enthusiast: Scrum / Kanban / Lean-> A true believer in OpenSource
http://twitter.com/jacerdeira [email protected]
Disclamer
-> I understand your questions, but sometimes I don't have answers
-> I'm not a NoSQL Dogmatic, just an enthusiast about the new ways of storing information
-> I have worked with RDBMS for 12 years
Everyone has their preferences
I don't care if I/you will use SQL or NoSQL. I just want to deliver
better Services/Aplications to
the clients/users.
Concepts & Theory
Scale up vs Scale Down
Performance VS Scalability
Latency VS Throughput
Availability VS Consistency
Brewer's
CAPTheorem
Choose only 2:
Consistency
Availabil ity
Partition Tolerance
At a given time in certain enviroment
Consistency
Availability
PartitionTolerance
RDBMS
NoSQL
Centralized System
In a centralized system (RDBMS) we don't have network partition
P in CAP
So we get:
Availability
Consistency
-> A tomicity
-> Consistency
-> Isolated
-> Durability
Distr ibuted System
In a distr ibuted system we (might) have network partition
P in CAP
So you can only pick one:
Availabil ity
Consistency
CAP in practice
We have only two types of SystemsCP == CA (very similar)AP
So in a network partition we have only one choice
Consistency
Availabil ity
-> Basically Available
-> Soft state
-> Eventually consistent
Eventual Consistency
How to Scale OutRDBMS ?http://capellaniaprimaria.blogspot.com/2011/02/concurso-deportivo-4-pregunta.html
Partition
Partition + Replication
ORM Problems
ORM Problems
What you want ?
Find/read a record/object
ORM Problems
What you want ?
Find/read a record/object
What you get ?
A huge underground complexity
Let Validateour Thoughts
Let Validateour Thoughts
Do we need ACID for all solutions?
Let Validateour Thoughts
Do we need ACID for all solutions?
When is Eventually Consistent enough ?
Let Validateour Thoughts
Do we need ACID for all solutions?
When is Eventually Consistent enough ?
Different solutions require different needs
Why NoSQL Appears ?
Because New Drivers Appears
(business or technical demand)
New Drivers Behind NoSQL
Large amount of dataCommodity hardware
Scale Fast And CheapConstantly changing request (data)
Why RDBMS aren't good enough ?
Why RDBMS aren't good enough ?
Scalling reads in a RDBMS is hard
Why RDBMS aren't good enough ?
Scalling reads in a RDBMS is hard
Scalling writes is impossible
Think again
Do we really need a RDBMS ?
Think again
Do we really need a RDBMS ?
Sometimes !
Think again
Do we really need a RDBMS ?
Sometimes !
But a lot of times we don't !
NoSQL
How did NoSQL start ?
Google: BigtableAmazon: Dynamo
Facebook: CassandraLinkedIn: Valdemort
Yahoo: HBase (hadoop)
OriginsGoogle : “How can we build a DB on top of Google File
System”
Paper: Bigtable A distributed store system for →
structured data, 2006
Amazon: “How can we build a distributed hash table for the data center”
Paper : Dynamo Amazon's highly available key-value →
store
Different Types of NoSQL
Key-Value Stores
Document Databases
Column Databases
Graph Databases
Key-Value Stores
Origin: Amazon's Dynamo paperData model : Collections of KV pairsImplementations: Dynamo, Voldemort, Membase,
Riak, RedisGood For:
- Large amount of data- Scale writes and reads- Fast- Programmer friendly
Document Databases
Origin: Lotus NotesData model : Collections of DocumentsImplementations: CouchDB, MongoDB,
Amazon SimpleDBGood For:
- Human Data Structure - Programmer friendly- Rapid Development- Web friendly- CRUD
Column Databases
Origin: Google's BigTable PaperData model : Column family – each row (at least in
theory) can have different configurationImplementations: BigTable, HBase, CassandraGood For:
- Large amount of data- scale writes like no other- High availability
Graph Databases
Origin: Graph TheoryData model : Nodes and Relations,
both can have KV pairsImplementations: Neo4j, FlockDBGood For:
- resolve graph problems- Fast
Why I 'd choose CouchDB ?
-> Easy to understand documents-> Use standards web technologies-> Simple to install and configure-> Small footprint (works on mobile platforms)
-> Scales well (not for huge amount of data)
-> Replication in the core
CouchDB Main Principals
Document Oriented Database
No rows or columns
Collection of JSON Documents
Schema-Free
In CouchDB HTTP Rules
-> Everything is a HTTP Request-> We are used to know GET and POST-> But there are others:
-> PUT-> DELETE-> COPY
RESTful HTTP API
Why JSON ?
-> Light and text-based data format-> Simple to parse-> Not verbose (comparing to xml)
-> Suitable for javascript frameworks (jquery)
-> Parsers available in almost all programming languages
JSON Example{
make: "Ford" ,model : "Mustang" ,year: 2009,body: "Coupe" ,color: "Red" ,engine: {
gas_type: "Petrol" ,cubic_capacity: 4600
} ,previous_owners: [
{name: "John Smith" ,mileage: 1000
} ,{
name: "Jane Hunt" ,mileage: 2500
}]
}
JSON Example{
make: "Ford" ,model : "Mustang" ,year: 2009,body: "Coupe" ,color: "Red" ,engine: {
gas_type: "Petrol" ,cubic_capacity: 4600
} ,previous_owners: [
{name: "John Smith" ,mileage: 1000
} ,{
name: "Jane Hunt" ,mileage: 2500
}]
}
JSON Example{
make: "Ford" ,model : "Mustang" ,year: 2009,body: "Coupe" ,color: "Red" ,engine: {
gas_type: "Petrol" ,cubic_capacity: 4600
} ,previous_owners: [
{name: "John Smith" ,mileage: 1000
} ,{
name: "Jane Hunt" ,mileage: 2500
}]
}
JSON Example{
make: "Ford" ,model : "Mustang" ,year: 2009,body: "Coupe" ,color: "Red" ,engine: {
gas_type: "Petrol" ,cubic_capacity: 4600
} ,previous_owners: [
{name: "John Smith" ,mileage: 1000
} ,{
name: "Jane Hunt" ,mileage: 2500
}]
}
Example
Create / Delete Database
$ curl http://127.0.0.1:5984
{"couchdb":"Welcome","version":"1.0.1"}
$ curl -X PUT http://127.0.0.1:5984/contacts
{"ok":true}
$ curl -X GET http://127.0.0.1:5984/_all_dbs
["contacts","_users"]
$ curl -X DELETE http://127.0.0.1:5984/contacts
{"ok":true}
Manage Documents
$ curl -X PUT http://127.0.0.1:5984/contacts/joaocerdeira -d '{}'
{"ok":true,"id":"joaocerdeira","rev":"1-967a00dff5e02add41819138abb3284d"}
$ curl -X GET http://127.0.0.1:5984/contacts/joaocerdeira
{"_id":"joaocerdeira","_rev":"1-967a00dff5e02add41819138abb3284d"}
$ curl -X DELETE http://127.0.0.1:5984/contacts/joaocerdeira?rev=1-967a00dff5e02add41819138abb3284d
{"ok":true,"id":"joaocerdeira","rev":"2-eec205a9d413992850a6e32678485900"}
Manage Documents
$ curl -X PUT http://127.0.0.1:5984/contacts/joaocerdeira -d'{"firstName":"Joao","lastName":"Cerdeira","email":"[email protected]"}'
{"ok":true,"id":"joaocerdeira","rev":"1-186fe12b748c40559e8f234d8e566c18"}
$ curl -X GET http://127.0.0.1:5984/contacts/joaocerdeira
{"_id":"joaocerdeira","_rev":"1-186fe12b748c40559e8f234d8e566c18","firstName":"Joao","lastName":"Cerdeira","email":"[email protected]"}
Copy Documents
$ curl -X COPY http://127.0.0.1:5984/contacts/joaocerdeira -H "Destination: batatinha"
{"id":"batatinha","rev":"1-186fe12b748c40559e8f234d8e566c18"}
$ curl -X GET http://127.0.0.1:5984/contacts/batatinha
{"_id":"batatinha","_rev":"1-186fe12b748c40559e8f234d8e566c18","firstName":"Joao","lastName":"Cerdeira","email":"[email protected]"}
Changing Documents
$ curl -X PUT http://127.0.0.1:5984/contacts/batatinha -d '{"_rev":"1-186fe12b748c40559e8f234d8e566c18","firstName":"Clown","lastName":"Batatinha","email":["[email protected]","[email protected]@rtp.pt"], "phone":"93 1234567"}'
{"ok":true,"id":"batatinha","rev":"2-b7079a6d71179b1571652059355d84c3"}
$ curl -X GET http://127.0.0.1:5984/contacts/batatinha
{"_id":"batatinha","_rev":"2-b7079a6d71179b1571652059355d84c3","firstName":"Clown","lastName":"Batatinha","email":["[email protected]","[email protected]@rtp.pt"], "phone":"93 1234567"}
MVCC
CouchDB never blocks
Append Mode Only
Designing Documents{
"_id":"joaocerdeira", "_rev":"1-186fe12b748c40559e8f234d8e566c18",
“doctype”:”contact”
"firstName":"Joao","lastName":"Cerdeira",“company”:”MULTICERT”
"emails":[{
“type”:”personal”,“email”:"[email protected]“
},{
“type”:”business”,“email”:"[email protected]“
}],“phones”:[
{“type”:”personal”,“phone”:"93 1234567“
},{
“type”:”business”,“phone”:"93 7654321“
}]
}
Designing Documents{
"_id":"joaocerdeira", "_rev":"1-186fe12b748c40559e8f234d8e566c18",
“doctype”:”contact”
"firstName":"Joao","lastName":"Cerdeira",“company”:”MULTICERT”
"emails":[{
“type”:”personal”,“email”:"[email protected]“
},{
“type”:”business”,“email”:"[email protected]“
}],“phones”:[
{“type”:”personal”,“phone”:"93 1234567“
},{
“type”:”business”,“phone”:"93 7654321“
}]
}
Designing Documents{
"_id":"joaocerdeira", "_rev":"1-186fe12b748c40559e8f234d8e566c18",
“doctype”:”contact”
"firstName":"Joao","lastName":"Cerdeira",“company”:”MULTICERT”
"emails":[{
“type”:”personal”,“email”:"[email protected]“
},{
“type”:”business”,“email”:"[email protected]“
}],“phones”:[
{“type”:”personal”,“phone”:"93 1234567“
},{
“type”:”business”,“phone”:"93 7654321“
}]
}
Designing Documents{
"_id":"joaocerdeira", "_rev":"1-186fe12b748c40559e8f234d8e566c18",
“doctype”:”contact”
"firstName":"Joao","lastName":"Cerdeira",“company”:”MULTICERT”
"emails":[{
“type”:”personal”,“email”:"[email protected]“
},{
“type”:”business”,“email”:"[email protected]“
}],“phones”:[
{“type”:”personal”,“phone”:"93 1234567“
},{
“type”:”business”,“phone”:"93 7654321“
}]
}
Futon Web Interface
Views
Quering CouchDB
Queries in JavaScript
Use Map/Reduce for quering
For simple queries Map/Reduce isn't needed
Don't have joins (but you can have similar)
Simple Views
function(doc){emit(doc._id,doc);
}
function(doc){If (doc.type=='vip'){
emit(doc._id,doc);}
}
List All Documents
List All DocumentsOf type 'vip'
Temp Views
$ curl -X POST -H "Content-type: application/json" http://127.0.0.1:5984/contacts/_temp_view -d '{"map":"function(doc){emit(doc._id,doc);}"}'
{"total_rows":2,"offset":0,"rows":[
{"id":"batatinha","key":"batatinha","value":{"_id":"batatinha","_rev":"2-b7079a6d71179b1571652059355d84c3","firstName":"Palhaco","lastName":"Batatinha","email":["[email protected]","[email protected]@rtp.pt"],"phone":"93 1234567"}},{"id":"joaocerdeira","key":"joaocerdeira","value":{"_id":"joaocerdeira","_rev":"1-186fe12b748c40559e8f234d8e566c18","firstName":"Joao","lastName":"Cerdeira","email":"[email protected]","_deleted_conflicts":["2-eec205a9d413992850a6e32678485900"]}}
Normal Views
{"_id" : "_design/example","views" : {
"foo" : {"map":"function(doc){emit(doc._id,doc);}"
}}
}
$ curl -X PUT -H "Content-type: application/json" http://127.0.0.1:5984/contacts/_design/example -d @design_simple1.json
Normal Views
$ curl -X GET http://127.0.0.1:5984/contacts/_design/example/_view/foo {"total_rows":2,"offset":0,"rows":[
{"id":"batatinha","key":"batatinha","value":{"_id":"batatinha","_rev":"2-b7079a6d71179b1571652059355d84c3","firstName":"Palhaco","lastName":"Batatinha","email":["[email protected]","[email protected]@rtp.pt"],"phone":"93 1234567"}},{"id":"joaocerdeira","key":"joaocerdeira","value":{"_id":"joaocerdeira","_rev":"1-186fe12b748c40559e8f234d8e566c18","firstName":"Jo\u00e3o","lastName":"Cerdeira","email":"[email protected]","_deleted_conflicts":["2-eec205a9d413992850a6e32678485900"]}}]}
Map/ReduceGoogle patent from the paper: http:// labs.google .com/papers/mapreduce.html
image source: http://map-reduce.wikispaces.asu.edu/
Map/Reduce Views
{"_id" : "_design/example","views" : {…...................................
"bar" : {"map":"function(doc){emit(doc,1);}","reduce":"function(keys, values, rereduce) {
return sum(values);}"}}}
$ curl -X GET http://127.0.0.1:5984/contacts/_design/example/_view/bar
{"rows":[{"key":null,"value":7}]}
Map/Reduce Views
{"_id" : "_design/example","views" : {…...................................
""aggreg" : { "map":"function(doc){if(doc.country){emit(doc.country,1);}}", "reduce":"function(keys, values, rereduce) {return sum(values);}" }
$ curl -X GET http://127.0.0.1:5984/contacts/_design/example/_view/aggreg?group=true {"rows":[{"key":"England","value":1},{"key":"Portugal","value":2},{"key":"US","value":2}]}
Replication
Write
Read
Write
ReadRead
Write
ReadRead
Read
One Time Replication
$ curl -H "Content-type: application/json -X POST http://127.0.0.1:5984/_replicate -d '{"source":"contacts","target":"contacts-replica"}'
{"ok":true,"session_id":"00872a440fdda973d6a9a18f2f571bb8","source_last_seq":19,"history": [{"session_id":"00872a440fdda973d6a9a18f2f571bb8","start_time":"Tue, 05 Jul 2011 23:03:32 GMT","end_time":"Tue, 05 Jul 2011 23:03:32 GMT","start_last_seq":0,"end_last_seq":19,"recorded_seq":19,"missing_checked":0,"missing_found":8,"docs_read":12,"docs_written":12,"doc_write_failures":0}]}
Write Write
Continuous Replication
$ curl -vX POST http://127.0.0.1:5984/_replicate-d '{
"source":"http://127.0.0.1:5984/contacts","target":"http://127.0.0.1:5984/contacts-replica","continuous":true
}'
Write Write
Read Write
White WriteRead
Load BalancingCaching
It's HTTP. So use the tools you know-> NGINX-> Squid-> Apache mod_proxy-> …....
Library
Conflict Resolution
http://thetowersofjacksonville.com/photogallery/photo12411/real.htmhttp://thetowersofjacksonville.com/photogallery/photo12411/real.htmhttp://thetowersofjacksonville.com/photogallery/photo12411/real.htmhttp://thetowersofjacksonville.com/photogallery/photo12411/real.htmhttp://thetowersofjacksonville.com/photogallery/photo12411/real.html
Conflicts Resolution
function(doc) {
if(doc._conflicts) {emit(doc._conflicts, null);}
}
{"total_rows":1,"offset":0,"rows":[{"id":"identifier","key":["2-7c971bb974251ae8541b8fe045964219"],"value":null}]}
$ curl -X DELETE $HOST/db-replica/identifier?rev=2-de0ea16f8621cbac506d23a0fbbde08a
{"ok":true,"id":"identifier","rev":"3-bfe83a296b0445c4d526ef35ef62ac14"}
$ curl -X PUT $HOST/db-replica/identifier-d '{"count":3,"_rev":"2-7c971bb974251ae8541b8fe045964219"}'
{"ok":true,"id":"identifier","rev":"3-5d0319b075a21b095719bc561def7122"}
Library
http://thetowersofjacksonville.com/photogallery/photo12411/real.htm
ClientsJavaScript : Jquery CouchDB Library.Net : RelaxJava : CouchDB4JPerl : CouchDB::Client Net::CouchDbRuby : CouchRestPython : couchdb-pythonScala : scouchdbAnd so much more ...
CouchDBIn
Mobile
http://www.digitaljournal.com/article/261153
Mobile PlatformsSupported
Simply Works
PhoneGAP LawnChair
Own Your Data
I like services like google but what aboutmy privacy ?!
I think CouchDB is the way to own my data
http://thetowersofjacksonville.com/photogallery/photo12411/real.htm
Partition with Cluster
Solutions
“CouchDB is built of the Web to the Web”
– Jacob Kaplan-Moss
We need a MindSetChange
Stop seing all the data in the
world as relational data
Don't trust me . . . or othersTry it !
And the Future…
Probably will be polyglot
Using RDBMS and more than one NoSQL
Database per solution
Success Stories