Advanced CouchDB

Post on 11-May-2015

7.915 views 2 download

Tags:

description

http://joind.in/2495 PHPBenelux conference January 2011

Transcript of Advanced CouchDB

CouchDBrelax

CouchDBrelax

Sander van de Graaf@svdgraaf

Focus -> practical usage examples

http://joind.in/talk/view/2495

second talk ever, please provide feedback

CONTENTS

• Introduction

• PHP Usage

• Replication/Scalability

• Backend usage

• Couchapps

•Other stuff

CouchDBrelax

NOSQL

IT’S A MOVEMENT

Movement, definitions vary

1998

Back in the day...

Lame movie 1

Another one

And then some more...

XML was introduced

Some game was published

MC Donald’s Happy Meal

Carlo Strozzi

Released NOSQL open source DB

NOSQL == Not Only SQL

“[The NoSQL movement] departs from the relational model altogether, it should therefore have been called more appropriately ‘NoREL’, or something to that effect.”

- Carlo Strozzi

CouchDBrelax

Ubuntu One, contacts sync

NUTSHELL

SPEED

Speed Not diskpace (see cleanup)

APPEND ONLY

Append only storage, happy cup of coffee!

NO REPAIR NEEDED

COMPACTING

HTTP SERVER

caching, loadbalancing, without extra costs :D

CAP

CAP

CouchDB

EVENTUALLY CONSISTENT

CouchDB

CouchDB focus is on Availability + Reliability, and will be consistent after replication.

FULL REST API

REST

• GET

• PUT

• POST

•DELETE

• COPY

• SELECT

• UPDATE

• INSERT

•DELETE

• ...

JSON{ total_rows: 2, offset: 0, rows: [ { id: '_design/foobar', key: '_design/foobar', value: { rev: '5-982b2fc36835715b2aae54609b5d5f1e' } }, { id: 'f0e1fd96eb6e094f74dda8d949000a6a', key: 'f0e1fd96eb6e094f74dda8d949000a6a', value: { rev: '1-86bca407fce8234a63c90ff549b56b10' } }, ]}

Javascript == awesome! :D

REPLICATION

Key feature, relaxed about replication issues, and version conflicts

Welcome to Futon, I prefer a UI

http-console rocks the socks out of telnet

Berkeley

CONTENTS

• Introduction

• PHP Usage

• Replication/Scalability

• Backend usage

• Couchapps

•Other stuff

PHP USAGE

PHP LIBRARIES

• PHPillow (LGPL)

• PHP Object Freezer (BSD)

• PHP On Couch (GPL 2 / 3)

• PHP CouchDB Extension (PHP license)

• SAG for CouchDB (apache)

•Doctrine 2 CouchDB ODM

All are quite nice, doctrine has some rough edges, I use PHP On Couch with custom patch for Zend autoloader easyness

<?PHP

// setup connection for couchdb$client = new Couchdb_Client('http://ponies.couchone.com:5984','rainbows');

// fetch a document$doc = $client->getDoc('awesome_pony');

// updating document$doc->newproperty = array("type","awesome");

try{ $client->storeDoc($doc);}catch (Exception $e){ echo "Document storage failed : " . $e->getMessage();}

PHP On Couch with small ZF autoloader fix

CONTENTS

• Introduction

• PHP Usage

• Replication/Scalability

• Backend usage

• Couchapps

•Other stuff

REPLICATION

DEFINITION

“Replication is the process of sharing information so as to ensure consistency between redundant resources, such as software or hardware components, to improve reliability, fault-tolerance, or accessibility.”

Source: wikipedia

CouchDBrelax

CouchDBrelax

CouchDBrelax

CouchDB

CouchDBrelax

CouchDBrelax

CouchDBrelax

CouchDBrelax

Mysql can do this

CouchDBrelax

CouchDBrelax

Master, Master replication

CouchDBrelax

CouchDBrelax

CouchDBrelax

CouchDBrelax

CouchDBrelax

CouchDBrelax

US NL

BE

Not only locally

P2P WEB

“World Domination”

CLUSTERING“The fun stuff ”

Couchdb doesn’t support partitioning (sharding) itself, couchdb -> http based -> lots of possibilities

CouchDBrelax

CouchDBrelax

loadbalancer

...n

The basics are all the same: easy => couchdb instances 1..n => loadbalancer

CHALLENGES

• Large amounts of data

• Large views (with big/long map/reduce queries)

• LOTS of traffic

• Location based partitions

• For fun and profit

MAP/REDUCE

INPUT

IP Bytes

212.122.174.13 18271

212.122.174.13 191726

212.122.174.13 198

74.119.8.111 91272

74.119.8.111 8371

212.122.174.13 43

Map/Reduce example

MAPPER => REDUCER

IP Bytes

212.122.174.13

18271

212.122.174.13191726

212.122.174.13198

212.122.174.13

43

74.119.8.11191272

74.119.8.1118371

AFTER REDUCE

IP Bytes

212.122.174.13 210238

74.119.8.111 99643

PARTITION INPUT

Partition IP Bytes

0 212.122.174.13 18271

0 212.122.174.13 191726

0 212.122.174.13 198

1 74.119.8.111 91272

1 74.119.8.111 8371

0 212.122.174.13 43

Map/Reduce example

MAPPER => REDUCER

Partition IP Bytes

0 212.122.174.13

18271

0 212.122.174.13191726

0 212.122.174.13198

0 212.122.174.13

43

1 74.119.8.11191272

1 74.119.8.1118371

If data is big enough, you could even need a re-re-re-reducer

AFTER REDUCE

IP Bytes

212.122.174.13 210238

74.119.8.111 99643

• CouchDB Lounge

• Pillow

• BigCouch

CLUSTERING OPTIONS

LOUNGE

•partitioning/clustering

•Nginx module

•meebo.com

• ‘easy’

•http://tilgovi.github.com/couchdb-lounge/

LOUNGE

• dumb_proxy => proxy for simple PUT/GET’s

• smart_proxy => proxy for map/reduce over shards

• replicator => updates all copies, redudantly

it can make sure that there are N copies of a document at every moment

CouchDBrelax

CouchDBrelax

nginx

...n

dumb_proxy

dumb_proxy == ONLY GET/PUT

CouchDBrelax

CouchDBrelax

nginx

...n

smart_proxy

smart_proxy takes care of the map/reduce and re-reducers over multiple nodes

Bonus:

other nginx modules work too

mod_cache, mod_expire, etc.

PILLOW

•Erlang based

• router/rereducer (map/reduce over multiple systems)

• In development (but promising!)

•https://github.com/khellan/Pillow

BIGCOUCH

•Fork

•100% api compatible

•Open Source/Commercial

•https://cloudant.com/#!/solutions/bigcouch

CONTENTS

• Introduction

• PHP Usage

• Replication/Scalability

• Backend usage

• Couchapps

•Other stuff

BACKEND USAGE

PROXIED

CouchDBrelax

proxied via middleware, or via mod_proxy or similiar

DIRECT

CouchDBrelax

or direct, because http based, content is directly available in javascript

NOSQL && SQL HYBRID

• onSave, onCommit hooks available in every major framework

• onSave -> make a JSON representation of your object, and PUT it to couchdb (#protip: only ‘public’ data)

• sql db is leading, you don’t care about versioning in couchdb

• you can use your data directly from couchdb within your frontend javascript

<?phpclass Pony extends Application_models{ public function toArray() { $data = $this->_getData(); unset($data['created_on']); unset($data['created_by']); unset($data['access_level']); unset($data['private_data']); $data['tags'] = $this->getTags(); $data['categories'] = $this->getCategories(); $data['rainbows'] = 'double'; return $data; }}

MODEL

AFTER_SAVE

<?phpclass article_module extends admin_module{ public function after_save() { parent::after_save(); $data = $this->toJson(); $res = CouchDB::put($data); $this->_id = $res->_id; $this->_rev = $res->_rev; }}

RewriteEngine OnRewriteRule /data/(.*) http://127.0.0.1:5984/db/$1 [P,L]

PROXY

Proxy the calls (work around sandbox/other domain error), or use jsonp

JAVASCRIPT

<script type="text/javascript">$.getJSON("/db/ponies/_design/ponies/_view/best-ponies?include_docs=true", function(res){ for(i in res.rows) { doc = res.rows[i].doc; // do stuff } });</script>

CONTENTS

• Introduction

• PHP Usage

• Replication/Scalability

• Backend usage

• Couchapps

•Other stuff

COUCHAPP

CouchDB has it’s own structure for “distributed, scalable web applications” called couchapps

“Distributed, scalable, web applications you say?

omgwtfbbq!?!1!!!11!1!eleven”

_attachments

the magic is in _attachments

CouchDBrelax

CouchDBrelax

CouchDBrelax

distribution via replication

INSTALLATION

Couchapp 0.7.0

installation is easy

$ couchapp init

init a project

LAYOUT

creates a default folder

$ couchapp push http://ponies.couchone.com:5984/rainbows

https://github.com/brandon-beacher/couchapp-tmbundle

couchapp push on save -> textmate

CONTENTS

• Introduction

• PHP Usage

• Replication/Scalability

• Backend usage

• Couchapps

•Other stuff

OTHER STUFF

REWRITES

_REWRITE

such urls make us a sad panda

{ .... "rewrites": [ { "from": "/best-5-ponies", "to": "ponies/_view/best-ponies", "method": "GET", "query": { "descending": true, "limit": 5, "key": "foobar" } } ] }

$ curl "http://ponies.couchone.com/rainbows/_design/ponies/_rewrite/best-5-ponies"

to this

[vhosts]awesomeponies.com = /rainbows/_design/ponies/_rewrite

$ curl "http://ponies.couchone.com/rainbows/_design/ponies/_rewrite/best-5-ponies"

rewrite this

_CHANGES

$ curl -X GET "http://ponies.couchone.com/rainbows/_changes"

{"results":[

],"last_seq":0}

curl -X PUT http://ponies.couchone.come/rainbows/foobar -d '{"type":"awesome"}'

{"results":[{"seq":1,"id":"foobar","changes":[{"rev":"1-aaa8e2a031bca334f50b48b6682fb486"}]}],"last_seq":1}

{"results":[{"seq":1,"id":"foobar","changes":[{"rev":"1-aaa8e2a031bca334f50b48b6682fb486"}]},{"seq":2,"id":"foobar2","changes":[{"rev":"1-e18422e6a82d0f2157d74b5dcf457997"}]}],"last_seq":2}

_CHANGES OPTIONS

• ?since

• Longpolling

• Continuous

$ curl -X GET "http://ponies.couchone.com/rainbows/_changes?since=20"

curl -X GET "http://ponies.couchone.com/rainbows/_changes?feed=longpoll&since=2"

Longpolling: good for little updates, connections stays open until change, then gets closed and you need to reconnect, lots of reconnects for lots of updates

curl -X GET "http://ponies.couchone.com/rainbows/_changes?feed=continuous&since=2"

Connections stays open, and you get updates on the fly!

FILTERS

filters can be used to filter documents from output

function(doc, req){ if(doc.priority == 'high') { return true; } return false;}

we only want high priority documents

function(doc, req){ if(doc.name == req.query.name) { return true; }

return false;}

you can use req for request based filters

curl -X GET"http://ponies.couchone.com/rainbows/_changes?feed=continuous&filter=app/name&name=foobar

SHOWS

function(doc, req) { return { body: "Hello World" }}

curl -X"http://ponies.couchone.com/rainbows/_design/foobar/_show/showfunction/docid"

function(doc) { return { "code": 302, "body": "See other", "headers": { "Location": doc.target } };}

You can also define http headers, we used this for translating public id’s into private storage id’s. In this way, couchdb took care of all the headers and http stuff, and we could use a regular nginx proxy module

LUCENE

[external]fti=/path/to/python /path/to/couchdb-lucene/tools/couchdb-external-hook.py

[httpd_db_handlers]_fti = {couch_httpd_external, handle_external_req, <<"fti">>}

function(doc) { var ret=new Document(); ret.add(doc.message); ret.add(new Date(doc.datetime)); return ret;}

GEOCOUCHhttps://github.com/vmx/couchdb

See Dericks talk yesterday

GEOCOUCH

• Supports bbox

• fork

• outputs via lists, georss possible

• directly useable by google maps

• can read GIS data

• combined with _changes makes interesting usecase

- bbox => all items withing a certain bounding box, polygon is in the works- currently a fork of couchdb, in the works as external module- output can be setup seperately- google maps can use georss- GIS: Geographic Information System (used worldwide?)

function(doc){ if (doc.geo && doc.geo.latitude != '' && doc.geo.longitude != '') { emit( { type: "Point", coordinates: [parseFloat(doc.geo.latitude), parseFloat(doc.geo.longitude)] }, [doc._id, doc] ); }}

SPATIAL INDEXin spatial/points.js

http://ponies.couchone.com/rainbows/_design/unicorns/_spatial/points?bbox=0,0,180,90

Worldwide search

{"update_seq":3,"rows":[ { "id":"augsburg", "bbox":[10.898333,48.371667,10.898333,48.371667], "value":["augsburg",[10.898333,48.371667]] }]}

if (GBrowserIsCompatible()){ map = new GMap2(document.getElementById('map')); var geoXML = new GGeoXml('http://ponies.couchone.com/rainbows/url-to-georss-view'); map.addOverlay(geoXML);}

GEORSS && GOOGLE MAPS

curl -X GET "http://ponies.couchone.com/rainbows/_design/alarmeringen/_spatial/points?bbox=51.711369,4.218407,52.136520,4.745740";

Q?

http://www.couchone.com/get

http://joind.in/talk/view/2495

second talk ever, please provide feedback