Introduction to Riak
Python hack-a-thon 2011.10.15Yosuke Hara {twiter_id, yosukehara}
Software Developer
1Saturday, October 15, 11
Who am I ?
2Saturday, October 15, 11
RIA
twitter-id: yosukehara
Distributed Storage
3Saturday, October 15, 11
What’s ‘Riak’ ?
basho
4Saturday, October 15, 11
KVS + Extra
5Saturday, October 15, 11
Basho Bitcaskor
LevelDB(Google, from 1.0)
6Saturday, October 15, 11
Erlang/OTP Runtime
Riak KV
Client APIs
Request Coordination
Riak Core
get put delete map-reduce
HTTP Protocol Buffers
Erlang local client
membershipconsistent hashing handoff
node-livenessgossip
buckets
vnodes
storage backend
Workers
vnode master
Architecture
7Saturday, October 15, 11
The horizontal scaling bits
8Saturday, October 15, 11
Inspired Amazon’s Dynamo (2007)
KVS (Key-Value Storage) + Extra
Masterless, P2P-Replication
Consistent Hashing
Failover - Quorums, Hinted handoffEventual Consistency
9Saturday, October 15, 11
Inspired Amazon’s Dynamohttp://www.allthingsdistributed.com/2007/10/amazons_dynamo.html
“The reliability and scalability of a system is dependent on how its application state is managed.” - Werner Vogels
10Saturday, October 15, 11
Simple operations - get, put, deleteValue is mostly opaque (some metadata)Extras
MapReduce (depends on Javascript)LinksFull-text search (Optional - Riak Search)Secondary Indexes (from ver 1.0)
KVS (Key-Value-Store)
11Saturday, October 15, 11
P2P - Masterless
12Saturday, October 15, 11
P2P - Masterless
NOT SPOF
12Saturday, October 15, 11
Consistent Hashing
Virtual Node“Hash Ring”2^160 / 32
Node-0Node-1Node-2Node-3
Virtual Node ID = SHA1( $KEY )
13Saturday, October 15, 11
Consistent Hashing
Virtual Node“Hash Ring”2^160 / 32
Node-0Node-1Node-2Node-3
Logical subdivision of the cluster
Virtual Node ID = SHA1( $KEY )
13Saturday, October 15, 11
For Parallelism# of VNodes == Maximum Concurrent Reqs
For Rebalancing the ClusterSmallest block that can be shifted to a new node
For ResilianceThe system restarts failed VNodes
14Saturday, October 15, 11
Quorum
Every request contacts all replicas of key
N : number of replicas (default 3)R : read quorumW : write quorum
Able to adjust to “Consistency Level”
15Saturday, October 15, 11
# of replicas = 3# of successful WRITEs = 2
Distributed Data
(1) PUT File (3) if # of replicas needed for a successful Write > 2
(4) return “ok”
(2) Replicate - Router on each node
Node-0Node-1Node-2Node-3
16Saturday, October 15, 11
(2) Replicate
Hinted Handoff
# of replicas = 3# of successful WRITEs = 2
X(1) PUT File (3) if # of replicas needed
for a successful Write > 2
(4) return “ok”
Node-0Node-1Node-2Node-3
X
X
X
X
X
X
X
X
(5) Recover
17Saturday, October 15, 11
Eventual Consistency
18Saturday, October 15, 11
Availability
Partition Tolerance
Consistency
CAP Theorem
19Saturday, October 15, 11
The doc-database bits
20Saturday, October 15, 11
Store objects as JSON or any format
Link between objects
No SQL - Query with Javascript Map-Reduce
21Saturday, October 15, 11
The operations-friendly bits
22Saturday, October 15, 11
Web-shaped storage
Store data in its original format
Using HTTP
Load balancing
Caches
No special-nodes - grow horizontally
23Saturday, October 15, 11
basho
DEMO
24Saturday, October 15, 11
PUTcurl -X PUT \ -H "x-riak-index-service_bin: news" \ -H "x-riak-index-range_int: 201110151300" \ -H "Content-Type: application/json" \ -d '{"operation":"W", "key":"news/photo/20111015_small.jpg", "service":"news","range":"201110151300"}' \ http://127.0.0.1:8098/riak/news/log_001
For Secondary Index
VALUE
KEY
25Saturday, October 15, 11
GETcurl http://127.0.0.1:8098/demo/news/log_001 | python -mjson.tool
{ "key": "news/photo/20111015_small.jpg", "operation": "W", "range": "201110151300", "service": "news"}
For Secondary Index
VALUE
KEY
26Saturday, October 15, 11
Seconday Index (2I)curl http://127.0.0.1:8098/buckets/demo/index/service_bin/news | python -mjson.tool
{ "keys": [ "log_001", "log_002" ]}
For Secondary Index
VALUE
KEY
27Saturday, October 15, 11
curl -H "Content-Type: application/json" http://127.0.0.1:8098/mapred \ --data @-<<\EOF{"inputs":{"bucket":"demo","index":"range_int","key":"201110151300"}, "query":[ {"map": {"name":"Riak.mapValuesJson","language":"javascript","keep":false}}, {"reduce":{"language":"javascript","source":"function(v) { var json_object = eval(v); var result = {}; for (i = 0; i < json_object.length; i++) { var key = json_object[i].service + '_' + json_object[i].operation; var cur = result[key] == undefined ? 0 : result[key];
result[key] = cur + 1; } return result;}"}}]}EOF
Map Reduce
Result: {"news_W":2,"social_W":1}
28Saturday, October 15, 11
basho_bench
29Saturday, October 15, 11
Depend on ErlangAble to be ‘Easy Operation’Able to create ‘Pretty Chart’for KVS and Others
basho_bench
30Saturday, October 15, 11
Benchmartk Result
31Saturday, October 15, 11
{mode, max}.{duration, 15}.{concurrent, 5}.
{driver, basho_bench_driver_riakclient}.{code_paths, ["deps/stats", "/Users/jmeredith/basho/riak/apps/riak_kv", "/Users/jmeredith/basho/riak/apps/riak_core"]}.
{key_generator, {int_to_bin, {uniform_int, 35000}}}.{value_generator, {fixed_bin, 10000}}.
{riakclient_nodes, ['[email protected]']}.{riakclient_mynode, ['[email protected]', longnames]}.{riakclient_replies, 1}.
basho_bench config
32Saturday, October 15, 11
Thank you for listening
33Saturday, October 15, 11
Top Related