A Clever Way to Scale-out a Web Application
-
Upload
kazuho-oku -
Category
Technology
-
view
10.391 -
download
3
description
Transcript of A Clever Way to Scale-out a Web Application
A Clever Way to Scale-out a Web Application
Cybozu Labs, Inc. Kazuho Oku
RDB sharding
denormalization is inevitable
Sep 11 2009 A Clever Way to Scale-out a Web Application 2
tweet
following
timeline
uid:1-2000
followed_by
tweet
following
timeline
uid:2001-4000
followed_by
tweet
following
timeline
uid:4001-6000
followed_by
...
when uid:123 tweets, write his tweet, read uids of his followers, and update the timeline table of his followers
Two methods to update the shards
eventual consistency asynchonous updates using worker processes pros: fast response, high scalability cons: hard to maintain
2-phase commit synchronous updates pros: synchronous, doesn't require external
daemon cons: slow response
Sep 11 2009 A Clever Way to Scale-out a Web Application 3
The problems
complex queries reading from / writing to multiple DB nodes cannot use secondary indexes
need to maintain per-user views (denormalized tables)
maintain consistency between the nodes when using eventual consistency model
dynamic scaling adding new nodes without stopping the service
Sep 11 2009 A Clever Way to Scale-out a Web Application 4
Incline
Sep 11 2009 A Clever Way to Scale-out a Web Application 5
Incline
solution for the two problems of eventual consistency: complex update queries maintenance of the denormalized tables
basic idea do not let app. developers write denormalization
logic handle denormalization below the SQL layer
by using triggers and queue tables
Sep 11 2009 A Clever Way to Scale-out a Web Application 6
tweet
following
timeline
uid:1-2000
followed_by
queue
tweet
following
timeline
uid:2001-4000
followed_by
queue
tweet
following
timeline
uid:4001-6000
followed_by
queue
Incline – illustrated
insert / update / delete rows of related tables automatically
Sep 11 2009 A Clever Way to Scale-out a Web Application 7
...
when uid:123 tweets, write only to his tweet table. Incline updates other tables automatically
tweet
following
timeline
uid:1-2000
followed_by
queue
tweet
following
timeline
uid:2001-4000
followed_by
queue
tweet
following
timeline
uid:4001-6000
followed_by
queue
Incline – illustrated (cont'd)
insert / update / delete rows of related tables automatically
Sep 11 2009 A Clever Way to Scale-out a Web Application 8
...
when uid:2431 starts following uid:940 only write to his following table
Incline – details
triggers generated from def. files sync. updates within each node async. updates between the nodes
each DB node has a queue table helper program (C++) applies the queued events
to other nodes uses a fault tolerant algorithm
application only needs to write to the user's shard
Sep 11 2009 A Clever Way to Scale-out a Web Application 9
Incline – the commands
# create queue tables % incline --mode=shard --rdbms=mysql --database=microblog \ --host=10.0.200.10 --source=microblog.json --shard-source=shard.json \ create-queue
# create triggers % incline --mode=shard --rdbms=mysql --database=microblog \ --host=10.0.200.10 --source=microblog.json --shard-source=shard.json \ create-trigger
# run forwarder (transfers data from specified host to other shards) % incline --mode=shard --rdbms=mysql --database=microblog \ --host=10.0.200.10 --source=microblog.json --shard-source=shard.json \ forward
Sep 11 2009 A Clever Way to Scale-out a Web Application 10
Incline – the definition files
# view def. file [ {
"source" : [ "tweet", "followed_by" ], "destination" : "timeline",
"pk_columns" : { "followed_by.follower_id" : "user_id", "tweet.user_id" : "tweet_user_id",
"tweet.tweet_id" : "tweet_id" },
"npk_columns" : { "tweet.ctime" : "ctime" },
"merge" : { "tweet.user_id" : "followed_by.user_id"
}, "shard-key" : "user_id" }, {
"source" : "following", "destination" : "followed_by",
"pk_columns" : { "following.following_id" : "user_id", "following.user_id" : "follower_id"
}, "shard-key" : "user_id"
} ]
Sep 11 2009 A Clever Way to Scale-out a Web Application 11
# shard def. file { "algorithm" : "range-int",
"map" : { "1" : {
"host" : "10.0.200.10", "username" : "pac1251781019" },
"2001" : { "host" : "10.0.200.11",
"username" : "pac1251781332" }, "4001" : {
"host" : "10.0.200.12", "username" : "pac1251781408"
} }
Incline – FYI the generated triggers
CREATE TRIGGER _INCLINE_followed_by_INSERT AFTER INSERT ON followed_by FOR EACH ROW BEGIN
IF (((1<=NEW.follower_id AND NEW.follower_id<2001))) THEN INSERT INTO timeline (user_id,ctime,tweet_id,tweet_user_id) SELECT
NEW.follower_id,tweet.ctime,tweet.tweet_id,tweet.user_id FROM tweet WHERE tweet.user_id=NEW.user_id;
ELSE INSERT INTO _iq_timeline (user_id,ctime,tweet_id,tweet_user_id,_iq_action)
SELECT NEW.follower_id,tweet.ctime,tweet.tweet_id,tweet.user_id,'I' FROM tweet WHERE tweet.user_id=NEW.user_id;
END IF; END CREATE TRIGGER _INCLINE_followed_by_UPDATE AFTER UPDATE ON followed_by FOR EACH
ROW BEGIN IF (((1<=NEW.follower_id AND NEW.follower_id<2001))) THEN REPLACE INTO timeline (user_id,ctime,tweet_id,tweet_user_id) SELECT
NEW.follower_id,tweet.ctime,tweet.tweet_id,tweet.user_id FROM tweet WHERE tweet.user_id=NEW.user_id;
ELSE INSERT INTO _iq_timeline (user_id,ctime,tweet_id,tweet_user_id,_iq_action)
SELECT NEW.follower_id,tweet.ctime,tweet.tweet_id,tweet.user_id,'U' FROM tweet WHERE tweet.user_id=NEW.user_id;
END IF; END CREATE TRIGGER _INCLINE_followed_by_DELETE AFTER DELETE ON followed_by FOR EACH
ROW BEGIN IF (((1<=OLD.follower_id AND OLD.follower_id<2001))) THEN DELETE FROM timeline WHERE timeline.user_id=OLD.follower_id AND
tweet_user_id=OLD.user_id; ELSE
INSERT INTO _iq_timeline (user_id,tweet_id,tweet_user_id,_iq_action) SELECT OLD.follower_id,tweet.tweet_id,tweet.user_id,'D' FROM tweet WHERE tweet.user_id=OLD.user_id;
END IF;
END CREATE TRIGGER _INCLINE_following_INSERT AFTER INSERT ON following FOR EACH ROW
BEGIN IF (((1<=NEW.following_id AND NEW.following_id<2001))) THEN INSERT INTO followed_by (user_id,follower_id) SELECT
NEW.following_id,NEW.user_id;
ELSE INSERT INTO _iq_followed_by (user_id,follower_id,_iq_action) SELECT
NEW.following_id,NEW.user_id,'I'; END IF; ENDCREATE TRIGGER _INCLINE_following_DELETE AFTER DELETE ON following FOR EACH
ROW BEGIN IF (((1<=OLD.following_id AND OLD.following_id<2001))) THEN DELETE FROM followed_by WHERE followed_by.user_id=OLD.following_id AND
followed_by.follower_id=OLD.user_id;
ELSE INSERT INTO _iq_followed_by (user_id,follower_id,_iq_action) SELECT
OLD.following_id,OLD.user_id,'D'; END IF; END CREATE TRIGGER _INCLINE_tweet_INSERT AFTER INSERT ON tweet FOR EACH ROW BEGIN INSERT INTO timeline (user_id,ctime,tweet_id,tweet_user_id) SELECT
followed_by.follower_id,NEW.ctime,NEW.tweet_id,NEW.user_id FROM followed_by WHERE ((1<=followed_by.follower_id AND followed_by.follower_id<2001)) AND NEW.user_id=followed_by.user_id;
INSERT INTO _iq_timeline (user_id,ctime,tweet_id,tweet_user_id,_iq_action) SELECT followed_by.follower_id,NEW.ctime,NEW.tweet_id,NEW.user_id,'I' FROM followed_by WHERE NOT (((1<=followed_by.follower_id AND followed_by.follower_id<2001))) AND NEW.user_id=followed_by.user_id;
END CREATE TRIGGER _INCLINE_tweet_UPDATE AFTER UPDATE ON tweet FOR EACH ROW BEGIN REPLACE INTO timeline (user_id,ctime,tweet_id,tweet_user_id) SELECT
followed_by.follower_id,NEW.ctime,NEW.tweet_id,NEW.user_id FROM followed_by WHERE ((1<=followed_by.follower_id AND followed_by.follower_id<2001)) AND NEW.user_id=followed_by.user_id;
INSERT INTO _iq_timeline (user_id,ctime,tweet_id,tweet_user_id,_iq_action) SELECT followed_by.follower_id,NEW.ctime,NEW.tweet_id,NEW.user_id,'U' FROM followed_by WHERE NOT (((1<=followed_by.follower_id AND followed_by.follower_id<2001))) AND NEW.user_id=followed_by.user_id;
END CREATE TRIGGER _INCLINE_tweet_DELETE AFTER DELETE ON tweet FOR EACH ROW BEGIN DELETE FROM timeline WHERE timeline.tweet_id=OLD.tweet_id AND
timeline.tweet_user_id=OLD.user_id; INSERT INTO _iq_timeline (tweet_id,tweet_user_id,user_id,_iq_action) SELECT
OLD.tweet_id,OLD.user_id,followed_by.follower_id,'D' FROM followed_by WHERE OLD.user_id=followed_by.user_id AND NOT (((1<=followed_by.follower_id AND followed_by.follower_id<2001)));
END
Sep 11 2009 A Clever Way to Scale-out a Web Application 12
Pacific
Sep 11 2009 A Clever Way to Scale-out a Web Application 13
Range-based sharding vs. hash-based
Range-based sharding is better range queries are sometimes necessary manual tuning is easy number of nodes increase continuously
with hash-based sharding, you have to add 1,2,4,8,16,32,64,... servers at once
Sep 11 2009 A Clever Way to Scale-out a Web Application 14
Pacific
utility programs for dynamic scaling mysqld_jumpstart pacific_divide
Sep 11 2009 A Clever Way to Scale-out a Web Application 15
mysqld_jumpstart – summary
create a mysqld instance in a single command service automatically started by daemontools setup of primary nodes and slaves auto-generated backup script: install_dir/etc/
backup.sh uses XtraBackup for hot-backup
Sep 11 2009 A Clever Way to Scale-out a Web Application 16
mysql_jumpstart – the commands
# create and start a master database % mysqld_jumpstart --mysql-install-db=/usr/local/mysql/bin/
mysql_install_db --mysqld=/usr/local/mysql/libexec/mysqld --base-dir=/var/servicedb --server-id=1252619462 --socket=/tmp/mysql-servicedb.sock --service-dir=/service/mysql-servicedb --replication-network='10.0.200.0/255.255.255.0'
# backup % /var/servicedb/etc/backup.sh /var/backup/servicedb.backup.20090911
# create and start a slave database % mysqld_jumpstart --mysql-install-db=/usr/local/mysql/bin/
mysql_install_db --mysqld=/usr/local/mysql/libexec/mysqld --base-dir=/var/servicedb --server-id=1252619493 --socket=/tmp/mysql-servicedb.sock --service-dir=/service/mysql-servicedb --replication-network='10.0.200.0/255.255.255.0' --master-host=10.0.200.1 --from-innobackupex
Sep 11 2009 A Clever Way to Scale-out a Web Application 17
Splitting a MySQL shard
Sep 11 2009 A Clever Way to Scale-out a Web Application 18
2,001~4,000
replication
Before:
After:
use replication to prepare, then upgrade a slave to master
1~2,000
slave
2,001~3,000 1~2,000 3,001~4,000 4,001~6,000
4,001~6,000
Problems in splitting a shard
speed vs. safety downtime should be minimum guarantee that all the application servers write to
the new node reads may switch to the new node eventually
Sep 11 2009 A Clever Way to Scale-out a Web Application 19
Pacific_divide – the blurbs
fail-safe application servers using the old sharding
definition cannot access the split nodes app. servers reload the definition upon such case
minimum impact on users no read-locks during division
in eventual-consistency mode
acquires write lock only against the dividing node write lock time < 10 seconds
if no delay in replication Sep 11 2009 A Clever Way to Scale-out a Web Application 20
Pacific_divide – the split algorithm
1. create a new slave node 2. drop write privileges of existing username on the dividing
node 3. wait until the new node becomes in sync. 4. update incline triggers 5. create new user and give read / write privileges 6. update shard def. 7. drop read privileges granted to the old username
Sep 11 2009 A Clever Way to Scale-out a Web Application 21
Pacific_divide – the comand
# upgrade 10.0.200.18 to a master with range uid:3,000- # # when instructed by pacific_divide, transmit shard.json to all # application servers and mysql shards (or you may use nfs, etc.)
% pacific_divide --shard-def=shard.json --database=microblog --new-host=10.0.200.18 --from-id=3000 --incline-source=microblog.json
Sep 11 2009 A Clever Way to Scale-out a Web Application 22
2,001~4,000
replication
Before:
After:
1~2,000
slave
2,001~3,000 1~2,000 3,001~4,000 4,001~6,000
4,001~6,000
Pacific_divide – how the shard def. changes
Sep 11 2009 A Clever Way to Scale-out a Web Application 23
# after
{
"algorithm" : "range-int", "map" : {
"1" : { "host" : "10.0.200.10", "username" : "pac1251781019"
}, "2001" : {
"host" : "10.0.200.11", "username" : "pac1252624011" },
"3001" : { "host" : "10.0.200.18",
"username" : "pac1252624011" }, "4001" : {
"host" : "10.0.200.12", "username" : "pac1251781408"
} }
# before
{
"algorithm" : "range-int", "map" : {
"1" : { "host" : "10.0.200.10", "username" : "pac1251781019"
}, "2001" : {
"host" : "10.0.200.11", "username" : "pac1251781332" },
"4001" : { "host" : "10.0.200.12",
"username" : "pac1251781408" } }
DBIx::ShardManager
Sep 11 2009 A Clever Way to Scale-out a Web Application 24
DBIx::ShardManager – the code
# create manager object my $mgr = DBIx::ShardManager->new( definition => DBIx::ShardManager::Definition::JSON->new( file => 'etc/user_shard_def.json', auto_reload => 1, ), connector => DBIx::ShardManager::Connector::DBI->new( driver => 'mysql', dbname => 'microblog',
attr => { mysql_enable_utf8 => 1, RaiseError => 1, }, ), );
Sep 11 2009 A Clever Way to Scale-out a Web Application 25
DBIx::ShardManager – the code (cont'd)
# read user's timeline
# first, read my timeline table my $timeline = $mgr->rw_handle($user_id)->selectall_arrayref( 'SELECT * FROM timeline WHERE user_id=? ORDER BY ctime DESC LIMIT
20',
{ Slice => {} }, $user_id, ); # fetch the tweets using (tweet_user_id,tweet_id) from other shards $mgr->shard_inner_join( $timeline, tweet_user_id => { 'tweet.tweet_id' => 'tweet_id', }, }
Sep 11 2009 A Clever Way to Scale-out a Web Application 26
DBIx::ShardManager – blurbs
access to raw DBI handles easy to use ORM above DBIx::ShardManager
detects changes and reloads shard def. but may throw exceptions on writes during node
divisions by pacific_divide display maintenance error, and let the user retry
shard_join to be optimized with Net::Drizzle, or mycached
Sep 11 2009 A Clever Way to Scale-out a Web Application 27
Conclusion
Sep 11 2009 A Clever Way to Scale-out a Web Application 28
Conclusion
RDB sharding is not difficult when using Incline, Pacific, DBIx::ShardManager IMO it is as easy as writing code for a standalone
database system
app. developers can use 2-phase commit if necessary or rely on Incline for async. updates
Sep 11 2009 A Clever Way to Scale-out a Web Application 29
Current Status & ToDo
Incline - early beta ToDo: add support for multiple shard keys, add
recovery support on data-loss
Pacific - early beta ToDo: make it a distribution
DBIx::ShardManager - still alpha ToDo: write more join functions, concurrent
access, etc.
Sep 11 2009 A Clever Way to Scale-out a Web Application 30
Miscellaneous
Mycached currently in alpha status access MySQL tables using memcached protocol higher concurrency (thousands of connections) higher throughput (2x SQL)
Sep 11 2009 A Clever Way to Scale-out a Web Application 31
For more information
see my blog http://developer.cybozu.co.jp/kazuho/ DBIx::ShardManager is in coderepos.org/share/
lang/perl
come to BPStudy #25 on 9/25 2h30m talk on Incline, Pacific,
DBIx::ShardManager (hopefully including demos)
Sep 11 2009 A Clever Way to Scale-out a Web Application 32