A Clever Way to Scale-out a Web Application

32
A Clever Way to Scale-out a Web Application Cybozu Labs, Inc. Kazuho Oku

description

Introduces middlewares that ease the development of web applications on database shards.

Transcript of A Clever Way to Scale-out a Web Application

Page 1: A Clever Way to Scale-out a Web Application

A Clever Way to Scale-out a Web Application

Cybozu Labs, Inc. Kazuho Oku

Page 2: A Clever Way to Scale-out a Web Application

RDB sharding

 denormalization is inevitable

Sep 11 2009 A Clever Way to Scale-out a Web Application 2

tweet

following

timeline

uid:1-2000

followed_by

tweet

following

timeline

uid:2001-4000

followed_by

tweet

following

timeline

uid:4001-6000

followed_by

...

when uid:123 tweets, write his tweet, read uids of his followers, and update the timeline table of his followers

Page 3: A Clever Way to Scale-out a Web Application

Two methods to update the shards

 eventual consistency  asynchonous updates using worker processes  pros: fast response, high scalability  cons: hard to maintain

 2-phase commit  synchronous updates  pros: synchronous, doesn't require external

daemon  cons: slow response

Sep 11 2009 A Clever Way to Scale-out a Web Application 3

Page 4: A Clever Way to Scale-out a Web Application

The problems

 complex queries  reading from / writing to multiple DB nodes  cannot use secondary indexes

 need to maintain per-user views (denormalized tables)

 maintain consistency between the nodes  when using eventual consistency model

 dynamic scaling  adding new nodes without stopping the service

Sep 11 2009 A Clever Way to Scale-out a Web Application 4

Page 5: A Clever Way to Scale-out a Web Application

Incline

Sep 11 2009 A Clever Way to Scale-out a Web Application 5

Page 6: A Clever Way to Scale-out a Web Application

Incline

 solution for the two problems of eventual consistency:  complex update queries  maintenance of the denormalized tables

 basic idea  do not let app. developers write denormalization

logic  handle denormalization below the SQL layer

 by using triggers and queue tables

Sep 11 2009 A Clever Way to Scale-out a Web Application 6

Page 7: A Clever Way to Scale-out a Web Application

tweet

following

timeline

uid:1-2000

followed_by

queue

tweet

following

timeline

uid:2001-4000

followed_by

queue

tweet

following

timeline

uid:4001-6000

followed_by

queue

Incline – illustrated

 insert / update / delete rows of related tables automatically

Sep 11 2009 A Clever Way to Scale-out a Web Application 7

...

when uid:123 tweets, write only to his tweet table. Incline updates other tables automatically

Page 8: A Clever Way to Scale-out a Web Application

tweet

following

timeline

uid:1-2000

followed_by

queue

tweet

following

timeline

uid:2001-4000

followed_by

queue

tweet

following

timeline

uid:4001-6000

followed_by

queue

Incline – illustrated (cont'd)

 insert / update / delete rows of related tables automatically

Sep 11 2009 A Clever Way to Scale-out a Web Application 8

...

when uid:2431 starts following uid:940 only write to his following table

Page 9: A Clever Way to Scale-out a Web Application

Incline – details

 triggers generated from def. files  sync. updates within each node  async. updates between the nodes

 each DB node has a queue table  helper program (C++) applies the queued events

to other nodes  uses a fault tolerant algorithm

 application only needs to write to the user's shard

Sep 11 2009 A Clever Way to Scale-out a Web Application 9

Page 10: A Clever Way to Scale-out a Web Application

Incline – the commands

# create queue tables % incline --mode=shard --rdbms=mysql --database=microblog \ --host=10.0.200.10 --source=microblog.json --shard-source=shard.json \ create-queue

# create triggers % incline --mode=shard --rdbms=mysql --database=microblog \ --host=10.0.200.10 --source=microblog.json --shard-source=shard.json \ create-trigger

# run forwarder (transfers data from specified host to other shards) % incline --mode=shard --rdbms=mysql --database=microblog \ --host=10.0.200.10 --source=microblog.json --shard-source=shard.json \ forward

Sep 11 2009 A Clever Way to Scale-out a Web Application 10

Page 11: A Clever Way to Scale-out a Web Application

Incline – the definition files

# view def. file [ {

"source" : [ "tweet", "followed_by" ], "destination" : "timeline",

"pk_columns" : { "followed_by.follower_id" : "user_id", "tweet.user_id" : "tweet_user_id",

"tweet.tweet_id" : "tweet_id" },

"npk_columns" : { "tweet.ctime" : "ctime" },

"merge" : { "tweet.user_id" : "followed_by.user_id"

}, "shard-key" : "user_id" }, {

"source" : "following", "destination" : "followed_by",

"pk_columns" : { "following.following_id" : "user_id", "following.user_id" : "follower_id"

}, "shard-key" : "user_id"

} ]

Sep 11 2009 A Clever Way to Scale-out a Web Application 11

# shard def. file { "algorithm" : "range-int",

"map" : { "1" : {

"host" : "10.0.200.10", "username" : "pac1251781019" },

"2001" : { "host" : "10.0.200.11",

"username" : "pac1251781332" }, "4001" : {

"host" : "10.0.200.12", "username" : "pac1251781408"

} }

Page 12: A Clever Way to Scale-out a Web Application

Incline – FYI the generated triggers

CREATE TRIGGER _INCLINE_followed_by_INSERT AFTER INSERT ON followed_by FOR EACH ROW BEGIN

IF (((1<=NEW.follower_id AND NEW.follower_id<2001))) THEN INSERT INTO timeline (user_id,ctime,tweet_id,tweet_user_id) SELECT

NEW.follower_id,tweet.ctime,tweet.tweet_id,tweet.user_id FROM tweet WHERE tweet.user_id=NEW.user_id;

ELSE INSERT INTO _iq_timeline (user_id,ctime,tweet_id,tweet_user_id,_iq_action)

SELECT NEW.follower_id,tweet.ctime,tweet.tweet_id,tweet.user_id,'I' FROM tweet WHERE tweet.user_id=NEW.user_id;

END IF; END CREATE TRIGGER _INCLINE_followed_by_UPDATE AFTER UPDATE ON followed_by FOR EACH

ROW BEGIN IF (((1<=NEW.follower_id AND NEW.follower_id<2001))) THEN REPLACE INTO timeline (user_id,ctime,tweet_id,tweet_user_id) SELECT

NEW.follower_id,tweet.ctime,tweet.tweet_id,tweet.user_id FROM tweet WHERE tweet.user_id=NEW.user_id;

ELSE INSERT INTO _iq_timeline (user_id,ctime,tweet_id,tweet_user_id,_iq_action)

SELECT NEW.follower_id,tweet.ctime,tweet.tweet_id,tweet.user_id,'U' FROM tweet WHERE tweet.user_id=NEW.user_id;

END IF; END CREATE TRIGGER _INCLINE_followed_by_DELETE AFTER DELETE ON followed_by FOR EACH

ROW BEGIN IF (((1<=OLD.follower_id AND OLD.follower_id<2001))) THEN DELETE FROM timeline WHERE timeline.user_id=OLD.follower_id AND

tweet_user_id=OLD.user_id; ELSE

INSERT INTO _iq_timeline (user_id,tweet_id,tweet_user_id,_iq_action) SELECT OLD.follower_id,tweet.tweet_id,tweet.user_id,'D' FROM tweet WHERE tweet.user_id=OLD.user_id;

END IF;

END CREATE TRIGGER _INCLINE_following_INSERT AFTER INSERT ON following FOR EACH ROW

BEGIN IF (((1<=NEW.following_id AND NEW.following_id<2001))) THEN INSERT INTO followed_by (user_id,follower_id) SELECT

NEW.following_id,NEW.user_id;

ELSE INSERT INTO _iq_followed_by (user_id,follower_id,_iq_action) SELECT

NEW.following_id,NEW.user_id,'I'; END IF; ENDCREATE TRIGGER _INCLINE_following_DELETE AFTER DELETE ON following FOR EACH

ROW BEGIN IF (((1<=OLD.following_id AND OLD.following_id<2001))) THEN DELETE FROM followed_by WHERE followed_by.user_id=OLD.following_id AND

followed_by.follower_id=OLD.user_id;

ELSE INSERT INTO _iq_followed_by (user_id,follower_id,_iq_action) SELECT

OLD.following_id,OLD.user_id,'D'; END IF; END CREATE TRIGGER _INCLINE_tweet_INSERT AFTER INSERT ON tweet FOR EACH ROW BEGIN INSERT INTO timeline (user_id,ctime,tweet_id,tweet_user_id) SELECT

followed_by.follower_id,NEW.ctime,NEW.tweet_id,NEW.user_id FROM followed_by WHERE ((1<=followed_by.follower_id AND followed_by.follower_id<2001)) AND NEW.user_id=followed_by.user_id;

INSERT INTO _iq_timeline (user_id,ctime,tweet_id,tweet_user_id,_iq_action) SELECT followed_by.follower_id,NEW.ctime,NEW.tweet_id,NEW.user_id,'I' FROM followed_by WHERE NOT (((1<=followed_by.follower_id AND followed_by.follower_id<2001))) AND NEW.user_id=followed_by.user_id;

END CREATE TRIGGER _INCLINE_tweet_UPDATE AFTER UPDATE ON tweet FOR EACH ROW BEGIN REPLACE INTO timeline (user_id,ctime,tweet_id,tweet_user_id) SELECT

followed_by.follower_id,NEW.ctime,NEW.tweet_id,NEW.user_id FROM followed_by WHERE ((1<=followed_by.follower_id AND followed_by.follower_id<2001)) AND NEW.user_id=followed_by.user_id;

INSERT INTO _iq_timeline (user_id,ctime,tweet_id,tweet_user_id,_iq_action) SELECT followed_by.follower_id,NEW.ctime,NEW.tweet_id,NEW.user_id,'U' FROM followed_by WHERE NOT (((1<=followed_by.follower_id AND followed_by.follower_id<2001))) AND NEW.user_id=followed_by.user_id;

END CREATE TRIGGER _INCLINE_tweet_DELETE AFTER DELETE ON tweet FOR EACH ROW BEGIN DELETE FROM timeline WHERE timeline.tweet_id=OLD.tweet_id AND

timeline.tweet_user_id=OLD.user_id; INSERT INTO _iq_timeline (tweet_id,tweet_user_id,user_id,_iq_action) SELECT

OLD.tweet_id,OLD.user_id,followed_by.follower_id,'D' FROM followed_by WHERE OLD.user_id=followed_by.user_id AND NOT (((1<=followed_by.follower_id AND followed_by.follower_id<2001)));

END

Sep 11 2009 A Clever Way to Scale-out a Web Application 12

Page 13: A Clever Way to Scale-out a Web Application

Pacific

Sep 11 2009 A Clever Way to Scale-out a Web Application 13

Page 14: A Clever Way to Scale-out a Web Application

Range-based sharding vs. hash-based

 Range-based sharding is better  range queries are sometimes necessary  manual tuning is easy  number of nodes increase continuously

 with hash-based sharding, you have to add 1,2,4,8,16,32,64,... servers at once

Sep 11 2009 A Clever Way to Scale-out a Web Application 14

Page 15: A Clever Way to Scale-out a Web Application

Pacific

 utility programs for dynamic scaling  mysqld_jumpstart  pacific_divide

Sep 11 2009 A Clever Way to Scale-out a Web Application 15

Page 16: A Clever Way to Scale-out a Web Application

mysqld_jumpstart – summary

 create a mysqld instance in a single command  service automatically started by daemontools  setup of primary nodes and slaves  auto-generated backup script: install_dir/etc/

backup.sh  uses XtraBackup for hot-backup

Sep 11 2009 A Clever Way to Scale-out a Web Application 16

Page 17: A Clever Way to Scale-out a Web Application

mysql_jumpstart – the commands

# create and start a master database % mysqld_jumpstart --mysql-install-db=/usr/local/mysql/bin/

mysql_install_db --mysqld=/usr/local/mysql/libexec/mysqld --base-dir=/var/servicedb --server-id=1252619462 --socket=/tmp/mysql-servicedb.sock --service-dir=/service/mysql-servicedb --replication-network='10.0.200.0/255.255.255.0'

# backup % /var/servicedb/etc/backup.sh /var/backup/servicedb.backup.20090911

# create and start a slave database % mysqld_jumpstart --mysql-install-db=/usr/local/mysql/bin/

mysql_install_db --mysqld=/usr/local/mysql/libexec/mysqld --base-dir=/var/servicedb --server-id=1252619493 --socket=/tmp/mysql-servicedb.sock --service-dir=/service/mysql-servicedb --replication-network='10.0.200.0/255.255.255.0' --master-host=10.0.200.1 --from-innobackupex

Sep 11 2009 A Clever Way to Scale-out a Web Application 17

Page 18: A Clever Way to Scale-out a Web Application

Splitting a MySQL shard

Sep 11 2009 A Clever Way to Scale-out a Web Application 18

2,001~4,000

replication

Before:

After:

 use replication to prepare, then upgrade a slave to master

1~2,000

slave

2,001~3,000 1~2,000 3,001~4,000 4,001~6,000

4,001~6,000

Page 19: A Clever Way to Scale-out a Web Application

Problems in splitting a shard

 speed vs. safety  downtime should be minimum  guarantee that all the application servers write to

the new node  reads may switch to the new node eventually

Sep 11 2009 A Clever Way to Scale-out a Web Application 19

Page 20: A Clever Way to Scale-out a Web Application

Pacific_divide – the blurbs

 fail-safe  application servers using the old sharding

definition cannot access the split nodes  app. servers reload the definition upon such case

 minimum impact on users  no read-locks during division

 in eventual-consistency mode

 acquires write lock only against the dividing node  write lock time < 10 seconds

 if no delay in replication Sep 11 2009 A Clever Way to Scale-out a Web Application 20

Page 21: A Clever Way to Scale-out a Web Application

Pacific_divide – the split algorithm

1.  create a new slave node 2.  drop write privileges of existing username on the dividing

node 3.  wait until the new node becomes in sync. 4.  update incline triggers 5.  create new user and give read / write privileges 6.  update shard def. 7.  drop read privileges granted to the old username

Sep 11 2009 A Clever Way to Scale-out a Web Application 21

Page 22: A Clever Way to Scale-out a Web Application

Pacific_divide – the comand

# upgrade 10.0.200.18 to a master with range uid:3,000- # # when instructed by pacific_divide, transmit shard.json to all # application servers and mysql shards (or you may use nfs, etc.)

% pacific_divide --shard-def=shard.json --database=microblog --new-host=10.0.200.18 --from-id=3000 --incline-source=microblog.json

Sep 11 2009 A Clever Way to Scale-out a Web Application 22

2,001~4,000

replication

Before:

After:

1~2,000

slave

2,001~3,000 1~2,000 3,001~4,000 4,001~6,000

4,001~6,000

Page 23: A Clever Way to Scale-out a Web Application

Pacific_divide – how the shard def. changes

Sep 11 2009 A Clever Way to Scale-out a Web Application 23

# after

{

"algorithm" : "range-int", "map" : {

"1" : { "host" : "10.0.200.10", "username" : "pac1251781019"

}, "2001" : {

"host" : "10.0.200.11", "username" : "pac1252624011" },

"3001" : { "host" : "10.0.200.18",

"username" : "pac1252624011" }, "4001" : {

"host" : "10.0.200.12", "username" : "pac1251781408"

} }

# before

{

"algorithm" : "range-int", "map" : {

"1" : { "host" : "10.0.200.10", "username" : "pac1251781019"

}, "2001" : {

"host" : "10.0.200.11", "username" : "pac1251781332" },

"4001" : { "host" : "10.0.200.12",

"username" : "pac1251781408" } }

Page 24: A Clever Way to Scale-out a Web Application

DBIx::ShardManager

Sep 11 2009 A Clever Way to Scale-out a Web Application 24

Page 25: A Clever Way to Scale-out a Web Application

DBIx::ShardManager – the code

# create manager object my $mgr = DBIx::ShardManager->new( definition => DBIx::ShardManager::Definition::JSON->new( file => 'etc/user_shard_def.json', auto_reload => 1, ), connector => DBIx::ShardManager::Connector::DBI->new( driver => 'mysql', dbname => 'microblog',

attr => { mysql_enable_utf8 => 1, RaiseError => 1, }, ), );

Sep 11 2009 A Clever Way to Scale-out a Web Application 25

Page 26: A Clever Way to Scale-out a Web Application

DBIx::ShardManager – the code (cont'd)

# read user's timeline

# first, read my timeline table my $timeline = $mgr->rw_handle($user_id)->selectall_arrayref( 'SELECT * FROM timeline WHERE user_id=? ORDER BY ctime DESC LIMIT

20',

{ Slice => {} }, $user_id, ); # fetch the tweets using (tweet_user_id,tweet_id) from other shards $mgr->shard_inner_join( $timeline, tweet_user_id => { 'tweet.tweet_id' => 'tweet_id', }, }

Sep 11 2009 A Clever Way to Scale-out a Web Application 26

Page 27: A Clever Way to Scale-out a Web Application

DBIx::ShardManager – blurbs

 access to raw DBI handles  easy to use ORM above DBIx::ShardManager

 detects changes and reloads shard def.  but may throw exceptions on writes during node

divisions by pacific_divide  display maintenance error, and let the user retry

 shard_join to be optimized  with Net::Drizzle, or mycached

Sep 11 2009 A Clever Way to Scale-out a Web Application 27

Page 28: A Clever Way to Scale-out a Web Application

Conclusion

Sep 11 2009 A Clever Way to Scale-out a Web Application 28

Page 29: A Clever Way to Scale-out a Web Application

Conclusion

 RDB sharding is not difficult when using Incline, Pacific, DBIx::ShardManager  IMO it is as easy as writing code for a standalone

database system

 app. developers can use 2-phase commit if necessary  or rely on Incline for async. updates

Sep 11 2009 A Clever Way to Scale-out a Web Application 29

Page 30: A Clever Way to Scale-out a Web Application

Current Status & ToDo

 Incline - early beta  ToDo: add support for multiple shard keys, add

recovery support on data-loss

 Pacific - early beta  ToDo: make it a distribution

 DBIx::ShardManager - still alpha  ToDo: write more join functions, concurrent

access, etc.

Sep 11 2009 A Clever Way to Scale-out a Web Application 30

Page 31: A Clever Way to Scale-out a Web Application

Miscellaneous

 Mycached  currently in alpha status  access MySQL tables using memcached protocol  higher concurrency (thousands of connections)  higher throughput (2x SQL)

Sep 11 2009 A Clever Way to Scale-out a Web Application 31

Page 32: A Clever Way to Scale-out a Web Application

For more information

 see my blog http://developer.cybozu.co.jp/kazuho/  DBIx::ShardManager is in coderepos.org/share/

lang/perl

 come to BPStudy #25 on 9/25  2h30m talk on Incline, Pacific,

DBIx::ShardManager (hopefully including demos)

Sep 11 2009 A Clever Way to Scale-out a Web Application 32