Michał Gruchała - Data sharding
-
Upload
afterdesign -
Category
Documents
-
view
228 -
download
0
Transcript of Michał Gruchała - Data sharding
8/6/2019 Michał Gruchała - Data sharding
http://slidepdf.com/reader/full/michal-gruchala-data-sharding 1/32
Data Sharding
Michał Gruchała
[email protected] 2011
8/6/2019 Michał Gruchała - Data sharding
http://slidepdf.com/reader/full/michal-gruchala-data-sharding 2/32
TODO
● Background● Theory● Practice
● Summary
8/6/2019 Michał Gruchała - Data sharding
http://slidepdf.com/reader/full/michal-gruchala-data-sharding 3/32
Background
Microblogging site● user messages (blog)● cockpit/wall
Classic architecture● database● web server(s)
● loadbalancer(s)
8/6/2019 Michał Gruchała - Data sharding
http://slidepdf.com/reader/full/michal-gruchala-data-sharding 4/32
Background
Web servers, load balancers● one server ● ...● 1000 servers
● not a problem
Database● one database
● two databases (master -> slave)● two databases (master <-> master)● n databases (slave(s)<-master<->master->slave(s))
a lot of replication ;)
8/6/2019 Michał Gruchała - Data sharding
http://slidepdf.com/reader/full/michal-gruchala-data-sharding 5/32
Background
Replication ● increase read performance (raid1)● increase data safety (raid1)● does not increase system's capacity (GBs)
8/6/2019 Michał Gruchała - Data sharding
http://slidepdf.com/reader/full/michal-gruchala-data-sharding 6/32
Background
Scalability
● stateless elements scale well
● stateful elements○ quite easy to scale
■ if we want more reads (cache, replication)○ hard to scale
■ if we want more writes■ if we want more capacity
8/6/2019 Michał Gruchała - Data sharding
http://slidepdf.com/reader/full/michal-gruchala-data-sharding 7/32
Background
Sharding ;)
A B C D
E F G H
I J K L
A B C D
E F G H
I J K L
8/6/2019 Michał Gruchała - Data sharding
http://slidepdf.com/reader/full/michal-gruchala-data-sharding 8/32
Theory
8/6/2019 Michał Gruchała - Data sharding
http://slidepdf.com/reader/full/michal-gruchala-data-sharding 9/32
Theory
Scaling● Scale Back
○ delete, archive unuset data● Scale Up (vertical)
○ more power, more disks● Scale Out (horizontal)
○ add machines■ functional partitioning
■ replication■ sharding
8/6/2019 Michał Gruchała - Data sharding
http://slidepdf.com/reader/full/michal-gruchala-data-sharding 10/32
Theory
Sharding● split one big database into many smaller databases
○ spread rows○ spread them across many servers
● shared-nothing partitioning● not a replication
8/6/2019 Michał Gruchała - Data sharding
http://slidepdf.com/reader/full/michal-gruchala-data-sharding 11/32
Theory
Sharding key
● shard by a key● all data with that key will be on the same shard
● i.e. shard by user - all informations connected to user are onone shard (user info, messages, friends list)
user 1 -> shard 1
user 2 -> shard 2
user 3 -> shard 1
user 4 -> shard 2
● choosing a right key is very important!
8/6/2019 Michał Gruchała - Data sharding
http://slidepdf.com/reader/full/michal-gruchala-data-sharding 12/32
Theory
Sharding function
● maps keys to shards● where to find the data
● where to store the data
shard number = sf(key)
8/6/2019 Michał Gruchała - Data sharding
http://slidepdf.com/reader/full/michal-gruchala-data-sharding 13/32
Theory
Sharding function
● Dynamic ○ Mapping in a database table
● Fixed ○ Modulo
shard number = id % shards_count○ Hash + Modulo
shard number = md5(email) % shards_count ○ Consistent hasing
http://en.wikipedia.org/wiki/Consistent_hashing
8/6/2019 Michał Gruchała - Data sharding
http://slidepdf.com/reader/full/michal-gruchala-data-sharding 14/32
Theory
Advantages
● Linear write/read performance scalability (raid0)● Capacity increase (raid0)
● Smaller databases are easier to manage○ alter ○ backup/restore○ truncate ;)
● Smaller databases are faster ○ as may fit into memory
● Cost effective○ 80core, 20 HD, 80GB RAM vs○ 10 x (8core, 2HD, 8GB RAM)
8/6/2019 Michał Gruchała - Data sharding
http://slidepdf.com/reader/full/michal-gruchala-data-sharding 15/32
Theory
Challenges
● Globally unique IDs○ unique across all shards
■ auto_increment_increment, auto_increment_offset■ global IDs table
○ not unique across shards■ IDs in dbs - not unique■ shard_number - unique
■ global unique ID = shard_number + db ID
8/6/2019 Michał Gruchała - Data sharding
http://slidepdf.com/reader/full/michal-gruchala-data-sharding 16/32
Challenges
Re-sharding
● consistent hasingor
● more shards than machines/nodes(i.e. 100 shards on 10 machines)
1,4,7 2,5,8 3,6,9
1,6 2,7 3,8 4,9 5
8/6/2019 Michał Gruchała - Data sharding
http://slidepdf.com/reader/full/michal-gruchala-data-sharding 17/32
Challenges
Cross-shard
● queries○ sent to many shards
○ collect result from one○ avoidable (better sharding key, more sharding keys)
● joins○ send query to many shards○ join results in an application○ sometimes unavoidable
8/6/2019 Michał Gruchała - Data sharding
http://slidepdf.com/reader/full/michal-gruchala-data-sharding 18/32
Challenges
Network
● more machines, more smaller streams● full-mesh between webservers and shards
● pconnect vs. connect
Complexity
● usually sharding is done in application logic
8/6/2019 Michał Gruchała - Data sharding
http://slidepdf.com/reader/full/michal-gruchala-data-sharding 19/32
8/6/2019 Michał Gruchała - Data sharding
http://slidepdf.com/reader/full/michal-gruchala-data-sharding 20/32
Practice
Microblogging site● see users messages● see stream/wall
Classic architecture● database● web server(s)● loadbalancer(s)
8/6/2019 Michał Gruchała - Data sharding
http://slidepdf.com/reader/full/michal-gruchala-data-sharding 21/32
Practice
Data
id login
1 John
2 Bob
3 Andy
4 Claire
5 Megan
id owner message
1 2 M1
2 1 M2
3 2 M3
4 3 M4
5 2 M5
who whose
1 2
3 4
3 2
1 3
5 2
2 1
1 5
4 3
4 1
John's messages?John's follows?
8/6/2019 Michał Gruchała - Data sharding
http://slidepdf.com/reader/full/michal-gruchala-data-sharding 22/32
8/6/2019 Michał Gruchała - Data sharding
http://slidepdf.com/reader/full/michal-gruchala-data-sharding 23/32
Practice
id login
1 John
2 Bob
3 Andy
4 Claire
5 Megan
id owner message
1 2 M1
3 2 M3
5 2 M5
who whose
2 1
4 3
4 1
id owner message
2 1 M2
4 3 M4
who whose
1 2
3 4
3 2
1 3
5 2
1 5
shard0
shard1
mapping?
8/6/2019 Michał Gruchała - Data sharding
http://slidepdf.com/reader/full/michal-gruchala-data-sharding 24/32
Practice
Bob's blog
● Bob's messages○ find Bob's id in User table (id = 2)
○ find Bob's shard (2%2 = 0, shard0)○ fetch Messages (shard0) where owner = 2
● People Bob follows○ find Bob's id in User table (id = 2)○ find Bob's shard (2%2 = 0, shard0)○ fetch whose id from Follow table (shard0)○ fetch people info from User table
8/6/2019 Michał Gruchała - Data sharding
http://slidepdf.com/reader/full/michal-gruchala-data-sharding 25/32
Practice
id login
1 John
2 Bob
3 Andy
4 Claire
5 Megan
id owner message
1 2 M1
3 2 M3
5 2 M5
who whose
2 1
4 3
4 1
id owner message
2 1 M2
4 3 M4
who whose
1 2
3 4
3 2
1 3
5 2
1 5
shard0
shard1
8/6/2019 Michał Gruchała - Data sharding
http://slidepdf.com/reader/full/michal-gruchala-data-sharding 26/32
Practice
Who follows Andy ?
● find Andy's id in User table (id=3)● find Andy's shard (3%2 = 1, shard1)
● hmmm
8/6/2019 Michał Gruchała - Data sharding
http://slidepdf.com/reader/full/michal-gruchala-data-sharding 27/32
Practice
id login
1 John
2 Bob
3 Andy
4 Claire
5 Megan
id owner message
1 2 M1
3 2 M3
5 2 M5
who whose
2 1
4 3
4 1
id owner message
2 1 M2
4 3 M4
who whose
1 2
3 4
3 2
1 3
5 2
1 5
shard0
shard1
Cross-shardquery!
8/6/2019 Michał Gruchała - Data sharding
http://slidepdf.com/reader/full/michal-gruchala-data-sharding 28/32
Practice
id login
1 John
2 Bob
3 Andy
4 Claire
5 Megan
id owner message
1 2 M1
3 2 M3
5 2 M5
who whose
2 1
4 3
4 1
id owner message
2 1 M2
4 3 M4
who whose
1 2
3 4
3 2
1 3
5 2
1 5
shard0
shard1
Ideas?
8/6/2019 Michał Gruchała - Data sharding
http://slidepdf.com/reader/full/michal-gruchala-data-sharding 29/32
Summary
8/6/2019 Michał Gruchała - Data sharding
http://slidepdf.com/reader/full/michal-gruchala-data-sharding 30/32
Summary
Shard or not to shard
● many reads, little writes? - don't● many writes and no capacity problems? - don't (use SSD)
● capacity problems? - yes● many writes and capacity problems? - yes● scale-up is affordable? - don't shard
As You see... it depends!
8/6/2019 Michał Gruchała - Data sharding
http://slidepdf.com/reader/full/michal-gruchala-data-sharding 31/32
Summary
If You have to shard
● always use sharding + replication = raid10○ sharding reduces high availability (like raid0)
● more shards than You need○ i.e. 4 machines, 100 shards○ or dynamic allocation
● think of network capacity (full-mesh)○ load sharding (google it ;))
● sharding key - important!○ cross-shard queries
8/6/2019 Michał Gruchała - Data sharding
http://slidepdf.com/reader/full/michal-gruchala-data-sharding 32/32
Wake Up!