Post on 09-Nov-2014
description
S
Scale In after Scaling Out
WHY, WHEN AND HOW
Who am I
Been using mySQL since 1999
Worked for FriendFinder, Friendster, Flickr, Rockyou, SchoolFeed, Weebly
Presented on Flickr Architecture: Doing Billions of Queries Per Day. Record Every Referral For Flickr Real-time. Scaling to 200K TPS per second with Open Source. Scaling a Widget Company, MySpace Vrs Facebook API Load Patterns. University Of Utah Presentation and various others.
Patterns from Start to Scale
Start a project with a single mySQL DB.
Get some users add more disks to the mySQL DB
Get some more users add a slave
Add more Slaves
Then need to split up the master
Patterns: Continued
Master is not strong enough
Put tables on other servers with slaves
Constantly battle slave lag
Big TableAll Tables Minus the Big Table
Still not scaling Horizontally
Let’s Shard
Federation
User 1’s DataUser 2’s DataUser 3’s Data
….User N’s Data
User 1’s Data
User 2’s Data
User 3’s Data
User N’s Data
Assign a section of the Database as a whole to a Server
Have a Layer to tell the connector what to connect to
Add slaves for redundancy
Go Master-Master
Do this N times
Finally provided stable service
Federation
This Increases
WriteThroughput
Now you have the ability to scale Horizontally
What problem was solved at its lowest levels? Lack of IOPS solved Handling of concurrency solved
Problems introduced
Lots of power used
Lots of servers to manage
Lots of rack space used
Some less then optimal hardware usage
SSD
SSD use NAND flash chips, each chip holds millions of Cells.
SLC can hold a single data bit, MLC can hold multiple data bits yielding a higher density or more disk space.
Typically MCL provides Slower throughput than SLC due to more complicated error correction algorithms and false positive reads.
MLC is not all bad major leaps in Firmware improved it
More enterprises are using MLC
Its cheaper
Fast Enough
Endurance improved
Write amplification improved (erasure and data wad resends)
Stay on top of Firmware changes
TRIM improvements which solves the progressively slower writes to blocks over and over.
We use Intel SSDSA2CW160 320 Series MLC SSD
It’s FAST
It’s Reliable
It has advance power protection features Really big capacitors to flush buffered data
Low power usage
We consider it the best
It’s no longer made
Everyone wants it
Speed of single SSD verses Single Spinning Metal Drive
20K IOPS writes reported - SSD
35K IOPS reads reported – SSD
200 IOPS for SEAGATE ST9146852SS HT043TB0584C
Now that we have more IOPS need Space
RAID-5 gives the best space performance we can use.
8 160GB SSD gives 1TB 613GB of usable space Raw size is 149 GB per disk Reserved for wear
Now that we have space we can combine Shards
First get the IOP usage of the current shards is 12K IOPS
Next get Disk space requirements Do not use more then 50% so you have room for
growth I use now 56% of space
Depending on the Replication Traffic per shard you may need another plan
S
How to combine Data
Code Steps
Have a program that keeps a hashmap of tablename to federated column
Lock the federated entity by throwing an error in the application that says this federated entity is not available
SELECT ALL Federated data (in chunks) and add it to the new combined table.
Update pointers
Error if any step fails and keep the data locked otherwise Unlock
Another way to combine, more Operational
Take a copy of the shard.
Configure multi instance mysql
Run that shard off a different port
I choose to do both methods here is how
Let’s take a case.
Support 20 million websites
90% of all sites get 1 or more hits but less then 1000 hits per day
Less then 10% of sites gets more then 1000 hits per day
8 shards to handle 12K IOPS 64 CPU Threads 288 GB of memory 64 2.5” drives Roughly $40K Of hardware Multiply * 2 for redundancy
Replication was lagging
Simply combining the data onto one server will not work
Master needs 10K IOPS replication with some tricks can use 2.5K IOPS
Innodb_fake_changes did not work
Facebook:faker helped but CPU was underpowered thus not really good to saturate IOPS and keep it in sync
Multi_Mysql is the answer
Set up 4 mySQL instances
Instead of 1 Replication thread I now have 4
Instead of limiting 2.5K IOPS ON SSD from single Replication thread I now have 10K IOPS
Master produces 10K IOPS from ETL that runs on all Front End
I don’t need much memory in fact each instance only has a 4GB buffer pool
S
Let’s look at the details
All tables are compressed
KEY_BLOCK_SIZE=8K with INNODB
Consistent Hash for Hostname to bigint
Remove Lookup in exchange for a small CPU computation
Md5 based off 1st 16 bits of hex number to produce 8byte bigint
Primary KEY is HashId + Hostname(10)
HashId maps to ShardId with range blocks
S
Run through all hostnames to assign
to a shard
Test Hashing is even
X = #
Shards
Y = PV/Site
Shard on a Single Server
There are 8 Databases per Server Instance
A database represents a shard
There are 4 mySQL Server Instances
32 Shards Total
Can isolate a single DB to a single Server
Can isolate a single Host to a single Shard
Base on a Range go to the correct server, port and database
How to switch
Write in both locations
Log if write fails in a single location (none happened)
Backfill old data to new format
Switch reads over to the new format once data is verified as correct
In Staging switch Reads to new format
Verify that Production and Staging render the same graph
Verify that Production and Staging have the same referrers
Sample random Pro accounts and make sure numbers match
Roll out with a switch to rollback
There was a bug where some website where passed as user input to the lookup method, yet I stored everything lowercase names
Turn off new reads with Application config switch
Fix issue and turn on new reads
Clean Up
Once fully over on new format
Kill old format
Repurpose servers
Profit
Some Stats
$80K potential in server cost reduced to $7K
Utilize all the CPU
Less memory per server but more IOPS
All replicas stay in sync because there is now more then 1 replication thread per physical server. (There are 4)
Next Generation
Fusion I/O PCIe SSD Card
1U Form factor
Less power 40W-50W
No need to RAID
Questions
Twitter @dathanvp
http://mysqldba.blogspot.com
http://facebook.com/dathan
http://linkedIn.com/in/dathan
http://about.me/dathan
mailto:dathanvp@gmail.com