SmugMug: From MySQL to Amazon DynamoDB (DAT204) | AWS re:Invent 2013
-
Upload
amazon-web-services -
Category
Technology
-
view
5.762 -
download
0
description
Transcript of SmugMug: From MySQL to Amazon DynamoDB (DAT204) | AWS re:Invent 2013
![Page 1: SmugMug: From MySQL to Amazon DynamoDB (DAT204) | AWS re:Invent 2013](https://reader033.fdocuments.net/reader033/viewer/2022042613/54b778d94a7959db2c8b4947/html5/thumbnails/1.jpg)
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
DAT204 - SmugMug: From MySQL to
Amazon DynamoDB (and some of the tools we used to get there)
Brad Clawsie, SmugMug.com
November 14, 2013
![Page 2: SmugMug: From MySQL to Amazon DynamoDB (DAT204) | AWS re:Invent 2013](https://reader033.fdocuments.net/reader033/viewer/2022042613/54b778d94a7959db2c8b4947/html5/thumbnails/2.jpg)
Welcome!
• I'm Brad Clawsie, an engineer at SmugMug.com
• SmugMug.com is a platform for hosting, sharing,
and selling photos
• Our target market is professional and enthusiast
photographers
![Page 3: SmugMug: From MySQL to Amazon DynamoDB (DAT204) | AWS re:Invent 2013](https://reader033.fdocuments.net/reader033/viewer/2022042613/54b778d94a7959db2c8b4947/html5/thumbnails/3.jpg)
This Talk…
• Isn't an exhaustive Amazon DynamoDB tutorial
• Is about some observations made while
migrating some features from MySQL to
Amazon DynamoDB
• Is an introduction to some of the tools that have
helped us migrate to Amazon DynamoDB
![Page 4: SmugMug: From MySQL to Amazon DynamoDB (DAT204) | AWS re:Invent 2013](https://reader033.fdocuments.net/reader033/viewer/2022042613/54b778d94a7959db2c8b4947/html5/thumbnails/4.jpg)
Background
• SmugMug.com started in 2003
• LAMP code base
• A few machines/colocation
→ a lot of machines/colocations
→ Amazon Web Services
• Hundreds of thousands of paying customers
• Millions of viewers, billions of photos, petabytes
of storage
![Page 5: SmugMug: From MySQL to Amazon DynamoDB (DAT204) | AWS re:Invent 2013](https://reader033.fdocuments.net/reader033/viewer/2022042613/54b778d94a7959db2c8b4947/html5/thumbnails/5.jpg)
![Page 6: SmugMug: From MySQL to Amazon DynamoDB (DAT204) | AWS re:Invent 2013](https://reader033.fdocuments.net/reader033/viewer/2022042613/54b778d94a7959db2c8b4947/html5/thumbnails/6.jpg)
![Page 7: SmugMug: From MySQL to Amazon DynamoDB (DAT204) | AWS re:Invent 2013](https://reader033.fdocuments.net/reader033/viewer/2022042613/54b778d94a7959db2c8b4947/html5/thumbnails/7.jpg)
![Page 8: SmugMug: From MySQL to Amazon DynamoDB (DAT204) | AWS re:Invent 2013](https://reader033.fdocuments.net/reader033/viewer/2022042613/54b778d94a7959db2c8b4947/html5/thumbnails/8.jpg)
![Page 9: SmugMug: From MySQL to Amazon DynamoDB (DAT204) | AWS re:Invent 2013](https://reader033.fdocuments.net/reader033/viewer/2022042613/54b778d94a7959db2c8b4947/html5/thumbnails/9.jpg)
Amazon DynamoDB in a Nutshell
• Tables → [Keys → Items]
• Items → [AttributeName → Attribute]
• Attribute → {Type:Value}
• Provisioned throughput
• NoSQL-database-as-a-service
• Create, Get, Put, Update, Delete, Query, Scan
![Page 10: SmugMug: From MySQL to Amazon DynamoDB (DAT204) | AWS re:Invent 2013](https://reader033.fdocuments.net/reader033/viewer/2022042613/54b778d94a7959db2c8b4947/html5/thumbnails/10.jpg)
MySQL at SmugMug
![Page 11: SmugMug: From MySQL to Amazon DynamoDB (DAT204) | AWS re:Invent 2013](https://reader033.fdocuments.net/reader033/viewer/2022042613/54b778d94a7959db2c8b4947/html5/thumbnails/11.jpg)
MySQL on Our Terms...
• “SQL”, but not relational • We avoid joins, foreign keys, complex queries, views,
etc.
• Simplified our model so that caching was easier
to implement
• Used like a key (id) → values (row) system
![Page 12: SmugMug: From MySQL to Amazon DynamoDB (DAT204) | AWS re:Invent 2013](https://reader033.fdocuments.net/reader033/viewer/2022042613/54b778d94a7959db2c8b4947/html5/thumbnails/12.jpg)
MySQL on Our Terms...
• Aggressive denormalization is common in many
online applications
• Upside – easy to migrate some of these tables
and supporting code to “NoSQL” style database
• Downside(?) – database does less, code does
more
![Page 13: SmugMug: From MySQL to Amazon DynamoDB (DAT204) | AWS re:Invent 2013](https://reader033.fdocuments.net/reader033/viewer/2022042613/54b778d94a7959db2c8b4947/html5/thumbnails/13.jpg)
So Why Change?
We're hitting roadblocks that can't be addressed
by:
• More/better hardware
• More ops staff
• Best practices
![Page 14: SmugMug: From MySQL to Amazon DynamoDB (DAT204) | AWS re:Invent 2013](https://reader033.fdocuments.net/reader033/viewer/2022042613/54b778d94a7959db2c8b4947/html5/thumbnails/14.jpg)
Notable Issue #1: “OFFLINE OPS”
like ALTER TABLE
• We used to have a fair number of read-only/site-
maintenance downtime to ALTER tables
• As number of users grows, this always
inconveniences someone
• Introduces risk into the code
• Other RDBMs are better about this
![Page 15: SmugMug: From MySQL to Amazon DynamoDB (DAT204) | AWS re:Invent 2013](https://reader033.fdocuments.net/reader033/viewer/2022042613/54b778d94a7959db2c8b4947/html5/thumbnails/15.jpg)
Temporary Relief...
• Introduced the concept of treating a column as a
JSON-like BLOB type for embedding
varying/new data
• Bought us some time and flexibility, and reduced
the need for ALTER TABLE-related downtime
• But MySQL wasn't intended to be an ID →
BLOB system, and other issues remained
![Page 16: SmugMug: From MySQL to Amazon DynamoDB (DAT204) | AWS re:Invent 2013](https://reader033.fdocuments.net/reader033/viewer/2022042613/54b778d94a7959db2c8b4947/html5/thumbnails/16.jpg)
Notable Issue #2: Concurrency
• MySQL can manifest some non-graceful
degradation under heavy write load
• We're already isolating non-essential tables to
their own databases and denormalizing where
we can...the problem persists
![Page 17: SmugMug: From MySQL to Amazon DynamoDB (DAT204) | AWS re:Invent 2013](https://reader033.fdocuments.net/reader033/viewer/2022042613/54b778d94a7959db2c8b4947/html5/thumbnails/17.jpg)
Notable Issue #3: Replication
• A necessary headache, but in fairness MySQL is
pretty good at it
• Performance issues (single threaded etc.)
• Makes it harder to reason about consistency in
code
• Big ops headache
![Page 18: SmugMug: From MySQL to Amazon DynamoDB (DAT204) | AWS re:Invent 2013](https://reader033.fdocuments.net/reader033/viewer/2022042613/54b778d94a7959db2c8b4947/html5/thumbnails/18.jpg)
Notable Issue #4: Ops
Keeping all of this going requires an ops team...
• People
• Colocation
• “Space” concerns – storage, network
capacity, and all the hardware to meet
anticipated capacity needs
![Page 19: SmugMug: From MySQL to Amazon DynamoDB (DAT204) | AWS re:Invent 2013](https://reader033.fdocuments.net/reader033/viewer/2022042613/54b778d94a7959db2c8b4947/html5/thumbnails/19.jpg)
Intangibles
• We have the resources to try out some new things
• We were already AWS fan boys
• Big users of Amazon S3
• Recently moved out of colocations and into Amazon
EC2
• Our ops staff has become AWS experts
• So we would give an AWS database consideration
![Page 20: SmugMug: From MySQL to Amazon DynamoDB (DAT204) | AWS re:Invent 2013](https://reader033.fdocuments.net/reader033/viewer/2022042613/54b778d94a7959db2c8b4947/html5/thumbnails/20.jpg)
Immediate Observations
• Limited key structure
• Limited data types
• ACID-like on Amazon DynamoDB's terms
• Query/Scan operations not that interesting
• But, freedom from most space constraints
• Leaving the developer with primarily time
constraints
![Page 21: SmugMug: From MySQL to Amazon DynamoDB (DAT204) | AWS re:Invent 2013](https://reader033.fdocuments.net/reader033/viewer/2022042613/54b778d94a7959db2c8b4947/html5/thumbnails/21.jpg)
First Steps
• Start with a solved problem – stats/analytics
• SmugMug's stats stack is a relatively simple
data model:
{“u”:”1”,”i”:”123”,”a”:”321”...}
• We measure hits on the frontend and create
lines of JSON with user, image, album, time, etc.
![Page 22: SmugMug: From MySQL to Amazon DynamoDB (DAT204) | AWS re:Invent 2013](https://reader033.fdocuments.net/reader033/viewer/2022042613/54b778d94a7959db2c8b4947/html5/thumbnails/22.jpg)
First Steps
• Analytics needs reliable throughput – new data is
always being generated
• Space concerns (hardware, storage, replication)
It was obvious that Amazon DynamoDB would free
us from some space constraints. However, we
were naive about Amazon DynamoDB's special
time constraints.
![Page 23: SmugMug: From MySQL to Amazon DynamoDB (DAT204) | AWS re:Invent 2013](https://reader033.fdocuments.net/reader033/viewer/2022042613/54b778d94a7959db2c8b4947/html5/thumbnails/23.jpg)
Very Simple Tables
• A site key (user, image, album id) as HashKey
• A date as RangeKey
• The rest of the data
• Just a few tables • We'll have to manage removing data from them over
time
• Obvious: fewer tables → lower bill
![Page 24: SmugMug: From MySQL to Amazon DynamoDB (DAT204) | AWS re:Invent 2013](https://reader033.fdocuments.net/reader033/viewer/2022042613/54b778d94a7959db2c8b4947/html5/thumbnails/24.jpg)
Need for Tools
• Even with our simple initial test bed, we saw the
need for more tooling
• We are huge users of memcache multi*
functions
• So we wanted to be able to have arbitrary-sized
“batch” input requests
• PHP doesn't do concurrency
![Page 25: SmugMug: From MySQL to Amazon DynamoDB (DAT204) | AWS re:Invent 2013](https://reader033.fdocuments.net/reader033/viewer/2022042613/54b778d94a7959db2c8b4947/html5/thumbnails/25.jpg)
So...a Proxy
• A long-running proxy to take requests and
manage concurrency for PHP
• A proxy to allow us to cheat a little with sizing
our requests*
• Needed a tool that was geared toward building
tools like proxies
• Go fit the bill
![Page 26: SmugMug: From MySQL to Amazon DynamoDB (DAT204) | AWS re:Invent 2013](https://reader033.fdocuments.net/reader033/viewer/2022042613/54b778d94a7959db2c8b4947/html5/thumbnails/26.jpg)
A Little Risk
• Writing tools for a young database in a young
programming language
• Resulted in two codebases we will share:
• GoDynamo: equivalent of the AWS SDKs
• BBPD: an HTTP proxy based on GoDynamo
![Page 27: SmugMug: From MySQL to Amazon DynamoDB (DAT204) | AWS re:Invent 2013](https://reader033.fdocuments.net/reader033/viewer/2022042613/54b778d94a7959db2c8b4947/html5/thumbnails/27.jpg)
Observation #1:
On Amazon DynamoDB's Terms
• Sensible key ↔ application mapping
• Denormalization
• No reliance on complex queries or other
relational features
• Many at-scale MySQL users are already using it
in this way anyway
![Page 28: SmugMug: From MySQL to Amazon DynamoDB (DAT204) | AWS re:Invent 2013](https://reader033.fdocuments.net/reader033/viewer/2022042613/54b778d94a7959db2c8b4947/html5/thumbnails/28.jpg)
Observation #1:
On Amazon DynamoDB's Terms
• Avoid esoteric features
• Don't force it • Amazon DynamoDB is not the only AWS database
• Nice to have a “control” to use as a yardstick of
success
![Page 29: SmugMug: From MySQL to Amazon DynamoDB (DAT204) | AWS re:Invent 2013](https://reader033.fdocuments.net/reader033/viewer/2022042613/54b778d94a7959db2c8b4947/html5/thumbnails/29.jpg)
Observation #2:
Respect Throttling
• Coming from MySQL, graceful degradation is an
expected artifact of system analysis
• But Amazon DynamoDB is a shared service
using a simple WAN protocol
• You either get a 2xx (success), 4xx, or 5xx
(some kind of failure) • A binary distinction
![Page 30: SmugMug: From MySQL to Amazon DynamoDB (DAT204) | AWS re:Invent 2013](https://reader033.fdocuments.net/reader033/viewer/2022042613/54b778d94a7959db2c8b4947/html5/thumbnails/30.jpg)
Observation #2:
Respect Throttling
• Throttling is the failure state of a properly-
formatted request
• Throttling happens when the rate of growth of
requests changes quickly (my observation)
• Correlate your throttling to your provision
![Page 31: SmugMug: From MySQL to Amazon DynamoDB (DAT204) | AWS re:Invent 2013](https://reader033.fdocuments.net/reader033/viewer/2022042613/54b778d94a7959db2c8b4947/html5/thumbnails/31.jpg)
Observation #2:
Respect Throttling
• Typically, throttling happens well below the
provisioning level
• Don't reflexively increase your provisioning
• Amazon DynamoDB behaves best when you
optimize requests for space and time
![Page 32: SmugMug: From MySQL to Amazon DynamoDB (DAT204) | AWS re:Invent 2013](https://reader033.fdocuments.net/reader033/viewer/2022042613/54b778d94a7959db2c8b4947/html5/thumbnails/32.jpg)
Space Optimizations
• Compress data (reduce requests)
• Cache data (read locally when possible)
• Avoid clustering requests to tables/keys
• Use key/table structures if possible (often the
application dictates this)
![Page 33: SmugMug: From MySQL to Amazon DynamoDB (DAT204) | AWS re:Invent 2013](https://reader033.fdocuments.net/reader033/viewer/2022042613/54b778d94a7959db2c8b4947/html5/thumbnails/33.jpg)
Time Optimizations
• Reduce herding/spikes if possible
• Queue requests to be processed as a controlled
rate of flow elsewhere
• Experiment with concurrency to achieve
optimum reqs/sec
![Page 34: SmugMug: From MySQL to Amazon DynamoDB (DAT204) | AWS re:Invent 2013](https://reader033.fdocuments.net/reader033/viewer/2022042613/54b778d94a7959db2c8b4947/html5/thumbnails/34.jpg)
Don't Obsess Over Throttling
• Some throttling is unavoidable
• “Hot keys” are unavoidable
• The service will get better about adapting to
typical use
• Experiment: flow, distribution, mix of requests,
types of requests, etc.
• Throttling is a strong warning
![Page 35: SmugMug: From MySQL to Amazon DynamoDB (DAT204) | AWS re:Invent 2013](https://reader033.fdocuments.net/reader033/viewer/2022042613/54b778d94a7959db2c8b4947/html5/thumbnails/35.jpg)
Observation #3:
Develop with Real(ish) Data
• “Test” data and “test” volume will fail you when
you launch
• Again, no graceful degradation
• Your real data has its own flow and distribution • You must optimize for that
• Once again, set up a control to validate
observations
![Page 36: SmugMug: From MySQL to Amazon DynamoDB (DAT204) | AWS re:Invent 2013](https://reader033.fdocuments.net/reader033/viewer/2022042613/54b778d94a7959db2c8b4947/html5/thumbnails/36.jpg)
Observation #4:
Live with the Limits
• Don't try to recreate relational features in
Amazon DynamoDB
• Query/Scan are limited, be realistic
• You can't really see behind the curtain
• Feedback from the console is limited
• Expect to iterate
![Page 37: SmugMug: From MySQL to Amazon DynamoDB (DAT204) | AWS re:Invent 2013](https://reader033.fdocuments.net/reader033/viewer/2022042613/54b778d94a7959db2c8b4947/html5/thumbnails/37.jpg)
Success?
Recall our original MySQL gripes:
(1) ALTER TABLE: kind of solved
Amazon DynamoDB doesn't have full table
schemas so to speak, so while we are able to
add Attributes to an Item at will, we can only
change a table's provisioning once created.
![Page 38: SmugMug: From MySQL to Amazon DynamoDB (DAT204) | AWS re:Invent 2013](https://reader033.fdocuments.net/reader033/viewer/2022042613/54b778d94a7959db2c8b4947/html5/thumbnails/38.jpg)
Success?
(2) Replication: solved
But opaque to using Amazon DynamoDB.
(3) Concurrency: kind of solved
Throttling introduces a new kind of
concurrency issue, but at least it is limited to a
single table.
![Page 39: SmugMug: From MySQL to Amazon DynamoDB (DAT204) | AWS re:Invent 2013](https://reader033.fdocuments.net/reader033/viewer/2022042613/54b778d94a7959db2c8b4947/html5/thumbnails/39.jpg)
Success?
(4) Ops: mostly solved
Ops doesn't have to babysit servers anymore,
but they need to learn the peculiarities of
Amazon DynamoDB and accept the limited
value of the console and available body of
knowledge.
![Page 40: SmugMug: From MySQL to Amazon DynamoDB (DAT204) | AWS re:Invent 2013](https://reader033.fdocuments.net/reader033/viewer/2022042613/54b778d94a7959db2c8b4947/html5/thumbnails/40.jpg)
Recap: What We Wrote
• GoDynamo: like the AWS SDK, but in Go
• BBPD: a proxy written on GoDynamo
• See github.com/smugmug
![Page 41: SmugMug: From MySQL to Amazon DynamoDB (DAT204) | AWS re:Invent 2013](https://reader033.fdocuments.net/reader033/viewer/2022042613/54b778d94a7959db2c8b4947/html5/thumbnails/41.jpg)
Recap: Why a Proxy?
• Allows us to integrate Amazon DynamoDB with
PHP so concurrency can be put to use
• Moves operations to an efficient runtime
• Provides for simple debugging via curl and can
check for well-formedness of requests locally
• Hides details like renewing IAM credentials
![Page 42: SmugMug: From MySQL to Amazon DynamoDB (DAT204) | AWS re:Invent 2013](https://reader033.fdocuments.net/reader033/viewer/2022042613/54b778d94a7959db2c8b4947/html5/thumbnails/42.jpg)
Trivial Examples
# Convenience endpoints available directly:
$ curl -X POST -d '{"TableName":"user","Key":{"UserID":{"N":"1"}, \
"Date":{"N":"20131017"}}}' http://localhost:12333/GetItem
# Or specify the endpoint in a header:
$ curl -H 'X-Amz-Target: DynamoDB_20120810.GetItem' \
-X POST -d '{"TableName":"user","Key":{"UserID":{"N":"1"}, \
"Date":{"N":"20131017"}}}' http://localhost:12333/
![Page 43: SmugMug: From MySQL to Amazon DynamoDB (DAT204) | AWS re:Invent 2013](https://reader033.fdocuments.net/reader033/viewer/2022042613/54b778d94a7959db2c8b4947/html5/thumbnails/43.jpg)
BBPD is Just a Layer
• GoDynamo is where the heavy lifting is done
• Libraries for all endpoints • AWS Signature Version 4 support
• IAM support (transparent and thread-safe)*
• Other nonstandard goodies
• Pro-concurrency, high performance
• Enables some cool hacks
![Page 44: SmugMug: From MySQL to Amazon DynamoDB (DAT204) | AWS re:Invent 2013](https://reader033.fdocuments.net/reader033/viewer/2022042613/54b778d94a7959db2c8b4947/html5/thumbnails/44.jpg)
GoDynamo: Why Go?
• Strong types, concurrency, Unicode, separate
compilation, fast startup, low(ish) memory use,
static binary as output of compiler (deploy →
scp my_program)
• Types ↔ JSON is easy, flexible, and idiomatic
• Easy to learn and sell to your boss
![Page 45: SmugMug: From MySQL to Amazon DynamoDB (DAT204) | AWS re:Invent 2013](https://reader033.fdocuments.net/reader033/viewer/2022042613/54b778d94a7959db2c8b4947/html5/thumbnails/45.jpg)
Trivial Example // control our concurrent access to IAM creds in the background
iam_ready_chan := make(chan bool)
go conf_iam.GoIAM(iam_ready_chan)
// try to get an Item from a table
var get1 get_item.Request
get1.TableName = “my-table”
get1.Key = make(endpoint.Item)
get1.Key[“myhashkey”] = endpoint.AttributeValue{s:”thishashkey”}
get1.Key[“myrangekey”] = endpoint.AttributeValue{n:”1”}
body,code,err := get1.EndpointReq()
if err != nil || code != http.StatusOK {
panic(“uh oh”)
} else {
fmt.Printf(“%v\n”,body)
}
![Page 46: SmugMug: From MySQL to Amazon DynamoDB (DAT204) | AWS re:Invent 2013](https://reader033.fdocuments.net/reader033/viewer/2022042613/54b778d94a7959db2c8b4947/html5/thumbnails/46.jpg)
AWS Identity and Access
Management (IAM)
• Included as a dependency is another package
worth mentioning: goawsroles
• An interface that describes how to handle IAM
credentials
• An implementation for text files
• Suspends threads as credentials are being
updated
![Page 47: SmugMug: From MySQL to Amazon DynamoDB (DAT204) | AWS re:Invent 2013](https://reader033.fdocuments.net/reader033/viewer/2022042613/54b778d94a7959db2c8b4947/html5/thumbnails/47.jpg)
Just the Beginning
• Available at github.com/smugmug
• Standard disclaimer – works for us, but YMMV!
• Would love for you to use it and help create a
community of contributors
Thanks! :)
![Page 48: SmugMug: From MySQL to Amazon DynamoDB (DAT204) | AWS re:Invent 2013](https://reader033.fdocuments.net/reader033/viewer/2022042613/54b778d94a7959db2c8b4947/html5/thumbnails/48.jpg)
Please give us your feedback on this
presentation
As a thank you, we will select prize
winners daily for completed surveys!
DAT204