PetaMongo: A Petabyte Database for as Little as $200 (BDT307) | AWS re:Invent 2013

37
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc. PetaMongo: A Petabyte Database for as Little as $200 Chris Biow, MongoDB Miles Ward, AWS November 13, 2013

description

1,000,000,000,000,000 bytes. On demand. Online. Live. Big doesn't quite describe this data. Amazon Web Services makes it possible to construct highly elastic computing systems, and you can further increase cost efficiency by leveraging the Spot Pricing model for Amazon EC2. We showcase elasticity by demonstrating the creation and teardown of a petabyte-scale multiregion MongoDB NoSQL database cluster, using Amazon EC2 Spot Instances, for as little as $200 in total AWS costs. Oh and it offers up four million IOPS to storage via the power of PIOPS EBS. Christopher Biow, Principal Technologist at 10gen | MongoDB covers MongoDB best practices on AWS, so you can implement this NoSQL system (perhaps at a more pedestrian hundred-terabyte scale?) confidently in the cloud. You could build a massive enterprise warehouse, process a million human genomes, or collect a staggering number of cat GIFs. The possibilities are huMONGOus.

Transcript of PetaMongo: A Petabyte Database for as Little as $200 (BDT307) | AWS re:Invent 2013

Page 1: PetaMongo: A Petabyte Database for as Little as $200 (BDT307) | AWS re:Invent 2013

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.

PetaMongo: A Petabyte Database for as Little as $200 Chris Biow, MongoDB Miles Ward, AWS

November 13, 2013

Page 2: PetaMongo: A Petabyte Database for as Little as $200 (BDT307) | AWS re:Invent 2013

Agenda • MongoDB on AWS review

– Guidance, Storage, Architecture

• MongoDB at PetaScale on AWS

Page 3: PetaMongo: A Petabyte Database for as Little as $200 (BDT307) | AWS re:Invent 2013

• Whitepaper • AWS Marketplace • AWS

CloudFormation

Tools to simplify your design

http://media.amazonwebservices.com/AWS_NoSQL_MongoDB.pdf

Page 4: PetaMongo: A Petabyte Database for as Little as $200 (BDT307) | AWS re:Invent 2013

• Easy to start a single node

• Correctly configured PIOPS EBS Storage

• No extra cost

https://aws.amazon.com/marketplace/pp/B00COAAEH8/ref=srh_res_product_title?ie=UTF8&sr=0-6&qid=1383897659043

Page 5: PetaMongo: A Petabyte Database for as Little as $200 (BDT307) | AWS re:Invent 2013

mongodb.org/display/DOCS/Automating+Deployment+with+CloudFormation

• Nested Templates

• Nodes and Storage

• Configurable Scale

• AWS CloudFormation: Your Infrastructure belongs in your source control

AWS CloudFormation

Page 6: PetaMongo: A Petabyte Database for as Little as $200 (BDT307) | AWS re:Invent 2013

AWS Storage Options

• Amazon EBS – Provisioned IOPS volumes • Deliver predictable, high performance for I/O intensive workloads • Specify IOPS required upfront, and EBS provisions for lifetime of volume – 4000 IOPS per volume, can stripe to get thousands of IOPS to an EC2 instance

• High IO Instances – hi1.4xlarge • For some applications that require tens of thousands of IOPS • Eliminates network latency/bandwidth as a performance constraint to storage

EBS PIOPS

SSD

Page 7: PetaMongo: A Petabyte Database for as Little as $200 (BDT307) | AWS re:Invent 2013

AWS Storage Options Testing: random 4k reads

EBS

SSD

PIOPS

+

One Volume: ~200 MongoOPS with some variability, <1mb/s Loaded instance: ~ 1000 MongoOPS with some variability <10mb/s One Volume: 200 0 MongoOPS with <1% variability, 16mb/s Loaded Instance: 16,000 MongoOPS with <1% variability, 64mb/s Loaded Cluster Instance: MongoOPS, 320mb/s Hi1.4xlarge ephemeral: ~64,000 MongoOPS with low variability, ~245mb/s

Page 8: PetaMongo: A Petabyte Database for as Little as $200 (BDT307) | AWS re:Invent 2013

Testing: random 4k reads

EBS

SSD

PIOPS

+

Sta

ble

Page 9: PetaMongo: A Petabyte Database for as Little as $200 (BDT307) | AWS re:Invent 2013

Stability Tips

• Ext4 or XFS, nodiratime, noatime • Raise file descriptor limits • Set disk read-ahead • No large virtual memory pages • SNAPSHOT SNAPSHOT SNAPSHOT

Page 10: PetaMongo: A Petabyte Database for as Little as $200 (BDT307) | AWS re:Invent 2013

• Retain a PIOPS EBS node for snapshot backups

• Snapshots allow cross-AZ and cross-region recovery

• SSD hosts as primary

• Shard for scale

Page 11: PetaMongo: A Petabyte Database for as Little as $200 (BDT307) | AWS re:Invent 2013

244gb cr1.8xlarge Another option…

Page 12: PetaMongo: A Petabyte Database for as Little as $200 (BDT307) | AWS re:Invent 2013

So, about that Petabyte v.cheap • Spot Market • m1.small • 1024 shards • 1TB EBS from snapshot • PowerBench reader • Aggregation queries

v.fast • Auto Scaling On-Demand • m2.4xlarge • 50 shards • 20TB PIOPS indexed • PowerBench loader • Aggregation queries

Page 13: PetaMongo: A Petabyte Database for as Little as $200 (BDT307) | AWS re:Invent 2013

The naming of parts Amazon Terms • Provisioned IOPS • Elastic Compute Cloud • EC2 Spot Instances • Auto Scaling groups

Nicks • PIOPS • EC2 • Here, Spot! • ASG

Page 14: PetaMongo: A Petabyte Database for as Little as $200 (BDT307) | AWS re:Invent 2013

Players

Page 15: PetaMongo: A Petabyte Database for as Little as $200 (BDT307) | AWS re:Invent 2013

MongoDB • Document-model,

NoSQL database

• Dev adoption is STRONG

• MongoDB Inc. trending toward zero h/w

• Scale-up with commodity h/w • Scale-out with sharding • Scale-around with replication

Page 16: PetaMongo: A Petabyte Database for as Little as $200 (BDT307) | AWS re:Invent 2013

AWS • PIOPS for an IO-hungry client • 40% of MongoDB customer usage • 90% of MongoDB internal usage • More ports :2701[79] than :[15]521

Page 17: PetaMongo: A Petabyte Database for as Little as $200 (BDT307) | AWS re:Invent 2013

PB & Chocolate Differentiators for mutual customers

• Fast time-to-solution • Easy global distribution • Secondary index • Geo, text, security • Fast analytic aggregation

Page 18: PetaMongo: A Petabyte Database for as Little as $200 (BDT307) | AWS re:Invent 2013

Challenge

Page 19: PetaMongo: A Petabyte Database for as Little as $200 (BDT307) | AWS re:Invent 2013

Motivation: IWBCI…

• Test scale-out of MongoDB beyond typical • Learn massive scale-out on AWS • Do it as cheaply as possible • Apply customer data • Break the petabarrier

Page 20: PetaMongo: A Petabyte Database for as Little as $200 (BDT307) | AWS re:Invent 2013

m1.small us-east1 Spot Market

Page 21: PetaMongo: A Petabyte Database for as Little as $200 (BDT307) | AWS re:Invent 2013

m1.small us-east1d Spot Market

Page 22: PetaMongo: A Petabyte Database for as Little as $200 (BDT307) | AWS re:Invent 2013

Proposal Item Units Time Unit Cost Net Cost m1.small Spot 1050 3hr $0.007/hr $22.05 m1.large 3 48hrs $0.056/hr $8.07 S3 1TB 1wk $95/TB/mo 23.75 EBS 1024 x 1TB 1hr $100/TB/mo 142.22 S3 EBS 1PB ?? $0/TB 0.00 Total $196.09

http://calculator.s3.amazonaws.com/G77798SS77SH72

Page 23: PetaMongo: A Petabyte Database for as Little as $200 (BDT307) | AWS re:Invent 2013

Initial Directions

• Spot Instance requests – m1.small market, mostly us-east-1 (my zone “d”) – Net: $0.007 / hour = $7 / hr / K-shard

• Perl – use Net::Amazon::EC2; – gaps: parse EC2 command-line API

• Defer Chef, Puppet, AWS CloudFormation • YCSB • userdata.sh • t1.micro / m1.small / cr1.8xlarge

Page 24: PetaMongo: A Petabyte Database for as Little as $200 (BDT307) | AWS re:Invent 2013

MongoDB Architecture • 3x Config Servers

– mongod --configsvr

• Routing – mongos --configdb a,b,c

• Replica sets (not used) • Shards

– mongod

• Client load – java -cp [] com.yahoo.ycsb.Client

Page 25: PetaMongo: A Petabyte Database for as Little as $200 (BDT307) | AWS re:Invent 2013
Page 26: PetaMongo: A Petabyte Database for as Little as $200 (BDT307) | AWS re:Invent 2013

Range-based sharding

Page 27: PetaMongo: A Petabyte Database for as Little as $200 (BDT307) | AWS re:Invent 2013

Hash-based sharding

Page 28: PetaMongo: A Petabyte Database for as Little as $200 (BDT307) | AWS re:Invent 2013

Process Flow Spot Instance Request (sir-) • rejected • Awaiting evaluation • Awaiting fulfillment

– Partial – Launch intervals

• Fulfilled

Instances (i-) • Requested • Initializing (i) • Config running (C) • MongoS starting (s) • MongoS running (S) • MongoD starting (D) • Failed/slow response (X)

Page 29: PetaMongo: A Petabyte Database for as Little as $200 (BDT307) | AWS re:Invent 2013

Spot Instance Lifecycle

sir-

Config Sharded

MongoD Shard

MongoS

Page 30: PetaMongo: A Petabyte Database for as Little as $200 (BDT307) | AWS re:Invent 2013

Progress

Page 31: PetaMongo: A Petabyte Database for as Little as $200 (BDT307) | AWS re:Invent 2013

Scaling Experience • 4, 16, 64, 256, 1024 • 4: minimum magnitude for 3x Config • 16: startup variation, process flow • 64: full speed ahead! • 256: chunk distribution time • 1024: market dependence, client wire saturation

Page 32: PetaMongo: A Petabyte Database for as Little as $200 (BDT307) | AWS re:Invent 2013

Lessons Learned • Code defensively • Monitor: MongoDB Mgt Svc, top, iftop,

mongostat • Avoid sentimental attachment • Prototype / refactor • Make the instances do the work • Mitigate chunk migration

Page 33: PetaMongo: A Petabyte Database for as Little as $200 (BDT307) | AWS re:Invent 2013

Refactor • BenchPress YCSB • Auto Scaling groups request-spot-instances • use VM::EC2; Net::Amazon::EC2 • gsh monolithic Perl • serf polling

Page 34: PetaMongo: A Petabyte Database for as Little as $200 (BDT307) | AWS re:Invent 2013

Dee-Luxe

Page 35: PetaMongo: A Petabyte Database for as Little as $200 (BDT307) | AWS re:Invent 2013

0

200000000

400000000

600000000

800000000

1E+09

1.2E+09

1.4E+09

1.6E+09

1.8E+09

5:16:48 5:45:36 6:14:24 6:43:12 7:12:00 7:40:48

Docs Loaded, 512 shards

^ 1X RAM

Page 36: PetaMongo: A Petabyte Database for as Little as $200 (BDT307) | AWS re:Invent 2013

Further Work • Replication • Self-healing • MongoDB-appropriate benchmarks • Customer data • Self-hosting cluster

Page 37: PetaMongo: A Petabyte Database for as Little as $200 (BDT307) | AWS re:Invent 2013

Please give us your feedback on this presentation

As a thank you, we will select prize winners daily for completed surveys!

BDT307