S3 & Glacier - The only backup solution you'll ever need

31
S3 & Glacier Amazon S3 & Glacier - the only backup solution you’ll ever need Matthew Boeckman, VP - DevOps, Craftsy @matthewboeckman

description

Amazon provides an opportunity to scale backups infinitely, and to manage your backup and restore entirely in simple code or scripts. At Craftsy, we have a backup problem solved entirely with S3 & Glacier.

Transcript of S3 & Glacier - The only backup solution you'll ever need

Page 1: S3 & Glacier - The only backup solution you'll ever need

S3 & Glacier

Amazon S3 & Glacier - the only backup solution you’ll ever need

Matthew Boeckman, VP - DevOps, Craftsy@matthewboeckman

http://enginerds.craftsy.com http://devops.com/author/matthewboeckman

Page 2: S3 & Glacier - The only backup solution you'll ever need

Who is Craftsy

● Instructor led training videos for passionate hobbyists● #19 on Forbes’ Most Promising Companies 2014

Page 3: S3 & Glacier - The only backup solution you'll ever need

Why do backups matter?

… they don’t, until they do.

At small scale, backup can and is as easy as icloud, or google drive, or github.

As you grow, the situation gets more complex.

In a content creation business, backups are vital.

Page 4: S3 & Glacier - The only backup solution you'll ever need

RESTORE

Page 5: S3 & Glacier - The only backup solution you'll ever need

Define your exposure

daily backups - protect against accidental deletion (users) and system corruption/crash

DR - your building catches on fire. an employee goes rogue. the SAN suffers a total existence failure.

permanent archive - data at rest, legal/business permanence

Page 6: S3 & Glacier - The only backup solution you'll ever need

what are my options

LTO - hope you like lifecycle hardware refreshes! hope your data rests on a 1.5 or 3tb boundry. hope you enjoy swapping tapes… 4-17year durability

~$.008/GB** does not include operational costs **

DISK - good luck offsiting, hope you enjoy hardware maintenance, hope you’ve got a big checkbook

~$5/GB** does not include operational costs **

Page 7: S3 & Glacier - The only backup solution you'll ever need

hobo as a service

● not scalable● expensive● not addressable in code● requires software

systems

welcome to system adminsitrationville, population: you

Page 8: S3 & Glacier - The only backup solution you'll ever need

why did we look for alternatives?

Page 9: S3 & Glacier - The only backup solution you'll ever need

What are the alternatives?

s3

Page 10: S3 & Glacier - The only backup solution you'll ever need

why s3?

● $0.03/GB● 99.999999999% durable● easy interface with glacier● addressable with code

“If you store 10,000 objects with us, on average we may lose one of them every 10 million years or so. This storage is designed in such a way that we can sustain the concurrent loss of data in two separate storage facilities.”

Page 11: S3 & Glacier - The only backup solution you'll ever need

except...

Page 12: S3 & Glacier - The only backup solution you'll ever need

this is a busy space

Page 13: S3 & Glacier - The only backup solution you'll ever need

free tools, one thread, small files

http://s3tools.org/s3cmd ● great sync support● supports multipart● Does not support multithread● tons of other features (du, cp, rm, setpolicy, cloudfront…)

https://github.com/boto ● the aws python library● includes basic cli s3put, great reference point

great, but not fast

Page 14: S3 & Glacier - The only backup solution you'll ever need

free tools, multipass, big files

https://github.com/mumrah/s3-multipart ● zomg multithreaded, multipart up and down● saturated 100M pipe for a single file transfer● does not handle small files

https://github.com/aws/aws-cli ● zomg multithreaded, multipart up and down● saturated 500M pipe for a single file transfer● perfect in every imaginable way

Page 15: S3 & Glacier - The only backup solution you'll ever need

problems

approach single archive (tar)

lots o files, some big some small

le good ● fast transfer● simple● no filename

pain

● restore & navigation granularity

le bad ● restore all● can’t nav your

backup

● processing overhead

● users name things...

Page 16: S3 & Glacier - The only backup solution you'll ever need

our first backup script ... testy.sh

*shitty speed, restore costs, let’s just get to the next slide

Page 17: S3 & Glacier - The only backup solution you'll ever need

to mp or not to mp...

size=$(stat -f %z "$x")

if [ "$size" -gt "100000000" ]; then

s3-mp-upload.py ${S3MPT_OW} -np 10 -s 100 "$x" "s3://$

{BUCKET}/$CID/${x}" >> $LOG 2>&1

else

s3put ${S3PUT_OW} -s $AWS_ACCESS_SECRET -a

$AWS_ACCESS_KEY -b $BUCKET -p "$CPATH" "$x" >> $LOG

2>&1

fi

Page 18: S3 & Glacier - The only backup solution you'll ever need

aws s3 cli

aws s3 sync "$CID" "s3://${BUCKET}/${CID}" --delete

aws s3 cp "$SRC" "s3://${BUCKET}/${CID}" --recursive

simple, multipart, multithreaded and saturates 500mbps both ways

Page 19: S3 & Glacier - The only backup solution you'll ever need

hooray! now what

Page 20: S3 & Glacier - The only backup solution you'll ever need

why glacier?

● $0.01/GB● durable like s3● data sent to glacier via s3 policies cannot be

accidentally deleted

Page 21: S3 & Glacier - The only backup solution you'll ever need

s3->glacier

Page 22: S3 & Glacier - The only backup solution you'll ever need

prefix, delete

Page 23: S3 & Glacier - The only backup solution you'll ever need

restore from glacier

Page 24: S3 & Glacier - The only backup solution you'll ever need

restore, our UI

Page 25: S3 & Glacier - The only backup solution you'll ever need

restore from glacier as code

foreach ($objects as $flerbs) { $key = $flerbs->Key; $restore = $s3->restore_archived_object ($bucket, $key, $days ); //var_dump($restore); $http_code = $restore->status; $xml_resp = $restore->body; $message = $xml_resp->Message; $header = $restore->header; $req_id = $header['x-amz-request-id']; $amz_id2 = $header['x-amz-id-2']; $req_date = $header['date'];}

Page 26: S3 & Glacier - The only backup solution you'll ever need

oooh, a progress gui

Page 27: S3 & Glacier - The only backup solution you'll ever need

restore status as code

$header = $s3->get_object_headers($bucket, $key)->header; $restore = explode(',', $header['x-amz-restore']);$progress = strpos($restore[0], 'true');…if ($sclass == 'GLACIER') { if ($header['x-amz-restore'] == '') { $style = '"background-color: #0ce3ff;"'; // not restored! $note = ''; } elseif ($progress != false) { $style = '"background-color: #ffa391;"'; // restore in progress! $note = 'Restoration in progress'; } else { $style = '"background-color: #95fed0;"'; // restored! $note = "Restored until $expirydate"; }

Page 28: S3 & Glacier - The only backup solution you'll ever need

versioning, another approach

● versioning operates on the bucket level● each version of an object counts as storage ● versioning happens automatically with any PUT● deletion of an object does not explicitly delete versions (!!)

GET /my-image.jpg?versionId=L4kqtJlcpXroDTDmpUMLUo

GET /my-image.jpg?versions HTTP/1.1

DELETE /my-image.jpg?versionId=UIORUnfnd89493jJFJ

Page 29: S3 & Glacier - The only backup solution you'll ever need

restoring a version

Restore to the most recent version?

1. delete the current version!

Restore to an older version?

2. GET the desired version

3. PUT it back on top of the object you’re restoring

Page 30: S3 & Glacier - The only backup solution you'll ever need

where does Craftsy live?

1. Bi-weeklies of ACTIVE content to a non-versioned, non-glaciered bucket (~200TB total, +2TB/wk)

2. Weekly new finished content to glacier-backed s3 bucket (~500TB, +45TB/mo)

3. All code artifacts (dev, staging, prod) to a versioned s3 bucket

4. Postgres backup segments to an non-glaciered bucket

most important, we do all of it in code. we don’t shuffle tapes, upgrade hardware, or system administrate!

Page 31: S3 & Glacier - The only backup solution you'll ever need

Thank you

QUESTIONS!

Matthew Boeckman

@matthewboeckman

http://enginerds.craftsy.com

(deck will be there & slideshare)http://devops.com/author/matthewboeckman