Media Content Ingest, Storage, and Archiving with AWS (MED301) | AWS re:Invent 2013

49
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc. Media Content Ingest, Storage, and Archiving with AWS MED301 John Downey, Amazon Web Services, Business Development Manger - Storage November 13, 2013

description

The first step in a successful cloud-based media workflow is getting the content transferred and stored. From there you can achieve massive efficiencies for downstream processing and delivery via content access instead of content transfer. In this session you'll learn about best practices for ingesting content to the cloud; relevant AWS partners within the media ecosystem; how to use storage tiers based on the business value of your assets; and how to eliminate tape, tape museums, and tech refresh within your long term archive strategy; and ultimately how to remonetize archived assets.

Transcript of Media Content Ingest, Storage, and Archiving with AWS (MED301) | AWS re:Invent 2013

Page 1: Media Content Ingest, Storage, and Archiving with AWS (MED301) | AWS re:Invent 2013

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.

Media Content Ingest, Storage, and Archiving

with AWS – MED301

John Downey, Amazon Web Services, Business Development Manger - Storage

November 13, 2013

Page 2: Media Content Ingest, Storage, and Archiving with AWS (MED301) | AWS re:Invent 2013

Agenda – Content Ingest, Storage, and Archiving

• AWS components – Ingest

– Storage

– Archive

• Partner components – Ingest

– Storage

– Archive

• TCO / ROI considerations

Page 3: Media Content Ingest, Storage, and Archiving with AWS (MED301) | AWS re:Invent 2013

AWS Global Infrastructure

9 Regions

25 Availability Zones

46 Edge locations

Page 4: Media Content Ingest, Storage, and Archiving with AWS (MED301) | AWS re:Invent 2013

AWS Regions and Availability Zones

Customer decides where applications and data reside

Asia Pacific (Tokyo) US West (Oregon) EU (Ireland) US East (N. Virginia)

US West (N. Cal) (Asia Pacific) Singapore

AWS GovCloud (US) South America (Sao Paulo) Asia Pacific (Sydney)

Availability

Zone

Availability

Zone

Availability

Zone

Availability

Zone Availability

Zone

Availability

Zone

Availability

Zone Availability

Zone

Availability

Zone

Availability

Zone

Availability

Zone

Availability

Zone

Availability

Zone

Availability

Zone

Availability

Zone

Availability

Zone Availability

Zone

Availability

Zone

Availability

Zone

Availability

Zone

Availability

Zone

Availability

Zone

Availability

Zone

Page 5: Media Content Ingest, Storage, and Archiving with AWS (MED301) | AWS re:Invent 2013

AWS Ingest Options

AWS Direct Connect Dedicated bandwidth between

your site and AWS

AWS Storage Gateway On-premises storage federation with

Amazon S3 and Amazon Glacier

AWS Import/Export Physical transfer of media into and

out of AWS

Page 6: Media Content Ingest, Storage, and Archiving with AWS (MED301) | AWS re:Invent 2013

AWS Ingest Options – One Common Theme: Parallel Uploads

1. Multipart upload

2. Request rate optimization

3. TCP window scaling

4. TCP selective

acknowledgement

AWS has customers that ingest roughly 1 PB per day

Page 7: Media Content Ingest, Storage, and Archiving with AWS (MED301) | AWS re:Invent 2013

AWS Ingest Options AWS Direct Connect

• Reduces costs for bandwidth-

heavy workloads

• Private connectivity to AWS – Physical connection – 1 Gbps or 10

Gbps port

– Logical connections (802.1q

VLANs)

Public: To AWS cloud (Amazon EC2,

Amazon S3 etc.)

Private: To VPCs

• Consistent network performance

• Compatible with all AWS services

Page 8: Media Content Ingest, Storage, and Archiving with AWS (MED301) | AWS re:Invent 2013

AWS Ingest Options AWS Direct Connect

Cost • 1 Gbps port = $0.30/hour

• 10 Gbps port = $2.25/hour

• Data transfer IN = $0

• Data transfer OUT = $0.02 – 0.11 per GB depending upon location

– Can be a significant savings vs. Internet bandwidth out

Locations

• CoreSite 32 Avenue of the Americas, NY

• CoreSite One Wilshire & 900 North Alameda, LA

• Equinix DC1 – DC6 & DC10 - DC11, Ashburn, VA

• Equinix SV1 & SV5, San Jose, CA

• Equinix SE2 & SE3, Seattle, WA

• Equinix SG2, Singapore

• Equinix SY3, Sydney

• Equinix TY2, Tokyo

• Eircom, Clonshaugh

• TelecityGroup Docklands, London

• Terremark NAP do Brasil, Sao Paulo

Page 9: Media Content Ingest, Storage, and Archiving with AWS (MED301) | AWS re:Invent 2013

AWS Ingest Options AWS Import/Export

• Rapidly move data into and out

of AWS

• Portable storage device

shipment to AWS – eSATA

– USB 2.0 and 3.0 (including USB flash

drives)

– 2.5 and 3.5 inch internal SATA hard drives

• Supports – Amazon Elastic Block Store (EBS)

– Amazon Simple Storage Service (S3)

– Amazon Glacier

• Use cases – Initial content migration

– Content distribution via portable devices

– Disaster recovery

Page 10: Media Content Ingest, Storage, and Archiving with AWS (MED301) | AWS re:Invent 2013

AWS Ingest Options AWS Import/Export

• Cost – $80 per storage device handled

– $2.49 per data loading hour

– Standard pricing for • Amazon S3

• Amazon EBS

• Amazon Glacier

Page 11: Media Content Ingest, Storage, and Archiving with AWS (MED301) | AWS re:Invent 2013

AWS Ingest Options AWS Storage Gateway

• On-premises, virtual iSCSI

storage appliance

• Local cache enables low

latency access to data – Gateway – stored volumes

– Gateway – cached volumes

• Copies data in the form of

Amazon EBS snapshots to

Amazon S3

• Leverage Amazon S3 server-

side encryption

• Recent patch results in up to

5 TB of throughput per day

• Recover to Amazon EBS /

Amazon EC2

Page 12: Media Content Ingest, Storage, and Archiving with AWS (MED301) | AWS re:Invent 2013

AWS Ingest Options AWS Storage Gateway

• Cost (N. Virginia – varies per region)

– Gateway pricing • $125 per activated gateway/mo.

– Volume pricing • $0.095 per GB per month of data stored

– Snapshot pricing • $0.095 per GB per month of data stored

– Tiered data transfer pricing

model • Free inbound

• $0.12 - $0.05 per GB outbound

depending on tier

Page 13: Media Content Ingest, Storage, and Archiving with AWS (MED301) | AWS re:Invent 2013

AWS Ingest Options Gateway-Virtual Tape Library (Gateway VTL)

• On-premises, virtual tape library

storage appliance

• 10 virtual tape drives / 1500

virtual tape slots

• 150 TB local cache – VTL – virtual tape library

• Restore in seconds from VTL

– VTS – virtual tape shelf • 24 hour retrieval from VTS

• Encryption in transit and at rest

• Gateway VTL-AMI

• In lab we achieved 55 MB/s

upload throughput and 90 MB/s

iscsi ingest rate per gateway

Page 14: Media Content Ingest, Storage, and Archiving with AWS (MED301) | AWS re:Invent 2013

AWS Ingest Options Gateway-Virtual Tape Library (Gateway VTL)

• Cost (N. Virginia – varies per region)

– Gateway pricing • $125 per activated gateway/mo.

– Virtual tape shelf storage • $0.01 per GB per month of data stored

– Virtual tape library storage • $0.095 per GB per month of data stored

– Retrieval from virtual tape shelf • $0.30 per GB

– Virtual tape deletes • Free

Page 15: Media Content Ingest, Storage, and Archiving with AWS (MED301) | AWS re:Invent 2013

AWS Storage and Archive Options

Amazon Simple Storage Service (S3) Highly scalable object storage

1 byte to 5 TB in size

99.999999999% durability

Amazon Elastic Block Store (EBS) High-performance block storage device

1 GB to 1 TB in size

Mount as drives to instances with

snapshot/cloning functionalities

Amazon Glacier Long-term object archive

Extremely low cost per gigabyte

99.999999999% durability

Page 16: Media Content Ingest, Storage, and Archiving with AWS (MED301) | AWS re:Invent 2013

AWS Storage Options

Amazon Elastic Block Store (EBS)

• High I/O block storage for Amazon EC2

• Predictably scale to 1000s of IOPS per

Amazon EC2 instance

• Automatic replication within the Availability

Zone

• 10x more reliable than commodity disk drives

• Point-in-time snapshots • Amazon S3 durability (11-9s)

• Point-in-time snapshots across regions

• Amazon CloudWatch • Exposes Amazon EBS performance metrics

Page 17: Media Content Ingest, Storage, and Archiving with AWS (MED301) | AWS re:Invent 2013

AWS Storage Options Amazon Elastic Block Store (EBS)

Costs (US East) Amazon EBS standard volumes

$0.10 per GB-month of provisioned storage

$0.10 per 1 million I/O requests

Amazon EBS provisioned IOPS volumes

$0.125 per GB-month of provisioned storage

$0.10 per provisioned IOPS-month

Amazon EBS snapshots to Amazon S3

$0.095 per GB-month of data stored

Page 18: Media Content Ingest, Storage, and Archiving with AWS (MED301) | AWS re:Invent 2013

AWS Storage Options Amazon Simple Storage Service (S3)

• Synchronous in and synchronous out

object storage

• Designed for 99.999999999% durability

• Authentication mechanisms ensure data

is kept secure

• Multiple encryption options

– Amazon server-side encryption

• Standard storage

• Reduced redundancy storage (RRS)

Page 19: Media Content Ingest, Storage, and Archiving with AWS (MED301) | AWS re:Invent 2013

AWS Storage Options Amazon S3: Over 2 Trillion Total Objects

Page 20: Media Content Ingest, Storage, and Archiving with AWS (MED301) | AWS re:Invent 2013

AWS Storage Options Amazon Simple Storage Service (S3)

Costs (US East)

Standard

Storage

Reduced Redundancy

Storage

First 1 TB / month $0.095 per GB $0.076 per GB

Next 49 TB / month $0.080 per GB $0.064 per GB

Next 450 TB / month $0.070 per GB $0.056 per GB

Next 500 TB / month $0.065 per GB $0.052 per GB

Next 4000 TB / month $0.060 per GB $0.048 per GB

Over 5000 TB / month $0.055 per GB $0.037 per GB

Page 21: Media Content Ingest, Storage, and Archiving with AWS (MED301) | AWS re:Invent 2013

AWS Archive Options

Amazon Glacier

• $0.01 - GB per month

• Retrievals:

– 5% of monthly average storage (pro-rated daily) free

• Synchronous in

• 3–5 hour asynchronous retrieval

• Designed for 99.999999999% durability

• AES 256 encryption at rest

• Highly scalable

• Reliable

• Authentication mechanisms ensure data is kept secure

Page 22: Media Content Ingest, Storage, and Archiving with AWS (MED301) | AWS re:Invent 2013

AWS Archive Options Object Lifecycle Management: Amazon S3 → Amazon Glacier

• Seamlessly move data from Amazon S3 → Amazon Glacier

• 3-5 hour asynchronous retrieval

• Data lifecycle policies

• $0.01 per GB for Amazon Glacier costs

Page 23: Media Content Ingest, Storage, and Archiving with AWS (MED301) | AWS re:Invent 2013

Partner Ingest Options

Aspera Up to 1 Gb/s per instance to AWS

Signiant High-speed, network-efficient file transfer –

up to 200X faster than FTP with 95+%

network efficiency

CloudBeam SaaS-based file transfer into and out

of AWS

Page 24: Media Content Ingest, Storage, and Archiving with AWS (MED301) | AWS re:Invent 2013

Partner Ingest Options Aspera On-Demand

• Achieve file transfer speeds that are 1000s of times faster the FTP

• In, out, and across the cloud with enterprise-grade security

• End-to-end security

• Speeds of up to 1 Gbps per AWS instance

• 10 TB per 24 hours

• Scale-out architecture

• Web, mobile, embedded clients

Page 25: Media Content Ingest, Storage, and Archiving with AWS (MED301) | AWS re:Invent 2013

Partner Ingest Options Attunity CloudBeam

• Simplifies, automates, and

accelerates the loading and

replication of files from on-

premises, heterogeneous

sources to and from

Amazon S3

• Common Use Cases: – Content availability and distribution

– Data analysis (Amazon EMR Hadoop)

– Backup, disaster recovery, and archiving

– Region-to-region replication

Page 26: Media Content Ingest, Storage, and Archiving with AWS (MED301) | AWS re:Invent 2013

• AWS-based fast file transfer

as a service

• 200X faster than FTP

• Separates control layer from

the data layer

• Multiple sources and targets

including Amazon S3

• Firewall-friendly transfers

with autoselecting UDP,

TCP, and HTTP transport

options

NAS (CIFS, NFS)

DAS / SAN

Partner Ingest Options Signiant Media Shuttle

Page 27: Media Content Ingest, Storage, and Archiving with AWS (MED301) | AWS re:Invent 2013

Partner Ingest Options Cycle Computing DataManager

• Move data from any NAS / file

system to Amazon S3 and/or

Amazon Glacier

• Clean up expensive, on-premises

disk

• Maintain full access to all content

• Reduce or eliminate future data

migrations upon hardware refresh

Page 28: Media Content Ingest, Storage, and Archiving with AWS (MED301) | AWS re:Invent 2013

Partner Storage and Archive Solutions

Avere Systems Record performance, scale-out,

single file system NAS

Panzura Cloud-integrated local NAS capabilities for

the globally distributed enterprise

Page 29: Media Content Ingest, Storage, and Archiving with AWS (MED301) | AWS re:Invent 2013

Partner Storage and Archive Solutions Cloud Storage Gateway Solutions

Page 30: Media Content Ingest, Storage, and Archiving with AWS (MED301) | AWS re:Invent 2013

Partner Storage Options Example: Cloud Storage Gateway – Global Namespace NAS

Page 31: Media Content Ingest, Storage, and Archiving with AWS (MED301) | AWS re:Invent 2013

Partner Storage Options Avere Systems – Comparing 1,000,000 IOPS Solutions

• Add high-performance, scale-out clustering with any NAS

• Automated tiering

• Separates performance scaling from capacity

• Avere offers the leading $ per IOPS for NAS – $2.3/IOPS

• 80% less total equipment than traditional NAS systems

• Fastest scale-out, single file system (NAS) available

• Linear scaling to millions of operations/sec

• Tens of GB/sec of throughput

150ms

Avere

$2.3 / IOPS

Cloud

Latency

Page 32: Media Content Ingest, Storage, and Archiving with AWS (MED301) | AWS re:Invent 2013

Partner Storage Options Avere Systems

• Amazon S3 integration by EOY 2013 – 3-step process:

1. Leverage Avere to accelerate current NAS System

2. Nondisruptive migration to Amazon S3 / Amazon Glacier – FlashMove

3. Switch mode in Avere to enable primary NAS operations

– Retire older NAS gear

Avere FXT Edge Filer

Purpose-built for cloud

Enterprise-class scaling

Lowest TCO

Compute

Farm

Client

Workstations

Legacy NAS Show as complex w/

RAID, volume limits, low

utilization, mirror

schedules, etc.

Core Filers

WAN

On Premise AWS

Amazon

S3

Amazon

Glacier

Page 33: Media Content Ingest, Storage, and Archiving with AWS (MED301) | AWS re:Invent 2013

Partner Storage Options Panzura

• Panzura enables:

• Global sharing – On-premises, hybrid, across AWS regions

• Panzura Amazon Machine Image (AMI)

• Small physical footprint

• Separation of data and metadata

• Data protection

• NAS centralization

• Shift ratio of Opex vs. Capex

Page 34: Media Content Ingest, Storage, and Archiving with AWS (MED301) | AWS re:Invent 2013

TCO: On-Premises Cost Considerations

1. Primary storage hardware (primary / remote site)

2. DR / Remote site storage hardware

3. Raw to utilized storage (both primary and DR)

4. Storage growth (cost of upgrades)

5. Storage management software and 3rd party tools

6. Professional services

7. Hardware maintenance

8. Software maintenance

9. Backup software

10.Backup hardware (primary / remote site)

11.Offsite tape storage / vault

12.Archive software

13.Archive hardware

14.Power

15.Cooling

16.Space

17.Labor

18.Cost of capital

19.Training

20.Asset depreciation

21.Migration

22.Decommission / remove

23.Recycle

Page 35: Media Content Ingest, Storage, and Archiving with AWS (MED301) | AWS re:Invent 2013

Summary

AWS ingest, storage, and archive solutions:

• AWS Import/Export + Amazon S3, Amazon EBS, Amazon Glacier

• AWS Storage Gateway + Amazon S3

• AWS VTL + Amazon S3 + Amazon Glacier

Partner-based ingest solutions:

• Aspera on-demand solution + Amazon S3

• Attunity + Amazon S3

• Signiant Media Shuttle + Amazon S3

• Cycle Computing’s DataManager + Amazon S3 + Amazon Glacier

Partner-based storage / archive solutions:

• Avere Systems + Amazon S3 and Amazon Glacier

• Panzura + Amazon S3 and Amazon Glacier

Page 36: Media Content Ingest, Storage, and Archiving with AWS (MED301) | AWS re:Invent 2013

Thank you!

John Downey

[email protected]

646.276.1635

Page 37: Media Content Ingest, Storage, and Archiving with AWS (MED301) | AWS re:Invent 2013

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.

Glacier at iN DEMAND

Michael Raposa, iN DEMAND

[email protected]

Page 38: Media Content Ingest, Storage, and Archiving with AWS (MED301) | AWS re:Invent 2013

iN DEMAND Intro

• World leader in providing transactional

entertainment delivered through television

• Joint Venture – owned by Comcast, Time

Warner, & Cox

• Pay-per-view programming – MLB, NBA, NHL,

boxing, MMA, & Howard Stern

Page 39: Media Content Ingest, Storage, and Archiving with AWS (MED301) | AWS re:Invent 2013

The Problem

• 1.5 PB video archive – World War Z

– Ice Pirates

– Titanic II … Tsunami AND an Iceberg

• Tape storage – Tape corruption and bit rot

– Lost tapes

– Physical storage – 1.5 PB is a lot of tapes

– Legacy tape formats – LTO-1, 2, 3, 4, 5, etc. etc.

Page 40: Media Content Ingest, Storage, and Archiving with AWS (MED301) | AWS re:Invent 2013

The Problem (cont.)

• Manual asset tracking – Typical backup system stores file name, date, size

– Important metadata is tracked separately, e.g. bit rate, aspect

ratio, closed captioning, dual language, codec.

– Inventory issues –What bit rates do we have for Spider Man?

– Multiple storage – “Put it on tape just to be sure”

• Manual archive and restore – Wait for operator to handle restores – not 24x7

Page 41: Media Content Ingest, Storage, and Archiving with AWS (MED301) | AWS re:Invent 2013

The Problem (cont.)

• Expensive – Tape operator

– Tape storage

– Yearly tape library maintenance

• Limited scale – Limited by tape library capacity

– Limited by physical space

Page 42: Media Content Ingest, Storage, and Archiving with AWS (MED301) | AWS re:Invent 2013

The Solution – Mini-DAM

• Limited digital asset management system for

Glacier – Web UI

– Glacier storage

– $50 K

– Hosted at AWS – EC2, Amazon RDS, Amazon SNS,

Amazon SES

– Over 300 TB in Glacier to date

– Adding about 2 TB / day

Page 43: Media Content Ingest, Storage, and Archiving with AWS (MED301) | AWS re:Invent 2013
Page 44: Media Content Ingest, Storage, and Archiving with AWS (MED301) | AWS re:Invent 2013
Page 45: Media Content Ingest, Storage, and Archiving with AWS (MED301) | AWS re:Invent 2013
Page 46: Media Content Ingest, Storage, and Archiving with AWS (MED301) | AWS re:Invent 2013

Tips & Tricks

• Concurrent downloader required – Users want files FAST!!!

– .NET and JAVA AWS SDK have only a single-threaded downloader – MAX download c.a. 160 Mbps

– iN DEMAND wrote a multithreaded downloader

– Added to AWS SDK for Python (BOTO) – MAX. download 600 Mbps

• Per archive Glacier overhead – Every Glacier archive has a 32 kb overhead for metadata

– You are charged for this overhead

– For small files that 32 kb starts to add up

– Zip up small files before uploading

Page 47: Media Content Ingest, Storage, and Archiving with AWS (MED301) | AWS re:Invent 2013

Tips & Tricks

• Download request time outs – 24 hours to download archive

– Queue up requests to ensure that files are downloaded within the 24-hour timeout

• Add the extra encryption to make management happy – The MPAA loves encryption

– Management loves encryption

– AWS automatically encrypts files at rest in Glacier

Page 48: Media Content Ingest, Storage, and Archiving with AWS (MED301) | AWS re:Invent 2013

Tips & Tricks

• Checksum files before you upload – Save MD5 checksum

– Check that file hasn’t already been uploaded to Glacier

– Avoid file duplication

• Track who requests downloads to manage costs – Fee associated with each download

– Keep employees honest

Page 49: Media Content Ingest, Storage, and Archiving with AWS (MED301) | AWS re:Invent 2013

Please give us your feedback on this

presentation

As a thank you, we will select prize

winners daily for completed surveys!

MED301