AWS re:Invent 2016: Workshop: AWS S3 Deep-Dive Hands-On Workshop: Deploying and Managing a Global,...
-
Upload
amazon-web-services -
Category
Technology
-
view
447 -
download
2
Transcript of AWS re:Invent 2016: Workshop: AWS S3 Deep-Dive Hands-On Workshop: Deploying and Managing a Global,...
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Omair Gillani, S3 Product Manager, AWS
Lee Kear, Storage Specialist Solutions Architect, AWS
Jason Gluckman, Lead Software Engineer, Ring
Peter Levett, Storage Specialist Solutions Architect, AWS
Angela Wang, Solutions Architect, AWS
November 30, 2016
STG313
Amazon S3 Deep-Dive Hands-On Workshop:
Deploying and Managing a Global, Petabyte
Scale Storage Infrastructure
What to Expect from the SessionWhat to Expect from the Session
• How does a workshop differ from other sessions?
• S3 new features
• How we think about storage management for S3
• Storage Management Portfolio for S3
• Understand your data
• Discover your data
• Manage your data
• Pulling it all together
• Key naming schemes
• Group activity
How does a workshop differ from other sessions
Learn from AWS
45 minutes of lecture
Learn from each other
Group learning activity
New S3 Features
2012 2013 2014
Amazon storage usage
Trillions of objects
Millions of requests per second
Choice of storage classes on S3
Standard
Active data Archive dataInfrequently accessed data
Standard - Infrequent Access Amazon Glacier
File sync and share
+
consumer file
storage
Backup and archive +
disaster recovery
Long-retained
data
Use cases for Standard-Infrequent Access
Designed for 11 9s of
durability
Standard - Infrequent Access storage
Designed for
99.9% availability
Durable AvailableSame as Standard storage
High performance
• Bucket policies
• AWS Identity and Access
Management (IAM) policies
• Many encryption options
Secure
• Lifecycle management
• Versioning
• Event notifications
• Metrics
Integrated
• No impact on user
experience
• Simple REST API
Easy to use
- Directly PUT to Standard - IA
- Transition Standard to Standard - IA
- Transition Standard - IA to Amazon Glacier storage
- Expiration lifecycle policy
- Versioning support
Standard - Infrequent Access storageIntegrated: Lifecycle management
Standard - Infrequent Access
A comprehensive storage management
portfolio for S3
Storage Management for S3
Cross-Region
Replication Lifecycle
Policy
Data
Classification
& Management
Event
Notifications
S3 CloudWatch Metrics S3 Inventory Audit with object level
AWS CloudTrail Data Events
S3 Analytics
Standard Standard - Infrequent Access Amazon Glacier
Understand your storage usage
S3 InventoryAnalyze Logs with
Amazon EMRS3 Analytics
S3 Inventory
Use case: trigger business workflows and applications such as secondary index garbage
collection, data auditing, and offline analytics
• More information about your objects than provided by LIST API such as replication
status, multipart upload flag, and delete marker
Save time Daily or Weekly delivery Delivery to S3 bucketCSV File Output
S3 Inventory
Eventually consistent rolling snapshot
• New objects may not be listed
• Removed objects may still be included
Name Value Type Description
Bucket String Bucket name. UTF-8 encoded.
Key String Object key name. UTF-8 encoded.
Version Id String Version Id of the object
Is Latest boolean true if object is the latest version (current version) of a versioned object, otherwise false
Delete Marker boolean true if object is a delete marker of a versioned object, otherwise false
Size long Object size in bytes
Last Modified String Last modified timestamp. Format in ISO: YYYY-MM-DDTHH:mm:ss.SSSZ
ETag String eTag in HEX encoded format
StorageClass String Valid values: STANDARD, REDUCED_REDUNDANCY, GLACIER, STANDARD_IA. UTF-8 encoded.
Multipart Uploaded boolean true if object is uploaded by using multipart, otherwise false
Replication Status String Valid values: REPLICA, COMPLETED, PENDING, FAILED. UTF-8 encoded.
Validate before you act!• Use HEAD OBJECT
S3 Analytics – Storage Class Analysis
Analyze buckets,
prefixes, or tags
$0.10 per million
objects analyzedStorage Class
Analysis
&
lifecycle
recommendation
Data-driven storage management for S3
Export Analysis to
your S3 bucket
S3 Analytics – Storage Class Analysis
Demo
S3 Analytics – Storage Class Analysis
S3 Analytics – Storage Class Analysis
Monitor your storage
Monitor and Alert with
Amazon CloudWatch
Audit your storage with
AWS CloudTrail
Server Access Logs
CloudWatch metrics for S3
Operational & Performance monitoring
• Generate metrics for data of your choice
• Entire bucket, prefixes, and tags
• Up to 1,000 object groups
• 1-minute CloudWatch metrics
• Alert and alarm on metrics
• Pay for what you use
CloudWatch metrics for S3
Price per metric• $0.30 per metric per month
Metric Name Metric value
AllRequests Count
PutRequests Count
GetRequests Count
ListRequests Count
DeleteRequests Count
HeadRequests Count
PostRequests Count
BytesDownloaded MB
BytesUploaded MB
4xxErrors Count
5xxErrors Count
FirstByteLatency ms
TotalRequestLatency ms
CloudTrail data events for S3
Use case: Perform security analysis, meet your IT auditing and
compliance needs
API logs for bucket and object-level requests
• Creation/deletion of buckets
• Changes to bucket configuration (bucket policy, lifecycle policies,
replication policies, etc.)
• SNS notification for log file delivery (optional)
Manage your data
Cross-Region
Replication
Lifecycle Policies Event
Notifications
S3 Object Tags
Manage your data
S3 Object Tags
Manage storage based on object tags
• Classify your data
• Tag your objects with key-value pairs
• Write policies once based on the type of data
AnalyzeLifecycle PolicyAccess Control
Deep dive on tags
• Tags are key value pairs
• Maximum 10 tags per object
• Maximum key length—127 Unicode characters
• Maximum value length—255 Unicode characters
• Tag keys and values are case sensitive
2 ways to put tags via API
• Put objects with tag parameter, or
• Add tag API after object is created
What can I do with tags?
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject"
],
"Resource": "arn:aws:s3:::EXAMPLE-BUCKET-NAME/*"
"Condition": {"StringEquals": {"S3:ResourceTag/HIPAA":"True"}}
}
]
}
Manage permissions with tags
Lifecycle policies based on tags
<LifecycleConfiguration>
<Rule>
<ID>sample-rule</ID>
<Filter>
<And>
<Prefix>documents/</Prefix>
<Tag>
<Key>Project</Key>
<Value>Delta</Value>
</Tag>
<Tag>
<Key>Data type</Key>
<Value>HPI</Value>
</Tag>
</And>
</Filter>
<Status>Enabled</Status>
<Transition>
<Days>365</Days>
<StorageClass>GLACIER</StorageClass>
</Transition>
<Expiration>
<Days>3650</Days>
</Expiration>
</Rule>
</LifecycleConfiguration>
Putting it all together
Storage Management for S3
Cross-Region
ReplicationLifecycle Policy S3 Object TagsEvent
Notifications
Monitor and Alert with
CloudWatch
S3 Inventory Audit with
CloudTrail Data
Events
S3 Analytics
S3 Performance at Scale
Getting high throughput performance with S3
• S3 can scale to many thousands of requests per second
• Need a good key naming scheme
• Only at scale do you need to consider your key naming
scheme
• What are Partitions?
• Why?
• Spread keys lexigraphically
• Goal of partitioning is too spread the heat
• Prevent HotSpots
my-bucket/2013_11_13-164533125.jpgmy-bucket/2013_11_13-164533126.jpgmy-bucket/2013_11_13-164533127.jpgmy-bucket/2013_11_13-164533128.jpgmy-bucket/2013_11_12-164533129.jpgmy-bucket/2013_11_12-164533130.jpgmy-bucket/2013_11_12-164533131.jpgmy-bucket/2013_11_12-164533132.jpgmy-bucket/2013_11_11-164533133.jpgmy-bucket/2013_11_11-164533134.jpgmy-bucket/2013_11_11-164533135.jpgmy-bucket/2013_11_11-164533136.jpg
Use a key-naming scheme with randomness at the beginning for high
TPS
• Most important if you will regularly exceed 100 TPS on a bucket
• Avoid starting with a date or monotonically increasing numbers
• Consider adding a hash or reversed timestamp (ssmmhhddmmyy)
Don’t do this…
How to design for high Request Rates to S3
Partition:
my-bucket/2013_11_1
Distributing key names
Add randomness to the beginning of the key name…
my-bucket/6213-2013_11_13.jpgmy-bucket/4653-2013_11_13.jpgmy-bucket/9873-2013_11_13.jpgmy-bucket/4657-2013_11_13.jpgmy-bucket/1256-2013_11_13.jpgmy-bucket/8345-2013_11_13.jpgmy-bucket/0321-2013_11_13.jpgmy-bucket/5654-2013_11_13.jpgmy-bucket/2345-2013_11_13.jpgmy-bucket/7567-2013_11_13.jpgmy-bucket/3455-2013_11_13.jpgmy-bucket/4313-2013_11_13.jpg
Partitions:
my-bucket/0
my-bucket/1
my-bucket/2
my-bucket/3
my-bucket/4
my-bucket/5
my-bucket/6
my-bucket/7
my-bucket/8
my-bucket/9
Monotonically Increasing Customer IDmycustdata/2134857/app_data_1/2016-11-30-02:01:01:24/log.txt
mycustdata/2134857/app_data_1/2016-11-30-02:01:01:32/wrk_user
mycustdata/2134858/app_data_1/2016-11-30-02:01:01:29/product_usage.csv
mycustdata/2134858/app_data_1/2016-11-30-02:01:01:24/log.txt
mycustdata/2134858/app_data_1/2016-11-30-02:01:01:14/wrk_user
mycustdata/2134859/app_data_1/2016-11-30-02:01:01:28/product_usage.csv
mycustdata/2134859/app_data_1/2016-11-30-02:01:01:45/log.txt
mycustdata/2134859/app_data_1/2016-11-30-02:01:01:34/wrk_user
mycustdata/7584312/app_data_1/2016-11-30-02:01:01:23/product_usage.csv
mycustdata/7584312/app_data_1/2016-11-30-02:01:01:24/log.txt
mycustdata/7584312/app_data_1/2016-11-30-02:01:01:32/wrk_user
mycustdata/8584312/app_data_1/2016-11-30-02:01:01:29/product_usage.csv
mycustdata/8584312/app_data_1/2016-11-30-02:01:01:24/log.txt
mycustdata/8584312/app_data_1/2016-11-30-02:01:01:14/wrk_user
mycustdata/9584312/app_data_1/2016-11-30-02:01:01:28/product_usage.csv
mycustdata/9584312/app_data_1/2016-11-30-02:01:01:45/log.txt
mycustdata/9584312/app_data_1/2016-11-30-02:01:01:34/wrk_user
Partition:
mycustdata/213485
Partitions:
mycustdata/7
mycustdata/8
mycustdata/9
Reverse Monotonically Increase prefix
If a single customer can push a higher
workload, they can cause a Hotspot.
Add A Hash to Beginning of Key – Bestmycustdata/2134857/app_data_1/2016-11-30-02:01:01:24/log.txt
mycustdata/2134857/app_data_1/2016-11-30-02:01:01:32/wrk_user
mycustdata/2134858/app_data_1/2016-11-30-02:01:01:29/product_usage.csv
mycustdata/2134858/app_data_1/2016-11-30-02:01:01:24/log.txt
mycustdata/2134858/app_data_1/2016-11-30-02:01:01:14/wrk_user
mycustdata/2134859/app_data_1/2016-11-30-02:01:01:28/product_usage.csv
mycustdata/2134859/app_data_1/2016-11-30-02:01:01:45/log.txt
mycustdata/2134859/app_data_1/2016-11-30-02:01:01:34/wrk_user
mycustdata/1a/2134857/app_data_1/2016-11-30-02:01:01:24/log.txt
mycustdata/34/2134857/app_data_1/2016-11-30-02:01:01:32/wrk_user
mycustdata/a7/2134858/app_data_1/2016-11-30-02:01:01:29/product_usage.csv
mycustdata/58/2134858/app_data_1/2016-11-30-02:01:01:24/log.txt
mycustdata/70/2134858/app_data_1/2016-11-30-02:01:01:14/wrk_user
mycustdata/02/2134859/app_data_1/2016-11-30-02:01:01:28/product_usage.csv
mycustdata/2b/2134859/app_data_1/2016-11-30-02:01:01:45/log.txt
mycustdata/63/2134859/app_data_1/2016-11-30-02:01:01:34/wrk_user
Partition:
mycustdata/213485
Partitions:
mycustdata/0
mycustdata/1
mycustdata/2
mycustdata/3
mycustdata/4
mycustdata/5
mycustdata/6
mycustdata/7
Add a hash to evenly distribute the keys for all requests
mycustdata/8
mycustdata/9
mycustdata/a
mycustdata/b
mycustdata/c
mycustdata/d
mycustdata/e
mycustdata/f
Challenges of using a hash to create entropy
• Listing challenges/opportunities:
• A Secondary Index can be used to avoid listing
• Can be accomplished with Event Notification to AWS Lambda and
Amazon DynamoDB
• Blog Post - Building and Maintaining an Amazon S3 Metadata Index
without Servers
• Hash can be used to split work of LISTing objects
• Lifecycle constraints• Max number of lifecycle rules – 1000
• Tagging can make this easier
Faster upload of large objects
Parallelize PUTs with Multipart Uploads
• Increase aggregate throughput by
parallelizing PUTs on high-bandwidth
networks
• Move the bottleneck to the network,
where it belongs
• Increase resiliency to network errors;
fewer large restarts on error-prone
networks
Best Practice
Faster download
You can parallelize GETs too
For large objects, use range-based GETs
For content distribution, enable Amazon CloudFront
• Caches objects at the edge
• 59 global edge locations
GET /example-object HTTP/1.1
Host: example-bucket.s3.amazonaws.com
x-amz-date: Fri, 28 Jan 2011 21:32:02 GMT
Range: bytes=0-9
Authorization: AWS AKIAIOSFODNN7EXAMPLE:Yxg83MZaEgh3OZ3l0rLo5RTX11o=
Q & A
Case Study
Ring Products: Practical Uses for the IoT
Ring Neighborhoods: Network Effects in Practice
Wilshire Park study with LAPD:
Ring installed on 10% of homes
Burglaries down 55% for the
entire community in 6 months
Burglars want an easy hit, and go
elsewhere if you’re home
Alarms are reactive, not proactive
Traditional systems don’t link up,
so protection ends at your door
Devices installed in nearly every country on Earth
Millions of connected apps and devices
Over 1 billion videos and rapidly increasing
High growth brings challenges, even month to month
Ring Urban Activity Index
2016-10-20, USA-only,
low-cut rural areas
Global Concerns
Intelligently Determining Class
Ring Requirements
• Live video is ingested from devices and apps via our application servers
• Videos are uploaded to our S3 buckets
• The videos are transcoded and make them available for customers to
stream
• Customers need low latency in delivering video streams around the world
• Customers get a 30-day free trial of video backups.
• If they decide to continue to store videos, they can store videos for up to 6
months after the activity.
• When users share videos, we expect them to be watched a lot, and
sometimes they go viral
Present Your Design
• How did you address the use case?
• What was your key naming scheme?
• How did you address scale?
• How did you manage object metadata?
• Did you minimize cost?
• How do you monitor your requests?
• How did you address security considerations?
Ring Video Pipeline
Raw
Buckets
Final
(Standard)
S3 Logs
Amazon
CloudFront
Ring App(s)
AWS
Lambda
Viewers
Amazon
SQS
Owner(s)
Visitor
Application
Servers
Ring Device
GPU
Farm
Final
(IA)
Lifecycle
Transitions
Event
Triggers
Live Video
Extreme Performance is Easy
S3 will automatically partition if you use good keys – or just add more buckets
CloudFront as a CDN for GET heavy loads and faster downloads
Faster uploads with Transfer Acceleration
TCP Window Scaling - without it, 64kB window kneecaps long fat networks
TCP SACK is good for fast but lossy connections like mobile connections
examplebucket/2134857/data/start.png
examplebucket/2134857/data/resource.rsrc
examplebucket/2134857/data/results.txt
examplebucket/2134858/data/start.png
examplebucket/2134858/data/resource.rsrc
examplebucket/2134858/data/results.txt
examplebucket/2134859/data/start.png
examplebucket/2134859/data/resource.rsrc
examplebucket/2134859/data/results.txt
examplebucket/7584312/data/start.png
examplebucket/7584312/data/resource.rsrc
examplebucket/7584312/data/results.txt
examplebucket/8584312/data/start.png
examplebucket/8584312/data/resource.rsrc
examplebucket/8584312/data/results.txt
examplebucket/9584312/data/start.png
examplebucket/9584312/data/resource.rsrc
examplebucket/9584312/data/results.txt
S3 Scaling on H-Day
Thank you!Thank you!
Remember to complete
your evaluations!
Remember to complete
your evaluations!