Evaluating Caching and Storage Options on the Amazon Web Services Cloud Gagan Agrawal, Ohio State...
-
Upload
sophia-brewer -
Category
Documents
-
view
215 -
download
1
Transcript of Evaluating Caching and Storage Options on the Amazon Web Services Cloud Gagan Agrawal, Ohio State...
Evaluating Caching and Storage Options on the Amazon Web Services
Cloud
Gagan Agrawal, Ohio State University - Columbus, OH
David Chiu, Washington State University - Vancouver, WA
Presented by
Smita Vijayakumar, Juniper Networks
2
Outline
‣ Introduction to Cloud Computing
‣Background on AWS and Motivation
‣Cost and Performance Evaluation
‣Conclusion
3
Cloud Computing Paradigm
Cloud “Utility” Providers:Amazon AWS, Azure, Cloudera,
Google App Engine
Consumers:Companies, labs, schools, et al.
4
Cloud Computing Paradigm
Algorithms& Data
Cloud “Utility” Providers:Amazon AWS, Azure, Cloudera,
Google App Engine
Consumers:Companies, labs, schools, et al.
5
Cloud Computing Paradigm
Algorithms& Data
Cloud “Utility” Providers:Amazon AWS, Azure, Cloudera,
Google App Engine
Consumers:Companies, labs, schools, et al.
6
Cloud Computing Paradigm
Algorithms& Data
Cloud “Utility” Providers:Amazon AWS, Azure, Cloudera,
Google App Engine
Consumers:Companies, labs, schools, et al.
ProcessedResults
7
Promises of Cloud Computing
Allows us to consolidate
machines and outsource
computation and storage
Pay-as-you-go Computing
“Infinite” compute resources and storage
8
Outline
‣ Introduction to Cloud Computing
‣Background on AWS and Motivation
‣Cost and Performance Evaluation
‣Conclusion
9
A Motivating Example
‣A service-oriented system that answers queries
from a similar domain
‣ Intermediate and
final results can be
cached and reused
for future queries
‣Often present in
workflow
applications
12
Leveraging the Cloud for Storage
‣Store and Cache Intermediate and Final Results in the
Cloud
‣The Cloud has many options for data storage
• Memory
• Disks
• Network Disks
• Highly Available Persistent Storage
‣There are several tradeoffs in each option
13
Amazon Web Services (AWS)
‣A Case study: AWS has emerged as one of the most
widely used Cloud platform
‣We consider caching and storage performance in three
AWS Services:
• Elastic Compute Cloud (EC2) Machine instances
• Simple Storage Service (S3)
• Elastic Block Storage (EBS)
14
AWS Services: EC2
‣Elastic Compute Cloud (EC2)
• Access to virtualized machines with varying capabilities
(e.g., CPU cores, memory, disk space) depending on
price.
Instance Type CPU Memory Disk I/O
Small 1 virtual core 1.7GB 160GB medium
XLarge 4 virtual cores(x 2 compute units ea)
15.0GB 1.7TB high
15
AWS Services: EBS
‣Elastic Block Storage (EBS)
• Persisted network disks.
• Must be mounted onto EC2 machine before use.
• Users must initially specify a fixed size and format to
appropriate file system.
16
AWS Services: S3
‣Simple Storage Service (S3)
• Simple FTP-style API: GET, PUT, etc.
• Highly available, reliable, and durable storage (but
slower)
• “Infinite capacity”
• Not required to be used with EC2 machines.
• Very inexpensive in terms of costs.
17
Costs of AWS Services
18
Tradeoffs Per Application and Service
‣Caching in-core (EC2-Memory)
• Fast, but expensive
• Small, may need extra logic to coordinate set of EC2
nodes
• Data is volatile
19
Tradeoffs Per Application and Service
‣Caching on local disk (EC2-Disk)
• Much slower than memory
• Much more space
• Data is still volatile
20
Tradeoffs Per Application and Service
‣Caching on Elastic Block Store (EC2-EBS)
• Possibly slower than disk
• Volume size is initially configured by application users
• Data is persisted
21
Tradeoffs Per Application and Service
‣Caching on S3
• Slowest option, but most reliable
• No bound on size
• Data is persisted
22
Outline
‣ Introduction to Cloud Computing
‣Background on AWS and Motivation
‣Cost and Performance Evaluation
‣Conclusion
24
Experimental Application
‣Geospatial Application: Land Elevation Change
• In general, 2 large matrices (DEM files) are retrieved, and their
difference is returned
‣500 unique requests
‣Requests are issued randomly
‣Eviction not considered (we assume cache/storage configuration
is being used to store all results)
25
Performance
‣We use 4 different DEM data sizes to test performance:
• 1KB, 1MB, 5MB, 50MB
‣This means a full cache would hold
• 500KB, 500MB, 2.5GB, 25GB
26
1KB DEM Size
27
1MB DEM Size
28
5MB DEM Size
29
50MB DEM Size
30
Cost Analysis
‣We next assess the costs versus the performance
‣Performance is being measured as relative speedup over
the baseline DEM process execution, shown in Table 2
‣We project costs and speedup over 2000 and 200000
requests
31
Monthly Costs for Volatile Cache (1MB)
200000 I/O Requestsoutside of AWS
2000 I/O Requestsoutside of AWS
Cost per unit speedup is low when requests are high.
I/O costs are still low because of small data size
3.5 3.26 3.6 3.6 267 28 347 180.5Speedup
32
Monthly Costs for Volatile Cache (50MB)
200000 I/O Requestsoutside of AWS
2000 I/O Requestsoutside of AWS
Costs are now dominated by I/O due to large data size
In terms of performance, makes more sense to use xlarge for large data size
2.9 3.3 16.05 31.66Speedup
small instance makes better economic sense for small number of requests
33
Monthly Costs for Persistent Cache (1MB)
200000 I/O Requestsoutside of AWS
2000 I/O Requestsoutside of AWS
S3 makes better economic sense than EBS-based instances
3.4 3.62 3.58 30 13.6 134Speedup
S3 performance is comparable for a cache with small I/O requests
34
Monthly Costs for Persistent Cache (50MB)
200000 I/O Requestsoutside of AWS
2000 I/O Requestsoutside of AWS
Interesting - Even with low cost of S3, it still makes sense to use xlarge when I/O requests are high
2.59 2.74 3.19 6.4 11.09 22.66Speedup
S3 still comparable, and makes better economic sense than EBS-based instances
35
Outline
‣ Introduction to Cloud Computing
‣Background on AWS and Motivation
‣Cost and Performance Evaluation
‣Conclusion
36
Summary (1)
‣For smaller data (<= 5MB)
• If request rate is low: Use small instance on-disk
• If request rate is high: Use small instance in-memory
• Although I/O is slow, the cost of using small instance is
very low
‣ If persistence is needed,
• Use S3, and avoid EBS
37
Summary (2)
‣For larger data (>= 50MB and large cache sizes)
• Use xlarge instances
• Higher I/O rates
• Larger memory and disk capacity
‣EBS may be considered in conjunction to XLarge
instances for persistence
‣ If performance is not an issue, but persistence and
costs are, use S3
38
Conclusion
‣Cloud offers many viable options for data storage and
caching
‣We evaluated the cost-performance tradeoffs of these
various options, and determined a roadmap for making
clear decisions on resource usage
39
Thank you
‣Questions and Comments?
• David Chiu - [email protected]
• Gagan Agrawal – [email protected]