(DAT202) Managed Database Options on AWS
-
Upload
amazon-web-services -
Category
Technology
-
view
1.499 -
download
1
Transcript of (DAT202) Managed Database Options on AWS
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Pranav Nambiar, Sr. Manager of Product Management, AWS Database Services
Jeongsang Baek, VP of Engineering, IGAWorks
October 2015 | Las Vegas, NV
DAT 202
Managed Database Options on AWS
One size fits all … doesn’t quite work
How can we optimize for scale, performance, cost?
Scale
Cost
Performance
How we wish …
This is a
worry-free zone
WORRY
What to expect from the session
• Why managed database services?
• SQL vs NoSQL
• AWS database options
• Amazon DynamoDB—A nonrelational managed database
• Amazon RDS—A relational managed database
• Amazon ElastiCache—A managed in-memory cache
• Amazon Redshift—A managed data warehouse
• Useful insights from IGAWorks
• Wrap-up
Why managed database services?
If you host your databases on-premises
Power, HVAC, net
Rack and stack
Server maintenance
OS patches
DB s/w patches
Database backups
Scaling
High availability
DB s/w installs
OS installation
you
App optimization
If you host your databases on-premises
Power, HVAC, net
Rack and stack
Server maintenance
OS patches
DB s/w patches
Database backups
Scaling
High availability
DB s/w installs
OS installation
you
App optimization
If you host your databases in Amazon EC2
Power, HVAC, net
Rack and stack
Server maintenance
OS patches
DB s/w patches
Database backups
Scaling
High availability
DB s/w installs
OS installation
you
App optimization
If you host your databases in Amazon EC2
OS patches
DB s/w patches
Database backups
Scaling
High availability
DB s/w installs
you
App optimization
Power, HVAC, net
Rack and stack
Server maintenance
OS installation
If you choose a managed DB service
Power, HVAC, net
Rack and stack
Server maintenance
OS patches
DB s/w patches
Database backups
App optimization
High availability
DB s/w installs
OS installation
you
Scaling
Quick summary of the options
• Self-Managed—You are responsible for the hardware,
OS, security, updates, backups, replication etc., but have
full control over it.
• EC2 Instances—You only need to focus on the database
level updates, patches, replication, backups etc. and
don’t have to worry about the hardware or the OS
installation.
• Fully Managed—You get features such as backup and
replication etc. as a package service and don’t have to
bother with patching and updates.
What are the AWS managed DB
options?
A managed service for each major DB type
Amazon
DynamoDB
Document
and Key-
Value Store
Amazon
RDS
SQL
Database
Engines
Amazon
ElastiCache
In-Memory
Key-Value
Store
Amazon
Redshift
Data
Warehouse
Pick the best tool for the job
Decisions
NoSQL
vs. SQL
Aurora
vs.
MySQLDynamoDB
vs. Mongo
NoSQL vs. SQL for a new app: how to choose?
• Schema-less, easy reads
and writes, simple data
model
• Scaling is easy
• Focus on performance and
availability at any scale
• Strong schema, complex
relationships,
transactions and joins
• Scaling is difficult
• Focus on consistency
over scale and availability
NoSQL SQL
What is Amazon DynamoDB?
Amazon DynamoDB
NoSQL database
Fully managed
Single-digit millisecond latency
Massive and seamless scalability
Low costAmazon
DynamoDB
Popular use cases
Ad Tech IoT GamingMobile
& Web
Ad serving,
retargeting, ID
lookup, user
profile
management,
session-
tracking, RTB
Tracking state,
metadata and
readings from
millions of
devices, real-
time
notifications
Recording
game details,
leaderboards,
session
information,
usage history,
and logs
Storing user
profiles,
session details,
personalization
settings, entity
specific
metadata
Predictable, low latency performance
Consistent single-digit millisecond latency even at massive scales
Writes
Replicated continuously to 3 AZs
Persisted to disk (custom SSD)
Reads
Strongly or eventually consistent
No latency trade-off
Automatic replication for rock-solid durability and
availability
Amazon DynamoDB is a schemaless database
Attributes
Schema-lessSchema is defined per item
Items
Table
Item Key
Define the desired performance using provisioned
throughput
Read
capacity unitsWrite
capacity units
1 RPS > 2.5 M
requests in a
month
You pay for the resources that you use
Monthly
bill = GB +
Pricing varies by region. Further details at http://aws.amazon.com/dynamodb/pricing/
Storage
consumed
Write
capacity
units
(WCUs)
+
Read
capacity
units
(RCUs)
Free tier:
• Generous free tier of 25 GB, 25 WCUs, and 25 RCUs
• That is, you get over 60M read requests and 60M write request for free in a month
• The free tier is indefinite—you benefit from this every month
Selected DynamoDB customers
What is Amazon RDS?
Relational databases
Fully managed
Fast, predictable performance
Simple and fast to scale
Low cost, pay for what you useAmazon
RDS
Amazon Aurora
Use cases
Applicable wherever you need relational databases
eCommerce Gaming
Websites IT Solutions
Apps
Reporting
RDS feature matrix
Feature Aurora MySQL PostgreSQL Oracle SQL Server
VPC
High availability
Instance scaling
Encryption Coming
soon
Read replicas Oracle Golden
GateCross region
Max storage 64 TB 6 TB 6 TB 6 TB 4 TB
Scale storage Auto
Scaling
Provisioned IOPS NA 30,000 30,000 30,000 20,000
Largest instance R3.8XL R3.8XL R3.8XL R3.8XL R3.8XL
Amazon Aurora: Fast, available, and MySQL-compatible
SQL
Trans-
actions
AZ 1 AZ 2 AZ 3
Caching
Amazon
S3
5x faster than MySQL on
same hardware
Sysbench: 100K writes/sec
and 500K reads/sec
Designed for 99.99%
availability
6-way replicated storage
across 3 AZs
Scale to 64 TB and 15 read
replicas
Amazon RDS is simple and fast to scale
Database instance types
offer a range of CPU and
memory selections
Scale up or down among
instance types on demand
Database storage is
scalable on demand
Amazon RDS offers fast, predictable storage
General Purpose
(SSD) for most
workloads
Provisioned IOPS
(SSD) for OLTP
workloads up to
30,000 IOPS
Magnetic for small
workloads with
infrequent access
High availability Multi-AZ deployments
Enterprise-grade fault tolerance solution for
production databases
Choose cross-region replication for enhanced data locality,
even more ease of migration
Even faster recovery in the
event of disaster
Bring data close to your
customers
Promote to a master for
easy migration
Monthly
bill = +
Further details at http://aws.amazon.com/rds/pricing/
You pay for the resources that you use
Storage
consumed
Duration for which DB
instances were used
(Price depends on
type of storage)
(Price depends on
type of DB instance)
Free tier (for first 12 months)
• 750 micro DB instance hours
• 20 GB of DB storage
• 20 GB for backups
• 10 million I/O operations
GBN ×
Selected Amazon RDS customers
What is Amazon ElastiCache?
In-memory key-value store
High-performance
Memcached and Redis
Fully managed; zero adminAmazon
ElastiCache
Caching layer for performance or cost optimization
of an underlying database
Storage of ephemeral key-value data
High-performance application patterns such as
leaderboards (for gaming users), session
management, event counters, in-memory lists
Popular use cases
Key ElastiCache features
• Fully managed
• Cache node auto-
discovery
• Multi-AZ node
placement
• Fully managed
• Multi-AZ with
auto-failover
• Persistence
• Read replicas
How ElastiCache billing works
Monthly
bill = N ×
Further details at http://aws.amazon.com/elasticache/pricing/
Duration for which the
nodes were usedNumber of nodes
(Price depends on type
of node)
Free tier (for first 12 months)—750 micro cache node hours
Selected ElastiCache customers
What is Amazon Redshift?
Amazon
Redshift
a lot faster
a lot cheaper
a whole lot simpler
Relational data warehouse
Massively parallel; petabyte scale
Fully managed
HDD and SSD platforms
$1,000/TB/year; starts at $0.25/hour
Popular use cases
10x cheaper
Easy to provision
Higher DBA productivity
Traditional
enterprises
10x faster
No programming
Easily leverage BI tools,
Hadoop, machine
learning, streaming
Companies
with big data
Analysis in-line with
process flows
Pay as you go, grow as
you need
Managed availability and
disaster recovery
SaaS
companies
Amazon Redshift architectureLeader node
• Simple SQL endpoint
• Stores metadata
• Optimizes query plan
• Coordinates query execution
Compute nodes
• Local columnar storage
• Parallel/distributed execution of all
queries, loads, backups, restores,
resizes
Start at just $0.25/hour, grow to 2 PB
(compressed)
• DC1: SSD; scale 160 GB–326 TB
• DS2: HDD; scale 2 TB–2 PB
10 GigE
(HPC)
IngestionBackupRestore
JDBC/ODBC
Amazon Redshift is fast
Dramatically less I/O
Column storage
Data compression
Zone maps
Direct-attached storage
Large data block sizes
10 | 13 | 14 | 26 |…
… | 100 | 245 | 324
375 | 393 | 417…
… 512 | 549 | 623
637 | 712 | 809 …
… | 834 | 921 | 959
10
324
375
623
637
959
ID Age State Amount
123 20 CA 500
345 25 WA 250
678 40 FL 125
957 37 WA 375
Fully managed, continuous/incremental backups
Multiple copies within cluster
Continuous and incremental backups
to Amazon S3
Continuous and incremental backups
across regions
Streaming restore
Amazon S3
Amazon S3
Region 1
Region 2
Amazon Redshift offers rock-solid fault tolerance
Amazon S3
Amazon S3
Region 1
Region 2
Disk failures
Node failures
Network failure
AZ/region level disasters
You pay for what you use
Further details at https://aws.amazon.com/redshift/pricing/
Monthly
bill = N ×
Duration for which the
nodes were usedNumber of nodes
(Price depends on type
of node)2 month free trial
Leader node is free
No upfront costs, pay as you go
Price includes three data copies
Backup storage is free up to 100% of provisioned storage
3x data compression on average
Redshift has a large ecosystem
Data Integration Systems IntegratorsBusiness Intelligence
Selected Amazon Redshift customers
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Jeongsang Baek, VP of Engineering, IGAWorks
October 2015
IGAWorksRe-architecting Your Application at the Speed of
AWS Innovation
No.1 mobile business platform in Korea
IGAWorks provides
• Adbrix: App analytics and marketing attribution
• Adpopcorn: Monetization
• Live Operation: Operating tools for in-app campaigns
• Nanoo, Jiver: In-app engagement
All services are offered at no cost
Architecture of legacy service
Adbrix User
MobileDevice
Amazon Route 53
EC2Analytics MSSQL
DatabasesAnalytics
AWS Tokyo region
EC2Tracking
API
MSSQLDatabases
ActivityStorage
Over hundreds of EC2 instances
Over dozens of MSSQL instances
Over 1 PB EBS
Challenges
• Cost burden
• Operational burden
• Performance improvements
Use case: Adpopcorn
• Any app can be media for incentivized ads
• Reward a user in exchange for completing an action such as
installing or running the advertiser’s app
• Types
• Offerwall
• Lock screen ads
Participating incentivized ads
1. Open offerwall
2. Request available ads 3. Read available ads
5. Response available offers
6. Install and run advertiser’s app
Ad serve API Ad inventory
7. Sends the first run activity
8. Put a participation logMobiledevice
Participation logs
4. Check participation logs and de-duplicate ads
9. Give promised reward
Points to improve performance
• Ad inventory
• Store complex relational data
• Boost DB read request
• Participation log
• High read/write throughput
• Low latency
Re-Architecting Adpopcorn
ElastiCacheAd Inventory
AWS Elastic Beanstalk
Ad Serve API
Dynamo DBParticipation Log
Route 53MobileDevice
AWS Tokyo region
Amazon Kinesis
Participation Stream
Elastic Beanstalk
ETL Worker
Amazon RDSMonetization
Report
Amazon RDSAd Inventory
Use case: Adbrix
• Legacy
• Stored ‘all’ activities to MSSQL EC2 instances
• Expensive to store raw data to Amazon Elastic Block Store
• Hard to scale out and distribute data
• If one EC2 instance went down, then the whole service failed
• Storage size limitation
• Need to constantly monitor the storage whether it is full or not
Re-architecting Adbrix
EMR-SparkDaily Batch
Analysis
Adbrix User
MobileDevice
Route 53
EC2Adbrix
Analytics
DatabaseAdbrix
Analytics
Elastic Beanstalk
Activity Tracker
Amazon Kinesis
Elastic Beanstalk
Activity Process
Amazon S3Activity
Storages
Amazon Lambda
Micro-batch loading
Amazon Redshift
BI Analysis
AWS Tokyo region AWS N. Virginia region
Cross Region
Replication
ElastiCacheAd Inventory
Dynamo DBParticipation Log
Amazon RDSAd Inventory
• Amazon RDS:
- For ad inventory with strong schema, complex relationships, queryable data
- High availability Multi-AZ deployments
• Amazon DynamoDB:
- For participation log with heavy read/write load
- Single-digit millisecond latency
• Amazon ElastiCache:
- Redis/Memcached for fast and complex caching ad inventory
- Offloading the massive read request from RDS
• Amazon Redshift:
- For petabyte-scale big data analysis
- Export business insight easily by using reporting tool
DB heroes!
Fully-managed! Low cost! High performance!
Monthly cost report
Jan Feb Mar Apr May Jun Jul Aug
IGAWorks Cost Trend in 2015
Amazon ElastiCache
Amazon RDS
Amazon DynamoDB
Amazon Redshift
Others
Result
• Reduced 40% cost of analysis
• Scaled out more easily to support 130 million devices
• Guaranteed 2-digit latency from ad serve API response
+ Recruitment policy is changed
Lesson learned
Start your business today.
You may face with a difficult problem.
However, AWS already has the solutions.
To sum up…
Review: AWS managed DB services
Amazon
DynamoDB
Document
and Key-
Value Store
Amazon
RDS
SQL
Database
Engines
Amazon
ElastiCache
In-Memory
Key-Value
Store
Amazon
Redshift
Data
Warehouse
Benefits of AWS managed database services
Pay only for what
you use
No up-front cost
Fully managed
services
AWS handles
installs, patching,
restarts
Easy to scale
Grow as you need
Designed for use
with other AWS
services
AWS
Data PipelineAmazon
EC2
Amazon
S3
Amazon
CloudWatchAmazon
SNS
Amazon
VPC
Related Sessions
• DAT201 - Introduction to Amazon Redshift
Oct 7 – 1:30pm – 2:30pm
• DAT204 - NoSQL? No Worries: Building Scalable
Applications on AWS NoSQL Services
Oct 7 – 1:30pm – 2:30pm
• DAT301 - Amazon Aurora Deep Dive
Oct 7 – 2:40pm – 3:45pm
• DAT407 - Amazon ElastiCache: Deep Dive
Oct 8 – 11am – 12pm
Thank you!
Remember to complete
your evaluations!