AWS January 2016 Webinar Series - Amazon Aurora for Enterprise Database Applications
-
Upload
amazon-web-services -
Category
Technology
-
view
2.950 -
download
0
Transcript of AWS January 2016 Webinar Series - Amazon Aurora for Enterprise Database Applications
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Abdul Sathar Sait
Principal Product Manager
Amazon Aurora
for Enterprise
Database Applications
Enterprise database requirements ..
Database engine with enterprise class availability,
performance, scalability, and security.
Managed services – instant provisioning, push button
scaling, automated backups, patching, monitoring,
migration.
1
2
Goal: Provide fully managed enterprise class database service
without the cost and complexity of commercial database software.
Traditional relational databases
Gradual improvements on decades old design
Accommodate different server and storage hardware
Too complex to tune to achieve optimal performance
Layers of software to mitigate potential points of failure
‘Cloudification’ by virtue of additional layers
High cost, complex and punitive licensing terms Multiple layers of functionality
in a monolithic stack.
SQL
Transactions
Caching
Logging
Relational database re:Imagined
We started with a blank sheet of paper and reimagined
relational database for the cloud
Amazon Aurora is purpose built for the cloud
Designed from the ground up using AWS technology
Distributed component architecture with built-in redundancy
High-Availability and scale out are part of database core design
Self-healing components designed for resilience
Architected for security and performance
Security
Isolates your data within an Amazon VPC
Encryption at rest using Keys you create an
manage using KMS
Data, automated backups, snapshots, and
replicas in the same cluster all automatically
encrypted.
Seamless encryption and decryption, requiring
no changes to your application.
Automatic encryption in transit
Encryption at rest and transit
Enterprise-class
performance
Write performance (console screenshot)
MySQL Sysbench
R3.8XL with 32 cores
and 244 GB RAM
4 client machines with
1,000 threads each
MySQL Sysbench
R3.8XL with 32 cores
and 244 GB RAM
Single client with
1,000 threads
Read performance (console screenshot)
Writes scale with table count
TablesAmazon
Aurora
MySQL
I2.8XL
local SSD
MySQL
I2.8XL
RAM disk
RDS MySQL
30K IOPS
(single AZ)
10 60,000 18,000 22,000 25,000
100 66,000 19,000 24,000 23,000
1,000 64,000 7,000 18,000 8,000
10,000 54,000 4,000 8,000 5,000
Write-only workload
1,000 connections
Query cache (default on for Amazon Aurora, off for MySQL)
Write scales with number of connections
ConnectionsAmazon
Aurora
RDS MySQL
30K IOPS
(single AZ)
50 40,000 10,000
500 71,000 21,000
5,000 110,000 13,000
OLTP Workload
Variable connection count
250 tables
Query cache (default on for Amazon Aurora, off for MySQL)
Less IOs to backend
Effective query caching
Replica managementDo less work
Do it efficiently
Latch free lock management
Adaptive thread pools
Asynchronous commits
Consistent, low-latency writes
AZ 1 AZ 2
Primary
Instance
Standby
Instance
Amazon Elastic
Block Store (EBS)
Amazon S3
EBSmirror
EBS
EBSmirror
AZ 1 AZ 3
Primary
Instance
Amazon S3
AZ 2
Replica
Instance
Improvements
Consistency—tolerance to outliers
Latency—synchronous vs. asynchronous replication
Efficiency—significantly more efficient use of network I/O
Log records
Binlog
Data
Double-write buffer
FRM files, metadata
Type of writes
MySQL with standby Amazon Aurora
async
4/6 quorum
PiTR
Sequential
write
Sequential
write Distributed
writes
Limitation of MySQL lock manager
TIME
The lifetime of a lockin legacy systems
Insert a lock into the lock table
Single global lock allows only one active thread in lock manager
Latch-free lock manager
TIME
Lock lifetime in a new lock manager
A lock exists, but non-blocking
Lock lifetime in legacy systems
Latch freeAtomic lock insert
• Read-After-Write (RAW) with memory barriers for fast synchronization
• Staged allocation and de-allocation of locks for a lock hash table
Identical semantics as MySQL lock Concurrent latch-free operation
Uses specialized resource manager Implements lock compression
Asynchronous group commits
Read
Write
Commit
Read
Read
Read
Write
Commit
Read
Read
Read
Write
Commit
Read
Read
T1 T1 Tn
Commit(T1)
Commit(T2)
Commit(T3)
LSN 10
LSN 12
LSN 22
LSN 50
LSN 30
LSN 34
LSN 41
LSN 47
LSN 20
LSN 49
Commit(T4)
Commit(T5)
Commit(T6)
Commit(T7)
Commit(T8)
LSN growthDurable LSN at head-node
Commit queuePending commits in LSN order
Transactions
Time
Group
Commit
• Pending commits are queued in LSN order for asynchronous execution
• Commit threads scan the queue and executes multiple commits at a time
• Eliminates wait time for writes to be durable at the storage nodes
• Group execution of multiple commits at a time improves efficiency
Designed for
high-availability
Aurora storage
Highly available by default• 6-way replication across 3 AZs
• 4 of 6 write quorum
• Automatic fallback to 3 of 4 if an
Availability Zone (AZ) is unavailable
• 3 of 6 read quorum
SSD, scale-out, multi-tenant storage• Seamless storage scalability
• Up to 64 TB database size
• Only pay for what you use
Log-structured storage• Many small segments, each with their own redo logs
• Log pages used to generate data pages
• Eliminates chatter between database and storage
SQL
Transactions
AZ 1 AZ 2 AZ 3
Caching
Amazon S3
Lose two copies or an AZ failure without read or write availability impact
Lose three copies without read availability impact
Automatic detection, replication, and repair
SQL
Transaction
AZ 1 AZ 2 AZ 3
Caching
SQL
Transaction
AZ 1 AZ 2 AZ 3
Caching
Read and write availability Read availability
Self-healing, fault-tolerant
Continuous backupSegment snapshot Log records
Recovery Point
Segment 1
Segment 2
Segment 3
Time
Take periodic snapshot of each segments in parallel. Stream the redo logs to S3.
Backup happens continuously without performance or availability impact
At restore retrieve the appropriate segment snapshots and log streams to storage nodes.
Apply log streams to segment snapshots in parallel and asynchronously.
Traditional databases
Have to replay logs since the last
checkpoint
Single-threaded in MySQL; requires a
large number of disk accesses
Amazon Aurora
Underlying storage replays redo
records on demand as part of a disk
read
Parallel, distributed, asynchronous
Checkpointed Data Redo Log
Crash at T0 requires
a re-application of the
SQL in the redo log since
last checkpoint
T0 T0
Crash at T0 will result in redo
logs being applied to each segment
on demand, in parallel, asynchronously
Instant crash recovery
Faster, more predictable failover
AppRunningFailure Detection DNS Propagation
Recovery Recovery
DBFailure
MYSQL
App
Running
Failure Detection DNS Propagation
Recovery
DB
Failure
AURORA WITH MARIADB DRIVER
1 5 - 2 0 s e c
3 - 2 0 s e c
To cause the failure of a component at the database node:
ALTER SYSTEM CRASH [{INSTANCE | DISPATCHER | NODE}]
To simulate the failure of disks:
ALTER SYSTEM SIMULATE percent_failure DISK failure_type IN
[DISK index | NODE index] FOR INTERVAL interval
To simulate the failure of networking:
ALTER SYSTEM SIMULATE percent_failure NETWORK failure_type
[TO {ALL | read_replica | availability_zone}] FOR INTERVAL interval
To simulate the failure of an Aurora Replica:
ALTER SYSTEM SIMULATE percentage_of_failure PERCENT
READ REPLICA FAILURE [TO ALL | TO "replica name"] FOR INTERVAL interval
Simulate failures using SQL
Delivered as a
managed services
Databases are hard to manage
RDS platform: managing databases made easy
Schema design
Query construction
Query optimization
Backup & recovery
Isolation & security
Industry compliance
Push-button scaling
Automated patching
Advanced monitoring
Routine maintenance
Amazon RDS takes care of your time-consuming database
management tasks, freeing you to focus on your applications and
business
You
RDS
Advanced monitoring
Single page dashboard for OS
and process diagnostics in AWS
console
Customize dashboard with
choice of metrics and layout
Add alarms on specific metrics
Metrics egress via Cloud Watch
Logs into 3rd party monitoring
tools like Graphite etc.
Support for metrics crossover
into CloudWatch
Metrics such as load average, detailed CPU utilization, detailed disk IO, and per
process provides at a fixed range of granularities ranging from 60 to 1 seconds.
Applications becoming more complex
CLOUD
Amazon EC2
Amazon
RDS
BIG DATA
Hadoop
Cassandra
Amazon EC2
Middleware
On-Prem DBOn-prem
DB
.NET
WEB 2.0
Browser Logic
AJAX
Web Frameworks
Amazon RDSAmazon EC2
Amazon EC2
Middleware Middleware
Amazon EC2
Amazon EC2
Amazon RDS
Amazon
Elasticache
Monitoring across the stack is key to minimizing downtime Access to information from every potential point of failure
Alarm and notification system for pre-emptive action
Rich visualization of aggregated data at users’ convenience
Integrations with tools and dashboards
AWS Database
Migration service
Move data to the same or different database engine
Keep your apps running during the migration
Start your first migration in 10 minutes or less
Replicate within, to or from AWS EC2 or RDS
AWS Database
Migration Service
http://aws.amazon.com/dms
Customer
Premises
Application Users
AWS
Internet
VPN
Start a replication instance
Connect to source and target databases
Select tables, schemas or databases
Let the AWS Database Migration
Service create tables, load data and
keep them in sync
Switch applications over to the target
at your convenience
Keep your apps running during the migration
Migrate off Oracle and SQL Server
Move your tables, views, stored procedures and DML to MySQL, MariaDB & Amazon Aurora
Highlight where manual edits are neededAWS Schema
Conversion Tool
http://aws.amazon.com/sct
AWS Database Migration service
is in Open Preview now
Try it out yourself
Perfect fit for enterprise
6-way replication across 3
Fail-over in less than 30 secs
Near instant crash recovery
Up to 500K/sec read and 100K/sec write
15 low latency (10ms) Read Replicas
Up to 64 TB DB optimized storage volume
Instant provisioning and deployment
Automated patching and software upgrade
Backup and point-in-time recovery
Compute and storage scaling
Performance and scale
Enterprise class availability
Fully managed service
Many features are unique to Amazon Aurora
Comparing to traditional commercial databases like Oracle• Available only in most expensive database edition (Enterprise Edition)
• Failover and Replica – Oracle Active Data guard – Extra $$$ per core
• Backup to S3 - Oracle Secure Backup Cloud Module – Extra $$$ per channel
• Encryption – Oracle Advanced Security - Extra $$$ per core
Comparing features ..
Don’t be constrained by
Licenses, Cost or Capacity
Simple pricing
No licenses
No lock-in
Pay only for what you use
Discounts
44% with a 1-year RI
63% with a 3-year RI
vCPU Mem Hourly Price
db.r3.large 2 15.25 $0.29
db.r3.xlarge 4 30.5 $0.58
db.r3.2xlarge 8 61 $1.16
db.r3.4xlarge 16 122 $2.32
db.r3.8xlarge 32 244 $4.64
• Storage consumed, up to 64 TB, is $0.10/GB-month
• IOs consumed are billed at $0.20 per million I/O
• Prices are for Virginia
Enterprise grade, open source pricing
Cost of Ownership: Aurora vs. Commercial databasesOracle on EC2 Configuration Hourly Cost
Primary
r3.8XL
Standby
r3.8XL
Replica
r3.8XLReplica
R3.8XL
Storage6TB / 30K PIOP
Storage6TB / 30K PIOP
Storage6TB / 30K PIOP
Storage6TB / 30K PIOP
$2.93/hr
$2.93/hr
$2.93/hr $2.93/hr
$3.75/hr
$3.75/hr $3.75/hr
Instance cost: $11.72/ hr
License cost: $63.12/hr
Storage cost: $15.00 / hr
Total cost: $89.84 / hr
$3.75/hr
Enterprise
License
$15.78/hr
Enterprise
License
$15.78/hr
Enterprise
License
$15.78/hr
Enterprise
License
$15.78/hr
Cost of Ownership: Aurora vs. Oracle Aurora Configuration Hourly Cost
Instance cost: $13.92/ hr
Storage cost: $5.15 / hr
Total cost: $19.07 / hr
Primary
r3.8XL
Replica
r3.8XL
Replica
R3.8XL
Storage / 6TB
$4.64/ hr $4.64/ hr $4.64/ hr
$5.15 / hr
Storage IOPs assumptions:
1. Average IOPs is 50% of Max IOPs
2. 50% savings from shipping logs vs. full pages
78.7%
Savings
No idle standby instance
Single shared storage volume
No POIPs – pay for use IO
Reduction in overall IOP
Enterprise
use cases
Fastest growing service in AWS history
1000+ customer after 10 days of launch
Web and mobile
Content management
E-commerce, retail
Internet of Things
Search, advertising
BI and analytics
Games, media
Common Customer Use Cases
Expedia: On-line travel marketplace
Real-time business intelligence and analytics on a
growing corpus of on-line travel market place
data.
Current SQL server based architecture is too
expensive. Performance degrades as data
volume grows.
Cassandra with Solr index requires large memory
footprint and hundreds of nodes, adding cost.
Aurora benefits:
Aurora meets scale and performance
requirements with much lower cost.
25,000 inserts/sec with peak up to 70,000. 30ms
average response time for write and 17ms for
read, with 1 month of data.
World’s leading online travel
company, with a portfolio that
includes 150+ travel sites in 70
countries.
Alfresco: Enterprise Content Management
Needed database to scale without any
degradation in performance.
Benefits:
Alfresco on Amazon Aurora, scaled to 1
billion documents with a throughput of 3
million per hour, which is 10 times faster
than their MySQL environment.
Provides Enterprise Content
Management software built on
open standards.
Alfresco One Architecture
4
7
Alfresco Share
Alfresco Repository
Alfresco SOLR
Activiti Workflow
Engine
Database
FS Content
Store
Indexes
S3
RDS
EBS or Ephemeral
PIOPS EBS
(or Glacier)
EC2
Benchmark Environment – 1.2B Docs
UI Test x 20 m3.2xlarge Simulate 500 Users• Selenium / Firefox
• 1 hour constant load
• 10 sec think time
UI Test UI Test
Alfresco Alfresco Alfresco x 10 c3.2xlarge Alfresco with Share
and Repo
Solr x 20 m3.2xlarge Solr Solr
Aurora x 1 db.r3.xlarge
ELB
Sharded Solr Cloud
sites folders files transactions dbSize GB
10,804 1,168,206 1,168,206,000 15,475,064 3,185
Simulate AWS
Import/Export
(in place)
Benchmark Results
• Document load rate 1000 documents per second (with 10 nodes)
• Load rate was consistent even passing the 1B document
• Sub-second login times and good responses for other actions
• Open Library: 4.5s
• Page Results: 1s
• Navigate to Site: 2.3
• Aurora indexes used efficiently at 3.2TB
• No indications of any size-related bottlenecks with 1.1 Billion Documents
• CPU loads:
• Database: 8-10%
• Alfresco (each of 10 nodes): 25-30%
Insurance claims processing
• ICSC Provides fully integrated policy management, claim and billing solutions for property/casualty insurance organizations
• For the last 12 years ISCS has used SQL Server & Oracle commercial databases for operational & warehouse data
• The cost and maintenance of traditional commercial database has increasingly become the biggest expenditure and maintenance headache
• Maintaining its customer SLAs requires complex, difficult-to-manage replication and redundancy across multiple geographic locations
• As customer data grows, backup/restore times for its largest data sets have progressed to unacceptable levels.
Aurora benefits
SQL Server backups that once took 5-6 hours daily now happen continuously on Aurora. Snapshots from one customer database (~ 5TB in size) take 5 minutes to make and less than an hour to restore. ISCS can actually test disaster recovery daily if it wanted to.
Data that was once only available “daily, batch” into Redshift can now be migrated continuously using Aurora read-replicas and Change Data Capture (CDC).
Performance at scale is linear since ISCS’s application, like Aurora, is optimized for multiple, concurrent read requests to the database.
Multi-AZ Aurora read-replicas also eliminate the need for additional licenses/deployments of SQL Server.
The cost of a “more capable” deployment on Aurora has proven to be about 70% less than ISCS’s SQL Server deployments.
Amazon Aurora: Earth Networks
Earth Networks process over 25 terabytes
of real-time data daily, so need a scalable
database that can rapidly grow with
expanding data analysis
Benefits:
Aurora performance and scalability works
well with their rapid data growth. Moving
from SQL Server to Aurora was very easy
Operates world's largest
weather and lightning sensor
networks and technology
Thank you!