AWS Webcast - Introducing Amazon Redshift
-
Upload
amazon-web-services -
Category
Technology
-
view
1.369 -
download
4
description
Transcript of AWS Webcast - Introducing Amazon Redshift
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Introducing Amazon Redshift
Amazon’s Data Warehouse as a Service
Ben Butler, Solutions Architect
Worldwide Public Sector
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
What is Amazon Web Services?
AWS Global Infrastructure
Application Services
Networking
Deployment & Administration
Database Storage Compute
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
What is Amazon Web Services?
AWS Global Infrastructure
Application Services
Networking
Deployment & Administration
Storage Compute Database
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
AWS Database Services Fully managed SQL database service for OLTP
workloads
Fully managed NoSQL service for massively
scalable, high throughput, low latency workloads
Fully managed, fast and powerful, petabyte-scale
data warehouse service
Fully managed Memcached-compliant in memory caching service
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
AWS Database Services Fully managed SQL database service for OLTP
workloads
Fully managed NoSQL service for massively
scalable, high throughput, low latency workloads
Fully managed, fast and powerful, petabyte-scale
data warehouse service
Fully managed Memcached-compliant in memory caching service
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Traditional data warehousing is expensive and
complicated
Expensive Hardware and Software
Complex Tuning and Admin
Enterprises average between 3
and 4 DBAs per data
warehouse
Source: Oracle technology global price list 11/1/2012
Gartner: Critical factors in calculating the data warehouse TCO, July 2009
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Customers Aren’t Happy with Today’s Solutions
Large Companies Small Companies
Expensive
Hard to scale
Can’t afford to have a data warehouse
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Data warehousing done the AWS way
• Pay as you go, no up front costs
• Fast, cheap, easy to use
• SQL
• Provision in minutes
Introducing Amazon Redshift
Data Warehousing the AWS Way
Easily and rapidly analyze
petabytes of data
1/10 the cost of traditional data
warehouses
Automated deployment &
administration
Compatible with popular BI tools
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Most data never makes it to a data warehouse
1990 2000 2010 2020
The Data Analysis Gap
Enterprise Data
Data in Warehouse
Enterprise Data is growing at over 50% yearly
Data Warehousing growing at less than 10% yearly
Most data is left on the floor
Sources:
Gartner: User Survey Analysis: Key Trends Shaping the Future of Data Center Infrastructure Through 2011
IDC: Worldwide Business Analytics Software 2012–2016 Forecast and 2011 Vendor Shares
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
We set out to build… A fast and powerful, petabyte-scale data warehouse that is:
A Lot Faster
A Lot Cheaper
A Lot Simpler
Amazon Redshift
Delivered as a
Managed Service
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Data warehousing performance is all about IO
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Amazon Redshift dramatically reduces I/O
Data compression
Zone maps
Direct-attached storage
Large data block sizes
ID Age State Amount
123 20 CA 500
345 25 WA 250
678 40 FL 125
957 37 WA 375
• With row storage you do
unnecessary I/O
• To get total amount, you have to
read everything
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Amazon Redshift dramatically reduces I/O
Data compression
Zone maps
Direct-attached storage
Large data block sizes
ID Age State Amount
123 20 CA 500
345 25 WA 250
678 40 FL 125
957 37 WA 375
• With column storage, you only
read the data you need
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Amazon Redshift dramatically reduces I/O Column storage
Data compression
Zone maps
Direct-attached storage
Large data block sizes
• Columnar compression saves
space & reduces I/O
• Amazon Redshift analyzes and
compresses your data
analyze compression listing;
Table | Column | Encoding
---------+----------------+----------
listing | listid | delta
listing | sellerid | delta32k
listing | eventid | delta32k
listing | dateid | bytedict
listing | numtickets | bytedict
listing | priceperticket | delta32k
listing | totalprice | mostly32
listing | listtime | raw
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Amazon Redshift dramatically reduces I/O Column storage
Data compression
Direct-attached storage
Large data block sizes
• Keep track of the minimum and
maximum value for each block
• Skip over blocks that don’t
contain the data needed for a
given query
• Minimize unnecessary I/O
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Amazon Redshift dramatically reduces I/O Column storage
Data compression
Zone maps
Direct-attached storage
Large data block sizes
• Use direct-attached storage to
maximize throughput
• Hardware optimized for high
performance data processing
• Large block sizes to make the
most of each read
• Amazon Redshift manages
durability for you
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Amazon Redshift architecture Leader Node
• SQL endpoint
• Stores metadata
• Coordinates query execution
Compute Nodes • Local, columnar storage
• Execute queries in parallel
• Load, backup, restore via Amazon S3
• Parallel load from Amazon DynamoDB
Single node version available
10 GigE
(HPC)
Ingestion
Backup
Restore
JDBC/ODBC
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Amazon Redshift runs on optimized hardware HS1.8XL: 128 GB RAM, 16 Cores, 24 Spindles, 16 TB compressed user storage, 2 GB/sec scan rate
HS1.XL: 16 GB RAM, 2 Cores, 3 Spindles, 2 TB compressed customer storage
Optimized for I/O intensive workloads
High disk density
Runs in HPC - fast network
HS1.8XL available on Amazon EC2
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Amazon Redshift parallelizes and distributes everything
Query
Load
Backup/Restore
Resize
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
• Load in parallel from Amazon S3 or Amazon DynamoDB
• Data automatically distributed and sorted
• Scales linearly with number of nodes
Query
Load
Backup/Restore
Resize
Amazon Redshift parallelizes and distributes everything
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
• Backups to Amazon S3 are automatic, continuous and incremental
• Configurable system snapshot retention period
• Take user snapshots on-demand
• Streaming restores enable you to resume querying faster
Query
Load
Backup/Restore
Resize
Amazon Redshift parallelizes and distributes everything
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
• Resize while remaining online
• Provision a new cluster in the
background
• Copy data in parallel from node to
node
• Only charged for source cluster
Query
Load
Backup/Restore
Resize
Amazon Redshift parallelizes and distributes everything
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Query
Load
Backup/Restore
Resize
• Automatic SQL endpoint switchover
via DNS
• Decommission the source cluster
• Simple operation via AWS Console or
API
Amazon Redshift parallelizes and distributes everything
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Amazon Redshift lets you start small and grow big
Extra Large Node (HS1.XL) 3 spindles, 2 TB, 16 GB RAM, 2 cores
Single Node (2 TB)
Cluster 2-32 Nodes (4 TB – 64 TB)
Eight Extra Large Node (HS1.8XL) 24 spindles, 16 TB, 128 GB RAM, 16 cores, 10 GigE
Cluster 2-100 Nodes (32 TB – 1.6 PB)
Note: Nodes not to scale
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Amazon Redshift is priced to let you analyze all your data
Price Per Hour for
HS1.XL Single Node
Effective Hourly Price
Per TB
Effective Annual Price
per TB
On-Demand $ 0.850 $ 0.425 $ 3,723
1 Year
Reservation
$ 0.500 $ 0.250 $ 2,190
3 Year
Reservation
$ 0.228 $ 0.114 $ 999
Simple Pricing
Number of Nodes x Cost per Hour
No charge for Leader Node
No upfront costs
Pay as you go
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Amazon Redshift is easy to use Provision in minutes
Monitor query performance
Point and click resize
Built in security
Automatic backups
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Provision a data warehouse in minutes
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Monitor query performance
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Deep dive analysis
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Point and click resize
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Amazon Redshift has security built-in
SSL to secure data in transit
Encryption to secure data at rest • AES-256; hardware accelerated
• All blocks on disks and in Amazon S3 encrypted
No direct access to compute nodes
Amazon VPC support
10 GigE
(HPC)
Ingestion
Backup
Restore
Customer VPC
Internal
VPC
JDBC/ODBC
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Amazon Redshift continuously backs up your data and
recovers from failures
Replication within the cluster and backup to Amazon S3 to maintain multiple copies of
data at all times
Backups to Amazon S3 are continuous, automatic, and incremental
• Designed for eleven nines of durability
Continuous monitoring and automated recovery from failures of drives and nodes
Able to restore snapshots to any Availability Zone within a region
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Amazon Redshift integrates with multiple data sources
Amazon
DynamoDB
Amazon Elastic
MapReduce
Amazon Simple
Storage Service (S3)
Amazon Elastic
Compute Cloud
(EC2)
AWS Storage
Gateway
Service
Corporate
Data Center
Amazon Relational
Database Service
(RDS)
Amazon
Redshift
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Amazon Redshift provides multiple data loading options
Upload to Amazon S3
AWS Import/Export
AWS Direct Connect
Work with a partner
Data Integration
Systems Integrators
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Amazon Redshift works with your existing analysis tools
JDBC/ODBC
Amazon Redshift
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Pilot results have been dramatic
Tested 2 Billion row data set, 6
representative queries on a 2-
node Amazon Redshift cluster
Queries ran between 12x and
150x faster
Current environment:
32 nodes, 128 CPUs, 4.2TB
RAM, 1.6 PB disk
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Reporting Warehouse
Accelerated operational reporting
Support for short-time use cases
Data compression, index redundancy
RDBMS Redshift
OLTP
ERP Reporting
and BI
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Data Integration Partners*
On-Premises Integration
RDBMS Redshift
OLTP
ERP Reporting
and BI
* as of 3/14/2013
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Live Archive for (Structured) Big Data
Direct integration with copy command
High velocity data ages into Redshift
Low cost, high scale option for new apps
DynamoDB Redshift
OLTP
Web Apps Reporting
and BI
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Cloud ETL for Big Data
Maintain online SQL access to historical logs
Transformation and enrichment with EMR
Longer history ensures better insight
Redshift Reporting
and BI EMR
S3
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Resources & Questions
Ben Butler | [email protected]
RedShift on AWS - http://aws.amazon.com/redshift
Marketplace - https://aws.amazon.com/marketplace/redshift/
Documentation/User Guide - http://aws.amazon.com/documentation/redshift/
Best Practices
• http://docs.aws.amazon.com/redshift/latest/dg/c_designing-tables-best-practices.html
• http://docs.aws.amazon.com/redshift/latest/dg/c_loading-data-best-practices.html
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Introducing Amazon Redshift
Amazon’s Data Warehouse as a Service
http://aws.amazon.com/resources/databaseservices/webinars
Ben Butler, Solutions Architect
Worldwide Public Sector