AWS re:Invent 2016: Migrating Your Data Warehouse to Amazon Redshift (DAT202)
Leveraging Amazon Redshift for your Data Warehouse
-
Upload
amazon-web-services -
Category
Technology
-
view
386 -
download
0
Transcript of Leveraging Amazon Redshift for your Data Warehouse
![Page 1: Leveraging Amazon Redshift for your Data Warehouse](https://reader030.fdocuments.net/reader030/viewer/2022032506/55cd81dfbb61ebfe758b468b/html5/thumbnails/1.jpg)
©2015, Amazon Web Services, Inc. or its affiliates. All rights reserved
Leveraging Amazon Redshift for Your
Data Warehouse
John Loughlin, Solutions Architect @ AWS
Kyle Hubert, Principal Data Architect @ Simulmedia
![Page 2: Leveraging Amazon Redshift for your Data Warehouse](https://reader030.fdocuments.net/reader030/viewer/2022032506/55cd81dfbb61ebfe758b468b/html5/thumbnails/2.jpg)
Petabyte scale
Massively parallel
Relational data warehouse
Fully managed; zero admin
Amazon
Redshift
a lot faster
a lot cheaper
a whole lot simpler
![Page 3: Leveraging Amazon Redshift for your Data Warehouse](https://reader030.fdocuments.net/reader030/viewer/2022032506/55cd81dfbb61ebfe758b468b/html5/thumbnails/3.jpg)
Amazon
Redshift
Amazon
EMR
Amazon
EC2
Analyze
AWS Data
Pipeline
Amazon
Glacier
Amazon
DynamoDB
Store
AWS Direct
Connect
Collect
Amazon Kinesis
Amazon
S3
![Page 4: Leveraging Amazon Redshift for your Data Warehouse](https://reader030.fdocuments.net/reader030/viewer/2022032506/55cd81dfbb61ebfe758b468b/html5/thumbnails/4.jpg)
Common customer use cases
• Reduce costs by extending
DW rather than adding HW
• Migrate completely from
existing DW systems
• Respond faster to business
• Improve performance by an
order of magnitude
• Make more data available
for analysis
• Access business data via
standard reporting tools
• Add analytic functionality to
applications
• Scale DW capacity as
demand grows
• Reduce HW and SW costs
by an order of magnitude
Traditional enterprise DW Companies with big data SaaS companies
![Page 5: Leveraging Amazon Redshift for your Data Warehouse](https://reader030.fdocuments.net/reader030/viewer/2022032506/55cd81dfbb61ebfe758b468b/html5/thumbnails/5.jpg)
Amazon.com enterprise data warehouse
• Generates weblogs @ 2 terabytes/day, growing 67% YoY
• Oracle RAC legacy system
• Scan rate: 1 week of data/hour
• Hit RAC node limit of 32 nodes
• More data => Slower queries
• Migrated to Amazon Redshift
• Scan rate: 15 months of data (2.25 trillion rows) in 14 minutes
• More than 10 x performance with 100 node cluster
• 21 billion rows joined with 10 billion rows in under 2 hours, from
days
![Page 6: Leveraging Amazon Redshift for your Data Warehouse](https://reader030.fdocuments.net/reader030/viewer/2022032506/55cd81dfbb61ebfe758b468b/html5/thumbnails/6.jpg)
Amazon Redshift architecture
• Leader node
– SQL endpoint, JDBC/ODBC
– Stores metadata
– Coordinates query execution
• Compute nodes
– Local, columnar storage
– Execute queries in parallel
– Load, backup, restore via Amazon S3
– Load from Amazon DynamoDB or SS
• Two hardware platforms
– Optimized for data processing
– DS2: HDD; scale from 2TB to 2PB
– DC1: SSD; scale from 160 GB to 326 TB
10 GigE
(HPC)
IngestionBackupRestore
JDBC/ODBC
![Page 7: Leveraging Amazon Redshift for your Data Warehouse](https://reader030.fdocuments.net/reader030/viewer/2022032506/55cd81dfbb61ebfe758b468b/html5/thumbnails/7.jpg)
Amazon Redshift node types
• Optimized for I/O intensive workloads
• High disk density
• On demand at $0.85/hour
• As low as $1,000/TB/year
• Scale from 2 TB to 2 PB
DS2.XL: 31 GB RAM, 2 cores
2 TB compressed storage, 0.5 GB/sec
scan
DS2.8XL: 244 GB RAM, 16 cores
16 TB compressed, 4 GB/sec scan
• High performance at smaller storage size
• High compute and memory density
• On demand at $0.25/hour
• As low as $5,500/TB/year
• Scale from 160 GB to 326 TB
DC1.L: 16 GB RAM, 2 cores
160 GB of compressed SSD storage
DC1.8XL: 256 GB RAM, 32 cores
2.56 TB of compressed SSD storage
![Page 8: Leveraging Amazon Redshift for your Data Warehouse](https://reader030.fdocuments.net/reader030/viewer/2022032506/55cd81dfbb61ebfe758b468b/html5/thumbnails/8.jpg)
Amazon Redshift lets you analyze all your data
Price is nodes times
hourly cost
No charge for leader
node
3 x data compression
on average
Price includes 3 copies
of data
DS2 (HDD)Price per hour for
smallest single node
Effective annual
price per TB compressed
On-Demand $ 0.850 $ 3,725
1 Year Reservation $ 0.500 $ 2,190
3 Year Reservation $ 0.228 $ 999
DC1 (SSD)Price per hour for
smallest single node
Effective annual
price per TB compressed
On-Demand $ 0.250 $ 13,690
1 Year Reservation $ 0.161 $ 8,795
3 Year Reservation $ 0.100 $ 5,500
![Page 9: Leveraging Amazon Redshift for your Data Warehouse](https://reader030.fdocuments.net/reader030/viewer/2022032506/55cd81dfbb61ebfe758b468b/html5/thumbnails/9.jpg)
Amazon Redshift works with your analysis tools
JDBC/ODBC
Amazon Redshift
![Page 10: Leveraging Amazon Redshift for your Data Warehouse](https://reader030.fdocuments.net/reader030/viewer/2022032506/55cd81dfbb61ebfe758b468b/html5/thumbnails/10.jpg)
Amazon Redshift is easy to use
• Provision in minutes
• Monitor query
performance
• Point and click
resize
• Automatic backup
• Built-in security
![Page 11: Leveraging Amazon Redshift for your Data Warehouse](https://reader030.fdocuments.net/reader030/viewer/2022032506/55cd81dfbb61ebfe758b468b/html5/thumbnails/11.jpg)
Amazon Redshift continuously backs up your
data and recovers from failures
• Replication within the cluster and backup to Amazon S3 to
maintain multiple copies of data at all times
• Backups to Amazon S3 are continuous, automatic, and
incremental
– Designed for eleven nines of durability
• Continuous monitoring and automated recovery from failures of
drives and nodes
• Able to restore snapshots to any Availability Zone within a region
• Easily enable backups to a second region for disaster recovery
![Page 12: Leveraging Amazon Redshift for your Data Warehouse](https://reader030.fdocuments.net/reader030/viewer/2022032506/55cd81dfbb61ebfe758b468b/html5/thumbnails/12.jpg)
Amazon Redshift has security built-in
• Load encrypted from S3
• SSL to secure data in transit; ECDHE
perfect forward security
• Encryption to secure data at rest
– All blocks on disks and in S3 encrypted
– Block key, cluster key, master key (AES-
256)
– On-premises HSM and AWS CloudHSM
support
• Audit logging and AWS CloudTrail
integration
• Amazon VPC support
• SOC 1/2/3, PCI-DSS Level 1, FedRAMP
10 GigE
(HPC)
Ingestion
Backup
Restore
Customer VPC
InternalVPC
JDBC/ODBC
![Page 13: Leveraging Amazon Redshift for your Data Warehouse](https://reader030.fdocuments.net/reader030/viewer/2022032506/55cd81dfbb61ebfe758b468b/html5/thumbnails/13.jpg)
Amazon Redshift dramatically reduces I/O
• Column storage
• Data compression
• Zone maps
• Direct-attached storage • With row storage you do
unnecessary I/O
• To get the total amount, you
have to read everything
ID Age State Amount
123 20 CA 500
345 25 WA 250
678 40 FL 125
957 37 WA 375
![Page 14: Leveraging Amazon Redshift for your Data Warehouse](https://reader030.fdocuments.net/reader030/viewer/2022032506/55cd81dfbb61ebfe758b468b/html5/thumbnails/14.jpg)
• With column storage, you
only read the data you
need
Amazon Redshift dramatically reduces I/O
• Column storage
• Data compression
• Zone maps
• Direct-attached storage
ID Age State Amount
123 20 CA 500
345 25 WA 250
678 40 FL 125
957 37 WA 375
![Page 15: Leveraging Amazon Redshift for your Data Warehouse](https://reader030.fdocuments.net/reader030/viewer/2022032506/55cd81dfbb61ebfe758b468b/html5/thumbnails/15.jpg)
analyze compression listing;
Table | Column | Encoding
---------+----------------+----------
listing | listid | delta
listing | sellerid | delta32k
listing | eventid | delta32k
listing | dateid | bytedict
listing | numtickets | bytedict
listing | priceperticket | delta32k
listing | totalprice | mostly32
listing | listtime | raw
Amazon Redshift dramatically reduces I/O
• Column storage
• Data compression
• Zone maps
• Direct-attached storage • COPY compresses
automatically
• You can analyze and override
• More performance, less cost
![Page 16: Leveraging Amazon Redshift for your Data Warehouse](https://reader030.fdocuments.net/reader030/viewer/2022032506/55cd81dfbb61ebfe758b468b/html5/thumbnails/16.jpg)
Amazon Redshift dramatically reduces I/O
• Column storage
• Data compression
• Zone maps
• Direct-attached storage
10 | 13 | 14 | 26 |…
… | 100 | 245 | 324
375 | 393 | 417…
… 512 | 549 | 623
637 | 712 | 809 …
… | 834 | 921 | 959
10
324
375
623
637
959
• Track the minimum and
maximum value for each
block
• Skip over blocks that don’t
contain relevant data
![Page 17: Leveraging Amazon Redshift for your Data Warehouse](https://reader030.fdocuments.net/reader030/viewer/2022032506/55cd81dfbb61ebfe758b468b/html5/thumbnails/17.jpg)
Amazon Redshift dramatically reduces I/O
• Column storage
• Data compression
• Zone maps
• Direct-attached storage
• Use local storage for
performance
• Maximize scan rates
• Automatic replication and
continuous backup
• HDD and SSD platforms
![Page 18: Leveraging Amazon Redshift for your Data Warehouse](https://reader030.fdocuments.net/reader030/viewer/2022032506/55cd81dfbb61ebfe758b468b/html5/thumbnails/18.jpg)
Amazon Redshift @ Simulmedia
![Page 19: Leveraging Amazon Redshift for your Data Warehouse](https://reader030.fdocuments.net/reader030/viewer/2022032506/55cd81dfbb61ebfe758b468b/html5/thumbnails/19.jpg)
—John Wanamaker
“Half the money I spend on advertising is wasted; the
trouble is I don't know which half.”
![Page 20: Leveraging Amazon Redshift for your Data Warehouse](https://reader030.fdocuments.net/reader030/viewer/2022032506/55cd81dfbb61ebfe758b468b/html5/thumbnails/20.jpg)
A data-centric approach to TV advertising
![Page 21: Leveraging Amazon Redshift for your Data Warehouse](https://reader030.fdocuments.net/reader030/viewer/2022032506/55cd81dfbb61ebfe758b468b/html5/thumbnails/21.jpg)
Targeted TV advertising that reaches
110 million households
![Page 22: Leveraging Amazon Redshift for your Data Warehouse](https://reader030.fdocuments.net/reader030/viewer/2022032506/55cd81dfbb61ebfe758b468b/html5/thumbnails/22.jpg)
Anonymous viewing data from millions of set-top
boxes and smart TVs overlaid with 3rd party
viewing data
![Page 23: Leveraging Amazon Redshift for your Data Warehouse](https://reader030.fdocuments.net/reader030/viewer/2022032506/55cd81dfbb61ebfe758b468b/html5/thumbnails/23.jpg)
Reinvested in our platform with Amazon Redshift
![Page 24: Leveraging Amazon Redshift for your Data Warehouse](https://reader030.fdocuments.net/reader030/viewer/2022032506/55cd81dfbb61ebfe758b468b/html5/thumbnails/24.jpg)
10–100 x improvement in performance
Decreased time to release
Proliferation of experiments on the data
![Page 25: Leveraging Amazon Redshift for your Data Warehouse](https://reader030.fdocuments.net/reader030/viewer/2022032506/55cd81dfbb61ebfe758b468b/html5/thumbnails/25.jpg)
Business opportunity/capacity has
increased exponentially;
headcount for the team has remained
stable
![Page 26: Leveraging Amazon Redshift for your Data Warehouse](https://reader030.fdocuments.net/reader030/viewer/2022032506/55cd81dfbb61ebfe758b468b/html5/thumbnails/26.jpg)
On-premises Hadoop/Hive cluster with >80
nodes storing 150 TBs of data
![Page 27: Leveraging Amazon Redshift for your Data Warehouse](https://reader030.fdocuments.net/reader030/viewer/2022032506/55cd81dfbb61ebfe758b468b/html5/thumbnails/27.jpg)
HDFS -> S3
Freedom from replication factor
Separate archives and active data set
Scalable performance
![Page 28: Leveraging Amazon Redshift for your Data Warehouse](https://reader030.fdocuments.net/reader030/viewer/2022032506/55cd81dfbb61ebfe758b468b/html5/thumbnails/28.jpg)
Production data was optimal for MPP
![Page 29: Leveraging Amazon Redshift for your Data Warehouse](https://reader030.fdocuments.net/reader030/viewer/2022032506/55cd81dfbb61ebfe758b468b/html5/thumbnails/29.jpg)
$0
$35,000
$70,000
$105,000
$140,000
$175,000
HDD SSDAmazon Redshift solution A solution B solution C solution D
MPP cost—per TB per year
![Page 30: Leveraging Amazon Redshift for your Data Warehouse](https://reader030.fdocuments.net/reader030/viewer/2022032506/55cd81dfbb61ebfe758b468b/html5/thumbnails/30.jpg)
Managed service
Continual upgrades
Automatic snapshotting
![Page 31: Leveraging Amazon Redshift for your Data Warehouse](https://reader030.fdocuments.net/reader030/viewer/2022032506/55cd81dfbb61ebfe758b468b/html5/thumbnails/31.jpg)
<1 sec to query 2 years of historical viewing data
N.B.: skinny fact table
![Page 32: Leveraging Amazon Redshift for your Data Warehouse](https://reader030.fdocuments.net/reader030/viewer/2022032506/55cd81dfbb61ebfe758b468b/html5/thumbnails/32.jpg)
Flexible data discovery period
Better understanding of data
Tuned facts and distributed dimensions
![Page 33: Leveraging Amazon Redshift for your Data Warehouse](https://reader030.fdocuments.net/reader030/viewer/2022032506/55cd81dfbb61ebfe758b468b/html5/thumbnails/33.jpg)
Production Amazon Redshift cluster with 3
nodes storing ~1.4 TB
Non-production Amazon Redshift cluster
with 2 nodes storing ~8 TB
![Page 34: Leveraging Amazon Redshift for your Data Warehouse](https://reader030.fdocuments.net/reader030/viewer/2022032506/55cd81dfbb61ebfe758b468b/html5/thumbnails/34.jpg)
S3 data lake
Minor transformations during ingestion
Idempotent audit tables in Amazon Redshift
Star schema design
![Page 35: Leveraging Amazon Redshift for your Data Warehouse](https://reader030.fdocuments.net/reader030/viewer/2022032506/55cd81dfbb61ebfe758b468b/html5/thumbnails/35.jpg)
Decreased our infrastructure costs
Cleaned up our architecture
Operationally complexity removed
Capacity planning eased
![Page 36: Leveraging Amazon Redshift for your Data Warehouse](https://reader030.fdocuments.net/reader030/viewer/2022032506/55cd81dfbb61ebfe758b468b/html5/thumbnails/36.jpg)
Demographics/Targeting/Forecasting
From ~1 hour to ~10 seconds
![Page 37: Leveraging Amazon Redshift for your Data Warehouse](https://reader030.fdocuments.net/reader030/viewer/2022032506/55cd81dfbb61ebfe758b468b/html5/thumbnails/37.jpg)
Measurement
from ~7–10 hours to ~5 minutes
![Page 38: Leveraging Amazon Redshift for your Data Warehouse](https://reader030.fdocuments.net/reader030/viewer/2022032506/55cd81dfbb61ebfe758b468b/html5/thumbnails/38.jpg)
SQL everywhere
![Page 39: Leveraging Amazon Redshift for your Data Warehouse](https://reader030.fdocuments.net/reader030/viewer/2022032506/55cd81dfbb61ebfe758b468b/html5/thumbnails/39.jpg)
Data science:
Improve forecasting
Improve optimizations
Improve measurement
![Page 40: Leveraging Amazon Redshift for your Data Warehouse](https://reader030.fdocuments.net/reader030/viewer/2022032506/55cd81dfbb61ebfe758b468b/html5/thumbnails/40.jpg)
Analytics:
Build new reports
Discover more about effective spots
![Page 41: Leveraging Amazon Redshift for your Data Warehouse](https://reader030.fdocuments.net/reader030/viewer/2022032506/55cd81dfbb61ebfe758b468b/html5/thumbnails/41.jpg)
Best practices
![Page 42: Leveraging Amazon Redshift for your Data Warehouse](https://reader030.fdocuments.net/reader030/viewer/2022032506/55cd81dfbb61ebfe758b468b/html5/thumbnails/42.jpg)
Learn the Amazon Redshift Management Console:
Set up queueing
Set up alerts
Track CPU utilization when debugging
![Page 43: Leveraging Amazon Redshift for your Data Warehouse](https://reader030.fdocuments.net/reader030/viewer/2022032506/55cd81dfbb61ebfe758b468b/html5/thumbnails/43.jpg)
Low concurrency (1–3 queries)
Alerts on disk usage
Query execution details
![Page 44: Leveraging Amazon Redshift for your Data Warehouse](https://reader030.fdocuments.net/reader030/viewer/2022032506/55cd81dfbb61ebfe758b468b/html5/thumbnails/44.jpg)
COPY/UNLOAD
Remember to analyze tables for planner
Take advantage of compression analysis
![Page 45: Leveraging Amazon Redshift for your Data Warehouse](https://reader030.fdocuments.net/reader030/viewer/2022032506/55cd81dfbb61ebfe758b468b/html5/thumbnails/45.jpg)
Use timestamp/date data types
(Add timezone to column name)
Use varchar
![Page 46: Leveraging Amazon Redshift for your Data Warehouse](https://reader030.fdocuments.net/reader030/viewer/2022032506/55cd81dfbb61ebfe758b468b/html5/thumbnails/46.jpg)
Your Feedback is Important to AWSPlease complete the session evaluation. Tell us what you think!
![Page 47: Leveraging Amazon Redshift for your Data Warehouse](https://reader030.fdocuments.net/reader030/viewer/2022032506/55cd81dfbb61ebfe758b468b/html5/thumbnails/47.jpg)
NEW YORK