Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) | AWS re:Invent 2013

46
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc. DAT 205 - Amazon Redshift in Action Enterprise, Big Data, and SaaS Use Cases November 15, 2013

description

Since Amazon Redshift launched last year, it has been adopted by a wide variety of companies for data warehousing. In this session, learn how customers NASDAQ, HauteLook, and Roundarch Isobar are taking advantage of Amazon Redshift for three unique use cases: enterprise, big data, and SaaS. Learn about their implementations and how they made data analysis faster, cheaper, and easier with Amazon Redshift.

Transcript of Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) | AWS re:Invent 2013

Page 1: Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) | AWS re:Invent 2013

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.

DAT 205 - Amazon Redshift in Action

Enterprise, Big Data, and SaaS Use Cases

November 15, 2013

Page 2: Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) | AWS re:Invent 2013

Amazon Redshift

Fast, simple, petabyte-scale data warehousing for less than $1,000/TB/Year

Page 3: Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) | AWS re:Invent 2013

Amazon Redshift architecture

• Leader Node – SQL endpoint

– Stores metadata

– Coordinates query execution

• Compute Nodes – Local, columnar storage

– Execute queries in parallel

– Load, backup, restore via Amazon S3

– Parallel load from Amazon DynamoDB

• Single node version available

10 GigE

(HPC)

Ingestion

Backup

Restore

JDBC/ODBC

Page 4: Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) | AWS re:Invent 2013

Amazon Redshift is priced to let you analyze all your data

Price Per Hour for

HS1.XL Single Node

Effective Hourly

Price per TB

Effective Annual

Price per TB

On-Demand $ 0.850 $ 0.425 $ 3,723

1 Year Reservation $ 0.500 $ 0.250 $ 2,190

3 Year Reservation $ 0.228 $ 0.114 $ 999

Simple Pricing

Number of Nodes x Cost per Hour

No charge for Leader Node

No upfront costs

Pay as you go

Page 5: Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) | AWS re:Invent 2013

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.

Data Warehousing for Capital Markets

Jason Timmes, AVP of Software Development, NASDAQ OMX

November 15, 2013

Page 6: Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) | AWS re:Invent 2013

Where innovation meets action

6

WE LIST ~3300 GLOBAL COMPANIES WORTH

IN MARKET CAP REPRESENTING

$6 TRILLION

DIVERSE INDUSTRIES AND

MANY OF THE WORLD’S

MOST WELL-KNOWN AND

INNOVATIVE BRANDS

3 CLEARINGHOUSES

WE OWN AND OPERATE

26 MARKETS

AND 5 CENTRAL

SECURITIES DEPOSITORIES

MORE THAN 5500 STRUCTURED PRODUCTS

ARE TIED TO OUR GLOBAL INDEXES WITH THE NOTIONAL VALUE OF

AT LEAST $1 TRILLION

OUR TECHNOLOGY IS USED TO POWER MORE THAN

IN 50 COUNTRIES 70 MARKETPLACES

OUR GLOBAL PLATFORM

CAN HANDLE MORE THAN

1 MILLION MESSAGES/SECOND AT A MEDIAN SPEED OF SUB-55 MICROSECONDS

including

W E P O W E R 1 IN 10

OF THE WORLD’S SECURITIES TRANSACTIONS

Page 7: Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) | AWS re:Invent 2013

What I do

New data and analytics platforms to store and

serve data to internal and external customers.

Page 8: Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) | AWS re:Invent 2013

The Challenge

• Archiving Market Data – classic “Big Data” problem

• Power Surveillance and Business Intelligence/Analytics

• Minimize cost – Not only infrastructure, but development/IT labor costs too

• Empower the business for self-service

Page 9: Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) | AWS re:Invent 2013

Financial Information Forum, Redistribution without permission from FIF prohibited, email: [email protected]

SIP Total Monthly Message VolumesOPRA, UQDF and CQS

23

OPRA Annual Increase: 69%CQS Annual Increase: 10%UQDF Annual Decrease: 6%

Total Monthly Message Volume Average Daily

Volume Date OPRA

Aug-12 80,600,107,361 3,504,352,494

Sep-12 77,303,404,427 4,068,600,233

Oct-12 98,407,788,187 4,686,085,152

Nov-12 104,739,265,089 4,987,584,052

Dec-12 81,363,853,339 4,068,192,667

Jan-13 82,227,243,377 3,915,583,018

Feb-13 87,207,025,489 4,589,843,447

Mar-13 93,573,969,245 4,678,698,462

Apr-13 123,865,614,055 5,630,255,184

May-13 134,587,099,561 6,117,595,435

Jun-13 162,771,803,250 8,138,590,163

Jul-13 120,920,111,089 5,496,368,686

Aug-13 136,237,441,349 6,192,610,970

Total Monthly Message Volume Combined Average Daily

Volume Date UQDF CQSAug-12 2,317,804,321 8,241,554,280 459,102,548Sep-12 1,948,330,199 7,452,279,225 494,768,917Oct-12 1,016,336,632 7,452,279,225 403,267,422Nov-12 2,148,867,295 9,552,313,807 557,199,100Dec-12 2,017,355,401 8,052,399,165 503,487,728Jan-13 2,099,233,536 7,474,101,082 455,873,077Feb-13 1,969,123,978 7,531,093,813 500,011,463Mar-13 2,010,832,630 7,896,498,260 495,366,545Apr-13 2,447,109,450 9,805,224,566 556,924,273

May-13 2,400,946,680 9,430,865,048 537,809,624Jun-13 2,601,863,331 11,062,086,463 683,197,490Jul-13 2,142,134,920 8,266,215,553 473,106,840

Aug-13 2,188,338,764 9,079,813,726 512,188,750

0

100,000,000

200,000,000

300,000,000

400,000,000

500,000,000

600,000,000

Jan-13 Feb-13 Mar-13 Apr-13 May-13 Jun-13 Jul-13 Aug-13 Sep-13

NASDAQ Exchange Daily Peak Messages

Market

Data

Is Big

Data Charts courtesy of the

Financial Information

Forum

Page 10: Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) | AWS re:Invent 2013

Our legacy solution

• On-premises MPP DB – Relatively expensive, finite storage

– Required periodic additional expenses to add more storage

– Ongoing IT (administrative) human costs

• Legacy BI tool – Requires developer involvement for new data sources, reports,

dashboards, etc.

Page 11: Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) | AWS re:Invent 2013

New Solution: Amazon Redshift

• Cost Effective – Redshift is 43% of the cost of legacy

• Assuming equal storage capacities

– Doesn’t include IT ongoing costs!

• Performance – Easily outperforms our legacy BI/DB solution

– Insert 550K rows/second on a 2 node 8XL cluster

• Elastic – Add additional capacity on demand, easy to grow our cluster

Page 12: Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) | AWS re:Invent 2013

New Solution: Pentaho BI/ETL • Amazon Redshift partner

– http://aws.amazon.com/redshift/partners/pentaho/

• Self Service – Tools empower BI users to

integrate new data sources, create their own analytics, dashboards, and reports without requiring development involvement

• Cost effective

Page 13: Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) | AWS re:Invent 2013

Net Result

• New solution is cheaper, faster, and offers

capabilities that our business didn’t have before – Empowers our business users to explore data like they never

could before

– Reduces IT and development as bottlenecks

– Margin improvement (expense reduction and supports business

decisions to grow revenue)

Page 14: Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) | AWS re:Invent 2013

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.

HauteLook + Amazon Redshift

A Case Study

Kevin Diamond, HauteLook

November 15, 2014

Page 15: Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) | AWS re:Invent 2013

Who am I? Kevin Diamond

• CTO of HauteLook, a Nordstrom Company

• Oversee all technology, infrastructure, data,

engineering, etc.

• Major focus on great customer experience and

the analytics to provide it

Page 16: Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) | AWS re:Invent 2013

What is HauteLook?

• Private sale, members-only limited-time sale events

• Premium fashion and lifestyle brands at exclusive prices of

50-75% off

• Over 20 new sale events begin each morning at 8am PST

• Over 14 million members

• Acquired by Nordstrom in 2011

Page 17: Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) | AWS re:Invent 2013

Why a Data Warehouse?

• Centralized storage of multiple data sources

• Singular reporting consistency for all departments

• Data model that supports analytics not transactions

• Operational reports vs. analytical reports – Real-time vs. previous day

Page 18: Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) | AWS re:Invent 2013

Why Amazon Redshift?

• Looked at some competitors: – Ranged from $ to $$$

– All required Software, Implementation and BIG Hardware

• Skipped the RFP

• Jumped into the Public Beta of Amazon Redshift and never looked back

Page 19: Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) | AWS re:Invent 2013

How We Implemented Amazon Redshift

• ETL from MySQL and Microsoft SQL Server into AWS across a Direct Connect line storing on S3

• Also used S3 to dump flat files (iTunes Connect Data, Web Analytics dumps, log files, etc)

• Used AWS Data Pipeline for executing Sqoop and Hadoop running on EC2 to load data into Amazon Redshift

• Redshift Data Model based on Star Schema which looks something like …

Page 20: Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) | AWS re:Invent 2013

Example of Star Schema

Page 21: Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) | AWS re:Invent 2013

Usage with Business Intelligence

• Already selected a BI Tool

• Had difficulty deploying in the cloud

• But worked great on-premises

• Easily tied into Amazon Redshift using ODBC Drivers

• BUT, metadata for reports had to live in MSSQL

• Ported many SSIS/SSRS reports over

– But only the analytical reports!

Page 22: Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) | AWS re:Invent 2013

And it all looks like this

Page 23: Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) | AWS re:Invent 2013

Amazon Redshift Instances

• We use a little under 2TB

• Thought to use 2 - BIG 8XL instance to get great performance (in passive failover mode)

• Cost us $$$

• Then we tested using 6 - XL instances in a cluster

• Performed better and allowed for more concurrency of queries in all but a handful of cases that really needed the 8XL power

• Cost us $

• Duh! That’s why we do distributed everything else!!

Page 24: Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) | AWS re:Invent 2013

Some First Hand Experience

• ETL was hardest part

• Amazon Redshift performs awesome

• Someone needs to make a great client SQL tool

• MicroStrategy works great on it (just wished it loved running in EC2)

• Saving a ton, thanks to:

– No hardware costs

– No maintenance/overhead (rack + power)

– Annual costs are equivalent to just the annual maintenance of some of the cheaper DW on-premises options

Page 25: Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) | AWS re:Invent 2013

Conclusion/Last Advice • Only use 8XL instances if you need >2TB of space

– Otherwise distribute on a bunch of XL nodes

• Buy reserved instances (we still need to do this!) since you likely will have this always on

• Although we haven’t yet, the idea of a flexible scale-up/down DW is crazy awesome – maybe during Holiday we will

• Probably could have used Elastic MapReduce instead of Hadoop – wasn’t sure how it would play with Sqoop

• Almost all BI tools play with Amazon Redshift now, so choose what is right for your business, and make sure it works in EC2 before just putting it there

• Communication between AWS and your DC is easy and fast, but I recommend a Direct Connect

• Passed our rigorous information security standards, but used in a VPC

Page 26: Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) | AWS re:Invent 2013

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.

Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases

Parag Thakker – VP, Roundarch Isobar

Colin McGuigan – Architect, Roundarch Isobar

November 15th, 2013

Page 27: Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) | AWS re:Invent 2013

27 27 27 27 27 27 27

OUR SERVICES ACROSS BOUGHT, OWNED AND EARNED MEDIA

Strategies We digitally transform business processes and

disrupt industries

Campaigns We create, measure and optimize digitally-focused

campaigns

Business planning: competitive & industry analysis, business cases, maturity models, roadmaps Strategies: brand, interactive, multi-channel, social, content

Audience insight Communications planning Creative: advertising, visual design, content creation, studio production Optimization: analytics, monitoring, SEO, MVT, media ROI analysis

Experiences We produce joyful

experiences that inspire consumer interaction

Platforms We design and build flexible and scalable technology solutions

Research: competitive, segmentation, persona development, heuristics Requirements and specifications: content analysis and specs, functional requirements, functional specifications User experience design: information architecture, taxonomy and meta data, interaction design, mobile

Platforms: content management, search, portals, mobile, front-end technology, internet-enabled devices/wearables, social apps, web services, security, big data, hosting

Products We invent digital

products that generate new revenue streams

Digital products Digital product extensions Brand as a service

roundarch isobar

Page 28: Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) | AWS re:Invent 2013

28 28 28 28 28 28 28

• 4-5+ million pages daily (40-70 Mbit/sec)

• Portal availability over 99.9% of time

• 28 production enterprise services

• Over 300 applications available

• Public-facing and secure private instances (NIPR & SIPR)

• Portal support for over 5,000 “Communities of Interest”

Key metrics for our USAF work include:

• 900,000+ registered users

• 700,000+ PK-E users

• Response time worldwide: 3 seconds for 80% of all pages

• Over 1.2 million logins/week

• 124,000 unique daily users

U.S. Air Force

We have served the U.S. Air Force since 2001, building their enterprise portal and many mission-critical applications

Page 29: Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) | AWS re:Invent 2013

29 29 29 29 29 29 29

Transforming in-stadium operations through a touch-screen command center

Our executive touch-screen environment provides real-time stadium

and game data, allowing the Jets owner, Woody Johnson, to monitor

the fan experience during game time and make operational

decisions that help maximize sales. The command center provides

summary-level and drill-down views of stadium operations such as

tickets, parking and concessions. It also creates predictive

algorithms that help identify pinch points and open revenue

opportunities.

New York Jets

“We brought the big picture close enough to identify new, better ways to do business.”

Page 30: Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) | AWS re:Invent 2013

30 30 30 30 30 30 30

Technology:

• JavaScript, HTML5, CSS3

• Uses Jquery,

JavascriptMVC, Less

• JSON Web Services

• Java, Spring, JPA, Mongo

DB

• User comment: “We love

how fast it is!”

• Facilitates collaboration between

portfolio managers and analysts

• Provides a holistic view of a

company/stock

– What is everything our

organization knows about

AAPL

• Digitizes PDF/Excel tools and

reports to enable rich, dynamic

interactions

• Simplifies content creation; e.g.,

comments, recommendation

reports, document upload

• Rich charting and visualization of

analytics

William Blair | Investment Research Management System

Through a joint venture with Copia Capital, we created a new product offering for William Blair

Page 31: Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) | AWS re:Invent 2013

31 31 31 31 31 31

What is the focus of your CMO today?

Optimize marketing spend across all channels (Bought, Earned and Owned)

Page 32: Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) | AWS re:Invent 2013

32 32 32 32 32 32

billions marketing spend

dozens media channels

hundreds data sources

multiple terabytes data size

multiple clients

domain

Search

Display

Ads

Email

Affiliate

Social

Print

Mobile

Sales TV

Radio

Web

Page 33: Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) | AWS re:Invent 2013

marketing effectiveness stages

Analyze

Learn

Optimize

• Centralized cross channel

Big Data Platform

• Standardized cross channel

reporting tools

• Discovery tools to identify

channel optimization

opportunities

• Modeling solutions

• Channel experience

enhancements

• Improved media buying,

planning & reporting functions

• Real time integration into DSP

• A/B testing based micro

segment adjustments

DLP AMNET

Scorecard

Scorecard

Compass

Real-Time and Non-Real-Time

Sonar

Page 34: Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) | AWS re:Invent 2013

34 34 34 34 34 34

So what have we accomplished?

Built Marketing Analytics Platform - Radar to enable in-time analytics, reporting and optimization for multiple clients with customized metrics with 200+ feeds (1TB/week) with various frequency, granularity and classification as scalable multi-tenant SaaS platform on Amazon with first launch in 3 months

Page 35: Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) | AWS re:Invent 2013

35 35 35 35 35 35

scorecard dashboard

Page 36: Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) | AWS re:Invent 2013

36 36 36 36 36 36 36

Detailed Analytic Reports

Scorecard App

TV

DDS

Media Team Client Stakeholders

Media Team Planners Client Team

scorecard logical architecture

Paid Search

Google Bing

Marin

Organic Search

Google Bing

Sales

TBD

Digital Video Custom

Site Metrics

Google Omniture

Display

Google DFA

Radio

DDS

Paid Social Facebook

Twitter

Print OOH

DDS

Earned Social Facebook

Twitter

Competit

ive Custom

Page 37: Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) | AWS re:Invent 2013

37 37 37 37 37 37

Voluminous Data

Digital

CRM

Research

- Surveys

- Demographics - Campaigns

- Search - Mobile - Attribution - Site - Social - Display

- Cookie Level - UGC - Geospatial - Weather - Sales - Competitive

DA

TA V

OLU

ME

VARIETY and GRANULARITY

data sources

Page 38: Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) | AWS re:Invent 2013

38 38 38 38 38 38

WWW

tech architecture

BI Tools

Analysts

SaaS Reporting Platform

Clients

Hadoop EMR

S3 Redshift MySQL RDS EC2 Beanstalk

Radio

Display Ads

Search

Social

Feeds

Page 39: Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) | AWS re:Invent 2013

39 39 39 39 39 39

Files loaded on Amazon S3/Amazon Glacier

Extract

Utilize Pig on Amazon EMR to cleanse, standardize and validate the data

Transform

Load

ETL

Use COPY to load Pig output

Hadoop EMR

S3 Redshift

Glacier Radio

Display Ads

Search

Social

Feeds

Page 40: Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) | AWS re:Invent 2013

40 40 40 40 40 40

For BI / adhoc analysis

ODBC and JDBC access

Cheap, fast, easily scalable

Performance

data warehouse

Handles humongous aggregation quickly

Redshift

Tableau, BI Tools

Analysts

Page 41: Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) | AWS re:Invent 2013

41 41 41 41 41 41

In Amazon Redshift using SQL

Multi-step aggregation

Join performance data with metadata

Mapping

in MySQL for sub second web response

Load aggregates

Redshift MySQL RDS

SQL

Views, Clicks, CTR, CPC etc

Product, Campaign

Radio

Display Ads

Search

Social

aggregation

Aggregates

Page 42: Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) | AWS re:Invent 2013

42 42 42 42 42 42

Job control dashboard

Jenkins for client+channel ETL

Data intake/extract Amazon DynamoDB for state management

Ruby for provisioning, job flow

Amazon EMR clusters

On demand, job-initiated

data workflow

Hadoop EMR Redshift MySQL RDS S3

DynamoDB

Jenkins

Ruby

Page 43: Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) | AWS re:Invent 2013

43 43 43 43 43 43

Hardware and location

Designed for redundancy

Managed services

Multi-Tenant

For clients

Automated stack provisioning

SaaS dashboard

Load Balancing

ElastiCache

DNS

Client1.com Client2.com

EC2 Beanstalk

MySQL RDS

Page 44: Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) | AWS re:Invent 2013

44 44 44 44 44 44

Scalable

Highly

Quickly with reduced risk

Innovate

To market

Time

Operational overhead

Lower

AWS advantages

Ruby

DevOps Developers

Python

Java

AWS Ops

US AMAZON

Page 45: Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) | AWS re:Invent 2013

45 45 45 45 45 45

Metadata is more important than the data

learnings

Design for scalability upfront

Always explore better ways to aggregate

Cost management is very important

Build Agile: Perform early end-to-end validation on smaller dataset Separate data visualization, data cleansing, storage & data aggregation

Be smart about implementing data aggregation routines across multiple granularities

Page 46: Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) | AWS re:Invent 2013

Please give us your feedback on this

presentation

As a thank you, we will select prize

winners daily for completed surveys!

DAT205