(BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent...

59
November 12, 2014 | Las Vegas, NV BDT206 See How Amazon Redshift is Powering Business Intelligence in the Enterprise Rahul Pathak, Amazon Redshift Jason Timmes, Nasdaq Kevin Diamond, Hautelook

description

"Take a look into how NordstromRack.com | HauteLook and Nasdaq OMX are using Amazon Redshift for data warehouse and supporting business intelligence workloads one year after they made the move to using Amazon Redshift. We will cover why HauteLook chose Redshift, how they built the architecture, discuss what data is being stored and accessed, and overall, how that data is powering the HauteLook business. We will also discuss how Nasdaq migrated from an on-premised data warehouse to Amazon Redshift, and how they've been able to take advantage of Redshift's array of security features such as hardware security modules (HSM), encryption, and audit-logging.

Transcript of (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent...

Page 1: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014

November 12, 2014 | Las Vegas, NV

BDT206

See How Amazon Redshift is Powering Business

Intelligence in the EnterpriseRahul Pathak, Amazon Redshift

Jason Timmes, Nasdaq

Kevin Diamond, Hautelook

Page 2: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014
Page 3: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014

Amazon

Redshift

Amazon Elastic

MapReduce

Amazon EC2

Analyze

AWS Data

Pipeline

Amazon

GlacierAmazon

DynamoDB

Store

AWS Direct

Connect

Collect

Amazon Kinesis

Amazon

S3

Page 4: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014

10 GigE

(HPC)

Ingestion

Backup

Restore

JDBC/ODBC

Page 5: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014

10 GigE

(HPC)

Ingestion

Backup

Restore

Customer VPC

Internal

VPC

JDBC/ODBC

Page 6: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014
Page 7: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014
Page 8: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014

Data Source ET

Direct

Connect

Client

Forwarder

LoaderState Management

SandboxAmazon Redshift

S3

Page 9: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014
Page 10: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014
Page 11: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014

11

LEADING INDEX PROVIDER WITH

41,000+ INDEXES ACROSS ASSET CLASSES AND

GEOGRAPHIES

Over 10,000 Corporate Clients in

60 countries

Our technology

powers over

70

MARKETPLACES,

regulators, CSDs

and clearing-

houses

in over

50 COUNTRIES

100+ DATA

PRODUCT OFFERINGS

supporting 2.5+ millioninvestment professionals

and users

IN 98 COUNTRIES

26 Markets

3 Clearing Houses

5 Central Securities

Depositories

Lists more than 3,500

companies in 35 countries,

representing more than $8.8

trillion in total market value

Page 12: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014
Page 13: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014

Our warehouse can be used to

analyze market share, client

activity, surveillance, power our

billing, and more…

Page 14: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014
Page 15: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014
Page 16: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014
Page 17: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014
Page 18: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014

• Pay close attention to manifest mandatory flag! – Amazon Redshift UNLOAD always sets this to false!!!

Page 19: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014

• TableIngestStatus– We originally put this table in Amazon Redshift itself

– Turns out Amazon Redshift is not efficient on really small data sets

– Significantly impacted performance, and increased concurrency

contention

• Solution: Moved TableIngestStatus to a separate

transactional RDBMS (MySQL)– We were already using a MySQL instance to persist workflow

states

Page 20: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014

• Direct Connect (private lines)

• VPC

• Encryption in flight (HTTPS/SSL/TLS on API, JDBC)– Parameter Group: require_ssl = true

– Use Amazon Redshift cluster SSL certificate to verify cluster

identity

• Encryption at rest– AES-256 encrypt files prior to loading to S3 (not using S3 SSE)

– Amazon Redshift encryption

• Specified at cluster creation, applies to backups/snapshots too

Page 21: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014

• Amazon Redshift will store the cluster key in a

single customer premise HSM (or CloudHSM)– SafeNet Luna SA HSM, firmware version should match CloudHSM

– Requires certificate exchange between cluster and HSM

– Requires cluster have an EIP

• On our side, required static 1-to-1 NAT of HSM private IP

• VPC Security Groups still apply; can still isolate cluster from others

– Encrypted database key decrypted in HSM, passed over encrypted

channel to cluster on startup, stored in memory to decrypt data

encryption (block) keys

– If running an HSM HA group, must synchronize keys after creation

Page 22: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014

• HSM integration was critical to Nasdaq adoption

• Monitor cluster access, react to any unauthorized

connections– STL_CONNECTION_LOG

• Query system table on a timed basis, alert to any unexpected access

– CloudTrail to Splunk Amazon Redshift connection & user logs

• Captures all API calls, not activity inside Amazon Redshift

– STL_DDLTEXT

• Audits all schema changes in the cluster

• In response to an alert, Amazon Redshift/HSM connectivity

is severed, and cluster is immediately shut down

Page 23: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014

• With validation, data integrity, and security

requirements met, the challenge remains to

optimize ingest

• Why?– Concurrency is a huge performance factor; can’t afford to be

loading yesterday’s data when clients are running queries

Page 24: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014
Page 25: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014

-

20

40

60

80

100

120

140

1 2 4 6 8 10 12 14 16 18

Th

rou

gh

pu

t (M

B/s

ec)

Concurrent Threads

S3 (over HTTPS) Multithreaded Throughput

Page 26: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014
Page 27: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014

On premises AWS Regional (Multi-AZ) Scope AWS (US-East,

primary AZ/VPC)

S3

Amazon SNS

Redshift

Database

Cluster

HSM Key

Appliance

Cluster

MySQL

Redshift

Load files/

Manifests

Redshift

Snapshots/

Backups

Data

Loaded

Topic

RMS Input

Sources

(multiple

systems)

Data Ingest

Process

Page 28: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014
Page 29: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014
Page 30: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014
Page 31: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014
Page 32: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014
Page 33: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014
Page 34: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014
Page 35: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014

November 12, 2014 | Las Vegas, NV

BDT206

See How Amazon Redshift is Powering Business

Intelligence in the Enterprise

Kevin Diamond, Nordstromrack.com | HauteLook

Page 36: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014
Page 37: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014
Page 38: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014
Page 39: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014
Page 40: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014
Page 41: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014
Page 42: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014
Page 43: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014
Page 44: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014
Page 45: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014
Page 46: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014

Amazon Redshift

Page 47: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014
Page 48: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014
Page 49: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014

Staging ProdEMR

Data Pipeline Data Pipeline

Page 50: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014

Staging Prod

Page 51: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014
Page 52: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014
Page 53: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014

medium speed

medium storage

$3.7k/month

awesome support

small storage

$3.7k/month

awesome support

medium concurrency

$10k/month

awesome support

Page 54: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014
Page 55: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014
Page 56: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014
Page 57: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014

Total Storage

Daily Transfer

Monthly Growth

Monthly Spend

Estimated 3yr Savings

Page 58: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014
Page 59: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014

http://bit.ly/awsevals