Download - (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014

Transcript
Page 1: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014

November 12, 2014 | Las Vegas, NV

BDT206

See How Amazon Redshift is Powering Business

Intelligence in the EnterpriseRahul Pathak, Amazon Redshift

Jason Timmes, Nasdaq

Kevin Diamond, Hautelook

Page 2: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014
Page 3: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014

Amazon

Redshift

Amazon Elastic

MapReduce

Amazon EC2

Analyze

AWS Data

Pipeline

Amazon

GlacierAmazon

DynamoDB

Store

AWS Direct

Connect

Collect

Amazon Kinesis

Amazon

S3

Page 4: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014

10 GigE

(HPC)

Ingestion

Backup

Restore

JDBC/ODBC

Page 5: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014

10 GigE

(HPC)

Ingestion

Backup

Restore

Customer VPC

Internal

VPC

JDBC/ODBC

Page 6: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014
Page 7: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014
Page 8: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014

Data Source ET

Direct

Connect

Client

Forwarder

LoaderState Management

SandboxAmazon Redshift

S3

Page 9: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014
Page 10: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014
Page 11: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014

11

LEADING INDEX PROVIDER WITH

41,000+ INDEXES ACROSS ASSET CLASSES AND

GEOGRAPHIES

Over 10,000 Corporate Clients in

60 countries

Our technology

powers over

70

MARKETPLACES,

regulators, CSDs

and clearing-

houses

in over

50 COUNTRIES

100+ DATA

PRODUCT OFFERINGS

supporting 2.5+ millioninvestment professionals

and users

IN 98 COUNTRIES

26 Markets

3 Clearing Houses

5 Central Securities

Depositories

Lists more than 3,500

companies in 35 countries,

representing more than $8.8

trillion in total market value

Page 12: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014
Page 13: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014

Our warehouse can be used to

analyze market share, client

activity, surveillance, power our

billing, and more…

Page 14: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014
Page 15: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014
Page 16: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014
Page 17: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014
Page 18: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014

• Pay close attention to manifest mandatory flag! – Amazon Redshift UNLOAD always sets this to false!!!

Page 19: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014

• TableIngestStatus– We originally put this table in Amazon Redshift itself

– Turns out Amazon Redshift is not efficient on really small data sets

– Significantly impacted performance, and increased concurrency

contention

• Solution: Moved TableIngestStatus to a separate

transactional RDBMS (MySQL)– We were already using a MySQL instance to persist workflow

states

Page 20: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014

• Direct Connect (private lines)

• VPC

• Encryption in flight (HTTPS/SSL/TLS on API, JDBC)– Parameter Group: require_ssl = true

– Use Amazon Redshift cluster SSL certificate to verify cluster

identity

• Encryption at rest– AES-256 encrypt files prior to loading to S3 (not using S3 SSE)

– Amazon Redshift encryption

• Specified at cluster creation, applies to backups/snapshots too

Page 21: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014

• Amazon Redshift will store the cluster key in a

single customer premise HSM (or CloudHSM)– SafeNet Luna SA HSM, firmware version should match CloudHSM

– Requires certificate exchange between cluster and HSM

– Requires cluster have an EIP

• On our side, required static 1-to-1 NAT of HSM private IP

• VPC Security Groups still apply; can still isolate cluster from others

– Encrypted database key decrypted in HSM, passed over encrypted

channel to cluster on startup, stored in memory to decrypt data

encryption (block) keys

– If running an HSM HA group, must synchronize keys after creation

Page 22: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014

• HSM integration was critical to Nasdaq adoption

• Monitor cluster access, react to any unauthorized

connections– STL_CONNECTION_LOG

• Query system table on a timed basis, alert to any unexpected access

– CloudTrail to Splunk Amazon Redshift connection & user logs

• Captures all API calls, not activity inside Amazon Redshift

– STL_DDLTEXT

• Audits all schema changes in the cluster

• In response to an alert, Amazon Redshift/HSM connectivity

is severed, and cluster is immediately shut down

Page 23: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014

• With validation, data integrity, and security

requirements met, the challenge remains to

optimize ingest

• Why?– Concurrency is a huge performance factor; can’t afford to be

loading yesterday’s data when clients are running queries

Page 24: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014
Page 25: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014

-

20

40

60

80

100

120

140

1 2 4 6 8 10 12 14 16 18

Th

rou

gh

pu

t (M

B/s

ec)

Concurrent Threads

S3 (over HTTPS) Multithreaded Throughput

Page 26: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014
Page 27: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014

On premises AWS Regional (Multi-AZ) Scope AWS (US-East,

primary AZ/VPC)

S3

Amazon SNS

Redshift

Database

Cluster

HSM Key

Appliance

Cluster

MySQL

Redshift

Load files/

Manifests

Redshift

Snapshots/

Backups

Data

Loaded

Topic

RMS Input

Sources

(multiple

systems)

Data Ingest

Process

Page 28: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014
Page 29: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014
Page 30: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014
Page 31: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014
Page 32: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014
Page 33: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014
Page 34: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014
Page 35: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014

November 12, 2014 | Las Vegas, NV

BDT206

See How Amazon Redshift is Powering Business

Intelligence in the Enterprise

Kevin Diamond, Nordstromrack.com | HauteLook

Page 36: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014
Page 37: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014
Page 38: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014
Page 39: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014
Page 40: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014
Page 41: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014
Page 42: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014
Page 43: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014
Page 44: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014
Page 45: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014
Page 46: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014

Amazon Redshift

Page 47: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014
Page 48: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014
Page 49: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014

Staging ProdEMR

Data Pipeline Data Pipeline

Page 50: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014

Staging Prod

Page 51: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014
Page 52: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014
Page 53: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014

medium speed

medium storage

$3.7k/month

awesome support

small storage

$3.7k/month

awesome support

medium concurrency

$10k/month

awesome support

Page 54: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014
Page 55: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014
Page 56: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014
Page 57: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014

Total Storage

Daily Transfer

Monthly Growth

Monthly Spend

Estimated 3yr Savings

Page 58: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014
Page 59: (BDT206) See How Amazon Redshift is Powering Business Intelligence in the Enterprise | AWS re:Invent 2014

http://bit.ly/awsevals