AWS re:Invent re:Cap - AWS re:Invent 2014 주요 발표 및 강연 정리 - Thomas Park
(FIN401) Seismic Shift: Nasdaq's Migration to Amazon Redshift | AWS re:Invent 2014
-
Upload
amazon-web-services -
Category
Technology
-
view
454 -
download
4
description
Transcript of (FIN401) Seismic Shift: Nasdaq's Migration to Amazon Redshift | AWS re:Invent 2014
2
We make the
world’s capital markets
move faster more efficientmore transparent
Public company
in S&P 500
Develop and run markets globally in
all asset classes
We provide technology, trading,
intelligence and listing services
Intense Operational Focus
on Efficiency and Competitiveness
We provide the infrastructure, tools and strategic
insight to help our customers navigate the complexity of global capital markets and realize their capital ambitions.
Get to know usWe have uniquely transformed our business from predominately a U.S. equities exchange to a
global provider of corporate, trading, technology and information solutions.
3
LEADING INDEX PROVIDER WITH
41,000+ INDEXES ACROSS ASSET CLASSES AND
GEOGRAPHIES
Over 10,000 Corporate Clients in
60 countries
Our technology
powers over
70
MARKETPLACES,
regulators, CSDs
and clearing-
houses
in over
50 COUNTRIES
100+ DATA
PRODUCT OFFERINGS
supporting 2.5+ millioninvestment professionals
and users
IN 98 COUNTRIES
26 Markets
3 Clearing Houses
5 Central Securities
Depositories
Lists more than 3,500
companies in 35 countries,
representing more than $8.8
trillion in total market value
Our warehouse can be used to
analyze market share, client
activity, surveillance, power our
billing, and more…
• A quality of an action such that repetitions of the
action have no further effect on outcome– In other words, f(x) = f(f(x)) = f(f(f(x))), etc.
• Ingest process is designed as a workflow engine
with each step in each workflow being idempotent.
• Failures are easily recovered by repeating the failed
step after resolving the root cause of any failure.
• Use a manifest file inside a transaction with a table
lock, and keep a record of completed ingests
• If the S3 COPY (insert) fails, rollback the transaction
• If the insert succeeds, write a record of the
completed ingest, and commit the transaction
• Idempotence: start transaction, lock destination
table, check for prior successful ingest, and only
start insert if data hasn’t already been loaded today
• Pay close attention to the mandatory flag!
• Redshift UNLOAD always sets this to false!!!
• TableIngestStatus– We originally put this table in Redshift itself
– Turns out Redshift is not efficient on really small data sets
– Significantly impacted performance, and increased concurrency
contention
• Solution: Moved TableIngestStatus to a separate
transactional RDBMS (MySQL)– We were already using a MySQL instance to persist workflow
states
• Multiple layers of security– Direct Connect (private lines)
– VPC
– HTTPS/SSL/TLS (Encryption in flight)
– AES-256 (Encryption at rest in S3)
– Redshift encryption (Encryption at rest in Redshift)
– HSM integration (Redshift master key managed on premise)
– CloudTrail/STL_CONNECTION_LOG to monitor for unauthorized
DB connections
• Direct Connect– No company data travels over internet circuits
• VPC– Isolate our Redshift servers from other tenets/internet connectivity
– Security Groups restrict inbound/outbound connectivity
• All AWS API calls are made over HTTPS
• All Redshift JDBC connections must use SSL/TLS– Parameter Group: require_ssl = true
– Use Redshift cluster SSL certificate to verify cluster identity
• See http://docs.aws.amazon.com/redshift/latest/mgmt/connecting-ssl-
support.html for details
• All Redshift load files staged in S3 are AES-256
encrypted (client side, not S3 SSE)– Key is provided to Redshift in the S3 COPY command:
• Enable cluster encryption on Redshift– Only specified during cluster creation, cannot be changed
– Applies to backups/snapshots as well
– Performance penalty, but not optional for Nasdaq
copy nbbo from 's3://my_ingest/2014-09-17/nbbo.manifest'
credentials 'aws_access_key_id=<access-key-id>;
aws_secret_access_key=<secret-access-key>;master_symmetric_key=<master_key>'
manifest encrypted gzip;
• Redshift will store the cluster key in a single
customer premise HSM (or CloudHSM)– SafeNet Luna SA HSM, firmware version should match CloudHSM
– Requires certificate exchange between cluster and HSM
– Requires cluster have an EIP
• On our side, required static 1-to-1 NAT of HSM private IP
• VPC Security Groups still apply; can still isolate cluster from others
– Encrypted database key decrypted in HSM, passed over encrypted
channel to cluster on startup, stored in memory to decrypt data
encryption (block) keys
– If running an HSM HA group, must synchronize keys after creation
• HSM integration was critical to Nasdaq adoption
• Monitor cluster access, react to any unauthorized
connections– STL_CONNECTION_LOG
• Query system table on a timed basis, alert to any unexpected access
– CloudTrail to Splunk Redshift connection & user logs
• Captures all API calls, not activity inside Redshift
– STL_DDLTEXT
• Audits all schema changes in the cluster
• In response to an alert, Redshift/HSM connectivity is
severed, and cluster is immediately shut down
• With validation, data integrity, and security
requirements met, the challenge remains to
optimize ingest
• Why?– Concurrency is a huge performance factor; can’t afford to be
loading yesterday’s data when clients are running queries
-
20
40
60
80
100
120
140
1 2 4 6 8 10 12 14 16 18
Th
rou
gh
pu
t (M
B/s
ec)
Concurrent Threads
S3 (over HTTPS) Multithreaded Throughput
On premise AWS Regional (Multi-AZ) Scope AWS (US-East,
primary AZ/VPC)
S3
SNS
Redshift
Database
Cluster
HSM Key
Appliance
Cluster
MySQL
Redshift
Load files/
Manifests
Redshift
Snapshots/
Backups
Data
Loaded
Topic
RMS Input
Sources
(multiple
systems)
Data Ingest
Process
Please give us your feedback on this session.
Complete session evaluations and earn re:Invent swag.
http://bit.ly/awsevals