HauteLook + Redshift

13
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc. HauteLook + Redshift A Case Study Kevin Diamond, HauteLook November 15, 2014

description

AWS Re:invent 2013 presentation about HauteLook's initial usage of the Redshift Data Warehouse platform.

Transcript of HauteLook + Redshift

HauteLook + Redshift A Case Study

HauteLook + RedshiftA Case StudyKevin Diamond, HauteLookNovember 15, 2014 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.Who am I? Kevin DiamondCTO of HauteLook, a Nordstrom CompanyOversee all technology, infrastructure, data, engineering, etc.Major focus on great customer experience and the analytics to provide itWhat is HauteLookPrivate sale, members-only limited-time sale eventsPremium fashion and lifestyle brands at exclusive prices of 50-75% offOver 20 new sale events begin each morning at 8am PSTOver 14 million membersAcquired by Nordstrom in 2011

Why a DataWarehouseCentralized storage of multiple data sourcesSingular reporting consistency for all departmentsData model that supports analytics not transactionsOperational reports vs. Analytical ReportsReal-Time vs. Previous DayWhy RedshiftLooked at some competitors:Ranged from $ to $$$All required Software, Implementation and BIG HardwareSkipped the RFPJumped into the Public Beta of RedShift and never looked backHow We Implemented RedshiftETL From MySQL and MSSQL into AWS across a Direct Connect line storing on S3Also used S3 to dump flat files (iTunes Connect Data, Web Analytics dumps, log files, etc)Used Data Pipeline for executing Sqoop and Hadoop running on EC2 to load data into RedshiftRedshift Data Model based on Star Schema which looks something likeExample of Star Scheme

Usage with Business IntelligenceAlready selected a BI Tool

Had difficulty deploying in the cloudBut worked great on-premiseEasily tied into Redshift using ODBC DriversBUT, metadata for reports had to live in MSSQLPorted many SSIS/SSRS reports overBut only the analytical reports!

And it all looks like this

Redshift InstancesWe use a little under 2TBThought to use 2 - BIG 8XL instance to get great performance (in passive failover mode)Cost us $$$Then we tested using 6 - XL instances in a clusterPerformed better allowing for more concurrency of queries in all but a handful of cases that really needed the 8XL powerCost us $Duh! Thats why we do distributed everything else!!Some First Hand ExperienceETL was hardest partRedshift performs awesomeSomeone needs to make a great client SQL toolMicroStrategy works great on it (just wished it loved running in EC2)Saving a ton thanks to:No hardware costsNo maintenance/overhead (rack + power)Annual costs are equivalent to just the annual maintenance of some of the cheaper DW on-premise optionsConclusion/Last AdviceOnly use 8XL instances if you need >2TB of spaceOtherwise distribute on a bunch of XL nodesBuy reserved instances (we still need to do this!) since you likely will have this always onAlthough we havent yet, the idea of a flexible scale-up/down DW is crazy awesome, maybe during Holiday we willProbably could have used Elastic MapReduce instead of Hadoop, just didnt, wasnt sure how it would play with SqoopAlmost all BI tools play with Redshift now, so choose what is right for your business, and make sure it works in EC2 before just putting it thereCommunication between AWS and your DC is easy and fast, but I recommend a Direct ConnectPassed our rigorous information security standards, but used in a VPCDAT205Please give us your feedback on this presentationAs a thank you, we will select prize winners daily for completed surveys!