Strata+Hadoop World NY 2016 - Avinash Ramineni

20
Strata Hadoop World | New York City | September 29th, 2016 Choice Hotels’ journey to better understand its customers through self-service analytics Narasimhan Sampath & Avinash Ramineni

Transcript of Strata+Hadoop World NY 2016 - Avinash Ramineni

StrataHadoopWorld|NewYorkCity|September29th,2016

Choice Hotels’ journey to better understand its customers

through self-service analytics

NarasimhanSampath&AvinashRamineni

Agenda

• Who is Choice Hotels

• Platform Architecture

• Implementation

• Value Add

StrataHadoopWord|NewYorkCity|September29th,2016Page3

Who is Choice Hotels?

Page4 StrataHadoopWord|NewYorkCity|September29th,2016

Who is Choice Hotels?

UnitedStates&CaribbeanHotelsopen 5,276Hotelsunderdevelopment 606Roomsopen&underdev. 446,813

CanadaHotelsopen 323Hotelsunderdevelopment 45Roomsopen&underdev. 30,135

SouthAmericaHotelsopen 64Hotelsunderdevelopment 7Roomsopen&underdev. 9,737

AsiaPacificHotelsopen 315Hotelsunderdevelopment 25Roomsopen&underdev. 23,289

EuropeHotelsopen 402Hotelsunderdevelopment 31Roomsopen&underdev. 50,388

MexicoHotelsopen 28Hotelsunderdevelopment 4Roomsopen&underdev. 3,219

CentralAmericaHotelsopen 14Hotelsunderdevelopment 0Roomsopen&underdev. 1,468

MiddleEastHotelsopen 1Hotelsunderdevelopment 2Roomsopen&underdev. 564

How About a Technology Company?

StrataHadoopWorld|NewYorkCity|September29th,2016Page6

Evolution of Guest Experience

Page7 StrataHadoopWord|NewYorkCity|September29th,2016

Project Goals

Page8 StrataHadoopWord|NewYorkCity|September29th,2016

• Business Drivers− SelfServiceReportingandAnalytics− Requirementsfornearreal-timeanalytics− SimplifyGovernance,ComplianceandAuditing− Bettersupportfornewapplications

• Technical Drivers− Unabletohandlevolume,velocity,andveracity− RetireLegacySystems− Difficulttofindskillset(Informix4GL)− SimplifyTechnologyStack

Key Design Tenets

• Separation of Compute and Storage• Independentlyscalecomputeandstorage• DataDemocratizationandGovernance• BringyourownCompute(BYOC)

• Lift and Shift between cloud provider(s) and On-premise

• HA / DR

• Open Source Stack

Page9

Separation of Compute and Storage

• Scale storage and compute independently (up or down)

• Shifts bottleneck from Disk IO to Network

• Centralized Data Storage • Write once & read everywhere• Data Democratization

• Easier Hardware upgrade paths

• Flexibile ArchitecturePage10

Storage

Servers

BYOC (Bring Your Own Cluster)

• Eliminates the need for very large clusters

• Easier to administer and maintain

• Reduces multi-tenancy issues

• Clusters can be upgraded independently

• Enables on-demand computing

• Lower costsPage11

MarketingCluster

CentralizedStorage

PersonalizationCluster

MainCluster

Platform Architecture

Page12

Platform Architecture – Data Ingestion Layer

• DB Ingestor

• Stream Ingestor− KafkaandSparkStreaming

• File Ingestor

• FTP / SFTP / Logs

• Ingestion using Service APIPage13

Platform Architecture – Data Processing Layer

• Storage layer carved into logical buckets• Landing, Raw, Derived and Delivery• Schema stored with data (no guesswork)

• Platform Jobs for• Converting text to Parquet• Saving streaming data Parquet• Derivatives• Compaction• Standardization

Page14

Platform Architecture – Data Delivery Layer

• Data Delivery • SQL - Spark Thrift Server / Impala

• Tableau, SQL IDE, Applications• SparkR

• Self Service • Derivatives

• Represented Via SQL on Delivery Layer• Stored in Derived Storage Layer • Metadata driven

• Derived Layer Generators• Long running Spark Job• Derivative Refresh

Page15

Implementation

• CDH Cloud ready-ness• Cloudera Director Limitations• Multi-Availability zone, regions

• Spark Thrift Server• Support• Performance Tuning• Concurrency, partition strategy• Cache Tables

• Security• Sentry Integration• Kerberos Ticket Renewal• Navigator Integration

Page16

Implementation

• Rapidly Changing Technology• Feature addition• Documentation• Bugs• Jar hell

• Compression Codec for Parquet

• S3 Eventual Consistency

• Small files • Performance Issues• Compaction

Page17

Implementation

• Partition Strategy• Parquet Files

• Balancing parallelism and throughput• Table Partitions

• Cluster sizing, optimization and tuning

• Integrating with Corporate infrastructure• Deployment practices• Monitoring and Alerting• Information Security Policies

Page18

Value Add

Enabling predictive analytics and real-time decisions

Integrated Scorecards – Daily /Weekly / Monthly Insights Near Real Time / Hourly / Daily Insights

Multivariate Testing, APT (Test vs. Control Analysis), and Text Analytics

Testing for Both Hotel and Customer / Research For Guest Insights

Personalized Display Ad Serving Real-time Actions (Machine Learning) Across Guest Touch Points

Hotel Lifecycle Data Real-time Alerts for Hotel Related Actions

StrataHadoopWorld|NewYorkCity|September29th,2016Page19

• One of the fastest growing big data companies

• Extensive experience in providing strategic and architectural consulting on Big

Data platforms and implementations

• Global delivery experience across multiple locations in US, Asia and Latin

America

• 100+ big data experts worldwide - US, Latin America and Asia

B A C K G R O U N D

C L A I R V O Y A N T S O F T . C O M

CLAIRVOYANT

A W A R D S & R E C O G N I T I O N

Questions

StrataHadoopWord|NewYorkCity|September29th,2016Page21

Principal @ Clairvoyant Email: [email protected]: https://www.linkedin.com/in/avinashramineni