AWS re:Invent 2016: Leveraging Amazon Machine Learning, Amazon Redshift, and an Amazon Simple...

35
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Daryan Dehghanpisheh, SVP Digital Strategy, The Howard Hughes Corporation Mick Bass, CEO, 47Lining November 2016 MAC302 Leveraging Amazon Machine Learning, Amazon Redshift, and an Amazon S3 Data Lake for Strategic Advantage in Real Estate

Transcript of AWS re:Invent 2016: Leveraging Amazon Machine Learning, Amazon Redshift, and an Amazon Simple...

Page 1: AWS re:Invent 2016: Leveraging Amazon Machine Learning, Amazon Redshift, and an Amazon Simple Storage Service Data Lake for Strategic Advantage in Real Estate (MAC302)

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Daryan Dehghanpisheh, SVP Digital Strategy, The Howard Hughes Corporation

Mick Bass, CEO, 47Lining

November 2016

MAC302

Leveraging Amazon Machine Learning,

Amazon Redshift, and an Amazon S3 Data

Lake for Strategic Advantage in Real Estate

Page 2: AWS re:Invent 2016: Leveraging Amazon Machine Learning, Amazon Redshift, and an Amazon Simple Storage Service Data Lake for Strategic Advantage in Real Estate (MAC302)

Speakers

Daryan DehghanpishehSVP, Digital Strategy

The Howard Hughes Corporation

Mick BassCEO

47Lining

Page 3: AWS re:Invent 2016: Leveraging Amazon Machine Learning, Amazon Redshift, and an Amazon Simple Storage Service Data Lake for Strategic Advantage in Real Estate (MAC302)

What to Expect from the Session

How to use machine learning to

improve business results

How to architect a data lake atop S3

that fuses on-premises, 3rd party

and public data sets

How to commission Lakeshore

Analytics in Amazon Redshift

Strategies for development of

summaries and aggregates for

Amazon Machine Learning

Training and running Amazon

Machine Learning to attain

predictive accuracy

Page 4: AWS re:Invent 2016: Leveraging Amazon Machine Learning, Amazon Redshift, and an Amazon Simple Storage Service Data Lake for Strategic Advantage in Real Estate (MAC302)

Project Alamo

Trump’s Data Lake

220M+ Voter Profiles

100K Hyper Targeted Ads

$200M+ Donations

< 120 Days

Page 5: AWS re:Invent 2016: Leveraging Amazon Machine Learning, Amazon Redshift, and an Amazon Simple Storage Service Data Lake for Strategic Advantage in Real Estate (MAC302)

Challenge

Page 6: AWS re:Invent 2016: Leveraging Amazon Machine Learning, Amazon Redshift, and an Amazon Simple Storage Service Data Lake for Strategic Advantage in Real Estate (MAC302)

About

Seaport District Ward Village The Woodlands

Downtown Summerlin Summerlin Downtown Columbia

New York, NY Honolulu, HI The Woodlands, TX

Summerlin, NV Summerlin, NV Columbia, MD

Page 7: AWS re:Invent 2016: Leveraging Amazon Machine Learning, Amazon Redshift, and an Amazon Simple Storage Service Data Lake for Strategic Advantage in Real Estate (MAC302)

Capital intensive Lots of human touch Micro/macro exposure

Long product cycles Commodity offering Fragmented market

About Real Estate

Page 8: AWS re:Invent 2016: Leveraging Amazon Machine Learning, Amazon Redshift, and an Amazon Simple Storage Service Data Lake for Strategic Advantage in Real Estate (MAC302)

“Big Data” problems we want to solve.

• Can we more accurately predict trends?

• Can we better forecast product demand?

• Can we speed up our sales cycles & time to money?

• Can we more accurately assess value & price?

• Can we use non-traditional data to find causality & correlation?

Page 9: AWS re:Invent 2016: Leveraging Amazon Machine Learning, Amazon Redshift, and an Amazon Simple Storage Service Data Lake for Strategic Advantage in Real Estate (MAC302)

The Team’s Task: Design a scalable solution that’s cost effective.

1. Combine large public & private data sources.

2. Simply perform lots of complex joins.

3. Improve the company’s data hygiene.

4. Enrich pricing & valuation models.

5. Build proprietary models & sources.

Page 10: AWS re:Invent 2016: Leveraging Amazon Machine Learning, Amazon Redshift, and an Amazon Simple Storage Service Data Lake for Strategic Advantage in Real Estate (MAC302)

And… do all of this without adding labor costs

or exploding our infrastructure & license costs.

Page 11: AWS re:Invent 2016: Leveraging Amazon Machine Learning, Amazon Redshift, and an Amazon Simple Storage Service Data Lake for Strategic Advantage in Real Estate (MAC302)

The big mental shifts…

Talent in the group is a profit center, not a cost center.

Our data is a key asset, worth a true dollar figure.

func(digital != “IT”);

Page 12: AWS re:Invent 2016: Leveraging Amazon Machine Learning, Amazon Redshift, and an Amazon Simple Storage Service Data Lake for Strategic Advantage in Real Estate (MAC302)

Case study:

Propensity to Buy Luxury PropertyPredictive Analytics Example

Page 13: AWS re:Invent 2016: Leveraging Amazon Machine Learning, Amazon Redshift, and an Amazon Simple Storage Service Data Lake for Strategic Advantage in Real Estate (MAC302)

Luxury Leads Business Requirements:

The test:

Can we accurately identify potential buyers using data?

The conditions:

• New luxury product in an untested market.

• Need new leads beyond in-bound requests.

• Must drive down our cost per lead.

• Build a machine to provide continuous insight.

Page 14: AWS re:Invent 2016: Leveraging Amazon Machine Learning, Amazon Redshift, and an Amazon Simple Storage Service Data Lake for Strategic Advantage in Real Estate (MAC302)

Luxury Leads High-Level Process

Target Market Whole US Market

Transactions B

Transactions A

“Union View”

Combined Data

Sources

Data AugmentedFeatures / Signatures

Generate

Clustering & ML

Features

Data AugmentedFeatures / Signatures

Clusters

Segments

Personas

Machine Learning

Propensity Predictor

Sourced Data

Refined

Engagement

Mechanisms

(US)

Refined

Engagement

Mechanisms

(Target)

Machine Learning

Propensity Predictor

Leads Database

Proactive

Call

List

Lead

Scoring

(Historic

TAM)

Page 15: AWS re:Invent 2016: Leveraging Amazon Machine Learning, Amazon Redshift, and an Amazon Simple Storage Service Data Lake for Strategic Advantage in Real Estate (MAC302)

Solution

Page 16: AWS re:Invent 2016: Leveraging Amazon Machine Learning, Amazon Redshift, and an Amazon Simple Storage Service Data Lake for Strategic Advantage in Real Estate (MAC302)

What is a Data Lake?

A “Data Lake” is a repository that holds raw data in

its native format until it is needed by down stream

analytics processes.

Page 17: AWS re:Invent 2016: Leveraging Amazon Machine Learning, Amazon Redshift, and an Amazon Simple Storage Service Data Lake for Strategic Advantage in Real Estate (MAC302)

Why is S3 a Natural Fit for Data Lakes?

No need to build a complicated stack

Simplicity = Freedom

• Inherent redundancy at low cost

• Massively parallel IO

• Separate storage from compute

• Integration with Amazon Redshift, EMR, etc.

Page 18: AWS re:Invent 2016: Leveraging Amazon Machine Learning, Amazon Redshift, and an Amazon Simple Storage Service Data Lake for Strategic Advantage in Real Estate (MAC302)

Can I Just Put Data in S3 and Call it a Lake?

Unmanaged Lake = Swamp

• How do you find things?

• Does everyone just have access to

everything?

• Does all data stay there forever?

Page 19: AWS re:Invent 2016: Leveraging Amazon Machine Learning, Amazon Redshift, and an Amazon Simple Storage Service Data Lake for Strategic Advantage in Real Estate (MAC302)

What Separates a Data Lake from a Data Swamp?

Intelligent use of storage

conventions in S3

Fine-grained permissionsto contribute, discover, transform and consume data

Data ingest standards

Defining and enforcing data governance processes

MetadataTo enable search, discovery

Page 20: AWS re:Invent 2016: Leveraging Amazon Machine Learning, Amazon Redshift, and an Amazon Simple Storage Service Data Lake for Strategic Advantage in Real Estate (MAC302)

Data Lake Reference Architecture

BITools

DataContributors

ManagedEnterpriseDataLake

ExternalSystems

DataLakeGovernors(Governance,En tlements)

DataConsumersB2E|B2B|B2CDirectUsers

BusinessProcesses

RawSubmissionsUntransformedBatch|Stream

ManagedDatasetsDataManagedbyLake

Suppor ngSchema-on-ReadUsageData|Metadata

PublishedDataIndexed,consumablevia

HADataLakeAPI

Indices,History

Contribute

Man

age

Consume

Search

Rules,Policies&En tlementsContribute|Manage|Transform|Access

Rule-DrivenIncrementalLoads,Transforms,Cataloging/Indexing,Publishing

Iden ty&Security Indexing&Search

IngestWorkers,Loaders

DataLakeAPI

AgileLakeshoreAnaly cs

DataMgmt&Orchestra on

DataLakeUI

Owned|On-prem

3rdParty

Partners|VendorsCustomers

AWS

Director

y

Service

Roles

AWSIAM

Perms

AWS

KMS

Monitoring

DataLakeWebUis

Elas cBeanstalk

SearchManage

Consume

SearchManage

Consume

Elas cBeanstalk

SingleSign-OnUnifiedPolicy-BasedEn tlements

S3|Submissions

AmazonKinesis|Submissions

AmazonS3|Content

S3|WorkinProgress

SQSQueue

Lambda

WorkerTier

RDSUI,App&APIState

AmazonDynamoDBDiscoveryViews

AmazonCloudSearch

Facets|Indices|Views

AmazonDynamoDBHAPublishedResults

RStudio AmazonMachine

Learning

Hadoop/SparkOn-demand

Elas cMapReduce,

QuboleRedshi

On-DemandWarehouses

BI/Visualiza on

AWSData

Pipelineairflo

w

TableauServer

AmazonQuicksight

AmazonCloudWatchCloudCheckrAWSCloudTrailDataDog

DataEcosystemAPIUsers

AmazonRedshi

Page 21: AWS re:Invent 2016: Leveraging Amazon Machine Learning, Amazon Redshift, and an Amazon Simple Storage Service Data Lake for Strategic Advantage in Real Estate (MAC302)

Luxury Leads High-Level Process

High Level Data Flow & Ops Model

Analysis WIP

Submission

Dataset

“Union View”

Sourced data

Owned data

Data

contributors

Defined

submission

mechanisms

Data lake

governors

• Define datasets managed within data lake

• Define submission mechanisms for each dataset

• Manage submission & access entitlements

• Govern costs associated with datasets & Lakeshore Analytics

• Work with business owners to define required Lakeshore Analytics

• Submit data

using defined

submission

mechanisms

Lakeshore

Analytics

• Consume datasets from data lake

• Use analysis WIP

• Manufacture published results1

2 3

4

Amazon

Kinesis S3

EMR Amazon

Redshift

Generate

clustering & ML

features

Page 22: AWS re:Invent 2016: Leveraging Amazon Machine Learning, Amazon Redshift, and an Amazon Simple Storage Service Data Lake for Strategic Advantage in Real Estate (MAC302)

Luxury Leads High-Level Process

Generate Clustering & ML Features

Extracted Features

Clustering

Dimensions

Distance

Heuristic

R Cluster Analysishierarchical | model-based

Profiling

Dimensions

Segment

Analysis

Feature

Definitions

N distinct buyer

personas emerged“Union View”

Sourced Data

Owned Data

Leads Cluster Analysis

Page 23: AWS re:Invent 2016: Leveraging Amazon Machine Learning, Amazon Redshift, and an Amazon Simple Storage Service Data Lake for Strategic Advantage in Real Estate (MAC302)

Luxury Leads High-Level Process

Buyer Personas for Marketing

Descriptive analytics on buyer personas = ability

to refine engagement models

Cluster 9

Page 24: AWS re:Invent 2016: Leveraging Amazon Machine Learning, Amazon Redshift, and an Amazon Simple Storage Service Data Lake for Strategic Advantage in Real Estate (MAC302)

Luxury Leads High-Level Process

Lead Scoring

1. Train the Model

Qualified

Candidates

ML Training Inputs(per candidate, “rewound” history)

Transaction History

Buy/Sell Quantities

Property TypesTime

3rd Party Data

For each candidate, model predicts:

Total Amount of Future Real Estate Purchases

US Real Estate Activityall buyers & sellers,

all transaction types,past 30-Years

Per-Candidate Statisticsnumber & size of

purchases/sales, locations, …

Bought Nothing

PercentileRank

Bought Most

Bought Little

0%

15%

40%

100%

model predicts rank of candidate

to +/- 20% of actual rank 70% of

the time

Training process detects complex patterns in training inputs that the model uses to

make predictions. Patterns are not available externally.

Train Model Generate Predictions

Page 25: AWS re:Invent 2016: Leveraging Amazon Machine Learning, Amazon Redshift, and an Amazon Simple Storage Service Data Lake for Strategic Advantage in Real Estate (MAC302)

Lead Scoring

2. Use the Model

Qualified

Candidates

Will Buy Nothing

PercentileRank Will Buy Most

Will Buy Some

0%

60% Sales Focus

Current Data(per candidate)

Transaction History

Buy/Sell Quantities

Property TypesTime

3rd Party Data

Predicted Rank of

Candidates

Generic Luxury Leads High-Level Process

100%

Current Data and

Rank Predictions

Re-Generated Each Night

Page 26: AWS re:Invent 2016: Leveraging Amazon Machine Learning, Amazon Redshift, and an Amazon Simple Storage Service Data Lake for Strategic Advantage in Real Estate (MAC302)

Scored Leads for Sales Team

Scored call list of real people who have bought

high-end real estate – tied to 9 personas

Luxury Leads High-Level Process

Leads Cluster Analysis

Page 27: AWS re:Invent 2016: Leveraging Amazon Machine Learning, Amazon Redshift, and an Amazon Simple Storage Service Data Lake for Strategic Advantage in Real Estate (MAC302)

Luxury Leads High-Level Process

Lead Scoring

3. Review / Refine Model Performance

Page 28: AWS re:Invent 2016: Leveraging Amazon Machine Learning, Amazon Redshift, and an Amazon Simple Storage Service Data Lake for Strategic Advantage in Real Estate (MAC302)

Improving ML predictive accuracy…use Amazon Redshift to extract use-case–specific

features

Examples:

• Aggregate computation, e.g., average consumption per month / year

• Periodic behavior frequency extraction

• Volatility analysis & extraction

• Time-series difference analysis

(e.g. average time between A and B, time-adjusted values)

Page 29: AWS re:Invent 2016: Leveraging Amazon Machine Learning, Amazon Redshift, and an Amazon Simple Storage Service Data Lake for Strategic Advantage in Real Estate (MAC302)

Amazon Redshift + Amazon Machine Learning

…better together

Time-series difference analysis example

today

recent past

Behaviors 1

today

A Long Time Ago

Behaviors 2

A

B

C

D

A

A

A

B

B

C

C

D

D

Behaviors 1 Behaviors 2

Net Value

Fre

qu

ency

Selll to Buy

Hold

Tim

e

Amazon

Redshift

Amazon

ML

Amazon

Redshift

Page 30: AWS re:Invent 2016: Leveraging Amazon Machine Learning, Amazon Redshift, and an Amazon Simple Storage Service Data Lake for Strategic Advantage in Real Estate (MAC302)

Technical Benefits of Approach:

Managed services that “just work”

providing speed, agility and scale

Amazon ML delivered

higher predictive accuracy

for propensity to buy

Page 31: AWS re:Invent 2016: Leveraging Amazon Machine Learning, Amazon Redshift, and an Amazon Simple Storage Service Data Lake for Strategic Advantage in Real Estate (MAC302)

Payoff

Page 32: AWS re:Invent 2016: Leveraging Amazon Machine Learning, Amazon Redshift, and an Amazon Simple Storage Service Data Lake for Strategic Advantage in Real Estate (MAC302)

Business benefits of approach:

• Extensible.

• Adaptive.

• Open standards. Can work with lots of partners.

• On demand.

• Ever growing talent pool.

Page 33: AWS re:Invent 2016: Leveraging Amazon Machine Learning, Amazon Redshift, and an Amazon Simple Storage Service Data Lake for Strategic Advantage in Real Estate (MAC302)

Robots

Rock!

Page 34: AWS re:Invent 2016: Leveraging Amazon Machine Learning, Amazon Redshift, and an Amazon Simple Storage Service Data Lake for Strategic Advantage in Real Estate (MAC302)

Thank you!

Page 35: AWS re:Invent 2016: Leveraging Amazon Machine Learning, Amazon Redshift, and an Amazon Simple Storage Service Data Lake for Strategic Advantage in Real Estate (MAC302)

Remember to complete

your evaluations!