Post on 10-Jan-2017
Daniel Aragao & Simon Hope
Daniel Aragao Simon Hope@dear_dr_dan @mapbutcher
REALESTATE.COM.AU
6BMarket Cap
11MAustralian Properties
55MVisits in September
4.7MApp Downloads …and counting
3,500PEOPLE
13COUNTRIES
34OFFICES
TECHNOLOGY &
SOCIAL JUSTICE
• In the beginning…
• Organising our Data
• Implementation approaches
• Hipster Batches
• Reactify
• Bring Your Own Data
• Finding the Data
• What we have learned so far
THIS IS WHAT THE STORY IS ABOUT
SORRY… IT’S OK TO LEAVE NOW
• Nope, we didn’t create a new Hadoop
• No hardcore Data Science
• There are some implementation details
• REA embraced the Cloud. AWS everywhere
• Under construction
IN THE BEGINNING…
ORGANISING OUR DATA
Increasingly, content is being distributed through searchand social platforms... There’s less visiting of publishers as destinations.
Jeff Weiner, CEO, Linkedin
Data sources
Data warehouse
PROBLEM…
STRATEGY…
STRATEGY…
STRATEGY…
Data Warehouse
StagingSSIS Dim Fact
PROBLEM…
Data Warehouse
StagingSSIS Dim Fact
PROBLEM…
Star schema leaky details
No Data Warehouse
StagingSSIS Dim Fact
STRATEGY…
STRATEGY…
Data Warehouse Facade
StagingSSIS Dim Fact
???
WHAT’S IN THE BOX?
Good things come in small packages services
THE HIPSTER BATCH
???
Hipster Batch
Hipster Batch
THE HIPSTER BATCH
• Small and short lived
• Decoupled via flat files via S3
• Single purpose
• Idempotent
• Polyglot
• Minimal runtime dependencies
• Discoverable
SNS, SQS
Data
A ‘TYPICAL’ IMPLEMENTATIONHipster Batch
SNS, SQS
ASG, ECS, Lambda
Data
A ‘TYPICAL’ IMPLEMENTATIONHipster Batch
SNS, SQS
ASG, ECS, Lambda
KMS
Data
A ‘TYPICAL’ IMPLEMENTATIONHipster Batch
Logs
SNS, SQS
ASG, ECS, Lambda
KMS
Data
A ‘TYPICAL’ IMPLEMENTATIONHipster Batch
Logs
SNS, SQS
ASG, ECS, Lambda
KMS
Cloudwatch
Data
A ‘TYPICAL’ IMPLEMENTATIONHipster Batch
Logs
SNS, SQS
ASG, ECS, Lambda
KMS
Cloudwatch
S3 buckets
Data
A ‘TYPICAL’ IMPLEMENTATIONHipster Batch
Hipster Batch
HIPSTER BATCH DOES SCIENCE
• Behavioural models for targeted marketing
• Recommendation engine
• External channels
Hipster BatchSCIENCE!
x 20
Hipster Batch
Stats models
SCIENCE!
x 20
API
Hipster Batch
Stats models
SCIENCE!
API
x 20
API
Hipster Batch
Stats models
SCIENCE!
API
x 20
API
Hipster Batch
Stats models
SCIENCE!
API
x 20
API
Hipster Batch
Stats models
GoogleNowAPI
SCIENCE!
From legacy to reactive
REACTIFY
Reactify
???
Reactify
http://www.reactivemanifesto.org
REACTIFY
• Manage Data flow with messages
• Protect consumers and care about isolation
• Resilience is important and Data replication is just fine
• Demand is elastic - and your components should be too
Reactify
Listings
Data coupling
No resilience or elasticity
Coupling
PROBLEM…
Reactify
Listings
SOLUTION…
Reactify
Listings Reactify
SOLUTION…
Reactify
Listings Reactify
SOLUTION…
Reactify
Listings ReactifyHipster Batch
SOLUTION…
Reactify
Listings ReactifyHipster Batch
Shielded consumers
IsolationDecoupled
SOLUTION…
Reactify
Listings
IMPLEMENTATION…
Reactify
ListingsRESTAPI
IMPLEMENTATION…
Reactify
ListingsRESTAPI
IMPLEMENTATION…
Reactify
ListingsRESTAPI Dynamo
Event Maker
Event Differ
IMPLEMENTATION…
Reactify
ListingsRESTAPI Dynamo
Event Maker
Event Differ
Kinesis
2
IMPLEMENTATION…
2
• Exposes current state only
• Stream of change notifications
• Hypertext Application Language - HAL
• Clear entity types
• Linking over embedding
• Cacheable and discoverable
REST API
REACTIFY REST API
REST API
https://feeds.listings.realestate.com.au/combined-listings/120449689
REST API
https://feeds.listings.realestate.com.au/combined-listings/120449689
REST API
https://feeds.listings.realestate.com.au/combined-listings/120449689
REST API
https://feeds.listings.realestate.com.au/combined-listings/120449689
REST API
Event Maker
https://feeds.listings.realestate.com.au/combined-listings/-/changes
REST API
Event Maker
https://feeds.listings.realestate.com.au/combined-listings/-/changes
REST API
Event Maker
https://feeds.listings.realestate.com.au/combined-listings/-/changes
REST API
Event Maker
https://feeds.listings.realestate.com.au/combined-listings/-/changes
Reactify
Event Differ
Reactify
Event Differ
Reactify
Event Differ
Reactify
Event Differ
The octopus in the box
— Did you use that data set? — Errr… No, we have another one
BRING YOUR OWN DATA
BRING YOUR OWN DATA - BYOD
• Allow data to flow freely
• Help the business to get what they need when they need it
• Self-service
BYOD
BYOD
CSV
BYOD
CSV
x 5
BYOD
CSV
x 5
Smarts on datatypes
BYOD
CSV
x 5
TableauServer
Smarts on datatypes
BYOD
CSV
x 5
TableauServer
Smarts on datatypes
BYOD
CSV
x 5
TableauServer
Audit, auth, share…
Smarts on datatypes
These were the implementation approaches, now to…
FIND THE DATA
Meaningful, automated, and easy-to-search metadata
WE TRIED
SNS, SQS
ASG, ECS, Lambda
KMS
Cloudwatch
Logs
MORE THAN DATAHipster Batch
SNS, SQS
ASG, ECS, Lambda
KMS
Cloudwatch
Logs
MORE THAN DATAHipster Batch
SNS, SQS
ASG, ECS, Lambda
KMS
Cloudwatch
Logs
Dataz
Ancestry
MORE THAN DATAHipster Batch
SNS, SQS
ASG, ECS, Lambda
KMS
Cloudwatch
Logs
Dataz
Ancestry
Metadata
MORE THAN DATAHipster Batch
Ancestry
Ancestry
Ancestry
Ancestry
Ancestry
RESTAPI
METADATA PIPELINE
Producers
RESTAPI
Ancestry
Ancestry
Ancestry
METADATA PIPELINE
Producers
RESTAPI
Ancestry
Ancestry
Ancestry
METADATA PIPELINE
Producers
RESTAPI
Ancestry
Ancestry
Ancestry
METADATA PIPELINE
Producers
Scrapy
RESTAPI
Ancestry
Ancestry
Ancestry
METADATA PIPELINE
Producers
Scrapy
RESTAPI
Ancestry
Ancestry
Ancestry
METADATA PIPELINE
Producers
Scrapy
WHAT WE HAVE LEARNED SO FAR
• Consumers create the last-mile data as needed
• We must work with external, independent delivery channels
• Push quality back to source/producer systems
• Data belongs to the entire organisation, not to a single team
I’ll give you my Data Warehouse when you can pry it from my cold dead hands.
THANK YOU
Daniel Aragao Simon Hope@dear_dr_dan @mapbutcher
REALESTATE.COM.AU