Post on 04-Dec-2014
description
A/B Testing In The Real WorldHow to run experiments in an “offline” setting
Elena Grewal 2014-04-09
Big Data Innovation Summit
The Plan
2
1
2
“Offline” experiments: what and why?
Some experiment pitfalls and advice
3 Conclusions
But first, you might ask: What is Airbnb?
Airbnb is an online marketplace for accommodations
4
Part of the “sharing economy”
Search in San Francisco
5
Come Stay In My Home!
6
That looks like a website. What do we mean by “offline”?
Guest Journey
Host Journey
Offline Operations Departments
+ Customer Support
+ Local Operations
+ Professional Photography
+ Many others…
10
Customer Support
Local Ops Teams
Photography
!
+ 3,000 Photographers worldwide
+ Over 100k listings photographed
+ Almost 2 million professional photos13
Stepping back
14
+ Many companies have offline operations + Can optimize these using experiments
!
!
!
Online Experiments:
We run these all the time too.
If you are curious about on our online experimentation see Jan Overgoor’s tech talk
http://nerds.airbnb.com/tech-talks/
Why Do We Need Experiments?
Before and after won’t work
16
• Often very little data before professional photos are added • Seasonality and other confounding factors bias results
Selection bias often impacts analysis
17
• Listings that opt to get professional photography are not the same as listings that do not get photography
Without an experiment, we don’t know the causal effect
18
This is the same reason we need online experiments
Date01−01 01−15 02−01 02−15 03−01 03−15
Product Launch
Product Rollback
Launch initiative: e.g. Offered Free Professional Photography
Traditional A/B Testing Online
Great sources: http://mcfunley.com/design-for-continuous-experimentation http://www.evanmiller.org/how-not-to-run-an-ab-test.html
Control Treatment
19
-5%
-4%
-3%
-2%
-1%
0%
1%
2%
3%
4%
5%
0 4 8 12 16 20 24 28 32 36
Del
ta
Treatment Effect for Price Filter Experiment
Initial Results Look Good
20
Δ > 0 : “positive”
0.00
0.10
0.20
0.30
0.40
0 4 8 12 16 20 24 28 32 36
p-va
lue
Days since start of experiment
P-Value
p < 0.05 : “significant”
-5%
-4%
-3%
-2%
-1%
0%
1%
2%
3%
4%
5%
0 4 8 12 16 20 24 28 32 36
Del
ta
Treatment Effect for Price Filter Experiment
Actually, NeutralStatistical significance by itself does not tell the whole story
p = 0.4 : “noise”
Δ = 0 : “neutral”
21
0.00
0.10
0.20
0.30
0.40
0 4 8 12 16 20 24 28 32 36
p-va
lue
Days since start of experiment
P-Value
p < 0.05 : “significant”
Offline Experiment Examples
Professional PhotographyLet’s run an experiment!
23
More bookings?
Beware of CannibalizationThe unit of randomization depends on the effect we want to estimate
24
!
!
Local Operations: Market Level Experiment
25
!
+ Smaller “long tail” markets < 100 reviewed listings
Randomize Markets 93 Treatment / 92 Control
Assess impact of operational strategy on market growth
+ Statistically measure the lift due to local ops teams + Measuring active listings, hosts, reviewed listings, and
bookings
Market Distribution U.S. & Europe
26
Finding: Local Ops Efforts Have Positive Impact on Growth
27
Active Listings
Control17% Growth
Local Ops Kickoff
Treatment 31% Growth
Case Study: Campos do Jordão, BR
28
+ Market grew 9x + Over 90% of the new listings are from new users + Low CPA + Primary approach is phone sales + Other approaches were less successful
+ 862%
+ 7%
Use qualitative research to understand what happened
Active Listing Growth
Treatment Control
Host EducationImproving listings through outreach
29
+ Initially not launched as an experiment and found positive impact + Launched as an experiment and found neutral impact + Don’t need market level approach here! !
Some takeawaysUse context to improve operations
30
+ Can investigate heterogeneity in treatment effects with higher N + Word of caution: can’t just compare those who were reached
by a call or email to the control (selection bias strikes again)
Compare entire treatment to entire control
31
!Treatment
!!
!Control
!!
Called
vs.
Additional Offline vs. Online Considerations
32
+ Opt-in biases + You know you are in an experiment (Hawthorne/John Henry effects) + Monetary incentives impact external validity, trade-off take-up rate
+ Takes time to adjust to a change + Sample size may be limited by ops capacity + Stakeholders may be less data-savvy + Real people delivering the experiment! + Ethical considerations !
Always partner with customer support.
!
Takeaways
+ Controlled experiments are the way to go if you want to make causal inference + Use them to optimize operations! !
but: + Level of randomization - what impact do you want to measure? + Cannibalization + Compare the right groups - no selection bias + Break down results to get the most from the analysis + Be practical/ethical - you are dealing with real people here
33
!
!
Questions? !
!
@elenatej elena.grewal@airbnb.com !
we’re hiring: www.airbnb.com/jobs