Airbnb offline experiments

Post on 04-Dec-2014

313 views 0 download

description

Elena Grewal presented these slides on a/b testing in the real world (offline experiments not online) at the Big Data Innovation Summit on April 9, 2014.

Transcript of Airbnb offline experiments

A/B Testing In The Real WorldHow to run experiments in an “offline” setting

Elena Grewal 2014-04-09

Big Data Innovation Summit

The Plan

2

1

2

“Offline” experiments: what and why?

Some experiment pitfalls and advice

3 Conclusions

But first, you might ask: What is Airbnb?

Airbnb is an online marketplace for accommodations

4

Part of the “sharing economy”

Search in San Francisco

5

Come Stay In My Home!

6

That looks like a website. What do we mean by “offline”?

Guest Journey

Host Journey

Offline Operations Departments

+ Customer Support

+ Local Operations

+ Professional Photography

+ Many others…

10

Customer Support

Local Ops Teams

Photography

!

+ 3,000 Photographers worldwide

+ Over 100k listings photographed

+ Almost 2 million professional photos13

Stepping back

14

+ Many companies have offline operations + Can optimize these using experiments

!

!

!

Online Experiments:

We run these all the time too.

If you are curious about on our online experimentation see Jan Overgoor’s tech talk

http://nerds.airbnb.com/tech-talks/

Why Do We Need Experiments?

Before and after won’t work

16

• Often very little data before professional photos are added • Seasonality and other confounding factors bias results

Selection bias often impacts analysis

17

• Listings that opt to get professional photography are not the same as listings that do not get photography

Without an experiment, we don’t know the causal effect

18

This is the same reason we need online experiments

Date01−01 01−15 02−01 02−15 03−01 03−15

Product Launch

Product Rollback

Launch initiative: e.g. Offered Free Professional Photography

Traditional A/B Testing Online

Great sources: http://mcfunley.com/design-for-continuous-experimentation http://www.evanmiller.org/how-not-to-run-an-ab-test.html

Control Treatment

19

-5%

-4%

-3%

-2%

-1%

0%

1%

2%

3%

4%

5%

0 4 8 12 16 20 24 28 32 36

Del

ta

Treatment Effect for Price Filter Experiment

Initial Results Look Good

20

Δ > 0 : “positive”

0.00

0.10

0.20

0.30

0.40

0 4 8 12 16 20 24 28 32 36

p-va

lue

Days since start of experiment

P-Value

p < 0.05 : “significant”

-5%

-4%

-3%

-2%

-1%

0%

1%

2%

3%

4%

5%

0 4 8 12 16 20 24 28 32 36

Del

ta

Treatment Effect for Price Filter Experiment

Actually, NeutralStatistical significance by itself does not tell the whole story

p = 0.4 : “noise”

Δ = 0 : “neutral”

21

0.00

0.10

0.20

0.30

0.40

0 4 8 12 16 20 24 28 32 36

p-va

lue

Days since start of experiment

P-Value

p < 0.05 : “significant”

Offline Experiment Examples

Professional PhotographyLet’s run an experiment!

23

More bookings?

Beware of CannibalizationThe unit of randomization depends on the effect we want to estimate

24

!

!

Local Operations: Market Level Experiment

25

!

+ Smaller “long tail” markets < 100 reviewed listings

Randomize Markets 93 Treatment / 92 Control

Assess impact of operational strategy on market growth

+ Statistically measure the lift due to local ops teams + Measuring active listings, hosts, reviewed listings, and

bookings

Market Distribution U.S. & Europe

26

Finding: Local Ops Efforts Have Positive Impact on Growth

27

Active Listings

Control17% Growth

Local Ops Kickoff

Treatment 31% Growth

Case Study: Campos do Jordão, BR

28

+ Market grew 9x + Over 90% of the new listings are from new users + Low CPA + Primary approach is phone sales + Other approaches were less successful

+ 862%

+ 7%

Use qualitative research to understand what happened

Active Listing Growth

Treatment Control

Host EducationImproving listings through outreach

29

+ Initially not launched as an experiment and found positive impact + Launched as an experiment and found neutral impact + Don’t need market level approach here! !

Some takeawaysUse context to improve operations

30

+ Can investigate heterogeneity in treatment effects with higher N + Word of caution: can’t just compare those who were reached

by a call or email to the control (selection bias strikes again)

Compare entire treatment to entire control

31

!Treatment

!!

!Control

!!

Called

vs.

Additional Offline vs. Online Considerations

32

+ Opt-in biases + You know you are in an experiment (Hawthorne/John Henry effects) + Monetary incentives impact external validity, trade-off take-up rate

+ Takes time to adjust to a change + Sample size may be limited by ops capacity + Stakeholders may be less data-savvy + Real people delivering the experiment! + Ethical considerations !

Always partner with customer support.

!

Takeaways

+ Controlled experiments are the way to go if you want to make causal inference + Use them to optimize operations! !

but: + Level of randomization - what impact do you want to measure? + Cannibalization + Compare the right groups - no selection bias + Break down results to get the most from the analysis + Be practical/ethical - you are dealing with real people here

33

!

!

Questions? !

!

@elenatej elena.grewal@airbnb.com !

we’re hiring: www.airbnb.com/jobs