Assumptions: Check yo'self before you wreck yourself
-
Upload
erin-shellman -
Category
Software
-
view
272 -
download
0
description
Transcript of Assumptions: Check yo'self before you wreck yourself
![Page 1: Assumptions: Check yo'self before you wreck yourself](https://reader034.fdocuments.net/reader034/viewer/2022042713/549852afb47959384d8b53b0/html5/thumbnails/1.jpg)
Assumptions: Check yo self, before
you wreck yo self.
Erin Shellman @erinshellman Seattle Software Craftsmanship
August 28, 2014 !
![Page 2: Assumptions: Check yo'self before you wreck yourself](https://reader034.fdocuments.net/reader034/viewer/2022042713/549852afb47959384d8b53b0/html5/thumbnails/2.jpg)
Assumptions: Making an ass out of you
and me.
Erin Shellman @erinshellman Seattle Software Craftsmanship
August 28, 2014 !
![Page 3: Assumptions: Check yo'self before you wreck yourself](https://reader034.fdocuments.net/reader034/viewer/2022042713/549852afb47959384d8b53b0/html5/thumbnails/3.jpg)
I’m Erin, and I’m a data scientist.
![Page 4: Assumptions: Check yo'self before you wreck yourself](https://reader034.fdocuments.net/reader034/viewer/2022042713/549852afb47959384d8b53b0/html5/thumbnails/4.jpg)
How much should this cost?
![Page 5: Assumptions: Check yo'self before you wreck yourself](https://reader034.fdocuments.net/reader034/viewer/2022042713/549852afb47959384d8b53b0/html5/thumbnails/5.jpg)
What about these?
![Page 6: Assumptions: Check yo'self before you wreck yourself](https://reader034.fdocuments.net/reader034/viewer/2022042713/549852afb47959384d8b53b0/html5/thumbnails/6.jpg)
What about these?
…and when?
![Page 7: Assumptions: Check yo'self before you wreck yourself](https://reader034.fdocuments.net/reader034/viewer/2022042713/549852afb47959384d8b53b0/html5/thumbnails/7.jpg)
Price optimization
![Page 8: Assumptions: Check yo'self before you wreck yourself](https://reader034.fdocuments.net/reader034/viewer/2022042713/549852afb47959384d8b53b0/html5/thumbnails/8.jpg)
Price optimization
1. Git yer Big Data!
![Page 9: Assumptions: Check yo'self before you wreck yourself](https://reader034.fdocuments.net/reader034/viewer/2022042713/549852afb47959384d8b53b0/html5/thumbnails/9.jpg)
Price optimization
1. Git yer Big Data!
2. Forecast demand
![Page 10: Assumptions: Check yo'self before you wreck yourself](https://reader034.fdocuments.net/reader034/viewer/2022042713/549852afb47959384d8b53b0/html5/thumbnails/10.jpg)
Price optimization
1. Git yer Big Data!
2. Forecast demand
3. Optimize price
![Page 11: Assumptions: Check yo'self before you wreck yourself](https://reader034.fdocuments.net/reader034/viewer/2022042713/549852afb47959384d8b53b0/html5/thumbnails/11.jpg)
Price optimization
1.Big Data!
2.demand
3.price
4. Profit!!!!!
![Page 12: Assumptions: Check yo'self before you wreck yourself](https://reader034.fdocuments.net/reader034/viewer/2022042713/549852afb47959384d8b53b0/html5/thumbnails/12.jpg)
Price optimization
1. Git yer Big Data!
2. Forecast demand
3. Optimize price
max
Xrevenue
yi = �0 + �1xi + ✏i
![Page 13: Assumptions: Check yo'self before you wreck yourself](https://reader034.fdocuments.net/reader034/viewer/2022042713/549852afb47959384d8b53b0/html5/thumbnails/13.jpg)
The key is a good forecast.
![Page 14: Assumptions: Check yo'self before you wreck yourself](https://reader034.fdocuments.net/reader034/viewer/2022042713/549852afb47959384d8b53b0/html5/thumbnails/14.jpg)
![Page 15: Assumptions: Check yo'self before you wreck yourself](https://reader034.fdocuments.net/reader034/viewer/2022042713/549852afb47959384d8b53b0/html5/thumbnails/15.jpg)
•Subset the data and focus on one category of product.
• e.g. Alpine ski bindings.
• Prototype & validate in R.
Units Soldi = α + β1(pricei) + εi
Do the easiest thing
![Page 16: Assumptions: Check yo'self before you wreck yourself](https://reader034.fdocuments.net/reader034/viewer/2022042713/549852afb47959384d8b53b0/html5/thumbnails/16.jpg)
Do the easiest thing
Residual
•Subset the data and focus on one category of product.
• e.g. Alpine ski bindings.
• Prototype & validate in R.
Units Soldi = α + β1(pricei) + εi
![Page 17: Assumptions: Check yo'self before you wreck yourself](https://reader034.fdocuments.net/reader034/viewer/2022042713/549852afb47959384d8b53b0/html5/thumbnails/17.jpg)
•We assume that residuals:
1.Normal, with mean zero.
2.Are not autocorrelated.
3.Are unrelated to the predictors.
Assumptions of SLR
![Page 18: Assumptions: Check yo'self before you wreck yourself](https://reader034.fdocuments.net/reader034/viewer/2022042713/549852afb47959384d8b53b0/html5/thumbnails/18.jpg)
•…and boring!
•For statistical methods, assumption testing traditionally relies on visually inspecting plots (and lets be real, most people don’t even do that).
Checking assumptions is hard
![Page 19: Assumptions: Check yo'self before you wreck yourself](https://reader034.fdocuments.net/reader034/viewer/2022042713/549852afb47959384d8b53b0/html5/thumbnails/19.jpg)
40 60 80 100 120
050
010
0015
0020
0025
00
Fitted values
Res
idua
ls
Residuals vs Fitted
194171
156
-3 -2 -1 0 1 2 3
02
46
8
Theoretical Quantiles
Stan
dar
diz
ed r
esid
uals
Normal Q-Q
194171
156
40 60 80 100 120
0.0
0.5
1.0
1.5
2.0
2.5
Fitted values
Stan
dardized
res
idua
ls
Scale-Location194171
156
0.00 0.01 0.02 0.03 0.04
02
46
8
Leverage
Stan
dar
diz
ed r
esid
uals
Cook's distance
0.5
1
Residuals vs Leverage
194171
109
![Page 20: Assumptions: Check yo'self before you wreck yourself](https://reader034.fdocuments.net/reader034/viewer/2022042713/549852afb47959384d8b53b0/html5/thumbnails/20.jpg)
OF all the practices you can leverage to assist your craftsmanship, you will get the most benefit from testing.
!
Stephen Vance
![Page 21: Assumptions: Check yo'self before you wreck yourself](https://reader034.fdocuments.net/reader034/viewer/2022042713/549852afb47959384d8b53b0/html5/thumbnails/21.jpg)
test_that assumption!
context("Check assumptions of SLR") !test_that("The residuals are normally distributed", { ! expect_that(shapiro.test(model_object$residuals)$p.value, is_more_than(0.05)) !}) !test_that("There is no autocorrelation", { ! expect_that(lmtest::bgtest(model_object)$p.value, is_more_than(0.05)) !}) !test_that("The residuals are unrelated to the predictor", { ! expect_that(cor(model_object$residuals, data$covariates), equals(0)) !}) !
![Page 22: Assumptions: Check yo'self before you wreck yourself](https://reader034.fdocuments.net/reader034/viewer/2022042713/549852afb47959384d8b53b0/html5/thumbnails/22.jpg)
Tests pass!
> test_file("./tests/test_slr.R") Check assumptions of SLR : [1] "units_sold ~ price" ... !
![Page 23: Assumptions: Check yo'self before you wreck yourself](https://reader034.fdocuments.net/reader034/viewer/2022042713/549852afb47959384d8b53b0/html5/thumbnails/23.jpg)
Psych.
> test_file("./tests/test_slr.R") Check assumptions of SLR : [1] "units_sold ~ price" 1.. !!1. Failure(@test_slr.R#12): The residuals are normally distributed ------------------------ shapiro.test(model_object$residuals)$p.value not more than 0.05. Difference: 0.05 !
![Page 24: Assumptions: Check yo'self before you wreck yourself](https://reader034.fdocuments.net/reader034/viewer/2022042713/549852afb47959384d8b53b0/html5/thumbnails/24.jpg)
Linear? Eh.
•We assumed the functional form was linear, but there are several common forms that might better fit the data. 0
500
1000
1500
2000
2500
100 200 300 400 500Price ($)
Uni
ts S
old
![Page 25: Assumptions: Check yo'self before you wreck yourself](https://reader034.fdocuments.net/reader034/viewer/2022042713/549852afb47959384d8b53b0/html5/thumbnails/25.jpg)
Price ($)
Uni
ts S
old
Price ($)
Uni
ts S
old
Price ($)
Uni
ts S
old
Price ($)
Uni
ts S
old
Linear Log-log
Linear-log Log-linear
![Page 26: Assumptions: Check yo'self before you wreck yourself](https://reader034.fdocuments.net/reader034/viewer/2022042713/549852afb47959384d8b53b0/html5/thumbnails/26.jpg)
Price ($)
Uni
ts S
old
Price ($)
Uni
ts S
old
Price ($)
Uni
ts S
old
Price ($)
Uni
ts S
old
Linear response to change in price. Much more sensitive to change in price.
More gradual response to changes in price Sensitive initially, then gradual
![Page 27: Assumptions: Check yo'self before you wreck yourself](https://reader034.fdocuments.net/reader034/viewer/2022042713/549852afb47959384d8b53b0/html5/thumbnails/27.jpg)
![Page 28: Assumptions: Check yo'self before you wreck yourself](https://reader034.fdocuments.net/reader034/viewer/2022042713/549852afb47959384d8b53b0/html5/thumbnails/28.jpg)
![Page 29: Assumptions: Check yo'self before you wreck yourself](https://reader034.fdocuments.net/reader034/viewer/2022042713/549852afb47959384d8b53b0/html5/thumbnails/29.jpg)
![Page 30: Assumptions: Check yo'self before you wreck yourself](https://reader034.fdocuments.net/reader034/viewer/2022042713/549852afb47959384d8b53b0/html5/thumbnails/30.jpg)
# Automagically explore SLR with common functional forms candidate_models = list(linear = 'units_sold ~ price', loglog = 'log(units_sold + 1) ~ log(price + 1)', linearlog = 'units_sold ~ log(price + 1)', loglinear = 'log(units_sold + 1) ~ price') !run = function(candidate_models, input_data) { forecasts = list() test_input = data.frame(price = 0:1000) ! # Forecast for (model in candidate_models) { test_environment = new.env() ! # Generate the forecast forecasts[[model]] = generate_forecast(model, input_data) ! # Save off current value of things for testing assign("model", forecasts[[model]], envir = test_environment) assign("errors", forecasts[[model]]$residuals, envir = test_environment) assign("covariate", input_data$price, envir = test_environment) assign("label", model, envir = test_environment) ! save(test_environment, file = 'env_to_test.Rda') ! # Run assumption tests test_file("./tests/test_slr.R") ! #### OPTIMIZE PRICE!!! #### opt_results = optimizer(forecasts[[model]], test_input) ! # Multiply the predicted demand by the price for expected revenue opt_results$expected_revenue = test_data$price * opt_results$predicted_units_sold ! pdf(paste(model, “.pdf”, sep = ‘’)) plot_price(opt_results) ! } ! return(forecasts) !}
![Page 31: Assumptions: Check yo'self before you wreck yourself](https://reader034.fdocuments.net/reader034/viewer/2022042713/549852afb47959384d8b53b0/html5/thumbnails/31.jpg)
rut roh…> run(candidate_models, slr_data) Check assumptions of SLR : [1] "units_sold ~ price" 1.. !!1. Failure(@test_slr.R#12): The residuals are normally distributed --------------------------------- shapiro.test(linear$residuals)$p.value not more than 0.05. Difference: 0.05 !Check assumptions of SLR : [1] "log(units_sold + 1) ~ log(price + 1)" 1.2 !!1. Failure(@test_slr.R#12): The residuals are normally distributed --------------------------------- shapiro.test(linear$residuals)$p.value not more than 0.05. Difference: 0.05 !2. Failure(@test_slr.R#24): The residuals are unrelated to the predictor --------------------------- cor(test_environment$errors, test_environment$covariate) not equal to 0 Mean absolute difference: 0.05545615 !Check assumptions of SLR : [1] "units_sold ~ log(price + 1)" 1.2 !!1. Failure(@test_slr.R#12): The residuals are normally distributed --------------------------------- shapiro.test(linear$residuals)$p.value not more than 0.05. Difference: 0.05 !2. Failure(@test_slr.R#24): The residuals are unrelated to the predictor --------------------------- cor(test_environment$errors, test_environment$covariate) not equal to 0 Mean absolute difference: 0.04201906 !Check assumptions of SLR : [1] "log(units_sold + 1) ~ price" 1.. !!1. Failure(@test_slr.R#12): The residuals are normally distributed --------------------------------- shapiro.test(linear$residuals)$p.value not more than 0.05. Difference: 0.05
![Page 32: Assumptions: Check yo'self before you wreck yourself](https://reader034.fdocuments.net/reader034/viewer/2022042713/549852afb47959384d8b53b0/html5/thumbnails/32.jpg)
0
5000
10000
15000
20000
0 250 500 750 1000Price ($)
Exp
ecte
d R
even
ue
0
5000
10000
15000
0 250 500 750 1000Price ($)
Exp
ecte
d R
even
ue
0
2000
4000
6000
0 250 500 750 1000Price ($)
Exp
ecte
d R
even
ue
Linear Log-log
Linear-log Log-linear
0
20000
40000
60000
0 250 500 750 1000Price ($)
Exp
ecte
d R
even
ue
![Page 33: Assumptions: Check yo'self before you wreck yourself](https://reader034.fdocuments.net/reader034/viewer/2022042713/549852afb47959384d8b53b0/html5/thumbnails/33.jpg)
0
5000
10000
15000
20000
0 250 500 750 1000Price ($)
Exp
ecte
d R
even
ue
0
5000
10000
15000
0 250 500 750 1000Price ($)
Exp
ecte
d R
even
ue
0
2000
4000
6000
0 250 500 750 1000Price ($)
Exp
ecte
d R
even
ue
Linear Log-log
Linear-log Log-linear
0
20000
40000
60000
0 250 500 750 1000Price ($)
Exp
ecte
d R
even
ue
Optimal Price = $322
Optimal Price > $1000
Optimal Price = $∞
Optimal Price = $779
![Page 34: Assumptions: Check yo'self before you wreck yourself](https://reader034.fdocuments.net/reader034/viewer/2022042713/549852afb47959384d8b53b0/html5/thumbnails/34.jpg)
![Page 35: Assumptions: Check yo'self before you wreck yourself](https://reader034.fdocuments.net/reader034/viewer/2022042713/549852afb47959384d8b53b0/html5/thumbnails/35.jpg)
Mea
n =
185
0
10
20
30
40
100 200 300 400Price ($)
Coun
ts
![Page 36: Assumptions: Check yo'self before you wreck yourself](https://reader034.fdocuments.net/reader034/viewer/2022042713/549852afb47959384d8b53b0/html5/thumbnails/36.jpg)
We are just getting warmed up!
In conclusion, these forecasts suck.
![Page 37: Assumptions: Check yo'self before you wreck yourself](https://reader034.fdocuments.net/reader034/viewer/2022042713/549852afb47959384d8b53b0/html5/thumbnails/37.jpg)
Beginner-Intermediate Intermediate-Advanced Advanced-Expert
0
500
1000
1500
2000
0 100 200 300 400 5000 100 200 300 400 5000 100 200 300 400 500Price ($)
Uni
ts S
old
![Page 38: Assumptions: Check yo'self before you wreck yourself](https://reader034.fdocuments.net/reader034/viewer/2022042713/549852afb47959384d8b53b0/html5/thumbnails/38.jpg)
2011-06-01 2011-10-01 2012-02-01 2012-06-01 2012-10-01 2013-02-01 2013-06-01 2013-10-01 2014-02-01Date
Uni
ts S
old
![Page 39: Assumptions: Check yo'self before you wreck yourself](https://reader034.fdocuments.net/reader034/viewer/2022042713/549852afb47959384d8b53b0/html5/thumbnails/39.jpg)
2011-06-01 2011-10-01 2012-02-01 2012-06-01 2012-10-01 2013-02-01 2013-06-01 2013-10-01 2014-02-01Date
Uni
ts S
old
TIME?!
![Page 40: Assumptions: Check yo'self before you wreck yourself](https://reader034.fdocuments.net/reader034/viewer/2022042713/549852afb47959384d8b53b0/html5/thumbnails/40.jpg)
Try something a little smarter
Units Soldi = α + β1(pricei) + β2(abilityi) + β3(monthi) + εi
![Page 41: Assumptions: Check yo'self before you wreck yourself](https://reader034.fdocuments.net/reader034/viewer/2022042713/549852afb47959384d8b53b0/html5/thumbnails/41.jpg)
Beginner-Intermediate Intermediate-Advanced Advanced-Expert
05000
1000015000
05000
1000015000
05000
1000015000
05000
1000015000
05000
1000015000
05000
1000015000
05000
1000015000
05000
1000015000
05000
1000015000
05000
1000015000
05000
1000015000
05000
1000015000
12
34
56
78
910
1112
0 250 500 750 1000 0 250 500 750 1000 0 250 500 750 1000Price ($)
Exp
ecte
d R
even
ue
![Page 42: Assumptions: Check yo'self before you wreck yourself](https://reader034.fdocuments.net/reader034/viewer/2022042713/549852afb47959384d8b53b0/html5/thumbnails/42.jpg)
Yeah, but who cares?
•Do we need to throw everything out just because some assumptions are invalidated?
•What is our goal?
•Is it still better than what we did previously?
![Page 43: Assumptions: Check yo'self before you wreck yourself](https://reader034.fdocuments.net/reader034/viewer/2022042713/549852afb47959384d8b53b0/html5/thumbnails/43.jpg)
Wrap it up.
1. Do the easiest thing first, and do it well. It’s how you’re going to learn the domain, and it’s your benchmark for improvement.
2. Test your assumptions, and invest time in building the tools needed to do that effectively.
3. Be cool, stay in school.
![Page 44: Assumptions: Check yo'self before you wreck yourself](https://reader034.fdocuments.net/reader034/viewer/2022042713/549852afb47959384d8b53b0/html5/thumbnails/44.jpg)
Nathan Decker, Brian Pratt & the Evo crew 🎿
Jason Gowans & Bryan Mayer 👬
Elissa “Downtown” Brown, forecasting genius 💁
John Foreman, MailChimp 🐵
#nordstromdatalab 📈
Thanks bros!!
![Page 45: Assumptions: Check yo'self before you wreck yourself](https://reader034.fdocuments.net/reader034/viewer/2022042713/549852afb47959384d8b53b0/html5/thumbnails/45.jpg)
Click-bait!1. Data Carpentry: http://mimno.infosci.cornell.edu/b/articles/carpentry/
2. Getting started with testthat. http://journal.r-project.org/archive/2011-1/RJournal_2011-1_Wickham.pdf
3. Clean Code: http://www.amazon.com/Clean-Code-Handbook-Software-Craftsmanship/dp/0132350882/
4. Quality Code: http://www.amazon.com/Quality-Code-Software-Principles-Practices/dp/0321832981
5. Revenue Management: http://www.amazon.com/Practice-Management-International-Operations-Research/dp/0387243763/
6. Pricing and Revenue Optimization: http://www.amazon.com/Pricing-Revenue-Optimization-Robert-Phillips-ebook/dp/B005JTDOVE/
7. Original G, Rob Hyndman: https://www.otexts.org/fpp and http://robjhyndman.com/hyndsight/