Walmart sales
-
Upload
yifan-li -
Category
Technology
-
view
125 -
download
3
description
Transcript of Walmart sales
![Page 1: Walmart sales](https://reader033.fdocuments.net/reader033/viewer/2022050804/54c64e3f4a7959ad7b8b4582/html5/thumbnails/1.jpg)
Kaggle Walmart sales prediction
Data Science General Assembly in DC 19 May 2014
Yifan Li
![Page 2: Walmart sales](https://reader033.fdocuments.net/reader033/viewer/2022050804/54c64e3f4a7959ad7b8b4582/html5/thumbnails/2.jpg)
Contents
The problem
The data
The model
The result
Conclusion
![Page 3: Walmart sales](https://reader033.fdocuments.net/reader033/viewer/2022050804/54c64e3f4a7959ad7b8b4582/html5/thumbnails/3.jpg)
The problemTo predict the Walmart sales number
Regression problem
1. Given past sales, markdown(sales) events
2. Given associated CPI, temperature, unemployment, fuel_price, store type, store size
![Page 4: Walmart sales](https://reader033.fdocuments.net/reader033/viewer/2022050804/54c64e3f4a7959ad7b8b4582/html5/thumbnails/4.jpg)
The data• The weekly sales data corresponding to store line plot histogram
![Page 5: Walmart sales](https://reader033.fdocuments.net/reader033/viewer/2022050804/54c64e3f4a7959ad7b8b4582/html5/thumbnails/5.jpg)
The data(cont.)Which feature to choose?
type vs. size
Training data Test data
![Page 6: Walmart sales](https://reader033.fdocuments.net/reader033/viewer/2022050804/54c64e3f4a7959ad7b8b4582/html5/thumbnails/6.jpg)
The data(cont.)Handling missing Markdown data
1. Fill it with zero
Handling missing CPI, temperature data
1. (CPI)Fill it with linear prediction of Temperature and Fuel Price
2. (Unemployment)Fill it with linear prediction of CPI and Temperature
![Page 7: Walmart sales](https://reader033.fdocuments.net/reader033/viewer/2022050804/54c64e3f4a7959ad7b8b4582/html5/thumbnails/7.jpg)
The model
linear regression: lm{stats}
regularization: glmnet{glmnet}
least absolute deviation: rq{quantreg}
![Page 8: Walmart sales](https://reader033.fdocuments.net/reader033/viewer/2022050804/54c64e3f4a7959ad7b8b4582/html5/thumbnails/8.jpg)
linear regression
The result
![Page 9: Walmart sales](https://reader033.fdocuments.net/reader033/viewer/2022050804/54c64e3f4a7959ad7b8b4582/html5/thumbnails/9.jpg)
regularization(result is not improved)
The result(cont.)
![Page 10: Walmart sales](https://reader033.fdocuments.net/reader033/viewer/2022050804/54c64e3f4a7959ad7b8b4582/html5/thumbnails/10.jpg)
The result(cont.)!
linear regression with weight(1 for normal week, 5 for holiday week)
![Page 11: Walmart sales](https://reader033.fdocuments.net/reader033/viewer/2022050804/54c64e3f4a7959ad7b8b4582/html5/thumbnails/11.jpg)
The result(cont.)!
least absolute deviation with weight
![Page 12: Walmart sales](https://reader033.fdocuments.net/reader033/viewer/2022050804/54c64e3f4a7959ad7b8b4582/html5/thumbnails/12.jpg)
The result(cont.)
first Kaggle experience
44 submissions(maximum 5 submissions per day)
592nd/694(18416.22852 points)
beat the all zero benchmark(647th,22265.71813 points)
![Page 13: Walmart sales](https://reader033.fdocuments.net/reader033/viewer/2022050804/54c64e3f4a7959ad7b8b4582/html5/thumbnails/13.jpg)
ConclusionLinear regression is a good starting point for feature selection(which takes a lot of time)
Using model that corresponding to the evaluation method may improve score
The best five on leaderboard uses Autoregression, Random Forest:
With limited data, more sophisticated algorithm would be beneficial