Bike Sharing Demand - Oregon State...
Transcript of Bike Sharing Demand - Oregon State...
Feature: • temperature • Humidity • Wind speed and so on Target: • Predict the total number of rentals every hour
Obviously: Regression problem
Background
Website: https://www.kaggle.com
Detail datetime - hourly date + timestamp season - 1 = spring, 2 = summer, 3 = fall, 4 = winter holiday - whether the day is considered a holiday working day - whether the day is neither a weekend nor holiday weather - 1: Clear, Few clouds, Partly cloudy, Partly cloudy 2: Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist 3: Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered clouds 4: Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog temp - temperature in Celsius atemp - "feels like" temperature in Celsius humidity - relative humidity wind speed - wind speed casual - number of non-registered user rentals initiated registered - number of registered user rentals initiated count - number of total rentals[1]
Our Approaches
Step1 : Deal with the data
Transfer csv file to arff file Due to our tool require this type of file.
Step 2: Algorithms’ name Scores
Linear Regression 1.09191 (ranks 1200th in around 1500 teams )
SMO reg 0.99771
Multilayer Perceptron 0.94332
Comments: SMO reg : SVM for regression Multiplayer Perceptron: Neural Networks
Improvement
The first improvement: Review the data Negative prediction unreasonable
Improved by those formula: For training set: y' = log(y + 1) For testing set y = exp(y') - 1
Algorithms’ name Scores
Linear Regression 1.09191
SMO Reg 0.99771
Multilayer Perceptron 0.94332
SMO Reg log 0.67238
Linear Regression log 0.64689
MultilayerPerceptron log 0.50732 (ranks 700th in 1500 teams)
After the first improvement
That’s not good enough for us !
The Second Improvement:
After observation of the training set Few people rent the bike at midnight or in the bad weather Divide the data into subsets For classification Decision Tree For Regression Regression Tree Meanwhile, using bagging or boosting
After the second improvement
Algorithms’ name Scores
Linear Regression 1.09191
SMO reg 0.99771
Multilayer Perceptron 0.94332
SMO Reg log 0.67238
Linear Regression log 0.64689
MultilayerPerceptron log 0.50732
Bagging REPTree 0.45441
AdditiveReg REPTree 0.44512(ranks 200th in 1500 )
Okay for us!
END
References
[1] https://www.kaggle.com/c/bike-sharing-demand