Bike Sharing Demand - Oregon State...

13
Bike Sharing Demand Speaker: Hanzhong Xu Meng Meng [1]

Transcript of Bike Sharing Demand - Oregon State...

Bike Sharing Demand

Speaker: Hanzhong Xu Meng Meng

[1]

Feature: • temperature • Humidity • Wind speed and so on Target: • Predict the total number of rentals every hour

Obviously: Regression problem

Background

Website: https://www.kaggle.com

Evaluation

• The solution of testing set is hidden • Evaluation by this formula(Testing error)

Detail datetime - hourly date + timestamp season - 1 = spring, 2 = summer, 3 = fall, 4 = winter holiday - whether the day is considered a holiday working day - whether the day is neither a weekend nor holiday weather - 1: Clear, Few clouds, Partly cloudy, Partly cloudy 2: Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist 3: Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered clouds 4: Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog temp - temperature in Celsius atemp - "feels like" temperature in Celsius humidity - relative humidity wind speed - wind speed casual - number of non-registered user rentals initiated registered - number of registered user rentals initiated count - number of total rentals[1]

Training data

Before start: Our tool

Our Approaches

Step1 : Deal with the data

Transfer csv file to arff file Due to our tool require this type of file.

Step 2: Algorithms’ name Scores

Linear Regression 1.09191 (ranks 1200th in around 1500 teams )

SMO reg 0.99771

Multilayer Perceptron 0.94332

Comments: SMO reg : SVM for regression Multiplayer Perceptron: Neural Networks

Improvement

The first improvement: Review the data Negative prediction unreasonable

Improved by those formula: For training set: y' = log(y + 1) For testing set y = exp(y') - 1

Algorithms’ name Scores

Linear Regression 1.09191

SMO Reg 0.99771

Multilayer Perceptron 0.94332

SMO Reg log 0.67238

Linear Regression log 0.64689

MultilayerPerceptron log 0.50732 (ranks 700th in 1500 teams)

After the first improvement

That’s not good enough for us !

The Second Improvement:

After observation of the training set Few people rent the bike at midnight or in the bad weather Divide the data into subsets For classification Decision Tree For Regression Regression Tree Meanwhile, using bagging or boosting

After the second improvement

Algorithms’ name Scores

Linear Regression 1.09191

SMO reg 0.99771

Multilayer Perceptron 0.94332

SMO Reg log 0.67238

Linear Regression log 0.64689

MultilayerPerceptron log 0.50732

Bagging REPTree 0.45441

AdditiveReg REPTree 0.44512(ranks 200th in 1500 )

Okay for us!

END

Thank you !