Project co prediction Regression analysis | MTH 426 IITK
-
Upload
vivekananda-samiti -
Category
Data & Analytics
-
view
58 -
download
0
Transcript of Project co prediction Regression analysis | MTH 426 IITK
-Multiple Regression Analysis
Speculating Daily Maximum Carbon Monoxide (CO) Level
Team Member Roll No
Bhanu Yadav 13198
Nakul Surana 13418
Instructor: Dr. Sharmishtha Mitra
◦ Increasing pollution levels in urban areas is harmful
◦ In this study we wish to predict CO levels a week prior
◦ In order to plan some outdoor activities in upcoming week
◦ CO level between (3PPM -6PPM) is considered as safe
Objective
◦ Use Hourly Data from March 2004 to February 2005 to forecast daily maximum level of CO for
5th April 2005 to 11th April 2005
◦ Dataset contains 9358 instances of hourly averaged response of several pollutants in Italian City
◦ Taken from - UCI machine learning repository- Air Quality data set
DATA
Variable Y CO
Possible X Variable PTO8.S1(CO), NMHC(GT), C6H6(GT), PTO8.S2(NMHC), NOx(GT), PTO8.S3(NOx),
NO2(GT), PTO8.S4(NO2), PTO8.S5(O3), T, RH and AH
X Variable NMHC had more than 90% missing values (Excluded from the possible X variables set)
All other variables had less than 10% missing values Replaced the missing values by the previous hour values and for
consecutive missing values with last week-hour values
Transformation of Data
This suggests a seasonality of CO w.r.t. days of the year to compensate that we will introduce dummy variables
X4 = 1 if days of the year are between 200 to 300
= 0 otherwise
And a seasonality of CO w.r.t. days of the week
X5 = 1 if Monday, Tuesday, Saturday and Sunday
= 0 otherwise
Dummy Variable
Input Variables
• Daily maximum C6H6 (lag 8)
• Daily maximum T (lag 7)
• Daily maximum AH (lag 7)
• Monthly dummy variables
• Weekly dummy variables Output Variable
• Daily maximum CO concentration
Best Model
Estimate SE T-Stat P-value
Intercept 2.2 0.22 9.67 Rejected
X1 0.14 0.006 21.54 Rejected
X2 -0.05 0.01 -4.99 Rejected
X3 -0.019 0.21 -0.09 Rejected
X4 0.30 0.16 1.83 Rejected
X5 0.15 0.13 1.18 Rejected
Sumsq DF Meansq F P-value
Total 1416.8 364 3.89 - -
Model 936.09 5 187.21 139.81 Rejected
Residual 480.72 359 1.33 - -
Lack of Fit 458.21 352 1.30 0.40 Rejected
Pure Error 22.51 7 3.21 - -
ANOVA
Coefficient Table
R2 = 0.66 ||| R2_adjusted = 0.65
Y = 2.2 + 0.15 (Max C6H6) – 0.05 (Max T) – 0.02 (Max AH) + 0.31 (Monthly dummy) + 0.16 (Weekly dummy)
R2_adjusted = 0.656 => Our model can explain 65% of the variability in the data
Normal probability plot of the residual behaves properly
Plot of Residuals against the Fitted Values yˆibehaves properly too
Conclusions