Project co prediction Regression analysis | MTH 426 IITK

12
-Multiple Regression Analysis S peculating D aily M aximum C arbon Monoxide (CO) Level Team Member Roll No Bhanu Yadav 13198 Nakul Surana 13418 Instructor: Dr. Sharmishtha Mitra

Transcript of Project co prediction Regression analysis | MTH 426 IITK

-Multiple Regression Analysis

Speculating Daily Maximum Carbon Monoxide (CO) Level

Team Member Roll No

Bhanu Yadav 13198

Nakul Surana 13418

Instructor: Dr. Sharmishtha Mitra

◦ Increasing pollution levels in urban areas is harmful

◦ In this study we wish to predict CO levels a week prior

◦ In order to plan some outdoor activities in upcoming week

◦ CO level between (3PPM -6PPM) is considered as safe

Objective

◦ Use Hourly Data from March 2004 to February 2005 to forecast daily maximum level of CO for

5th April 2005 to 11th April 2005

◦ Dataset contains 9358 instances of hourly averaged response of several pollutants in Italian City

◦ Taken from - UCI machine learning repository- Air Quality data set

DATA

Variable Y CO

Possible X Variable PTO8.S1(CO), NMHC(GT), C6H6(GT), PTO8.S2(NMHC), NOx(GT), PTO8.S3(NOx),

NO2(GT), PTO8.S4(NO2), PTO8.S5(O3), T, RH and AH

X Variable NMHC had more than 90% missing values (Excluded from the possible X variables set)

All other variables had less than 10% missing values Replaced the missing values by the previous hour values and for

consecutive missing values with last week-hour values

Transformation of Data

Seasonality

Seasonality

This suggests a seasonality of CO w.r.t. days of the year to compensate that we will introduce dummy variables

X4 = 1 if days of the year are between 200 to 300

= 0 otherwise

And a seasonality of CO w.r.t. days of the week

X5 = 1 if Monday, Tuesday, Saturday and Sunday

= 0 otherwise

Dummy Variable

Input Variables

• Daily maximum C6H6 (lag 8)

• Daily maximum T (lag 7)

• Daily maximum AH (lag 7)

• Monthly dummy variables

• Weekly dummy variables Output Variable

• Daily maximum CO concentration

Best Model

Estimate SE T-Stat P-value

Intercept 2.2 0.22 9.67 Rejected

X1 0.14 0.006 21.54 Rejected

X2 -0.05 0.01 -4.99 Rejected

X3 -0.019 0.21 -0.09 Rejected

X4 0.30 0.16 1.83 Rejected

X5 0.15 0.13 1.18 Rejected

Sumsq DF Meansq F P-value

Total 1416.8 364 3.89 - -

Model 936.09 5 187.21 139.81 Rejected

Residual 480.72 359 1.33 - -

Lack of Fit 458.21 352 1.30 0.40 Rejected

Pure Error 22.51 7 3.21 - -

ANOVA

Coefficient Table

R2 = 0.66 ||| R2_adjusted = 0.65

Normal probability plot of the residual:

Residue Analysis

Plot of Residuals against the Fitted Values yˆi

Residue Analysis

Y = 2.2 + 0.15 (Max C6H6) – 0.05 (Max T) – 0.02 (Max AH) + 0.31 (Monthly dummy) + 0.16 (Weekly dummy)

R2_adjusted = 0.656 => Our model can explain 65% of the variability in the data

Normal probability plot of the residual behaves properly

Plot of Residuals against the Fitted Values yˆibehaves properly too

Conclusions