USE OF GENERALIZED LINEAR MODEL IN FORECASTING OF AIR PASSENGERS CONVEYANCES FROM EU COUNTRIES...

26
USE OF GENERALIZED LINEAR USE OF GENERALIZED LINEAR MODEL IN FORECASTING OF AIR MODEL IN FORECASTING OF AIR PASSENGERS CONVEYANCES FROM PASSENGERS CONVEYANCES FROM EU COUNTRIES EU COUNTRIES Catherine Zhukovskaya Faculty of Transport and Mechanical Engineering Riga Technical University

Transcript of USE OF GENERALIZED LINEAR MODEL IN FORECASTING OF AIR PASSENGERS CONVEYANCES FROM EU COUNTRIES...

Page 1: USE OF GENERALIZED LINEAR MODEL IN FORECASTING OF AIR PASSENGERS CONVEYANCES FROM EU COUNTRIES Catherine Zhukovskaya Faculty of Transport and Mechanical.

USE OF GENERALIZED LINEAR USE OF GENERALIZED LINEAR MODEL IN FORECASTING OF AIR MODEL IN FORECASTING OF AIR

PASSENGERS CONVEYANCES PASSENGERS CONVEYANCES FROM EU COUNTRIESFROM EU COUNTRIES

Catherine ZhukovskayaFaculty of Transport and Mechanical

EngineeringRiga Technical University

Page 2: USE OF GENERALIZED LINEAR MODEL IN FORECASTING OF AIR PASSENGERS CONVEYANCES FROM EU COUNTRIES Catherine Zhukovskaya Faculty of Transport and Mechanical.

2

The 8th Tartu Conference on Multivariate Statistics

OutlineOutline

1. Introduction2. Informative base3. Used models for analyzing and forecasting of the air

passengers’ conveyances4. Elaboration of linear models 5. Elaboration of generalized linear models 6. Conclusion7. References

Page 3: USE OF GENERALIZED LINEAR MODEL IN FORECASTING OF AIR PASSENGERS CONVEYANCES FROM EU COUNTRIES Catherine Zhukovskaya Faculty of Transport and Mechanical.

3

The 8th Tartu Conference on Multivariate Statistics

1. Introduction1. Introduction

Most the literature which is devoted to forecasting of transport flows contain only simple forecasting models on the base of the time series methods [Hünt (2003)] or linear regression methods with small number of explanatory variables [Butkevičius, Vyskupaitis (2005), Šliupas (2006)].

Two different approaches for the forecasting of air passengers conveyances from EU countries were considered in this investigation: the classical method of linear regression; the generalized linear model (GLM).

The aim of this investigation is to illustrate the advantage of using the GLM comparing with the simple linear regression models.

The verification of the models and the evaluation of the unknown parameters are included as well.

All calculations are being done with Statistica 6.0 and elaborated computer software in MathCad 12.

Page 4: USE OF GENERALIZED LINEAR MODEL IN FORECASTING OF AIR PASSENGERS CONVEYANCES FROM EU COUNTRIES Catherine Zhukovskaya Faculty of Transport and Mechanical.

4

The 8th Tartu Conference on Multivariate Statistics

t1 - total population of the country (TP), millions of inhabitants;

t2 - area of the country (AREA), thousands of km2;

t3 - density of the country population (PD), number of inhabitants per km2;

t4 - monthly labour costs (MLC), thousands of euros;

t5 - gross domestic product (GDP) “per capita” in Purchasing Power Standards (PPS) (GDP_PPS);

t6 - gross domestic product (GDP), billions of euro;

t7 - comparative price level (CPL);

t8 - inflation rate (IR);

t9 - unemployment rate (UR);

t10 - labour productivity per hour worked (LPHW).

FactorsFactors

2. Informative base2. Informative base The forecasted variable was the number of air passenger carried,

expressed in millions of passengers.

Page 5: USE OF GENERALIZED LINEAR MODEL IN FORECASTING OF AIR PASSENGERS CONVEYANCES FROM EU COUNTRIES Catherine Zhukovskaya Faculty of Transport and Mechanical.

5

The 8th Tartu Conference on Multivariate Statistics

The following 25 countries of EU were selected: Belgium, Czech Republic, Denmark, Germany, Estonia, Greece, Spain, France, Ireland, Italy, Cyprus, Latvia, Lithuania, Luxembourg, Hungary, Malta, Netherlands, Austria, Poland, Portugal, Slovenia, Slovakia, Finland, Sweden and United Kingdom.

The considered period was from 1996 to 2005.

All data for this investigation have been received from the electronic database“The Statistical Office of the European Communities” (EUROSTAT)

http://epp.eurostat.ec.europa.eu

The final number of the observation was 161: Data for the period from 1996 to 2004 have been used for the estimation

and forecasting - 140 observations; Data of the 2005 have been used for the check out of the quality of

forecasting, so called the cross-validation (CV) - 21 observations.

Page 6: USE OF GENERALIZED LINEAR MODEL IN FORECASTING OF AIR PASSENGERS CONVEYANCES FROM EU COUNTRIES Catherine Zhukovskaya Faculty of Transport and Mechanical.

6

The 8th Tartu Conference on Multivariate Statistics

3. 3. Used models for analyzing and forecasting of Used models for analyzing and forecasting of the air passengers’ conveyancesthe air passengers’ conveyances

The data about concrete country for the concrete year were taken as the observation.

The main object of the consideration was the air passengers’ conveyances from EU countries.

All the considered models were the group models [Andronov (1983)].

Classification of regressional models according to their mathematical form: Linear regression models; Generalized linear regression models (GLM).

Main notionsMain notions

Page 7: USE OF GENERALIZED LINEAR MODEL IN FORECASTING OF AIR PASSENGERS CONVEYANCES FROM EU COUNTRIES Catherine Zhukovskaya Faculty of Transport and Mechanical.

7

The 8th Tartu Conference on Multivariate Statistics

The linear regression model [Hardle (2004)]:

E(Y(k)(x)) = xT, (1)

where: Y(k) is a dependent variable for the k-th considered model;

x = (x1, x2, …, xd)T is d-dimensional vector of explanatory variables;  = (0, 1, 2, …, d)T is a coefficient vector that has to be estimated

from observations for Y(k) and x.

The generalized linear regression model:

E(Y(k)(x)) = G{xT}, (2)

where G() is the known function of the one dimensional variable.

Page 8: USE OF GENERALIZED LINEAR MODEL IN FORECASTING OF AIR PASSENGERS CONVEYANCES FROM EU COUNTRIES Catherine Zhukovskaya Faculty of Transport and Mechanical.

8

The 8th Tartu Conference on Multivariate Statistics

4. Elaboration of linear models4. Elaboration of linear models

The basic criteria for the best model choosing:1. Multiple coefficient of determination (R2);2. Fisher criterion (F);3. Sum of the squares of the residuals (SSRes);4. Sum of the squares of residuals for the cross-validation (CV SSRes).

For the checking of the statistical hypotheses we always used the statistical significance level  = 0.05.

MODEL #1MODEL #1

Y(1) = 0 + 1x1 + 2x2 + 3x3 + 4x4 + 5x5 + 6x6+ 7x7 + 8x8 + 9x9 + 10x10,

where Y(1) is the total number of air passenger carried;x1 = t1, x2 = t2, x3 = t3, x4 = t4, x5 = t5, x6 = t6, x7 = t7, x8 = t8, x9 = t9, x10 = t10.

Page 9: USE OF GENERALIZED LINEAR MODEL IN FORECASTING OF AIR PASSENGERS CONVEYANCES FROM EU COUNTRIES Catherine Zhukovskaya Faculty of Transport and Mechanical.

9

The 8th Tartu Conference on Multivariate Statistics

Table 1

Results for the MODEL #1Results for the MODEL #1

Ê(Y(1)(x)) = 14 – 0,77x1 + 0,16x2 + 185,8x3 -2,44x4 + 0,53x5 + 0,07x6 + 0,05x7 +

+ 0,32x8 -1,2x9 - 1,03x10

..

Fisher criterion F = 63.49R2 = 0.831

Variable Factor b t(129) p-level

Intercept 14.00 0.84 0.405

x1 TP -0.77 -1.56 0.121

x2 AREA 0.16 5.60 0.000

x3 PD 185.80 4.67 0.000

x4 MLC -2.44 -0.44 0.660

x5 GDP_PPS 0.53 1.68 0.096

x6 GDP 0.07 3.81 0.000

x7 CPL 0.05 0.37 0.710

x8 IR 0.32 0.29 0.771

x9 UR -1.20 -1.59 0.114

x10 LPHW -1.03 -3.75 0.000

Page 10: USE OF GENERALIZED LINEAR MODEL IN FORECASTING OF AIR PASSENGERS CONVEYANCES FROM EU COUNTRIES Catherine Zhukovskaya Faculty of Transport and Mechanical.

10

The 8th Tartu Conference on Multivariate Statistics

MODEL MODEL #2#2

Y(2) = 0 + 1x1 + 2x2 + 3x3 + 4x4 + 5x5,

where Y(2) = Y(1);x1 = t2, x2 = t3, x3 = t6, x4 = t10, x5 = t11.

Results for the MODEL #2Results for the MODEL #2

Ê(Y(2)(x)) = 13.56 + 0,09x1 + 134,01x2 + 0,05x3 - 0,68x4 + 29,36x5.

t11 (ON) =0, if the considered country is the old member of EU; 1, if the considered country is the new one.

Table 2

Variable Factor b t(134) p-level

Intercept 13.56 2.45 0.016

x1 AREA 0.09 4.45 0.000

x2 PD 134.01 4.32 0.000

x3 GDP 0.05 10.34 0.000

x4 LPHW -0.68 -5.12 0.000

x5 ON 29.36 4.21 0.000

R2 = 0.829

Fisher criterion F = 129.85

New factorNew factor

Page 11: USE OF GENERALIZED LINEAR MODEL IN FORECASTING OF AIR PASSENGERS CONVEYANCES FROM EU COUNTRIES Catherine Zhukovskaya Faculty of Transport and Mechanical.

11

The 8th Tartu Conference on Multivariate Statistics

MODEL MODEL #3#3

Y(2) = 0 + 1x1 + 2x2 + 3x3 + 4x4 + 5x5,

where Y(3) = Y(1);

Table 3

Results for the MODEL #3Results for the MODEL #3

Ê(Y(3)(x)) = -6,34 + 113,26x1 + 0,14x2 - 0,52x3 - 0,03x4 + 3,03x5

R2 = 0.867

Fisher criterion F = 174.08

Modifications of factorsModifications of factors

2162161612122211 ,,,,,,, ttttttttttttttt

252141036231 ,,,, txtxtxtxtx

Variable Factor b t(134) p-level

Intercept -6.34 -1.05 0.296

x1 PD 113.26 4.00 0.000

x2 GDP 0.14 10.66 0.000

x3 LPHW -0.52 -5.80 0.000

x4 sq(TP) -0.03 -7.56 0.000

x5 sqrt(AREA) 3.03 5.74 0.000

Page 12: USE OF GENERALIZED LINEAR MODEL IN FORECASTING OF AIR PASSENGERS CONVEYANCES FROM EU COUNTRIES Catherine Zhukovskaya Faculty of Transport and Mechanical.

12

The 8th Tartu Conference on Multivariate Statistics

Analysis of observed and predicted valuesAnalysis of observed and predicted valuesfor the MODEL #3for the MODEL #3

1 2

-50.00

0.00

50.00

100.00

150.00

200.00

250.00

0 20 40 60 80 100 120 140

Observed Predicted

Figure 1. Plot of observed and predicted values.

Figure 2. Plot of observed and predicted values for the CV.

-50.00

0.00

50.00

100.00

150.00

200.00

250.00

0 3 6 9 12 15 18 21

CVObserved CVPredicted

Page 13: USE OF GENERALIZED LINEAR MODEL IN FORECASTING OF AIR PASSENGERS CONVEYANCES FROM EU COUNTRIES Catherine Zhukovskaya Faculty of Transport and Mechanical.

13

The 8th Tartu Conference on Multivariate Statistics

MODEL MODEL #4#4

Y(4) = 0 + 1x1 + 2x2 + 3x3 + 4x4 + 5x5 + 6x6 + 7x7 + 8x8 + 9x9,

where Y(4) = Y(1)/t1 - the ratio between the total number of air passenger carried and the number of inhabitants of the country;

Table 4

Results for the MODEL #4Results for the MODEL #4

Ê(Y(4)(x)) = 0,56 + 2,33x1 - 1,04x2 - 0,02x3 + 0,001x4 + 1,76x5 - 0,0004x6 +

+0,04x7 + 0,17x8.

R2 = 0.760

Fisher criterion F = 45.81

169128271611564433221 ,,,,,,, ttxttxtxtxtxtxtxtxtx ,

Variable Factor b t(131) p-level

Intercept -5.67 -6.25 0.000

x1 AREA -0.02 -6.73 0.000

x2 PD 10.37 6.19 0.000

x3 MLC -0.73 -4.19 0.000

x4 ON 0.83 8.30 0.000

x5 sqrt(TP) -1.02 -7.32 0.000

x6 sqrt(AREA) 1.06 7.10 0.000

x7 AREA/TP -0.12 -6.98 0.000

x8 sqrt(AREA)/TP 0.94 5.84 0.000

x9 GDP/TP 0.15 6.28 0.000

Page 14: USE OF GENERALIZED LINEAR MODEL IN FORECASTING OF AIR PASSENGERS CONVEYANCES FROM EU COUNTRIES Catherine Zhukovskaya Faculty of Transport and Mechanical.

14

The 8th Tartu Conference on Multivariate Statistics

MODEL MODEL #5#5

Y(2) = 0 + 1x1 + 2x2 + 3x3 + 4x4 + 5x5 + 6x6 + 7x7 + 8x8,

where Y(5) = Y(4);

t12 (HL) =0, if the value y/t1 for the considered country is small (less than 2);

1, if the value y/t1 is larger than 2.

Table 5

Results for the MODEL #5Results for the MODEL #5

Ê(Y(5)(x)) = 0,99 - 0,46x1 - 0,02x2 - 0,02x3 - 0,02x4 + 0,01x5 + 1,27x6 + 1,15x7 + 0,07x8

R2 = 0.864

Fisher criterion F = 104.174

New New factor

Variable Factor b t(131) p-level

Intercept 0.99 3.93 0.000

x1 MLC -0.46 -3.41 0.001

x2 GDP_PPS -0.02 -3.81 0.000

x3 IR -0.02 -1.33 0.187

x4 UR -0.02 -1.90 0.056

x5 LPHW 0.01 3.72 0.000

x6 ON 1.27 9.21 0.000

x7 HL 1.15 15.30 0.000

x8 GDP/TP 0.07 3.41 0.001

.,,,,,,, 16812711610594835241 ttxtxtxtxtxtxtxtx

Page 15: USE OF GENERALIZED LINEAR MODEL IN FORECASTING OF AIR PASSENGERS CONVEYANCES FROM EU COUNTRIES Catherine Zhukovskaya Faculty of Transport and Mechanical.

15

The 8th Tartu Conference on Multivariate Statistics

Pivot results for the linear regression modelsPivot results for the linear regression models

Model R2 R1 F R2 SSRes R3CV

SSResR4

Sum R

Total R

#1 0.831 3 63.49 4 52 651 5 114 885 5 17 5

#2 0.829 4 129.85 2 53 344 5 109 723 4 15 3

#3 0.867 1 174.10 1 41 599 2 49 450 1 5 1

#4 0.760 5 45.81 5 35 064 3 57 310 3 16 4

#5 0.864 2 104.20 3 12 775 1 51 448 2 8 2

Table 6

Page 16: USE OF GENERALIZED LINEAR MODEL IN FORECASTING OF AIR PASSENGERS CONVEYANCES FROM EU COUNTRIES Catherine Zhukovskaya Faculty of Transport and Mechanical.

16

The 8th Tartu Conference on Multivariate Statistics

Analysis of observed and predicted valuesAnalysis of observed and predicted valuesfor the MODEL #5for the MODEL #5

0.00

50.00

100.00

150.00

200.00

250.00

0 20 40 60 80 100 120 140

RObserved RPredicted

3 4

Figure 3. Plot of recalculated observed and predicted values.

Figure 4. Plot of recalculated observed and predicted values for the CV.

0.00

50.00

100.00

150.00

200.00

250.00

0 3 6 9 12 15 18 21

RObserved RCVPredicted

Page 17: USE OF GENERALIZED LINEAR MODEL IN FORECASTING OF AIR PASSENGERS CONVEYANCES FROM EU COUNTRIES Catherine Zhukovskaya Faculty of Transport and Mechanical.

17

The 8th Tartu Conference on Multivariate Statistics

4. Elaboration of generalized linear models4. Elaboration of generalized linear models

For the further investigation the best linear regression model (Model #5) has been chosen

Two different GLM were considered. In both of them the value of the regressand Y(GLM) = Y(5) / t1 and the collection of the regressors are the same as for Model #5.

GLM1GLM1

where hi is the total population number, xi is vector-columns of the independent variables, i is the observation number, i = 1, 2, …, n.

,

jji

jji

i

x

x

hYE

,

,GLM1

exp1

exp

j

j

i

β

β

x (3)

GLM2GLM2 ,1

j

ji

i

xa

hYE

,

GLM2

exp j

i

β

x(4)

where a is additional parameter (constant).

Page 18: USE OF GENERALIZED LINEAR MODEL IN FORECASTING OF AIR PASSENGERS CONVEYANCES FROM EU COUNTRIES Catherine Zhukovskaya Faculty of Transport and Mechanical.

18

The 8th Tartu Conference on Multivariate Statistics

For unknown parameter vector  estimation we used the least squares criterion

n

i βii YYβR

1

2

0 minˆ

1. Linearization

(5)

where Yi and Ŷi are observed and calculated values of Y.

j

jij*

*

xβY

Y,

1ln

jjij*

xβaY

,1

ln

LM1LM1

LM2LM2

(6)

(7)

where Y* = Y/ h.

Page 19: USE OF GENERALIZED LINEAR MODEL IN FORECASTING OF AIR PASSENGERS CONVEYANCES FROM EU COUNTRIES Catherine Zhukovskaya Faculty of Transport and Mechanical.

19

The 8th Tartu Conference on Multivariate Statistics

.ˆ987654321

987654321

0.647.810.290.4448.80.70.026.680.00113.78

0.647.810.290.4448.80.70.026.680.00113.78LM1

1 xxxxxxxxx

xxxxxxxxx

e

ehxYE

The models LM1 and LM2 give the following estimate for E(Y)

.0.3

1987654321 0.110.410.21.6717.960.810.041.71.6311.65

LM2xxxxxxxxxe

hxYE ˆ

We can see that linearization gives bad results. Making attempts to improve the obtained results a two-stage estimation procedure was developed.

The first stage corresponds to the considered linearization. As the second step we used the procedure of calibration when we precise the gotten estimates by using the well-known gradient method.

SSRes CV SSRes

Model #5 LM1 LM2 Model #5 LM1 LM2

R0/n 12 775 27 447 21 834 51 448 676 576 229 554

Table 7

The values of SSRes and CV SSRes for the Model #5 and LM

Page 20: USE OF GENERALIZED LINEAR MODEL IN FORECASTING OF AIR PASSENGERS CONVEYANCES FROM EU COUNTRIES Catherine Zhukovskaya Faculty of Transport and Mechanical.

20

The 8th Tartu Conference on Multivariate Statistics

Gradients for the least squares criterion

2

,

,1

1,

,

exp1

exp

exp1

exp

2

jjij

ij

jijin

i

jjij

jjij

ii

xxβh

hYβR

2

,

,1

1, exp

exp

exp

12

jjij

ij

jijin

i

jjij

ii

xβa

xxβh

xβa

hYβR

GLM1GLM1

GLM2GLM2

(8)

(9)

2. Calibration

Page 21: USE OF GENERALIZED LINEAR MODEL IN FORECASTING OF AIR PASSENGERS CONVEYANCES FROM EU COUNTRIES Catherine Zhukovskaya Faculty of Transport and Mechanical.

21

The 8th Tartu Conference on Multivariate Statistics

The GLM1 and GLM2 have the following estimates for E(Y):

,1 987654321

987654321

0.150.680.111.265.770.760.021.221.057.05

0.150.680.111.265.770.760.021.221.057.05GLM1

xxxxxxxxx

xxxxxxxxx

e

ehxYE

ˆ

.6.3

1987654321 0.060.130.11.127.810.820.020.781.097.26

GLM2xxxxxxxxxe

hxYE ˆ

CV SSRes

Model #5 GLM1 GLM2

R0/n 51 447 47 807 34 567

Table 8

For the GLM2 we found the optimum value of R0 not only from the values but from the parameter also.

Page 22: USE OF GENERALIZED LINEAR MODEL IN FORECASTING OF AIR PASSENGERS CONVEYANCES FROM EU COUNTRIES Catherine Zhukovskaya Faculty of Transport and Mechanical.

22

The 8th Tartu Conference on Multivariate Statistics

Analysis of observed and predicted valuesAnalysis of observed and predicted valuesfor the GLMfor the GLM

5 6

Figure 5. Plot of observed and predicted values.

Figure 6. Plot of observed and predicted values for the CV.

-50

0

50

100

150

200

250

300

0 20 40 60 80 100 120 140

Robserved GLM1 GLM2

-50.00

0.00

50.00

100.00

150.00

200.00

250.00

0 3 6 9 12 15 18 21

CV Observed CV GLM1 CV GLM2

Page 23: USE OF GENERALIZED LINEAR MODEL IN FORECASTING OF AIR PASSENGERS CONVEYANCES FROM EU COUNTRIES Catherine Zhukovskaya Faculty of Transport and Mechanical.

23

The 8th Tartu Conference on Multivariate Statistics

0

10000

20000

30000

40000

50000

60000

70000

80000

1 2 3 4 5 6 7 8 9 10

SSRes CV SSRes

Figure 7. The values of SSRes and CV SSRes as a function of parameter for GLM 2

Dependence of values SSRes and CV SSRes from the Dependence of values SSRes and CV SSRes from the value of parameter value of parameter for GLM2 for GLM2

7

The optimal value for analysis of SSRes was obtained then  = 2. The best result for the analysis of CV SSRes was obtained then  = 6.

Page 24: USE OF GENERALIZED LINEAR MODEL IN FORECASTING OF AIR PASSENGERS CONVEYANCES FROM EU COUNTRIES Catherine Zhukovskaya Faculty of Transport and Mechanical.

24

The 8th Tartu Conference on Multivariate Statistics

6. Conclusion6. Conclusion The linear and generalized linear regressional models for the

forecasting of air passengers conveyances from EU countries were considered. These models contain a big number of explanatory factors and their combinations.

For the estimation of the unknown parameters of the linear regressional models we used the standard procedures. For the estimation of unknown parameters of GLM the special two-stage procedure has been elaborated.

The cross-validation approach has been taken as the main procedure for the check out the adequacy of all considered models and choosing the best model for the forecasting.

The advantage of GLM application has been shown.

Page 25: USE OF GENERALIZED LINEAR MODEL IN FORECASTING OF AIR PASSENGERS CONVEYANCES FROM EU COUNTRIES Catherine Zhukovskaya Faculty of Transport and Mechanical.

25

The 8th Tartu Conference on Multivariate Statistics

7. References7. References 1. Andronov A.M. etc. Forecasting of air passengers conveyances on the

transport. // Transport, Moscow, 1983. (In Russian).

2. Butkevičius J., Vyskupaitis A. Development of passenger transportation by Lithuanian sea transport. // In Proceedings of International Conference RelStat’04, Transport and Telecommunication, Vol.6. N 2, 2005.

3. Hardle W., Muller M., Sperlich S., Werwatz A. Nonparametric and Semiparametric Models. Springer, Berlin, 2004.

4. Hünt U. Forecasting of railway freight volume: approach of Estonian railway to arise efficiency. // In TRANSPORT – 2003, Vol. XXVIII, No 6, pp. 255-258.

5. Šliupas T. Annual average daily traffic forecasting using different techniques. // In TRANSPORT – 2006, Vol. XXI, No 1, pp. 38-43.

6. EUROSTAT YEARBOOK 2005. The statistical guide to Europe. Data 1993–2004. EU, EuroSTAT, 2005.URL: http://epp.eurostat.ec.europa.eu

Page 26: USE OF GENERALIZED LINEAR MODEL IN FORECASTING OF AIR PASSENGERS CONVEYANCES FROM EU COUNTRIES Catherine Zhukovskaya Faculty of Transport and Mechanical.

26

The 8th Tartu Conference on Multivariate Statistics

THANK YOU FOR YOUR ATTENTIONTHANK YOU FOR YOUR ATTENTION