The econometric discrete dependent variable multinomial Logit model

26
The econometric discrete dependent variable multinomial Logit model Eleftherios Giovanis This paper examines the consumers’ preferences to the local furniture market in the Province of Serres. We apply a multinomial logit model to investigate the probability of buying a furniture in the following four-monthly period. We analyze also the demographic characteristics and we conclude that they are playing a major role among other factors. The questionnaire that will be analyzed in the particular project is a subset of the prototype, while the questions that were included in the initial questionnaire were too many, as a result the analysis to be quite long. So we tried to concentrate and to be restricted at the most important factors that they practice a great influence to the consumers’ choice decisions. Introduction According to the findings of the sector-based study that was realised by ICAP (COMCENTER , 2007) in Greece the majority of the productive furniture units are characterized by the small size, while usually are of familial nature and they do not have automated production. The productive units of medium and big size it is appreciated that they approach the 30% of the market share. The conclusions of this study are that the Greek enterprises present a decreasing export activity, while a shift of the market share to the super-markets has been marked, as well as to the importing enterprises via franchising. An other conclusion of the study is that the purchase and the furniture consuming are directly connected with the disposable income . So the problem that emerges is that an important part of the disposable income of Greek households is absorbed because of the obligations of the loans settlement. This fact results to the time change of the existing furniture replacement. The domestic furniture consumption marked an increasing course during the period 1998- 2006 with an average annual of 4.6%. At the year of 2006 the living room furniture it is appreciated that they assembled the 47.0% of the total domestic market, the bedroom furniture covered the 27.0% of the total domestic market, while the dining room furniture assembled a percentage of 26.0% (COMCENTER , 2007) Jonkers (2006) in a report that was conducted in collaboration with the CBI Market Survey finds that one of the major threats in the Greek domestic furniture market is that the

description

This paper examines the consumers’ preferences to the local furniture market in the Province of Serres.

Transcript of The econometric discrete dependent variable multinomial Logit model

Page 1: The econometric discrete dependent variable multinomial Logit model

The econometric discrete dependent variable multinomial Logit model

Eleftherios Giovanis

This paper examines the consumers’ preferences to the local furniture market in the Province of Serres. We apply a multinomial logit model to investigate the probability of buying a furniture in the following four-monthly period. We analyze also the demographic characteristics and we conclude that they are playing a major role among other factors. The questionnaire that will be analyzed in the particular project is a subset of the prototype, while the questions that were included in the initial questionnaire were too many, as a result the analysis to be quite long. So we tried to concentrate and to be restricted at the most important factors that they practice a great influence to the consumers’ choice decisions.

Introduction

According to the findings of the sector-based study that was realised by ICAP

(COMCENTER , 2007) in Greece the majority of the productive furniture units are characterized

by the small size, while usually are of familial nature and they do not have automated

production. The productive units of medium and big size it is appreciated that they approach

the 30% of the market share. The conclusions of this study are that the Greek enterprises

present a decreasing export activity, while a shift of the market share to the super-markets

has been marked, as well as to the importing enterprises via franchising. An other conclusion

of the study is that the purchase and the furniture consuming are directly connected with the

disposable income . So the problem that emerges is that an important part of the disposable

income of Greek households is absorbed because of the obligations of the loans settlement.

This fact results to the time change of the existing furniture replacement.

The domestic furniture consumption marked an increasing course during the period 1998-

2006 with an average annual of 4.6%. At the year of 2006 the living room furniture it is

appreciated that they assembled the 47.0% of the total domestic market, the bedroom

furniture covered the 27.0% of the total domestic market, while the dining room furniture

assembled a percentage of 26.0% (COMCENTER , 2007)

Jonkers (2006) in a report that was conducted in collaboration with the CBI Market

Survey finds that one of the major threats in the Greek domestic furniture market is that the

Page 2: The econometric discrete dependent variable multinomial Logit model

1

Greek economy is quite dependent on furniture imports , based mainly on low prices, such

they arise opportunities for the developing country exporters, because the imports from these

countries are increasing at a faster rate than the imports from the developed countries.

According to Jonkers the best opportunities are in living and dining room furniture, where

domestic production is declining. In the same survey, imports increased by 80% in value

between 2001 and 2005, while exports were increased only by 14%. The major developing

countries exporters are China with Є 56 million, Turkey with Є 28.8 million , Indonesia with

Є 16.4 million, Vietnam with Є 10.2 million, India with Є 4.9 million, Malaysia with Є 4.2

million and then smaller suppliers are followed, as Albania, Egypt and South Africa. As for

the furniture exports the largest destination country is Cyprus, while Bulgaria, Germany and

Romania are followed.

The most important firm in Serres , and one of the most important in Greece, is the firm

“DROMEAS” ABEEA , which was established in 1979 and it is sited at the Industrial Area

outside Serres, about 80 km northeast of Thessalonica. Some of the firm’s achievements are

the equipment of 10,000 seats for waiting area of Manila’s airport and 4,000 seats for two

airports in Egypt. Smaller tasks bear its stamp in UK, Saudi Arabia and Australia. Also the

firm undertook the 40.0% of the furniture production that Olympic Committee was needed

(Interwood, 2007). Some other furniture firms and shops that are taking place in the

Prefecture of Serres are “BLACK RED WHITE”, “Fratzana”, “Kioutsoukis”, “ARREDO”,

and shops like “NEOSET”, “SATO” and others.

The main role of this project is to present some of the most important Logit models, that

can be used in the marketing survey researches and to choose the possible best model, while

this model choice it’s not unique, but is depended in the kind of product or service, the

questionnaire and sample design , the kind of the market , the city or the country , as also the

demographic characteristics, where a specific research is taking place.

Page 3: The econometric discrete dependent variable multinomial Logit model

2

Data

The data have been obtained by a marketing research that was realized by telephone

interview on 12-15 February of 2008 and was conducted by the firm “Analysis Center”. The

sample is 387 households and is being referred in the Prefecture of Serres in the region of

Macedonia of Greece. In the first stage the sample design was random, but in the second stage

data have been weighted based on age and sex. We must notice that the marketing survey is

refereed to households, but we are concern and for the sex too, because we would like to

obtain hypothesis test about the opinion and the preferences difference between the two sexes.

The weightings have been made based on the demographics data provided by National

Statistical Service of Greece. As concerning the urban weighting, is not necessary because the

research is reported for the city of Serres and the Capitals of regional Municipalities, so we

are concerning about only to urban population. We must notice that if the sample in the first

sample was not random, but stratified, as the industries in a specific sector, or particular age

category of particular sex, or a specific geographical region the weighted models would create

problems, as low standard errors and consequently erroneous interpretation of test

significance hypothesis. We must mention that it’s not possible to refer the name of the firm

which gave the order of the specific marketing research for private rights, but we are just

trying to give a guide of different approaches in the estimation of Logit models, as well the

interpretation of the results.

Methodology

The first thing that we must point out is to explain why we must take the Logit and not the

Probit model. In most application the two models are quite similar, while the main difference

is that the logistic distribution has slightly fatter tails, as we can see in figure 2.1. Also there is

no important reason to choose one model over the other. Actually many researchers prefer

Logit model, because of its mathematical simplicity (Gujarati, 2004).

Page 4: The econometric discrete dependent variable multinomial Logit model

3

Figure 1 Probit and logit cumulative distributions

In the model , that we will take, we would like to estimate the probability of buying

furniture in the next four-monthly period based on the kind of the furniture that consumers

generally would prefer to buy, on the criteria they choose the shop, on how much money they

intend to give , on demographics data as sex, age, income and profession. The multinomial

logit model in its general theoretical form is:

833632531430329228

1274263252241232221

2061941831721611514

13123112104938

27165544332211

Pr

PfbPfbPfbPfbPfbPfb

PfbIncbIncbIncbIncbidbageb

SexbInfbInfbInfbInfbInfbloyb

VarietybicebMonbMonbCritbCritb

CritbCritbCbCbCbCbCbaLi

+++++

+++++++

+++++++

++++++

++++++++=

,where α is constant, C1 is a dummy variable and is referred to question 1 , table 3.1,

presented in 3,where C1=1 for Living rooms and C1=0 otherwise where C2=1 , C3=1, C4=1,

C5=1 for dining rooms, Children furniture, Garden furniture, Bedrooms and Office furniture

respectively and zero otherwise. Crit is a dummy variable and is referred to question 2 , table

3.2 where Crit1=1 for Price and Crit1=0 otherwise where Crit2=1, Crit3=1, Crit4=1 for

quality, variety and trade name respectively and zero otherwise. Mon2 is a dummy variable

and is referred to question 3, table 3.3 where Mon2 =1 for 250-600 € and Mon2 =0 otherwise

and Mon3 =1 for ≥600 € and Mon2 =0 otherwise. Variables “Price”, “Variety” are

quantitative variables and are referred to question 4 , table 3.4. Loy is a dummy variable and is

Page 5: The econometric discrete dependent variable multinomial Logit model

4

referred to question 5, table 3.5 where loy=1 for Serres and loy =0 otherwise. Inf1 is a

dummy variable and is referred to question 3.6 , table 6 where Inf1=1 for TV and Inf1=0

otherwise and so for other variables Inf. Variable Sex is a dummy variable where Sex=1 for

male and Sex=0 for female. Variable age is quantitative variable and is presented in table

3.10. Variable “id” is a dummy variable where equals with 1 when the consumer lives in the

Municipality of Serres and equals with 0 the consumer lives in the regional Municipalities of

Serres Prefecture. The reason why we are taking this variable is to examine if the consumers

are characterized by homogenous preferences according to location or if there is heterogeneity

among them. Of course we could make the analysis more complicated and to cluster into

groups the main geographic regions but we make the hypothesis that the preferences are

homogenous, because in question 2 the “location” criterion assemble only 0.9%, so it doesn’t

play a crucial role in the consumer choices. Variables Inc are the dummy income variables

where Inc1=1 for income <500 € and Inc1=0 otherwise. The same procedure followed for the

other variables of income, Inc2, Inc3 and Inc4, and are presented in table 3.8. Finally variable

Pf (table 3.8) are the dummy profession variables, where Pf1 =1 for employees in Rural

Sector and Pf1 =0 otherwise. The same procedure is followed for the other variables of

employment , Pf2, Pf3, Pf4 ,Pf5, Pf6, and Pf8. The dependent polytomous variable is Li where is

referred to question 7, table 3.7 and it is Li =1 for those who answered YES, Li =2 for those

who answered NO and Li =3 for those who answered MAY BE.

So for a dummy variable with S categories, this requires the calculation of S-1 equations,

one for each category relative to the reference category. When using multinomial logistic

regression, one category of the dependent variable is chosen as the comparison category. This

category will be for Li =3. The probability is defined as

∑+

==J

j Ji

ji

i

X

Xjy

)exp(1

)exp()Pr(

β

β (1)

Page 6: The econometric discrete dependent variable multinomial Logit model

5

,and the log likelihood function can be written as

∑∑ −=J

j jiji

J

j i XXjy )exp(log()( ββ (2)

,where for the ith individual, yi is the observed outcome (dependent variable) and Xi is a

vector of explanatory variables , categorical or not, while j is the particular outcome and J

refers to all outcomes, except the base category. The unknown parameters βj are estimated by

maximum likelihood (Bartels, Boztug & Muller, 1999). The explanatory variables in relation

(1) doesn’t include the script t because the cases are the same for each choice j. With this

model we intend to explain if an unordered set of outcomes applies to the different individuals

in our sample, which means that probabilities of all these outcomes depend on the same

characteristics (Davidson & MacKinnon, 1999). In the section of the results we will show a

simple estimation example. Multinomial Logit relies in the assumption which called

independence from irrelevant alternatives (IIA) . This assumption claims that disturbances are

independent and homoscedastic (Greene, 2002). Because the dependent variable includes 3

outcomes we will consider outcome 1 (YES) as the base reference category and we will

estimate for the other two outcomes . So the probability for outcome 1 (YES) will be

∑+==

J

j Ji

ii

X

Xy

)exp(1

)exp()1Pr( 1

β

β (3)

, for outcome 2 (NO)

∑+==

J

j Ji

ii

X

Xy

)exp(1

)exp()2Pr( 2

β

β (4)

, and finally the probability for outcome 3 (MAY BE) is

Page 7: The econometric discrete dependent variable multinomial Logit model

6

∑+==

J

j Ji

i

Xy

)exp(1

1)3Pr(

β (5)

(Davidson & MacKinnon, 1999) A final matter that we must analyze is that from question 4

we took only variables price and variety. The reason why we have done this is that consumers

seem to respond in the same way, which means that price and quality might be considered as

a single variable, grouped to one. So we are trying to reduce the number of variables to avoid

the multicollinearity problem. Because those variables of question 4 are actually hierarchical,

the procedure of the cluster analysis is an agglomerative hierarchical method that begins with

all variables separate, six in our case, each forming its own cluster. . In the first step, the two

variables closest together are joined. In the next step, either a third variable joins the first two,

or two other variables join together into a different cluster. This process continues until all

clusters joined into one, but we decide to take two groups as it is more logical for our data.

First we must find the similarity measures between the variables and this can be done with the

commonly correlation coefficient distance measure

∑∑∑∑

∑∑∑−−

−=

])([])([ 2222 yynxxn

yxxynr

(6)

Ward’s cluster method objective is to minimize the sum of squares of the deviations from the

mean value (Žiberna et al, 2004)

∑∑∑ −=k

ikijk

ji

xXESS (7)

Ward’s clustering method results are presented in the figure 2.2, where we conclude that

the first groups constitutes by price, quality, service and service after shopping and the second

group constitutes by variety and delivery. The next step is taking the averages of each group

and to obtain the new variables.

Page 8: The econometric discrete dependent variable multinomial Logit model

7

Figure 2

Ward’s clustering method

V a r ia b l e s

Similarity

d e l i v e ryv a ri e tyse rv i c e a f te r sh o p p i n gse rv i c eq u a l l i t yp ri c e

-3 ,7 0

3 0 ,8 6

6 5 ,4 3

1 0 0 ,0 0

D e n d ro g ram w ith W a rd L in k ag e a n d Ab so lu te C o r r e la t io n C o e f f ic ie n t D is ta n ce

Second method is principal components. First we find the covariance matrix of the six

above variables. Then we find the eigenvalues of the covariance matrix in table 2.1. There are

two components with eigenvalues greater than unit. Table 2.2 presents the first principal

component eigenvector and we conclude again that we can obtain variables price, service,

quality and service after shopping as one, and from the other side variety and delivery as

another variable.

The first method is the frequency weighted multinomial logistic regression based on age.

The survey was conducted based on households but age plays an significant cluster variable

because there isn’t great age difference between couples and from the age we can generate

important significant. This is explained because the category of 30-50 years old presents the

greatest majority and frequency, especially in the city. So this category has the greatest weight

than the corresponding categories 18-24 old or 65 years and more, because couples that

belong in the category of 30-50 years old are more likely to buy furniture, for various reasons

as marriage, for replacement, because of deterioration or renovation, or to buy for their

children, that they will live in other house or in other city for educational purposes, working

or marriage. The probability is:

Page 9: The econometric discrete dependent variable multinomial Logit model

8

∑+

==J

j Jii

iji

i

XW

WXjy

)exp(1

)exp()Pr(

β

β (8.a.)

, while 8.a. can be written as

∑ −

−+

==J

j Ji

ji

i

XmJ

Jn

mj

jnX

jy

))exp((1

))(exp(

)Pr(1

1

β

β

(8.b.)

Where n is the number of observations, j is the specific outcome, J express all the

outcomes, except the base category, and m is the number of cases (Langholz & Goldstein,

2001). So for example if there are three persons of 30 years old, where the cases m equals

with three, who choose outcome 2 (NO), what is the probability based on the questions and

the demographics data?

The second model is the weighted robust multinomial Logit , where we obtain the same

weight as in the case of the weighted multinomial Logit model. The problem that arise in the

previous model is that MLE method and Rao’s score test can be misleading in the model

misspecification because of misclassification errors or extreme data points, the well known

outliers, in the sample (Pia & Feser, 2000). Pregibon (1982) suggests some tools that remove

data from the sample. But the problem that arises is that, while this procedure is iterative,

leaves the analyst with a considerably reduced sample. Robust is the well known Huber-

White sandwich variance estimator. Probabilities are defined as in the 8.a. The Huber-White

variance estimator is

11 )]()[()]([

1 −

Ε

∧∧

Ε

∧∧−

Ε

∧∧

Φ= βββ HHn

VE (9)

, where ∑=

Ε

Ε

Ε

Ε

∧∧

∂∂

∂=

n

i

ii

΄

xyg

nH

1

2

]),|(log

[1

)(ββ

ββ

(9.a.)

Page 10: The econometric discrete dependent variable multinomial Logit model

9

H is the Hessian matrix and

]),|(log

][),|(log

[1

1 ΄

xygxyg

n

iin

i

ii

=∧

∧∧

∂=Φ ∑

β

β

β

β (9.b.)

,while if Ε

β is the true MLE estimator then VE simplifies to 1)]}([{ −Ε

Η− β .(Greene, 2002).

We notice that these standard errors, in the case we study, are robust for certain

misspecifications of the distribution of dependent variable and not for heteroscedasticity. The

reason why we claim that is that the assumption where disturbances are independent and

homoscedastic is confirmed with Hausman’s test and we will analyze it in next part of the

project.

The third method is the replication method with Jackknife standard errors. Jackknife is a

non nonparametric technique for estimating standard error of a statistic. The procedure is a

systematically recomputation of the statistic estimation leaving out one observation at a time

from the sample set. Thus, each subsample consists of n − 1 observations formed by deleting

a different observation from the sample. The jackknife estimator and its standard error are

then calculated from these truncated subsamples (Greene, 2002). For example, suppose θ is

the parameter of interest and let )()2()1( ...., n

∧∧∧

θθθ be estimations of θ based on n subsamples

each of size n − 1. The jackknife estimator of θ is given by (Wolter, 2007)

n

n

i

i

J

∑=

= 1

)(θ

θ (10)

and the jackknife estimate of the standard error of J

θ is

2/12

1

)( ])(1

[ J

n

i

i

n

nJ

=

∧∧

∑ −−

=∧ θθσ θ (11)

Page 11: The econometric discrete dependent variable multinomial Logit model

10

The t-statistic can be defined as

2/12

1

)(

)(

])(1

1[

)(

J

n

i

i

Ji

n

nt

=

∧∧∧

∑ −−

−=

θθ

θθ

(12)

Results

We must notice that there isn’t something equivalent and available, in the literature , to be

able to compare our results with other findings. Marketing research firms are dealing with

these matters, but these results are not available in public. From the results that are presented

in tables 1-3 in appendix we conclude that we reject the simple weighted Logit model because

of the great number of the statistical insignificance of the variables, even if from table 4 and

the Hausman test we conclude that the independence from irrelevant alternatives (IIA)

hypothesis is true. Also we reject the weighted Logit model with robust White-Huber standard

errors because of the heteroscedasticity presence and so the IIA assumption violation. So we

accept as the best estimation the weighted multinomial Logit with Jackknife standard errors,

which satisfies also the IIA assumption. So if we would like to make a probability prediction

for a consumer of buying or not or not sure of buying in the next four-monthly period we will

take the following probabilities.

)exp(1

)exp()1Pr( 1

T

iL

Ly

Σ+== ,

)exp(1

)exp()2Pr( 2

T

iL

Ly

Σ+== ,

and

)exp(1

1)3Pr(

T

iL

yΣ+

==

So for example if a consumer chose from question 1 the answer Living rooms, the main

criterion of buying from a furniture shop is the price, is female ,she intends to spend 250-600

€, she marks all the characteristics of her previous shopping – price, quality and the others-

Page 12: The econometric discrete dependent variable multinomial Logit model

11

with 5, she is 30 years old, she prefers Serres , as the region of shopping, she prefers to be

informed by leaflets, her income is 1001-1500 €, the profession is businessman and she lives

in the Municipal of Serres, then by Table 3 in appendix the probabilities for the multinomial

Logit with Jackknife standard errors will be.

L1 = -24.646 + 1.977 – 2.446 + 5*0.837 + 5* 0.415 – 30*0.042 + 0.783 –3.635 – 2.05 +

26.610 -1.998 + 8.491 = 8.093

for outcome 1 and

L2 = -22.896 - 18.582 – 1.496 + 5*0.541 + 5* 0.344 + 30*0.048 +0.269 +4.091 1.604 +

32.296 -1.266 + 16.870 = 8.559

for outcome 2

%50.3894.8485

48.3271

)559.8exp()093.8exp(1

)093.8exp(

)exp(1

)exp()1Pr( 1

=

=++

=Σ+

==T

iL

Ly

%40.6194.8485

46.5213

)559.8exp()093.8exp(1

)559.8exp(

)exp(1

)exp()2Pr( 2

=

=++

=Σ+

==T

iL

Ly

and %1.094.8485

1

)exp(1

1)3Pr( ==

Σ+==

T

iL

y

Performance test of the proposed model

The next step is to apply a Monte-Carlo simulation to test the performance evaluation and

capability of the model we are presented. The expected coefficient value can be defined as

(Janke, 2002)

∑=

=N

i

iXfN

X1

)(1

(13)

Page 13: The econometric discrete dependent variable multinomial Logit model

12

, where X is the expectation value and the estimator X is a random number fluctuating

around the theoretical expected value. The variance is

222 )()( Χ−Χ=

Χσ (14)

, where we can take the standard errorN

σ. We must mention that the formula of standard

error is important, because the standard error of a Monte-Carlo simulation analysis decreases

with the square root of the sample size. Also if we would like for example a 50% error

reduction, or a 50% increase in accuracy, we must quadruple the number of random

drawings. As we already know, from relation (1)

∑+==≡

J

j Ji

ji

ij

X

Xjy

)exp(1

)exp()Pr(

β

βπ

(15)

So we can draw a predicted value y, from a multinomial distribution with parameters equal to

πj and n=1. We simulated the model with 500 set of parameters and then we took relations

(13) and (14) to find the mean estimated parameters and their standard errors. We decided to

simulate our estimations because our sample is finite so the parameter estimations are never

certain (Tomz et al, 2000) and probably not reliable and efficient. More specifically the

program draws simulations of the parameters from their asymptotic sampling distribution

equal to the vector of the estimated parameters and variance equal to the variance-covariance

matrix of estimates (Tomz et al, 2000). From the results of table 7 we conclude that our

model is fairly good, because the estimated coefficients by Monte-Carlo simulation are very

close to the estimated coefficients of the multinomial weighted Logit model with Jackknife

standard errors.

Page 14: The econometric discrete dependent variable multinomial Logit model

13

Conclusions

We applied three different multinomial Logit models for the marketing research survey

that was conducted in the Prefecture of Serres , for the case of the furniture market. The scope

of the research was the probability estimation of buying furniture, in the next four-monthly

period, based on the questionnaire and the demographic characteristics of the potential

consumers. We found that the simple weighted multinomial Logit is suffering by many

statistical insignificant variables, as there is a great possibility of the multicollinearity

problem. From the other side the weighted multinomial Logit, with Huber-White robust

standard errors presents heteroscedasticity and violates the IIA hypothesis. So we preferred to

choose the weighted multinomial Logit, with jackknife standard errors. We applied a simple

Monte-Carlo simulation and we concluded that the proposed model is quite a good option in

our case. We must mention that there are also other good estimations, as the Principal

Components (PC) logit or bootstrap, but the estimation are quite similar, with that of the

model we propose here, so it’s not necessary to present the results. It’s just worthy of

mentioning these methods, as PCA-logit or bootstrap, because in some other cases the

estimations might be quite better.

References

COMCENTER (2007),,“ The highly-fragmented furniture market in Greece” , I.C.A.P.

Bartels K., Boztug Y. & Muller M., (1999) “Testing the multinomial logit model”, working paper, University Potsdam, Humboldt-University at Berlin, Germany

Davidson R. & MacKinnon G.J., (1999), “Econometric theory and methods,” Oxford University Press, New York ,pp. 460-462 Greene H.W., (2003), “Econometric Analysis,” Fifth edition, Prentice Hall, New Jersey, U.S.A. , pp. 518-521, 724, 924 Gujarati D., (2004), “Basic Econometrics,” Fourth edition, McGraw-Hill, U.S.A., pp. 614-615

Interwood magazine , (2007) , “Dromeas presentation,” pp. 12-21

Page 15: The econometric discrete dependent variable multinomial Logit model

14

Janke W., (2002), “Statistical Analysis of Simulations: Data Correlations and Error Estimation,” John von Neumann Institute for Computing, Julich, NIC Series, Vol. 10, pp. 423-445. Jonkers J. (2006), “The domestic furniture market in Greece,” CBI MARKET SURVEY, Centre for the promotion of imports from developing countries, The Netherlands Langholz B. & Goldstein L., (2001), “Conditional logistic analysis of case-control studies with complex sampling,” Biostatistics, 2(1), 63-84. Pia M. & Feser V., (2000), “Robust Logistic Regression for Binomial Responses”, working paper, University of Geneva. Pregibon, D. (1982). “Resistant fits for some commonly used logistic models with medical applications,” Biometrics 38, 485-498.

Tomz M., Wittenberg J., King G., (2000), “Making the Most of Statistical Analyses: Improving Interpretation and Presentation,” American Journal of Political Science, Vol. 44, No. pp. 341–355 Wolter M. K. ,(2007), “Introduction to Variance Estimation,” Statistics for Social and behavioural sciences , Second Edition, Springer, 151-153 Žiberna A., Kejžar N. & Golob P., (2004), “A Comparison of Different Approaches to Hierarchical Clustering of Ordinal Data” , Metodološki zvezki, 1(1), 57-73

Page 16: The econometric discrete dependent variable multinomial Logit model

15

TABLE 1 EIGENVALUES

Eigenvalue 2,2600 1,1140 0,9610 0,6817 0,5238 0,4595

Proportion 0,377 0,186 0,160 0,114 0,087 0,077

Cumulative 0,377 0,562 0,722 0,836 0,923 1,000

TABLE 2. 1st PC factor

Variable PC1

price 0,441

service 0,489

quallity 0,527

variety 0,205

delivery 0,177

Service after shopping

0,463

TABLE 3 1.From which furniture category to you intend

generally to buy?

Percent

Living rooms 54.2

Dining rooms 11.2

Children furniture 9.3

Garden furniture 2.8

Bedrooms 19.2

Office furniture 3.3

TABLE 4 2. Which are the main criteria of buying

from a furniture shop?

Percent

Price 56.2

Quality 31.2

Variety 8.7

Trade name 3.0

Location 0.9

Page 17: The econometric discrete dependent variable multinomial Logit model

16

TABLE 5 3. How much money do you intend to give?

Percent

≤ 250 € 19.2

250-600 € 26.8

≥ 600 € 54.0

TABLE 6

4. Mark between 1 and five (5 is the best and 1 is the worst) the following characteristics you

faced in your previous furniture shopping.

Service Percent Mean St. deviation

1 0.7

2 3.2

3 19.4

4 41.2

5 35.5

4.07

0.86

Price

1 4.8

2 5.5

3 19.0

4 31.5

5 39.2

3.95

1.11

Quality

1 7.8

2 6.5

3 22.6

4 30.4

5 32.7

3.74

1.20

Variety

1 2.4

2 6.7

3 14.1

4 24.7

5 52.1

4.17

1.06

Page 18: The econometric discrete dependent variable multinomial Logit model

17

TABLE 6 (Continue)

4. Mark between 1 and five (5 is the best and 1 is the worst) the following characteristics you

faced in your previous furniture shopping.

Delivery Percent Mean St. Deviation

1 35.3

2 11.7

3 6.7

4 9.5

5 36.8

3.0

1.76

Service after shopping

1 4.7

2 5.9

3 23.9

4 22.7

5 42.8

3.93

1.15

TABLE 7 5. For the specific shopping do you prefer the Serres shops or other regions?

Percent

Serres 74.5

Thessaloniki 17.4

Drama 4.2

Bulgaria 0.3

Other region 3.6

TABLE 8 6. How would you like to be informed about the furniture products?

Percent

TV 24.8

Radio 1.4

Newspapers-magazines 11.0

Leaflets 53.2

Phone contact 1.7

Internet 7.9

Page 19: The econometric discrete dependent variable multinomial Logit model

18

TABLE 9 7. Will you buy furniture in the following four-monthly period?

Percent

YES 19.5

NO 66.0

MAY BE 14.5

TABLE 10 Income distribution and profession activity

Income Percent

<500 € 11.2

501-1000 € 33.2

1001-1500 € 29.7

1501-2000 € 12.5

>2000 € 13.4

Profession

Rural Sector 6.1

Public Sector Employee 16.9

Private Sector Employee 16.3

Businessman 11.8

Student 3.9

Household 20.8

Unemployed 6.1

Pensioner 18.1

TABLE 11 Sex

Percent

MALE 46.5

FEMALE 53.5

TABLE 12 Age

Percent

Mean 47.0

St. Deviation 14.4

Std. Error of Mean 0.78

Page 20: The econometric discrete dependent variable multinomial Logit model

22

TABLE 1 Weighted multinomial Logit model

Market = 1 Coef. z Market

= 1

Coef. z Market = 2 Coef. z Market = 2 Coef. z

C1 -24.19117 (2258.291)

-0.01 INF4 -2.743014* (.3507851)

-7.82 C1 -22.16686 (2258.291)

-0.01 INF4 -3.368385* (.3205891)

-10.51

C2 -24.3191 (2258.291)

-0.01 INF5 22.83438 . C2 -19.07691 (2258.291)

-0.01 INF5 16.97649* (.3756762)

45.19

C3 -25.38924 (2258.291)

-0.01 Sex -2.679168* (.1854444)

-14.45 C3 -21.85798 (2258.291)

-0.01 Sex -1.333601* (.1519156)

-8.78

C4 -22.60833 (1907391)

-0.00 Age -.0599014* (.0087728)

-6.83 C4 10.60145 (1369884)

0.00 Age .0313198* (.0069501)

4.51

C5 -24.28615 (2258.291)

-0.01 id -1.906601* (.2293723)

-8.31 C5 -21.49782 (2258.291)

-0.01 id -1.112417* (.1915824)

-5.81

CRIT1 2.309676* (.184604)

12.51 INC1 -5.837068* (.4140417)

-14.10 CRIT1 -17.626 (2258.291)

-0.01 INC1 -.8446702* (.3298488)

-2.56

CRIT2 -1.402899 (.)

. INC2 -4.76893* (.2847018)

-16.75 CRIT2 -19.67238 (2258.291)

-0.01 INC2 -.1101128 (.2469728)

-0.45

CRIT3 -5.21230* (.3421758)

-15.23 INC3 -2.029722* (.2858869)

-7.10 CRIT3 -24.18795 (2258.291)

-0.01 INC3 1.624504* (.2553209)

6.36

CRIT4 22.21736 (2258.291)

0.01 INC4 -7.152252* (.3930177)

-18.20 CRIT4 2.673512 (.)

. INC4 -1.967171* (.2919626)

-6.74

MON2 -1.20423* (.3642494)

-3.31 PF1 28.05087 (2258.291)

0.01 MON2 -1.06954* (.3539236)

-3.02 PF1 29.74646* (.2330322)

127.65

MON3 -1.69661* (.348464)

-4.87 PF2 24.8759 (2258.291)

0.01 MON3 -.6975685* (.347466)

-2.01 PF2 29.64446* (.2612)

113.49

Price 1.564757* (.1200148)

13.04 PF3 28.19031 (2258.291)

0.01 Price 1.16457* (.1035494)

11.25 PF3 29.98161* (.2875892)

104.25

Variety .2101363* (.0860052)

2.44 PF4 28.53592 (2258.291)

0.01 Variety .3212617* (.0649514)

4.95 PF4 33.55159* (.5056188)

66.36

LOY .8190109* (.1819961)

4.50 PF5 21.98089 (5227809)

0.00 LOY .2840219 (.1461295)

1.94 PF5 63.10384 (4506271)

0.00

INF1 -.0253447 (.3857449)

-0.07 PF6 26.89052 (2258.291)

0.01 INF1 -1.556478* (.3376938)

-4.61 PF6 29.36098* (.2271631)

129.25

INF2 20.11875 (.)

. PF8 30.49286 (2258.291)

0.01 INF2 20.32117* (.3704731)

54.85 PF8 31.04017 (.)

.

INF3 -3.28183* (.3923674)

-8.36 constant 3.03079 . INF3 -3.631752* (.3446816)

-10.54 constant 11.004 (.)

.

Log likelihood -3154.662 Pseudo R2 = 0.4113

Note: .(market=3 is the base outcome) , st. errors in parentheses, * denotes significant in 5% level, z denotes z-statistics

Page 21: The econometric discrete dependent variable multinomial Logit model

23

TABLE 2

Weighted multinomial Logit model with Huber-White robust standard errors Market = 1 Coef. z Market = 1 Coef. z Market = 2 Coef. z Market = 2 Coef. z C1 -24.1911*

(1.311869) -18.44 INF4 -2.743014*

(.2599947) -10.55 C1 -22.16686

(.) . INF4 -3.368385*

(.1704796) -19.76

C2 -24.3191* (1.079975)

-22.52 INF5 22.83438 (.)

. C2 -19.07691 (.)

. INF5 16.97649* (.3390611)

50.07

C3 -25.3892* (1.43201)

-17.73 Sex -2.679168* (.1868817)

-14.34 C3 -21.85798* (1.377216)

-15.87 Sex -1.333601* (.1386592)

-9.62

C4 -22.6083* (1.588602)

-14.23 Age -.0599014* (.009422)

-6.36 C4 10.60145 (.)

. Age .0313198* (.007253)

4.32

C5 -24.28615 (.)

. id -1.906601* (.2064382)

-9.24 C5 -21.49782 (.)

. id -1.112417* (.1512474)

-7.35

CRIT1 2.309676* (.2087961)

11.06 INC1 -5.837068* (.3919177)

-14.89 CRIT1 -17.626* (1.776402)

-9.92 INC1 -.8446702* (.336651)

-2.51

CRIT2 -1.402899 (.)

. INC2 -4.76893* (.2479698)

-19.23 CRIT2 -19.67238 (.)

. INC2 -.1101128 (.2396833)

-0.46

CRIT3 -5.21230* (.3268497)

-15.95 INC3 -2.029722* (.2660761)

-7.63 CRIT3 -24.18795 (.)

. INC3 1.624504* (.2580996)

6.29

CRIT4 22.21736 (.)

. INC4 -7.152252* (.3635541)

-19.67 CRIT4 2.673512 (.)

. INC4 -1.967171* (.3080401)

-6.39

MON2 -1.20423* (.2631651)

-4.58 PF1 28.05087* (2.772297)

10.12 MON2 -1.06954* (.2122592)

-5.04 PF1 29.74646* (.297904)

99.85

MON3 -1.69661* (.247516)

-6.85 PF2 24.8759 (.)

. MON3 -.6975685* (.2015474)

-3.46 PF2 29.64446* (.2209823)

134.15

Price 1.564757* (.1194442)

13.10 PF3 28.19031* (.7815903)

36.07 Price 1.16457* (.1153233)

10.10 PF3 29.98161* (.2531853)

118.42

Variety .0859671* (.0859671)

2.44 PF4 28.53592* (3.422268)

8.34 Variety .3212617* (.0571129)

5.63 PF4 33.55159* (.3761956)

89.19

LOY .8190109* (.1962168)

4.17 PF5 21.98089 (.)

. LOY .2840219 (.1456701)

1.95 PF5 63.10384* (.4796068)

131.57

INF1 -.0253447 (.3071136)

-0.08 PF6 26.89052* (2.056111)

13.08 INF1 -1.556478* (.1797354)

-8.66 PF6 29.36098* (.2161207)

135.85

INF2 20.11875 (.)

. PF8 30.49286 (.)

. INF2 20.32117* (.3147879)

64.56 PF8 31.04017 (.)

.

INF3 -3.28183* (.3378955)

-9.71 constant 3.03079 . INF3 -3.631752* (.2628416)

-13.82 constant 11.004 .

Log likelihood -3154.662 Pseudo R2= 0.4113

Note: .(market=3 is the base outcome) , st. errors in parentheses, * denotes significant in 5% level, z denotes z-statistics

Page 22: The econometric discrete dependent variable multinomial Logit model

24

TABLE 3 Weighted multinomial Logit model with jackknife robust standard errors

Market = 1 Coef. t Market = 1 Coef. t Market = 2 Coef. t Market = 2 Coef. t C1 -24.646

(.4233) -58.22 INF4 -3.635

(.2786) -13.05 C1 -22.986

(.3469) -66.25 INF4 -4.091

(.1582) -25.85

C2 -23.977 (.6381)

-37.57 INF5 21.162 (.4137)

51.15 C2 -18.925 (.4808)

-39.35 INF5 15.380 (.2446)

62.87

C3 -25.565 (.4561)

-56.05 Sex -2.870 (.1950)

-14.71 C3 -22.645 (.3828)

-59.16 Sex -1.583 (.1492)

-10.61

C4 -24.050* (298.313)

-0.08 Age -.042 (.0090)

-4.68 C4 16.071* (200.5572)

0.08 Age .048 (.0067)

7.14

C5 -25.671 (.4825)

-53.19 id -1.998 (.1749)

-11.42 C5 -23.192 (.3508)

-56.10 id -1.266 (.1358)

-9.32

CRIT1 1.977 (.3763)

5.25 INC1 -8.056 (.4807)

-16.76 CRIT1 -18.582 (.3514)

-52.87 INC1 -3.207 (.4115)

-7.80

CRIT2 -1.745 (.3833)

-4.55 INC2 -4.573 (.2258)

-20.25 CRIT2 -20.684 (.2961)

-69.84 INC2 -.0002* (.218)

0.00

CRIT3 -5.741 (.3854)

-14.90 INC3 -2.050 (.2235)

-9.17 CRIT3 -25.852 (.3686)

-70.12 INC3 1.604 (.2173)

7.38

CRIT4 21.855 (.4987)

43.82 INC4 -6.932 (.3836)

-18.07 CRIT4 1.453 (.4287)

3.39 INC4 -1.630 (.3110)

-5.24

MON2 -2.110 (.3318)

-6.36 PF1 25.621 (.3814)

67.16 MON2 -2.121 (.2937)

-7.22 PF1 28.177 (.3529)

79.84

MON3 -2.446 (.2924)

-8.37 PF2 22.681 (.3734)

60.74 MON3 -1.496 (.2394)

-6.25 PF2 28.477 (.2871)

99.16

Price 0.837 (.0640)

13.07 PF3 26.076 (.3689)

70.67 Price 0.541 (.0510)

10.59 PF3 28.921 (.3169)

91.24

Variety .415 (.0867)

4.79 PF4 26.610 (.5446)

48.86 Variety .344 (.0393)

8.75 PF4 32.296 (.4236)

76.24

LOY .783 (.2189)

3.58 PF5 16.517* (307.3844)

0.05 LOY .269* (.1756)

1.53 PF5 66.619* (120.3684)

0.55

INF1 0.107* (.3587)

030 PF6 25.329 (.3955)

64.04 INF1 -0.916 (.2278)

-4.02 PF6 28.703 (.3357)

85.50

INF2 17.416 (.4322)

40.29 PF8 28.457 (.4560)

62.40 INF2 17.625 (.3227)

54.61 PF8 29.865 (.4010)

74.46

INF3 -3.645 (.3517)

-10.36 constant 8.491 (.8963)

9.47 INF3 -3.846 (.2502)

-15.37 constant 16.870 (.6426)

26.25

Log likelihood -3154.662 Pseudo R2 =

0.4113

Note: .(market=3 is the base outcome) , st. errors in parentheses, * denotes insignificant in 5% level, t denotes t-statistics

Page 23: The econometric discrete dependent variable multinomial Logit model

25

TABLE 4 Hausman's specification test for the weighted multinomial logit model

Coefficients

(b)

partial

Coefficients

(B)

all

(b-B)

Difference

W Coefficients

(b)

partial

Coefficients

(B)

all

(b-B)

Difference

W

C1 27.582 25.257 2.325 0.098 INF3 -43.365 -29.976 -13.388 3.10E+09 C2 27.274 24.948 2.325 0.202 INF4 1.258 1.333 -0.075 0.064212 C3 -19.141 -9.513 -9.628 1.25E+04 INF5 -0.027 -0.031 0.004 0.003141 C4 26.550 24.588 1.962 0.103 Sex 1.860 1.112 0.747 0.120582 C5 19.831 20.511 -0.680 2.676 Age -0.763 0.844 -1.608 0.154244 CRIT1 21.965 22.558 -0.593 2.674 Id -1.173 0.110 -1.283 0.18215 CRIT2 26.933 27.073 -0.140 2.667 INC1 -2.742 -1.624 -1.118 0.214111 CRIT3 -27.040 -12.659 -14.381 2.05E+04 INC2 1.481 1.967 -0.485 0.185999 CRIT4 -0.811 1.069 -1.880 2.690 INC3 -34.677 -36.725 2.044 0.026322 MON2 -1.474 0.697 -2.172 2.687 INC4 -34.536 -36.620 2.084 0.068746 MON3 -1.2410 -1.164 -0.076 0.031 PF1 -34.874 -36.957 2.082 . Price -0.334 -0.321 -0.013 0.028 PF2 -34.330 -36.337 2.006 . Variety 0.064 -0.284 0.348 0.091 PF3 -36.476 -38.0162 1.539 0.037052 LOY 2.971 1.556 1.414 0.205 PF4 27.582 25.257 2.325 0.098427 INF1 4.271 3.631 0.639 0.160 PF6 27.274 24.948 2.325 0.20215 INF2 4.284 3.368 0.916 0.192 INF2 4.284 3.368 0.916 0.192653

Test: H0 : difference in coefficients not systematic , Pr = 1.0000 , *Reject H1

Page 24: The econometric discrete dependent variable multinomial Logit model

26

TABLE 5

Hausman's specification test for the weighted multinomial logit model with Huber-White robust standard errors

Coefficients

(b)

partial

Coefficients

(B)

all

(b-B)

Difference

W Coefficients

(b)

partial

Coefficients

(B)

all

(b-B)

Difference

W

C1 27.582 25.254 2.325 .145 INF3 -43.364 -29.976 -13.388 .317 C2 27.274 24.948 2.325 .200 INF4 1.258 1.333 -.075 .047 C3 -19.141 -9.513 -9.628 . INF5 -.027 -.031 .004 . C4 26.550 24.588 1.962 .066 Sex 1.860 1.112 .747 .113 C5 19.831 20.511 -.680 .263 Age -.763 .844 -1.608 .103 CRIT1 21.965 22.558 -.593 .159 Id -1.173 .110 -1.283 .190 CRIT2 26.933 27.073 -.140 . INC1 -2.742 -1.624 -1.118 .254 CRIT3 -27.040 -12.659 -14.381 . INC2 1.481 1.967 -.485 .145 CRIT4 -.811 1.069 -1.880 .113 INC3 -34.676 -36.724 2.044 .150 MON2 -1.474 .697 -2.172 .108 INC4 -34.536 -36.620 2.084 .162 MON3 -1.241 -1.164 -.076 . PF1 -34.874 -36.957 2.082 . Price -.334 -.321 -.013 .022 PF2 -34.330 -36.337 2.006 .173 Variety .064 -.284 .348 .062 PF3 -36.476 -38.016 1.539 . LOY 2.971 1.556 1.414 .164 PF4 27.582 25.257 2.325 .145 INF1 4.271 3.631 .639 .184 PF6 27.274 24.948 2.325 .200 INF2 4.284 3.368 .9160 .160 INF2 -43.364 -29.976 -13.388 .317 Test: H0 : difference in coefficients not systematic , Pr = 0.0000 , *Reject H0

Page 25: The econometric discrete dependent variable multinomial Logit model

27

TABLE 6

Hausman's specification test for the weighted multinomial Logit model with Jackknife standard errors

Coefficients

(b)

partial

Coefficients

(B)

all

(b-B)

Difference

W Coefficients

(b)

partial

Coefficients

(B)

all

(b-B)

Difference

W

C1 27.58222 25.25704 2.325183 .2701009 INF3 INF3 4.271423 3.631752 .6396704 C2 24.15519 22.16708 1.988104 . INF4 INF4 4.284463 3.368385 .9160783 C3 27.27403 24.94816 2.32587 .2703368 INF5 INF5 -43.36495 -29.97649 -13.38846 C4 -19.14167 -9.513089 -9.628579 . Sex Sex 1.258049 1.333601 -.0755513 C5 26.55099 24.588 1.962987 .1360996 Age Age -.0270006 -.0313198 .0043192

CRIT1 19.83164 20.51188 -.680234 .2246666 Id id 1.86003 1.112417 .747613 CRIT2 21.96522 22.55825 -.593033 .1694605 INC1 INC1 -.7636035 .8446702 -1.608274 CRIT3 26.93368 27.07383 -.1401496 .1340384 INC2 INC2 -1.173496 .1101128 -1.283609 CRIT4 -27.04027 -12.65906 -14.38121 . INC3 INC3 -2.742845 -1.624504 -1.11834 MON2 -.8112801 1.06954 -1.88082 .1148009 INC4 INC4 1.481673 1.967171 -.485498 MON3 -1.474704 .6975685 -2.172272 .1102908 PF1 PF1 -34.67769 -36.72252 2.044834 Price -1.241008 -1.16457 -.0764379 . PF2 PF2 -34.53645 -36.62053 2.084071 Variety -.3347701 -.3212617 -.0135083 .0229542 PF3 PF3 -34.8748 -36.95767 2.082875 LOY .0642479 -.2840219 .3482699 .0649966 PF4 PF4 -38.96175 -40.52765 1.565903 INF1 2.971465 1.556478 1.414987 .1675352 PF6 PF6 -82.42541 -72.06926 -10.35615 INF2 -48.46552 -33.32116 -15.14435 . INF2 INF3 4.271423 3.631752 .6396704

Test: H0 : difference in coefficients not systematic , Pr = 0.1126 , *Reject H1

Page 26: The econometric discrete dependent variable multinomial Logit model

28

TABLE 7

MONTE-CARLO SIMULATION Market = 1 Coef. Market = 1 Coef. Market = 2 Coef. Market = 2 Coef. C1 -24.635

(.3857) INF4 -3.6452

(.2741) C1 -22.9653

(.330) INF4 -4.085

(.1412) C2 -23.9114

(.6307) INF5 21.1507

(.4331) C2 -18.8615

(.4767) INF5 15.39

(.2485) C3 -25.5435

(.4383) Sex -2.86

(.1940) C3 -22.6257

(.369) Sex -1.5663

(.1462) C4 -28.4515*

(284.75) Age -.04276

(.0090) C4 3.852*

(201.2064) Age .04814

(.0067) C5 -25.6378

(.4543) id -2.00327

(.1818) C5 -23.1686

(.3477) id -1.2552

(.1385) CRIT1 1.9965

(.3759) INC1 -8.0812

(.4886) CRIT1 -18.5965

(.3454) INC1 -3.2441

(.4225) CRIT2 -1.726

(.3801) INC2 -4.5711

(.2262) CRIT2 -20.6993

(.2911) INC2 -.0062*

(.2252) CRIT3 -5.7675

(.3623) INC3 -2.033

(.2134) CRIT3 -25.8888

(.3595) INC3 1.6044

(.2132) CRIT4 21.8952

(.5212) INC4 -6.9244

(.3731) CRIT4 1.4347

(.4523) INC4 -1.6485

(.3170) MON2 -2.1460

(.3175) PF1 25.6271

(.3897) MON2 -2..1625

(.2904) PF1 28.1815

(.36078) MON3 -2.46375

(.2921) PF2 22.656

(.3735) MON3 -1.5261

(.2363) PF2 28.4753

(.2908) Price .8361

(.0650) PF3 26.06

(.3732) Price .544

(.053) PF3 28.9281

(.3250) Variety .4138

(.009) PF4 26.6119

(.5485) Variety .3448

(.0397) PF4 32.3052

(.4336) LOY .7924

(.2149) PF5 26.4515*

(296.0381) LOY .2823*

(.1715) PF5 69.009*

(120.3516) INF1 0.1179*

(.360) PF6 25.3177

(.4018) INF1 -.8961

(.2233) PF6 28.7143

(.3418) INF2 17.3932

(.4343) PF8 28.4231

(.4448) INF2 17.602

(.3371) PF8 29.8485

(.4049) INF3 -3.6455

(.3493) constant 8.5198

(.8868) INF3 -3.8446

(.2320) constant 16.8616

(.645) Note: .(market=3 is the base outcome) , st. errors in parentheses, * denotes insignificant in 5% level.