Pls & Marketing 10 Juillet 3

download Pls & Marketing 10 Juillet 3

of 16

Transcript of Pls & Marketing 10 Juillet 3

  • 8/6/2019 Pls & Marketing 10 Juillet 3

    1/16

    Use of ULS-SEM and PLS-SEM to

    measure a group eect in a regression

    model relating two blocks of binary

    variables

    Michel Tenenhaus1, Emmanuelle Mauger2

    & Christiane Guinot2;3

    (1) HEC School of Management (GRECHEC), 1 rue de la Libration,

    78351 Jouy-en-Josas, France

    (2) CERIES, Biometrics and Epidemiology unit, 20 rue Victor Noir,

    92521 Neuilly sur Seine, France

    (3) Computer Science Laboratory, Ecole Polytechnique, University of Tours, France

    Abstract

    The objective of this paper is to describe the use of unweighted leastsquares (ULS) structural equation modeling (SEM) and partial least squa-res (PLS) path modeling in a regression model relating two blocks ofbinary variables, when a group eect can inuence the relationship. Twosets of binary variables are available. The rst set is dened by one block

    X of predictors and the second set by one block Y of responses. PLSregression could be used to relate the responses Y to the predictors X,taking into account the block structure. However, for multigroup data,this model cannot be used because the path coecients can be dierentfrom one group to another. The relationship between Y and X is studiedin the context of structural equation modeling. A group eect A can aectthe measurement model (relating the manifest variables (MVs) to theirlatent variables (LVs)) and the structural equation model (relating theY-LV to the X-LV). In this paper, we wish to study the impact of thegroup eect on the structural model only, supposing that there is no groupeect on the measurement model. This approach has the main advantageof allowing a description of the group eect (main and interaction eects)at the LV level instead of the MV level. Then, an application of thismethodology on the data of a questionnaire investigating sun exposure

    behavior is presented.Key-words: Group eect, Interaction, PLS path modeling, Struc-

    tural Equation Modeling (SEM), Two-block regression, Unweighted LeastSquares (ULS)

    1

  • 8/6/2019 Pls & Marketing 10 Juillet 3

    2/16

    Introduction

    The objective of this paper is to describe the use of unweighted least squaresstructural equation modeling (ULS-SEM) and partial least squares path model-ing (PLS-SEM) in a regression model relating a response block Y to a predictorblock X, when a group eect A can aect the relationship. A structural equationrelates the response latent variable (LV) associated with block Y to the pre-dictor latent variable associated with block X, taking into account the groupeect A. In usual applications, the group eect acts on the measurement modelas well as on the structural model. In this paper, we wish to study the impact ofthe group eect on the structural model only, supposing that there is no groupeect on the measurement model. This constraint is easy to implement in ULS-SEM, but not in PLS-SEM. This approach has the main advantage of allowinga description of the group eect (main and interaction eects) at the LV levelinstead of the manifest variable level. We propose a four-step methodology:

    (1) Use of ULS-SEM with constraints on the measurement model, (2) LV es-timates are computed in the framework of PLS: the outer LV estimatesb andb are computed using mode A and, as inner LV estimates, the ULS-SEM LVs,

    (3) Analysis of covariance relating the dependent LVb to the independent termsb, A (main eect) and A b (interaction eect), and (4) Tests on the structural

    model, using bootstrapping.These methods were applied on the data of a questionnaire investigating

    sun exposure behavior addressed to a cohort of French adults in the context ofthe SU.VI.MAX epidemiological study. Sun protection behavior was describedaccording to gender and class of age (less than 50 at inclusion in the studyversus more or equal to 50). This paper illustrates the various stages in theconstruction of latent variables, also called scores, based on qualitative data.

    1 Theory

    Chin, Marcolin and Newsted (2003) proposed to use the PLS approach to relatethe response block Y to the predictor block X with a main eect A and aninteraction term A X added to the model as described in gure 1. In thisexample, the group variable A has two values, and A1 and A2 are two dummyvariables describing these values. Ping (1995) has studied the same model inthe LISREL context. An equivalent path model is described in gure 2, wherethe redundant manifest variables have been removed. This model in gure 2seems easier to estimate using ULS procedure than the model shown in gure 1,after removal of the redundant MVs: a negative variance estimate has beenencountered in the presented application for the gure 1 model, and not forthe gure 2 model. The study of the path coecients related to the arrowsconnecting XA1, XA2 and A to Y in gure 2 gives some insight on the maingroup eect A and on the interaction eect A X. However, this model canbe misleading because the blocks X Ah do not represent the product of thegroup eect A with the latent variable related to the X block. In this model the

    2

  • 8/6/2019 Pls & Marketing 10 Juillet 3

    3/16

    inuence of the group eect A on the measurement and the structural modelsare confounded.

    Figure 1: Two-block regression model with a group eect

    [Ping (1995), Chin, Marcolin and Newsted (2003) ]

    Figure 2: Two-block regression model with main eectand interaction

    [Group eect for measurement and structural models]

    3

  • 8/6/2019 Pls & Marketing 10 Juillet 3

    4/16

    In this paper, we suppose that there is no group eect on the measurementmodel. The regression coecients wjh in the regression equations relating the

    MVs to their LVs are all equal among the XAh blocks. This model is describedin gure 3. These equality constraints cannot be obtained with PLS-Graph(Chin, 2005) nor with other PLS softwares. But, a SEM software like AMOS 6.0(Arbuckle, 2005) could be used to estimate the path coecients subject to theseequality constraints with the ULS method.

    Figure 3: Two-block regression model with main eectand interaction

    [Group eect for the structural model only]

    1.1 Outer estimate of the latent variables in the PLS con-

    text

    Using the model described in gure 3, it is possible to compute the LV estimatesin a PLS way using the ULS-SEM weights wj . For each block, the weight wj isequal to the regression coecient ofh, LV for the block XAh, in the regressionof the manifest variable XjAh on the latent variable h:

    Cov(XjAh; h)

    V ar(h)=

    Cov(wjh + "jh; h)

    V ar(h)= wj (1)

    Therefore, in each block, these weights are proportional to the covariances be-tween the manifest variables and their LVs. With mode A, using the ULS-SEMlatent variables as LV inner estimates, the LV outer estimatebh for block XAhis given by the variable

    bh /Xj

    wj(XjAh XjAh) (2)

    4

  • 8/6/2019 Pls & Marketing 10 Juillet 3

    5/16

    where/means that the left term is equal to the right term up to a normalizationto unit variance. This approach is described in Tenenhaus et al. (2005).

    When all the X variables have the same units and all the weights wj arepositive, Fornell et al. (1996) suggest computing the LV estimate as a weightedaverage of the original MVs:

    bbh =Xj

    bwjXjAh =bAh (3)

    where bwj = wj=Pk wk andb = Pj bwjXj = Xbw: The LV estimate has valuesbetween 0 and 1 when the X variables are binary.

    In the same way, the LV outer estimate for block Y is given by

    bh /Xk

    ck(Yk Yk) (4)

    When all the weights ck are positive, they are normalized so that they sum upto 1. We obtain, keeping the same notation for the Fornell LV estimate,

    bh =Xk

    bckYk = Ybc (5)

    wherebck = ck=P` c`: This LV has also values between 0 and 1 when the Yvariables are binary.

    1.2 Use of multiple regression on the latent variables

    The structural equation of gure 3, relating to and taking into accountthe group eect A, is now estimated in the PLS framework by using the OLS

    multiple regression:

    b = 0

    + 1

    A1 + 0

    2bA1 + 03bA2 + "

    = 0

    + 1

    A1 + 0

    2bA1 + 03b(1A1) + " (6)

    = 0

    + 1

    A1 + 0

    3b+ (0

    2 0

    3)bA1 + "

    The regression equation ofb onb, taking into account the group eect A, isnally written as follows:

    b = 0

    + 1

    A1 + 2b+

    3bA1 + " (7)

    Consequently, there is a main group eect if the regression coecient of A1is signicantly dierent from zero and an interaction eect if the regressioncoecient ofbA1 is signicantly dierent from zero.

    This approach can be generalized without diculties if the group eect hasmore than two categories. In this approach ULS-SEM is only used to produceweights w and c that yield to the latent variablesb andb. The regression coe-cients of model (7) are estimated by ordinary least squares (OLS), independentlyof the ULS-SEM parameters.

    5

  • 8/6/2019 Pls & Marketing 10 Juillet 3

    6/16

    1.3 Use of bootstrap on the ULS-SEM regression coe-

    cients

    Denoting the latent variables for the model in gure 3 as follows:

    is the LV related to block Y

    1

    is the LV related to block XA1

    2

    is the LV related to block XA2

    3

    is the LV related to block A

    the theoretical model related to the model shown in gure 3 can be describedby equation (8):

    = 11 + 22 + 33 + (8)

    The test for a main eect A is equivalent to the test H0 : 3 = 0: The test for

    an interaction eect X A is equivalent to the test H0 : 1 = 2: Condenceintervals of the regression coecients of model (8) can be constructed by boot-strapping using AMOS 6.0. These intervals can be used to test the main groupeect and the interaction eect.

    2 Application

    2.1 Introduction

    Ultraviolet radiations are known to play a major role in the development of skin

    cancers in humans. Nevertheless, in developed countries an increase in sun expo-sure has been observed over the last fty years due to several sociological factors:longer holidays duration, traveling facilities and tanning being fashionable. Toestimate the risk of skin cancer occurrence and of skin photoageing related tosun exposure behavior, a self-administered questionnaire was specically de-veloped in the context of the SU.VI.MAX cohort (Guinot et al. 2001). TheSU.VI.MAX study (SUpplments en VItamines et Minraux Anti-oXydants) isa longitudinal cohort study conducted in France, which studies the relationshipbetween nutrition and health through the main chronic disorders prevalent in in-dustrialized countries. It involves a large sample of middle-age men and womenright across the country recruited in a free-living adult population (Hercberget al., 1998). The study objectives, design and population characteristics havebeen described elsewhere (Hercberg et al., 1998b). The information collected

    on this cohort oers the opportunity to conduct cross-sectional surveys usingself-reported health behavior and habits questionnaires, such as those used tostudy the sun exposure behavior of French adults (Guinot et al. 2001).

    6

  • 8/6/2019 Pls & Marketing 10 Juillet 3

    7/16

    2.2 Material and Methods

    Dermatologists and epidemiologists contributed to the denition of the question-naire, which was in two parts, the rst relating to sun exposure behavior overthe past year and the second to sun exposure behavior evaluated globally overthe subjects lifetime. The questionnaire was addressed in 1997 to the 12,741volunteers who were included in the cohort. Over 64% of the questionnaireswere returned and analyzed (8,084 individuals: 4,825 women and 3,259 men).

    In order to characterize the sun exposure of men and women, various syn-thetic variables characterizing sun exposure behavior were previously generated(Guinot et al. 2001). Homogeneous groups of variables related to sun exposurebehavior were obtained using a variable clustering method. Then, a principalcomponent analysis was performed on these groups to obtain synthetic variablescalled "scores". A rst group of binary variables was produced to characterize

    sun protection behavior over the past year (block Y with 6 variables). A secondgroup of binary variables was produced to characterize lifetime sun exposure be-havior (block X: 11 variables): intensity of lifetime sun exposure (4 variables),sun exposure during mountain sports (2 variables), sun exposure during nauti-cal sports (2 variables), sun exposure during hobbies (2 variables), and practiceof naturism (1 variable).

    The objective of this research was to study the relationship between sunprotection behavior over the past year of the individuals and their lifetime sunexposure behavior taking into account the group eects gender and class of age.

    The methodology used was the following.

    Firstly, the possible eect of gender has been studied. This analysis wascarried out in four parts:

    1st part. Because of the presence of dummy variables, the data are notmultinormal. Therefore, ULS-SEM was carried out using AMOS 6.0 with theoption Method=ULS. So, two weight vectors were obtained: a weight vector cfor the sun protection behavior over the past year and a weight vector w forlifetime sun exposure behavior.

    2nd part. Using these weights, two scores were calculated: one for the sunprotection behavior and one for lifetime sun exposure behavior.

    3rd part. Then, to study the possible gender eect on sun protection be-havior over the past year, an analysis of covariance was conducted using PROCGLM (SAS R software release 8.2 (SAS Institute Inc, 1999)) on lifetime sunexposure behavior score, gender and the interaction term between gender andlifetime sun exposure behavior score.

    4th part. Finally, the results of the last testing procedure were compared

    with those obtained using the regression coecient condence intervals formodel (8) calculated by bootstrapping (ULS-option) with AMOS 6.0.

    Secondly, the possible eect of age was studied for each gender using thesame methodology.

    7

  • 8/6/2019 Pls & Marketing 10 Juillet 3

    8/16

    2.3 Results

    The results are presented as follows. The relationship between sun protectionbehavior over the past year and lifetime sun exposure behavior has been studied,rstly with the gender eect (step 1), and secondly with the age eect for eachgender (step 2a and step 2b). Finally, three dierent lifetime sun exposurescores were obtained, as well as three sun protection over the past year scores.

    2.3.1 Step 1. Eect of gender

    ULS-SEM allowed to obtained weights c for the sun protection behavior overthe past year and weights w for the lifetime sun exposure behavior. The AMOSresults are shown in gure 4.

    Figure 4: Two block regression model for relating sun protection behaviorover the past year to the lifetime sun exposure behaviors with a gender eect

    acting on the structural model and not on the measurement model

    8

  • 8/6/2019 Pls & Marketing 10 Juillet 3

    9/16

    Then, the scores were calculated using the normalized weights on the origi-nal binary variables. The sun protection behavior over the past year was called

    Sun protection over the past year score 1 (normalized weight vector c1 shownin table 1). For example, the value c1 1 = 0:24 was obtained by dividingthe original weight 1.00 (shown in gure 4) by the sum of all the c1 weights(4:22 = 1:00 + :84 + : : : + :46). The lifetime sun exposure behavior score wascalled Lifetime sun exposure score 1 (normalized weight vector w1 shown intable 2).

    To study the possible eect of gender on sun protection behavior, an analysisof covariance was then conducted relating the Sun protection over the past yearscore 1 to the Lifetime sun exposure score 1, Gender and the interactionterm Gender*Lifetime sun exposure score 1. The results of this analysis aregiven in table 3.

    The LV Sun protection over the past year score 1 is signicantly relatedto the Lifetime sun exposure score 1 (t-test = 9.61, p

  • 8/6/2019 Pls & Marketing 10 Juillet 3

    10/16

    (t-test = 8.15, p

  • 8/6/2019 Pls & Marketing 10 Juillet 3

    11/16

  • 8/6/2019 Pls & Marketing 10 Juillet 3

    12/16

    and the interaction term Age50*Lifetime sun exposure score 2. The resultsare given in table 7. The LV Sun protection over the past year score 2 is

    signicantly related to Lifetime sun exposure score 2 (t-test = 15.3, p

  • 8/6/2019 Pls & Marketing 10 Juillet 3

    13/16

    The sun protection behavior of men less than 50 tends to increase more rapidlywith the level of lifetime sun exposure than the one of men over or equal to 50.

    Figure 6: Two block regression model for relating sun protection behaviorover the past year to the lifetime sun exposure behaviors for men with an age

    eect acting on the structural model and not on the measurement model

    These results are partially conrmed by the bootstrap analysis of model (8)given in table 10.

    13

  • 8/6/2019 Pls & Marketing 10 Juillet 3

    14/16

    The 95% CI for the regression coecient 3 is [-.104, .075]. Therefore thereis no signicant Age50 eect. The 95% CI for the regression coecients 1and 2 do overlap. We may conclude that 1 = 2 and, therefore, that thereis no signicant interaction eect Age50*Sun Exposure on Sun Protectionfor men. But this procedure does not control the risk .

    On the other hand, the previous procedure tends to give too small p-valuesbecause the LV estimates have been constructed to optimize the relationship.A bootstrap procedure on the rst approach will probably give more reliable

    results.

    Discussion

    A major issue is the stability of the scores.

    The lifetime sun exposure weights (table 6), yield to scores highly correlatedfor women and men:

    For women: Cor(Lifetime sun exposure score 1, Lifetime sun exposurescore 2) = 0.99,

    For men: Cor(Lifetime sun exposure score 1,Lifetime sun exposure score3) = 0.99.

    The weights obtained for the sun protection over the past year scores aresummarized in table 5. The correlations between the scores are all above 0.99:

    For women: Cor(Sun protection over the past year score 1,Sun protectionover the past year score 2) = 0.99,

    For men: Cor(Sun protection over the past year score 1,Sun protectionover the past year score 3) = 0.99.

    3 Conclusion

    A software like AMOS is oriented toward the estimation of the path coecientsof a structural equation model, when a software like PLS-Graph is orientedtoward the production of latent variables or scores. It is possible to use theresults of AMOS to construct scores using the PLS methodology. In this paper,we propose a way to take into account interactions in the structural equations,independently from the measurement model. This procedure follows four parts:

    (1) Use of AMOS to compute the weights of the manifest variables subject tothe constraint that the weights related to the same manifest variable are equalin the various groups.

    14

  • 8/6/2019 Pls & Marketing 10 Juillet 3

    15/16

  • 8/6/2019 Pls & Marketing 10 Juillet 3

    16/16

    [6] Hercberg, S., Galan, P., Preziosi, P., Roussel, A.-M., Arnaud, J., Richard,M.-J., Malvy, D., Paul-Dauphin, A., Brianon, S., Favier, A. (1998), Back-

    ground and rationale behind the SU.VI.MAX. study, a prevention trialusing nutritional doses of a combination of antioxidant vitamins and min-erals to reduce cardiovascular diseases and cancers. International Journalfor Vitamin and Nutrition Research, 68(1), 3-20.

    [7] Hercberg, S., Preziosi, P., Brianon, S., Galan, P., Triol, I., Malvy, D.,Roussel, A.-M., Favier, A. (1998), A primary prevention trial using nutri-tional doses of antioxidant vitamins and minerals in cardio-vascular diseasesand cancers in a general population: The SU.VI.MAX. study - Design,methods and participants characteristic. Controlled Clinical Trials, 19(4),336-351.

    [8] Hercberg, S., Galan, P., Preziosi, P., Bertrais, S., Mennen, L., Malvy, D.,

    Roussel, A.-M., Favier, A., Brianon, S. (2004), The SU.VI.MAX study:a randomized, placebo-controlled trial of the health eects of antioxidantvitamins and minerals. Archives of Internal Medicine 164(21), 2325-2342.

    [9] Ping, R.A. (1995), A Parsimonious Estimating Technique for Interactionand Quadratic Latent Variables. Journal of Marketing Research, 32 (Au-gust), 336-347.

    [10] SAS Institute Inc. (1999), SAS/STAT R Users Guide, Version 8. Cary, NC:SAS Institute Inc.

    [11] Tenenhaus, M., Esposito Vinzi V., Chatelin, Y. M., Lauro, C. (2005), PLSpath modeling, Computational Statistics and Data Analysis, 2005, 48(1),159-205.

    16