Complex sampling in latent variable models

100
Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion Complex sampling in latent variable models Daniel Oberski Department of methodology and statistics Complex sampling in latent variable models Daniel Oberski

Transcript of Complex sampling in latent variable models

Page 1: Complex sampling in latent variable models

Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion

Complex sampling in latent variable models

Daniel Oberski

Department of methodology and statistics

Complex sampling in latent variable models Daniel Oberski

Page 2: Complex sampling in latent variable models

Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion

• When doing latent class analysis, factor analysis, IRT, orstructural equation modeling, should you use samplingweights, stratification, and clustering variables?

• What is complex about surveys?• What is ``pseudo'' about pseudo-maximum likelihood?• What are design effects and what makes them so deft?

Complex sampling in latent variable models Daniel Oberski

Page 3: Complex sampling in latent variable models

Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion

Outline

..1 Complex surveys

..2 Latent variable models (LVM)

..3 Estimation of LVM under complex sampling

..4 Effect on LVM

..5 Conclusion

Complex sampling in latent variable models Daniel Oberski

Page 4: Complex sampling in latent variable models

Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion

Does it make a difference?

Complex sampling in latent variable models Daniel Oberski

Page 5: Complex sampling in latent variable models

Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion

Unweighted regression Weighted regression

Source: 1988 National Maternal and Infant Health Survey (Korn and Graubard, 1995).

Complex sampling in latent variable models Daniel Oberski

Page 6: Complex sampling in latent variable models

Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion

Unweighted regression Weighted regression

Source: 1988 National Maternal and Infant Health Survey (Korn and Graubard, 1995).

Complex sampling in latent variable models Daniel Oberski

Page 7: Complex sampling in latent variable models

Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion

Latent class analysis of eating vegetables

Unweighted LCALow High

Latent class 33% 77%

Recall 1 high 60% 80%Recall 2 high 51% 82%Recall 3 high 40% 81%Recall 4 high 46% 79%

LCA using weightsLow High

Latent class 18% 82%

Recall 1 high 46% 78%Recall 2 high 39% 76%Recall 3 high 28% 77%Recall 4 high 39% 73%

Source: The continuing Survey of Food Intakes by Individuals (Patterson et al., 2002).

Complex sampling in latent variable models Daniel Oberski

Page 8: Complex sampling in latent variable models

Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion

Latent class analysis of eating vegetables

Unweighted LCALow High

Latent class 33% 77%

Recall 1 high 60% 80%Recall 2 high 51% 82%Recall 3 high 40% 81%Recall 4 high 46% 79%

LCA using weightsLow High

Latent class 18% 82%

Recall 1 high 46% 78%Recall 2 high 39% 76%Recall 3 high 28% 77%Recall 4 high 39% 73%

Source: The continuing Survey of Food Intakes by Individuals (Patterson et al., 2002).

Complex sampling in latent variable models Daniel Oberski

Page 9: Complex sampling in latent variable models

Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion

Sample surveys, ``linear estimators''

Complex sampling in latent variable models Daniel Oberski

Page 10: Complex sampling in latent variable models

Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion

Sample surveys

Purposes:• Descriptive;• Analytic.

Assessment of Health Status and Social Determinantsof Health (Padgol village, Gujarat, India).Source: Boston U. India Research and Outreach Initiative.

Complex sampling in latent variable models Daniel Oberski

Page 11: Complex sampling in latent variable models

Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion

Sample surveys

Idea of a sample survey: can generalize from a sample to apopulation if the sample is ``like'' the population,``representative method''.

Complex sampling in latent variable models Daniel Oberski

Page 12: Complex sampling in latent variable models

Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion

Sample of people ``like'' the population?

• Neyman (1934) figured this would be true on average ifyou draw a random sample;

• This is the theory we still use today.

``Linear estimator'':

n−1∑

i∈sampleyi

= N−1∑

i∈populationyi.

and generallymn

d→ N [µ, var(mn)]

``Design-consistent''

Complex sampling in latent variable models Daniel Oberski

Page 13: Complex sampling in latent variable models

Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion

Sample of people ``like'' the population?

• Neyman (1934) figured this would be true on average ifyou draw a random sample;

• This is the theory we still use today.

``Linear estimator'':

n−1∑

i∈sampleyi

= N−1∑

i∈populationyi.

and generallymn

d→ N [µ, var(mn)]

``Design-consistent''

Complex sampling in latent variable models Daniel Oberski

Page 14: Complex sampling in latent variable models

Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion

Sample of people ``like'' the population?

• Neyman (1934) figured this would be true on average ifyou draw a random sample;

• This is the theory we still use today.``Linear estimator'':

n−1∑

i∈sampleyi

= N−1∑

i∈populationyi.

and generallymn

d→ N [µ, var(mn)]

``Design-consistent''

Complex sampling in latent variable models Daniel Oberski

Page 15: Complex sampling in latent variable models

Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion

Sample of people ``like'' the population?

• Neyman (1934) figured this would be true on average ifyou draw a random sample;

• This is the theory we still use today.``Linear estimator'':

n−1∑

i∈sampleyi

= N−1∑

i∈populationyi.

and generallymn

d→ N [µ, var(mn)]

``Design-consistent''

Complex sampling in latent variable models Daniel Oberski

Page 16: Complex sampling in latent variable models

Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion

``Linear estimator''• Most of the time when people talk about ``linearestimators'', they are thinking about means and totals.

• But a proportion is a linear estimator too;

• for ex., proportion observed for response patterns:

Pattern Prop. Pattern Prop.1111 0.226 0111 0.0901110 0.087 0110 0.0471101 0.092 0101 0.0461100 0.049 0100 0.0301011 0.085 0011 0.0451010 0.048 0010 0.0281001 0.049 0001 0.0291000 0.029 0000 0.022

LCAestimates:

Latent class1 2

y1 0.77 0.56y2 0.78 0.55y3 0.76 0.55y4 0.78 0.54

• Even the (co)variance is a linear estimator, if you redefined := (y− E(Y))(y− E(Y))T: then var(y) = (n− 1)−1

∑d

Complex sampling in latent variable models Daniel Oberski

Page 17: Complex sampling in latent variable models

Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion

``Linear estimator''• Most of the time when people talk about ``linearestimators'', they are thinking about means and totals.

• But a proportion is a linear estimator too;• for ex., proportion observed for response patterns:

Pattern Prop. Pattern Prop.1111 0.226 0111 0.0901110 0.087 0110 0.0471101 0.092 0101 0.0461100 0.049 0100 0.0301011 0.085 0011 0.0451010 0.048 0010 0.0281001 0.049 0001 0.0291000 0.029 0000 0.022

LCAestimates:

Latent class1 2

y1 0.77 0.56y2 0.78 0.55y3 0.76 0.55y4 0.78 0.54

• Even the (co)variance is a linear estimator, if you redefined := (y− E(Y))(y− E(Y))T: then var(y) = (n− 1)−1

∑d

Complex sampling in latent variable models Daniel Oberski

Page 18: Complex sampling in latent variable models

Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion

``Linear estimator''• Most of the time when people talk about ``linearestimators'', they are thinking about means and totals.

• But a proportion is a linear estimator too;• for ex., proportion observed for response patterns:

Pattern Prop. Pattern Prop.1111 0.226 0111 0.0901110 0.087 0110 0.0471101 0.092 0101 0.0461100 0.049 0100 0.0301011 0.085 0011 0.0451010 0.048 0010 0.0281001 0.049 0001 0.0291000 0.029 0000 0.022

LCAestimates:

Latent class1 2

y1 0.77 0.56y2 0.78 0.55y3 0.76 0.55y4 0.78 0.54

• Even the (co)variance is a linear estimator, if you redefined := (y− E(Y))(y− E(Y))T: then var(y) = (n− 1)−1

∑d

Complex sampling in latent variable models Daniel Oberski

Page 19: Complex sampling in latent variable models

Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion

``Linear estimator''• Most of the time when people talk about ``linearestimators'', they are thinking about means and totals.

• But a proportion is a linear estimator too;• for ex., proportion observed for response patterns:Pattern Prop. Pattern Prop.

1111 0.226 0111 0.0901110 0.087 0110 0.0471101 0.092 0101 0.0461100 0.049 0100 0.0301011 0.085 0011 0.0451010 0.048 0010 0.0281001 0.049 0001 0.0291000 0.029 0000 0.022

LCAestimates:

Latent class1 2

y1 0.77 0.56y2 0.78 0.55y3 0.76 0.55y4 0.78 0.54

• Even the (co)variance is a linear estimator, if you redefined := (y− E(Y))(y− E(Y))T: then var(y) = (n− 1)−1

∑d

Complex sampling in latent variable models Daniel Oberski

Page 20: Complex sampling in latent variable models

Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion

``Linear estimator''• Most of the time when people talk about ``linearestimators'', they are thinking about means and totals.

• But a proportion is a linear estimator too;• for ex., proportion observed for response patterns:Pattern Prop. Pattern Prop.

1111 0.226 0111 0.0901110 0.087 0110 0.0471101 0.092 0101 0.0461100 0.049 0100 0.0301011 0.085 0011 0.0451010 0.048 0010 0.0281001 0.049 0001 0.0291000 0.029 0000 0.022

LCAestimates:

Latent class1 2

y1 0.77 0.56y2 0.78 0.55y3 0.76 0.55y4 0.78 0.54

• Even the (co)variance is a linear estimator, if you redefined := (y− E(Y))(y− E(Y))T: then var(y) = (n− 1)−1

∑d

Complex sampling in latent variable models Daniel Oberski

Page 21: Complex sampling in latent variable models

Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion

``Linear estimator''• Most of the time when people talk about ``linearestimators'', they are thinking about means and totals.

• But a proportion is a linear estimator too;• for ex., proportion observed for response patterns:Pattern Prop. Pattern Prop.

1111 0.226 0111 0.0901110 0.087 0110 0.0471101 0.092 0101 0.0461100 0.049 0100 0.0301011 0.085 0011 0.0451010 0.048 0010 0.0281001 0.049 0001 0.0291000 0.029 0000 0.022

LCAestimates:

Latent class1 2

y1 0.77 0.56y2 0.78 0.55y3 0.76 0.55y4 0.78 0.54

• Even the (co)variance is a linear estimator, if you redefined := (y− E(Y))(y− E(Y))T: then var(y) = (n− 1)−1

∑d

Complex sampling in latent variable models Daniel Oberski

Page 22: Complex sampling in latent variable models

Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion

``Linear estimator''• Most of the time when people talk about ``linearestimators'', they are thinking about means and totals.

• But a proportion is a linear estimator too;• for ex., proportion observed for response patterns:Pattern Prop. Pattern Prop.

1111 0.226 0111 0.0901110 0.087 0110 0.0471101 0.092 0101 0.0461100 0.049 0100 0.0301011 0.085 0011 0.0451010 0.048 0010 0.0281001 0.049 0001 0.0291000 0.029 0000 0.022

LCAestimates:

Latent class1 2

y1 0.77 0.56y2 0.78 0.55y3 0.76 0.55y4 0.78 0.54

• Even the (co)variance is a linear estimator, if you redefined := (y− E(Y))(y− E(Y))T: then var(y) = (n− 1)−1

∑d

Complex sampling in latent variable models Daniel Oberski

Page 23: Complex sampling in latent variable models

Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion

``Linear estimator''• Most of the time when people talk about ``linearestimators'', they are thinking about means and totals.

• But a proportion is a linear estimator too;• for ex., proportion observed for response patterns:Pattern Prop. Pattern Prop.

1111 0.226 0111 0.0901110 0.087 0110 0.0471101 0.092 0101 0.0461100 0.049 0100 0.0301011 0.085 0011 0.0451010 0.048 0010 0.0281001 0.049 0001 0.0291000 0.029 0000 0.022

LCAestimates:

Latent class1 2

y1 0.77 0.56y2 0.78 0.55y3 0.76 0.55y4 0.78 0.54

• Even the (co)variance is a linear estimator, if you redefined := (y− E(Y))(y− E(Y))T: then var(y) = (n− 1)−1

∑d

Complex sampling in latent variable models Daniel Oberski

Page 24: Complex sampling in latent variable models

Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion

Complications→ ``complex surveys'':• Clustering• Stratification• Selection with unequal probabilities πi

Equivalent: not independently and identically distributed (iid)

Complex sampling in latent variable models Daniel Oberski

Page 25: Complex sampling in latent variable models

Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion

Clustering

Complex sampling in latent variable models Daniel Oberski

Page 26: Complex sampling in latent variable models

Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion

Simple random sampling: a lot of driving

A simple random sample of voter locations in the US.Source: Lumley (2010).

Complex sampling in latent variable models Daniel Oberski

Page 27: Complex sampling in latent variable models

Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion

Source: Heeringa et al. (2010)

Complex sampling in latent variable models Daniel Oberski

Page 28: Complex sampling in latent variable models

Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion

Sample clustering for several reasons:• Geographic clustering of elements for household surveysreduces interviewing costs by amortizing travel andrelated expenditures over a group of observations. E.g.:NCS- R, National Health and Nutrition Examination Survey(NHANES), Health and Retirement Study (HRS)

• Sample elements may not be individually identified on theavailable sampling frames but can be linked to aggregatecluster units (e.g., voters at precinct polling stations,students in colleges and universities). The availablesampling frame often identifies only the cluster groupings.

• One or more stages of the sample are deliberatelyclustered to enable the estimation of multilevel modelsand components of variance in variables of interest (e.g.,students in classes, classes within schools).

(Heeringa et al., 2010, p. 28)

Complex sampling in latent variable models Daniel Oberski

Page 29: Complex sampling in latent variable models

Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion

Sample clustering for several reasons:• Geographic clustering of elements for household surveysreduces interviewing costs by amortizing travel andrelated expenditures over a group of observations. E.g.:NCS- R, National Health and Nutrition Examination Survey(NHANES), Health and Retirement Study (HRS)

• Sample elements may not be individually identified on theavailable sampling frames but can be linked to aggregatecluster units (e.g., voters at precinct polling stations,students in colleges and universities). The availablesampling frame often identifies only the cluster groupings.

• One or more stages of the sample are deliberatelyclustered to enable the estimation of multilevel modelsand components of variance in variables of interest (e.g.,students in classes, classes within schools).

(Heeringa et al., 2010, p. 28)

Complex sampling in latent variable models Daniel Oberski

Page 30: Complex sampling in latent variable models

Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion

Sample clustering for several reasons:• Geographic clustering of elements for household surveysreduces interviewing costs by amortizing travel andrelated expenditures over a group of observations. E.g.:NCS- R, National Health and Nutrition Examination Survey(NHANES), Health and Retirement Study (HRS)

• Sample elements may not be individually identified on theavailable sampling frames but can be linked to aggregatecluster units (e.g., voters at precinct polling stations,students in colleges and universities). The availablesampling frame often identifies only the cluster groupings.

• One or more stages of the sample are deliberatelyclustered to enable the estimation of multilevel modelsand components of variance in variables of interest (e.g.,students in classes, classes within schools).

(Heeringa et al., 2010, p. 28)

Complex sampling in latent variable models Daniel Oberski

Page 31: Complex sampling in latent variable models

Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion

Stratification

Complex sampling in latent variable models Daniel Oberski

Page 32: Complex sampling in latent variable models

Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion

Sample stratified by region

Complex sampling in latent variable models Daniel Oberski

Page 33: Complex sampling in latent variable models

Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion

Stratified sampling serves several purposes:• Relative to an SRS of equal size, smaller standard errors• Disproportionately allocate the sample to subpopulations,that is, to oversample specific subpopulations to ensuresufficient sample sizes for analysis.

(Heeringa et al., 2010, p. 32)

Complex sampling in latent variable models Daniel Oberski

Page 34: Complex sampling in latent variable models

Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion

Unequal probabilities of selection

Complex sampling in latent variable models Daniel Oberski

Page 35: Complex sampling in latent variable models

Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion

Complex sampling in latent variable models Daniel Oberski

Page 36: Complex sampling in latent variable models

Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion

Common reasons for varying probabilities of case selection insample surveys include (Heeringa et al., 2010, p. 38--43):

• Disproportionate sampling within strata to• achieve an optimally allocated sample

• deliberately increase precision for subpopulations• Differentially sample subpopulations, e.g. NHANESoversampling of people with disabilities.

• Subsampling of observational units within sample clusters,e.g. selecting a single random respondent from theeligible members of sample households.

• Sampling probability that can be obtained only in theprocess of the survey data collection, e.g. in a randomdigit dialing (RDD) telephone survey, number of distinctlandline telephone numbers

• Nonresponse

Complex sampling in latent variable models Daniel Oberski

Page 37: Complex sampling in latent variable models

Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion

Common reasons for varying probabilities of case selection insample surveys include (Heeringa et al., 2010, p. 38--43):

• Disproportionate sampling within strata to• achieve an optimally allocated sample• deliberately increase precision for subpopulations

• Differentially sample subpopulations, e.g. NHANESoversampling of people with disabilities.

• Subsampling of observational units within sample clusters,e.g. selecting a single random respondent from theeligible members of sample households.

• Sampling probability that can be obtained only in theprocess of the survey data collection, e.g. in a randomdigit dialing (RDD) telephone survey, number of distinctlandline telephone numbers

• Nonresponse

Complex sampling in latent variable models Daniel Oberski

Page 38: Complex sampling in latent variable models

Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion

Common reasons for varying probabilities of case selection insample surveys include (Heeringa et al., 2010, p. 38--43):

• Disproportionate sampling within strata to• achieve an optimally allocated sample• deliberately increase precision for subpopulations

• Differentially sample subpopulations, e.g. NHANESoversampling of people with disabilities.

• Subsampling of observational units within sample clusters,e.g. selecting a single random respondent from theeligible members of sample households.

• Sampling probability that can be obtained only in theprocess of the survey data collection, e.g. in a randomdigit dialing (RDD) telephone survey, number of distinctlandline telephone numbers

• Nonresponse

Complex sampling in latent variable models Daniel Oberski

Page 39: Complex sampling in latent variable models

Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion

Common reasons for varying probabilities of case selection insample surveys include (Heeringa et al., 2010, p. 38--43):

• Disproportionate sampling within strata to• achieve an optimally allocated sample• deliberately increase precision for subpopulations

• Differentially sample subpopulations, e.g. NHANESoversampling of people with disabilities.

• Subsampling of observational units within sample clusters,e.g. selecting a single random respondent from theeligible members of sample households.

• Sampling probability that can be obtained only in theprocess of the survey data collection, e.g. in a randomdigit dialing (RDD) telephone survey, number of distinctlandline telephone numbers

• Nonresponse

Complex sampling in latent variable models Daniel Oberski

Page 40: Complex sampling in latent variable models

Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion

Common reasons for varying probabilities of case selection insample surveys include (Heeringa et al., 2010, p. 38--43):

• Disproportionate sampling within strata to• achieve an optimally allocated sample• deliberately increase precision for subpopulations

• Differentially sample subpopulations, e.g. NHANESoversampling of people with disabilities.

• Subsampling of observational units within sample clusters,e.g. selecting a single random respondent from theeligible members of sample households.

• Sampling probability that can be obtained only in theprocess of the survey data collection, e.g. in a randomdigit dialing (RDD) telephone survey, number of distinctlandline telephone numbers

• Nonresponse

Complex sampling in latent variable models Daniel Oberski

Page 41: Complex sampling in latent variable models

Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion

Common reasons for varying probabilities of case selection insample surveys include (Heeringa et al., 2010, p. 38--43):

• Disproportionate sampling within strata to• achieve an optimally allocated sample• deliberately increase precision for subpopulations

• Differentially sample subpopulations, e.g. NHANESoversampling of people with disabilities.

• Subsampling of observational units within sample clusters,e.g. selecting a single random respondent from theeligible members of sample households.

• Sampling probability that can be obtained only in theprocess of the survey data collection, e.g. in a randomdigit dialing (RDD) telephone survey, number of distinctlandline telephone numbers

• Nonresponse

Complex sampling in latent variable models Daniel Oberski

Page 42: Complex sampling in latent variable models

Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion

Common reasons for varying probabilities of case selection insample surveys include (Heeringa et al., 2010, p. 38--43):

• Disproportionate sampling within strata to• achieve an optimally allocated sample• deliberately increase precision for subpopulations

• Differentially sample subpopulations, e.g. NHANESoversampling of people with disabilities.

• Subsampling of observational units within sample clusters,e.g. selecting a single random respondent from theeligible members of sample households.

• Sampling probability that can be obtained only in theprocess of the survey data collection, e.g. in a randomdigit dialing (RDD) telephone survey, number of distinctlandline telephone numbers

• Nonresponse

Complex sampling in latent variable models Daniel Oberski

Page 43: Complex sampling in latent variable models

Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion

Linear estimators in complex samples

Complex sampling in latent variable models Daniel Oberski

Page 44: Complex sampling in latent variable models

Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion

Problems:Bias: If some (types of) people have a differing chance πiof being in the sample, usual sample statistics will not (onaverage) equal the population quantities anymore.Variance: Affected by clustering/stratification.

If µ̂n := n−1∑i∈sample

1πiyi, notice:

n−1∑

i∈sample

1

πiyi

= N−1∑

i∈population

πiπiyi = N

−1∑

i∈populationyi

Solutions:• weighted estimator µ̂n unbiased (Horvitz and Thompson, 1952);• Can obtain variance of weighted estimate, var(µ̂n), underclustering, stratification.

Complex sampling in latent variable models Daniel Oberski

Page 45: Complex sampling in latent variable models

Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion

Problems:Bias: If some (types of) people have a differing chance πiof being in the sample, usual sample statistics will not (onaverage) equal the population quantities anymore.Variance: Affected by clustering/stratification.

If µ̂n := n−1∑i∈sample

1πiyi, notice:

n−1∑

i∈sample

1

πiyi

= N−1∑

i∈population

πiπiyi = N

−1∑

i∈populationyi

Solutions:• weighted estimator µ̂n unbiased (Horvitz and Thompson, 1952);• Can obtain variance of weighted estimate, var(µ̂n), underclustering, stratification.

Complex sampling in latent variable models Daniel Oberski

Page 46: Complex sampling in latent variable models

Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion

Problems:Bias: If some (types of) people have a differing chance πiof being in the sample, usual sample statistics will not (onaverage) equal the population quantities anymore.Variance: Affected by clustering/stratification.

If µ̂n := n−1∑i∈sample

1πiyi, notice:

n−1∑

i∈sample

1

πiyi

= N−1∑

i∈population

πiπiyi = N

−1∑

i∈populationyi

Solutions:• weighted estimator µ̂n unbiased (Horvitz and Thompson, 1952);• Can obtain variance of weighted estimate, var(µ̂n), underclustering, stratification.

Complex sampling in latent variable models Daniel Oberski

Page 47: Complex sampling in latent variable models

Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion

Latent variable modeling

Complex sampling in latent variable models Daniel Oberski

Page 48: Complex sampling in latent variable models

Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion

Latent variable modeling (LVM)

• (Confirmatory) factor analysis (CFA);• Structural Equation Modeling (SEM);• Latent Class Analysis/Modeling (LCA/LCM);• Latent trait modeling;• Item Response Theory (IRT) models;• Mixture models;• Random effects/hierarchical/multilevel models;• ``Anchoring vignettes'' models;• ... etc.

Complex sampling in latent variable models Daniel Oberski

Page 49: Complex sampling in latent variable models

Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion

• Proportions can be turned into an LC or IRT analysis;• Covariances can be turned into a SEM analysis.

DefinitionLatent variable model estimation: a way of turning observedcovariances/proportions (``moments'') into LVM parameterestimates.

LVM : mn → θ̂n

Complex sampling in latent variable models Daniel Oberski

Page 50: Complex sampling in latent variable models

Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion

• Proportions can be turned into an LC or IRT analysis;• Covariances can be turned into a SEM analysis.

DefinitionLatent variable model estimation: a way of turning observedcovariances/proportions (``moments'') into LVM parameterestimates.

LVM : mn → θ̂n

Complex sampling in latent variable models Daniel Oberski

Page 51: Complex sampling in latent variable models

Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion

LVM : mn → θ̂n

Example: confirmatory factor analysis (CFA) with 1 factor, 3indicators:

:λ̂11 =

√cor(y1, y2)cor(y1, y3)/cor(y2, y3)

λ̂21 =√

cor(y1, y2)cor(y2, y3)/cor(y1, y3)

λ̂31 =√

cor(y1, y3)cor(y2, y3)/cor(y1, y2)

Complex sampling in latent variable models Daniel Oberski

Page 52: Complex sampling in latent variable models

Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion

Inference in latent variable models under simple randomsampling

Complex sampling in latent variable models Daniel Oberski

Page 53: Complex sampling in latent variable models

Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion

Data generating process:

Model

Superpopulation

Finite population

Sample

Inference:

Model

Finite population

Sample

(Fuller, 2009).

Complex sampling in latent variable models Daniel Oberski

Page 54: Complex sampling in latent variable models

Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion

Data generating process:

Model

Superpopulation

Finite population

Sample

Inference:

Model

Finite population

Sample

(Fuller, 2009).

Complex sampling in latent variable models Daniel Oberski

Page 55: Complex sampling in latent variable models

Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion

Data generating process:

Model

Superpopulation

→ Finite population →

Sample

Inference:

Model

Finite population

Sample

(Fuller, 2009).

Complex sampling in latent variable models Daniel Oberski

Page 56: Complex sampling in latent variable models

Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion

Data generating process:

Model

Superpopulation

→ Finite population → Sample

Inference:

Model

Finite population

Sample

(Fuller, 2009).

Complex sampling in latent variable models Daniel Oberski

Page 57: Complex sampling in latent variable models

Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion

Data generating process:

Model

Superpopulation

→ Finite population → Sample

Inference:

Model

Finite population

← Sample

(Fuller, 2009).

Complex sampling in latent variable models Daniel Oberski

Page 58: Complex sampling in latent variable models

Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion

Data generating process:

Model

Superpopulation

→ Finite population → Sample

Inference:

Model

← Finite population ← Sample

(Fuller, 2009).

Complex sampling in latent variable models Daniel Oberski

Page 59: Complex sampling in latent variable models

Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion

Data generating process:

Model

Superpopulation

→ Finite population → Sample

Inference:Model

← Finite population ← Sample

(Fuller, 2009).

Complex sampling in latent variable models Daniel Oberski

Page 60: Complex sampling in latent variable models

Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion

Superpopulation → Finite population of 100 subjects

Loadings: 0.707

y1−2

0

2

−4 −2 0 2

Corr:0.442

Corr:0.475

●●

●●

●●

●●

●●

●●

●●●

●●

● ●

y20

2

−2 0 2

Corr:0.321

●●

● ●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

● ●

●●

●●

● ●

● ●

● ●● ●

●●

●●

● ●

●●

●●●

●●

●●

●●

●●

● ●

●●

●●

●●

y30

2

−2 0 2

Loadings:y1: 0.810y2: 0.546y3: 0.587

Complex sampling in latent variable models Daniel Oberski

Page 61: Complex sampling in latent variable models

Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion

Simple random sample (SRS) of 20 from finite pop.

y1−2

0

2

−4 −2 0 2

Cor : 0.442

1: 0.425

2: 0.568

Cor : 0.475

1: 0.361

2: 0.668

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

● ●

y20

2

−2 0 2

Cor : 0.321

1: 0.258

2: 0.543

●●

● ●●●

● ●

●●

●●

●● ●

●●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

● ●

● ●● ●●

●●

●●

● ●

●● ●

●●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

y30

2

−2 0 2

simple.random1

2

(Superpopulationloadings: 0.707)

SRS factor loadingestimates:y1: 0.836y2: 0.679y3: 0.800

Complex sampling in latent variable models Daniel Oberski

Page 62: Complex sampling in latent variable models

Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion

Superpopulation inference from SRS tosuperpopulation

Superpopulation

← ←

Sample

y1−2

0

2

−4 −2 0 2

Cor : 0.442

1: 0.425

2: 0.568

Cor : 0.475

1: 0.361

2: 0.668

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

● ●

y20

2

−2 0 2

Cor : 0.321

1: 0.258

2: 0.543

●●

● ●●●

● ●

●●

●●

●● ●

●●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

● ●

● ●● ●●

●●

●●

● ●

●● ●

●●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

y30

2

−2 0 2

simple.random1

2λ11: 0.707λ21: 0.707λ31: 0.707

← ←

Avg. (sd) loading over10,000 samples:

λ̂11: 0.707 (0.125)λ̂21: 0.722 (0.127)λ̂31: 0.711 (0.122)

Complex sampling in latent variable models Daniel Oberski

Page 63: Complex sampling in latent variable models

Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion

Complex sampling affects latent variable modeling

Complex sampling in latent variable models Daniel Oberski

Page 64: Complex sampling in latent variable models

Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion

LVM : mn → θ̂n

This means that:• bias in covariances/proportions (moments) leads to bias inLVM parameter estimates;

• any across-sample variation in latent variable parameterestimates is entirely due to variation in the samplemoments used to estimate them.

• With more observed variables (moments), use MaximumLikelihood (ML) to get estimates, but above is still true.

• MLE: θ̂n = argmaxθ L(θ; µ̂n)

Complex sampling in latent variable models Daniel Oberski

Page 65: Complex sampling in latent variable models

Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion

LVM: mn → θ̂n so bias in mn means bias in θ̂n

• One solution: modeling correctly all aspects of thesampling design.

• Another solution:replacing the observed moments withdesign-consistent moments will providedesign-consistent estimates =

``pseudo-maximum likelihood'' (PML).

µ̂n → θ̂n

• (A third solution: weighted least squares - less thansatisfactory results)

(Skinner et al., 1989, chapter 3)

Complex sampling in latent variable models Daniel Oberski

Page 66: Complex sampling in latent variable models

Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion

LVM: mn → θ̂n so bias in mn means bias in θ̂n

• One solution: modeling correctly all aspects of thesampling design.

• Another solution:replacing the observed moments withdesign-consistent moments will providedesign-consistent estimates =

``pseudo-maximum likelihood'' (PML).

µ̂n → θ̂n

• (A third solution: weighted least squares - less thansatisfactory results)

(Skinner et al., 1989, chapter 3)

Complex sampling in latent variable models Daniel Oberski

Page 67: Complex sampling in latent variable models

Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion

LVM: mn → θ̂n so bias in mn means bias in θ̂n

• One solution: modeling correctly all aspects of thesampling design.

• Another solution:replacing the observed moments withdesign-consistent moments will providedesign-consistent estimates =

``pseudo-maximum likelihood'' (PML).

µ̂n → θ̂n

• (A third solution: weighted least squares - less thansatisfactory results)

(Skinner et al., 1989, chapter 3)

Complex sampling in latent variable models Daniel Oberski

Page 68: Complex sampling in latent variable models

Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion

LVM: mn → θ̂n so bias in mn means bias in θ̂n

• One solution: modeling correctly all aspects of thesampling design.

• Another solution:replacing the observed moments withdesign-consistent moments will providedesign-consistent estimates =

``pseudo-maximum likelihood'' (PML).

µ̂n → θ̂n

• (A third solution: weighted least squares - less thansatisfactory results)

(Skinner et al., 1989, chapter 3)

Complex sampling in latent variable models Daniel Oberski

Page 69: Complex sampling in latent variable models

Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion

LVM: mn → θ̂n so bias in mn means bias in θ̂n

• One solution: modeling correctly all aspects of thesampling design.

• Another solution:replacing the observed moments withdesign-consistent moments will providedesign-consistent estimates =

``pseudo-maximum likelihood'' (PML).

µ̂n → θ̂n

• (A third solution: weighted least squares - less thansatisfactory results)

(Skinner et al., 1989, chapter 3)

Complex sampling in latent variable models Daniel Oberski

Page 70: Complex sampling in latent variable models

Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion

Variance of PMLE is obtained by sandwich (linearization)estimate. In turn depends on variance of design-consistentmoment estimates (the ``meat'').

var(θ̂n) = (∆TV∆)−1∆TV︸ ︷︷ ︸ · var(µ̂n)︸ ︷︷ ︸ · V∆(∆TV∆)−1︸ ︷︷ ︸

V: Depends on distributional assumptions (=ML)∆: Depends on the specific model (=LVM)

var(µ̂n): Depends on variance of means/prop's/covar's under complex sampling

Complex sampling in latent variable models Daniel Oberski

Page 71: Complex sampling in latent variable models

Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion

Variance of PMLE is obtained by sandwich (linearization)estimate. In turn depends on variance of design-consistentmoment estimates (the ``meat'').

var(θ̂n) = (∆TV∆)−1∆TV︸ ︷︷ ︸ · var(µ̂n)︸ ︷︷ ︸ · V∆(∆TV∆)−1︸ ︷︷ ︸

V: Depends on distributional assumptions (=ML)∆: Depends on the specific model (=LVM)

var(µ̂n): Depends on variance of means/prop's/covar's under complex sampling

Complex sampling in latent variable models Daniel Oberski

Page 72: Complex sampling in latent variable models

Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion

Variance of PMLE is obtained by sandwich (linearization)estimate. In turn depends on variance of design-consistentmoment estimates (the ``meat'').

var(θ̂n) = (∆TV∆)−1∆TV︸ ︷︷ ︸ · var(µ̂n)︸ ︷︷ ︸ · V∆(∆TV∆)−1︸ ︷︷ ︸

V: Depends on distributional assumptions (=ML)∆: Depends on the specific model (=LVM)

var(µ̂n): Depends on variance of means/prop's/covar's under complex sampling

Complex sampling in latent variable models Daniel Oberski

Page 73: Complex sampling in latent variable models

Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion

Variance of PMLE is obtained by sandwich (linearization)estimate. In turn depends on variance of design-consistentmoment estimates (the ``meat'').

var(θ̂n) = (∆TV∆)−1∆TV︸ ︷︷ ︸ · var(µ̂n)︸ ︷︷ ︸ · V∆(∆TV∆)−1︸ ︷︷ ︸

V: Depends on distributional assumptions (=ML)∆: Depends on the specific model (=LVM)

var(µ̂n): Depends on variance of means/prop's/covar's under complex samplingComplex sampling in latent variable models Daniel Oberski

Page 74: Complex sampling in latent variable models

Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion

pseudo-

..1 supposed or purporting to be but not really so; false; notgenuine: pseudonym | pseudoscience.

..2 resembling or imitating: pseudohallucination |pseudo-French.

ORIGIN from Greek pseudēs ‘false,’ pseudos ‘falsehood.’

Source: New Oxford American Dictionary

Complex sampling in latent variable models Daniel Oberski

Page 75: Complex sampling in latent variable models

Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion

pseudo-

..1 supposed or purporting to be but not really so; false; notgenuine: pseudonym | pseudoscience.

..2 resembling or imitating: pseudohallucination |

pseudo-French.ORIGIN from Greek pseudēs ‘false,’ pseudos ‘falsehood.’

Source: New Oxford American Dictionary

Complex sampling in latent variable models Daniel Oberski

Page 76: Complex sampling in latent variable models

Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion

Pseudo-ML

Why ML?• Consistently estimate parameters aggregated overclusters and strata;

• Estimates ``MLE that would be obtained by fitting themodel to the population data''.

Why pseudo?• Not exactly equal to the MLE obtained by correctlymodeling all aspects of the sampling design;

• Not asymptotically optimal.Why PML?

• Aggregate parameters may be of interest;• No assumptions/modeling on design necessary.

Complex sampling in latent variable models Daniel Oberski

Page 77: Complex sampling in latent variable models

Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion

Pseudo-MLWhy ML?

• Consistently estimate parameters aggregated overclusters and strata;

• Estimates ``MLE that would be obtained by fitting themodel to the population data''.

Why pseudo?• Not exactly equal to the MLE obtained by correctlymodeling all aspects of the sampling design;

• Not asymptotically optimal.Why PML?

• Aggregate parameters may be of interest;• No assumptions/modeling on design necessary.

Complex sampling in latent variable models Daniel Oberski

Page 78: Complex sampling in latent variable models

Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion

Pseudo-MLWhy ML?

• Consistently estimate parameters aggregated overclusters and strata;

• Estimates ``MLE that would be obtained by fitting themodel to the population data''.

Why pseudo?• Not exactly equal to the MLE obtained by correctlymodeling all aspects of the sampling design;

• Not asymptotically optimal.

Why PML?• Aggregate parameters may be of interest;• No assumptions/modeling on design necessary.

Complex sampling in latent variable models Daniel Oberski

Page 79: Complex sampling in latent variable models

Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion

Pseudo-MLWhy ML?

• Consistently estimate parameters aggregated overclusters and strata;

• Estimates ``MLE that would be obtained by fitting themodel to the population data''.

Why pseudo?• Not exactly equal to the MLE obtained by correctlymodeling all aspects of the sampling design;

• Not asymptotically optimal.Why PML?

• Aggregate parameters may be of interest;• No assumptions/modeling on design necessary.

Complex sampling in latent variable models Daniel Oberski

Page 80: Complex sampling in latent variable models

Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion

The effect of clustering

Complex sampling in latent variable models Daniel Oberski

Page 81: Complex sampling in latent variable models

Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion

Cluster sample of 20 from finite population

y1−2

0

2

−4 −2 0 2

Cor : 0.442

1: 0.412

2: 0.49

Cor : 0.475

1: 0.504

2: 0.455

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●● ●

●●

y20

2

−2 0 2

Cor : 0.321

1: 0.352

2: 0.205

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

y30

2

−2 0 2●

sample.cluster1

2

(Superpopulationloadings: 0.707)

Cluster sampleloading estimates:y1: 0.997y2: 0.491y3: 0.456

Complex sampling in latent variable models Daniel Oberski

Page 82: Complex sampling in latent variable models

Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion

Superpopulation inference from SRS tosuperpopulation

Superpopulation

← ←

Sample

y1−2

0

2

−4 −2 0 2

Cor : 0.442

1: 0.425

2: 0.568

Cor : 0.475

1: 0.361

2: 0.668

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

● ●

y20

2

−2 0 2

Cor : 0.321

1: 0.258

2: 0.543

●●

● ●●●

● ●

●●

●●

●● ●

●●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

● ●

● ●● ●●

●●

●●

● ●

●● ●

●●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

y30

2

−2 0 2

simple.random1

2λ11: 0.707λ21: 0.707λ31: 0.707

← ←

Avg. (sd) loading over10,000 samples:

λ̂11: 0.665 (0.157)λ̂21: 0.699 (0.140)λ̂31: 0.703 (0.145)

Complex sampling in latent variable models Daniel Oberski

Page 83: Complex sampling in latent variable models

Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion

The effect of cluster sampling on factor analysis

λ̂11 λ̂21 λ̂31

avg sd avg sd avg sdPopulation: 0.707 0.707 0.707

SRS: 0.707 (0.125) 0.722 (0.127) 0.711 (0.122)Cluster smp: 0.665 (0.157) 0.699 (0.140) 0.703 (0.145)

deft 1.26 1.10 1.19deff 1.58 1.22 1.41% Var. incr. 58% 22% 41%

Complex sampling in latent variable models Daniel Oberski

Page 84: Complex sampling in latent variable models

Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion

Design effects' deftness

• ``Design effect'' or deff = varclus(θ̂)/varsrs(θ̂)(Kish, 1965);• deff is increase in variance relative to a simple randomsampling design;

• deft is relative increase in standard errors;• In practice deff/deft have to be estimated and we use thesandwich estimator of variance.

Useful for:• Seeing to what extent it makes a difference to takecomplex sampling into account;

• Identifying parameters that are more or less affected;• Sample size and power calculations.

Complex sampling in latent variable models Daniel Oberski

Page 85: Complex sampling in latent variable models

Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion

Design effects' deftness

• ``Design effect'' or deff = varclus(θ̂)/varsrs(θ̂)(Kish, 1965);• deff is increase in variance relative to a simple randomsampling design;

• deft is relative increase in standard errors;• In practice deff/deft have to be estimated and we use thesandwich estimator of variance.

Useful for:• Seeing to what extent it makes a difference to takecomplex sampling into account;

• Identifying parameters that are more or less affected;

• Sample size and power calculations.

Complex sampling in latent variable models Daniel Oberski

Page 86: Complex sampling in latent variable models

Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion

Design effects' deftness

• ``Design effect'' or deff = varclus(θ̂)/varsrs(θ̂)(Kish, 1965);• deff is increase in variance relative to a simple randomsampling design;

• deft is relative increase in standard errors;• In practice deff/deft have to be estimated and we use thesandwich estimator of variance.

Useful for:• Seeing to what extent it makes a difference to takecomplex sampling into account;

• Identifying parameters that are more or less affected;• Sample size and power calculations.

Complex sampling in latent variable models Daniel Oberski

Page 87: Complex sampling in latent variable models

Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion

Design effects' deftness

• ``Design effect'' or deff = varclus(θ̂)/varsrs(θ̂)(Kish, 1965);• deff is increase in variance relative to a simple randomsampling design;

• deft is relative increase in standard errors;• In practice deff/deft have to be estimated and we use thesandwich estimator of variance.

Useful for:• Seeing to what extent it makes a difference to takecomplex sampling into account;

• Identifying parameters that are more or less affected;• Sample size and power calculations.

Complex sampling in latent variable models Daniel Oberski

Page 88: Complex sampling in latent variable models

Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion

The effect of unequal probabilities of selection

Complex sampling in latent variable models Daniel Oberski

Page 89: Complex sampling in latent variable models

Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion

Sampling with probability correlated with factor x

Complex sampling in latent variable models Daniel Oberski

Page 90: Complex sampling in latent variable models

Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion

Sampling with probability correlated with x2

Complex sampling in latent variable models Daniel Oberski

Page 91: Complex sampling in latent variable models

Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion

λ̂11 λ̂21 λ̂31

avg sd avg sd avg sdPopulation: 0.707 0.707 0.707SRS: 0.707 (0.125) 0.722 (0.127) 0.711 (0.122)

Selection probability proportional to latent factor x:Unwghted: 0.679 (0.137) 0.683 (0.138) 0.692 (0.137)Bias/deft -4% 1.13 -3% 1.12 -2% 1.12Weighted: 0.687 (0.143) 0.698 (0.143) 0.703 (0.143)Bias/deft -3% 1.18 -1% 1.16 -1% 1.17

Complex sampling in latent variable models Daniel Oberski

Page 92: Complex sampling in latent variable models

Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion

λ̂11 λ̂21 λ̂31

avg sd avg sd avg sdPopulation: 0.707 0.707 0.707SRS: 0.707 (0.125) 0.722 (0.127) 0.711 (0.122)

Selection probability proportional to latent factor x:Unwghted: 0.679 (0.137) 0.683 (0.138) 0.692 (0.137)Bias/deft -4% 1.13 -3% 1.12 -2% 1.12Weighted: 0.687 (0.143) 0.698 (0.143) 0.703 (0.143)Bias/deft -3% 1.18 -1% 1.16 -1% 1.17

Selection probability proportional to x2:Unwghted: 0.845 (0.060) 0.842 (0.061) 0.843 (0.061)Bias/deft 20% 0.495 19% 0.492 19% 0.497Weighted: 0.750 (0.139) 0.739 (0.141) 0.737 (0.137)Bias/deft 6% 1.149 5% 1.145 4% 1.123

Complex sampling in latent variable models Daniel Oberski

Page 93: Complex sampling in latent variable models

Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion

When does weighting make a difference for pointestimates of latent variable models?

• (Usually) when weights represent omitted variable(s) thatinteract with observed or latent variables;

• (Sometimes, e.g. IRT, LCA) when selection is correlatedwith a dependent variable.

• When the model is strongly misspecified:

0.5 1.0 1.5 2.0

-2-1

01

x

y1

True curve (black line),Overall linear reg. line (green),

and reg. from unequal selection/weights

Complex sampling in latent variable models Daniel Oberski

Page 94: Complex sampling in latent variable models

Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion

When does weighting make a difference for pointestimates of latent variable models?

• (Usually) when weights represent omitted variable(s) thatinteract with observed or latent variables;

• (Sometimes, e.g. IRT, LCA) when selection is correlatedwith a dependent variable.

• When the model is strongly misspecified:

0.5 1.0 1.5 2.0

-2-1

01

x

y1

True curve (black line),Overall linear reg. line (green),

and reg. from unequal selection/weights

Complex sampling in latent variable models Daniel Oberski

Page 95: Complex sampling in latent variable models

Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion

When does weighting make a difference for pointestimates of latent variable models?

• (Usually) when weights represent omitted variable(s) thatinteract with observed or latent variables;

• (Sometimes, e.g. IRT, LCA) when selection is correlatedwith a dependent variable.

• When the model is strongly misspecified:

0.5 1.0 1.5 2.0

-2-1

01

x

y1

True curve (black line),Overall linear reg. line (green),

and reg. from unequal selection/weights

Complex sampling in latent variable models Daniel Oberski

Page 96: Complex sampling in latent variable models

Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion

When does weighting make a difference for pointestimates of latent variable models?

• (Usually) when weights represent omitted variable(s) thatinteract with observed or latent variables;

• (Sometimes, e.g. IRT, LCA) when selection is correlatedwith a dependent variable.

• When the model is strongly misspecified:

0.5 1.0 1.5 2.0

-2-1

01

x

y1

True curve (black line),Overall linear reg. line (green),

and reg. from unequal selection/weights

Complex sampling in latent variable models Daniel Oberski

Page 97: Complex sampling in latent variable models

Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion

Should you weight?

..1 Purpose of the analysis: analytical versus descriptive;

..2 Anticipated bias from an unweighted analysis;

..3 If unweighted analysis is unbiased, relative magnitude ofinefficiency resulting from a weighted analysis;

..4 Whether variables are available and known to model thesample design instead of weighting the analysis.

(Patterson et al., 2002, p. 727)

Complex sampling in latent variable models Daniel Oberski

Page 98: Complex sampling in latent variable models

Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion

Conclusions• Surveys are not usually simple random samples (or iid);• Sample design may bias the results of latent variablemodeling (confidence intervals, significance tests, fitmeasures, parameter estimates);

• Pseudo-maximum likelihood can take the design intoaccount without additional assumptions;

• Implemented in software. SEM: lavaan.survey in R• Nonparametric correction for the design;• ``Aggregate modeling'';• Payment is in variance (efficiency);• Alternative is modeling the effects of strata, clusters,covariates behind; ``disaggregate modeling''.

Complex sampling in latent variable models Daniel Oberski

Page 99: Complex sampling in latent variable models

Complex surveys Latent variable models (LVM) Estimation of LVM under complex sampling Effect on LVM Conclusion

Thank you for your attention!

Daniel [email protected]

http://daob.org/

Complex sampling in latent variable models Daniel Oberski

Page 100: Complex sampling in latent variable models

References

ReferencesFuller, W. A. (2009). Sampling statistics. Wiley, New York.Heeringa, S., West, B., and Berglund, P. (2010). Applied survey data analysis.Horvitz, D. and Thompson, D. (1952). A generalization of sampling without

replacement from a finite universe. Journal of the American StatisticalAssociation, 47(260):663--685.

Kish, L. (1965). Survey sampling. New York: Wiley.Korn, E. and Graubard, B. (1995). Examples of differing weighted and

unweighted estimates from a sample survey. The American Statistician,49(3):291--295.

Lumley, T. (2010). Complex surveys: a guide to analysis using R. Wiley.Neyman, J. (1934). On the two different aspects of the representative

method: the method of stratified sampling and the method of purposiveselection. Journal of the Royal Statistical Society, 97(4):558--625.

Patterson, B., Dayton, C., and Graubard, B. (2002). Latent class analysis ofcomplex sample survey data. Journal of the American StatisticalAssociation, 97(459):721--741.

Skinner, C., Holt, D., and Smith, T. (1989). Analysis of complex surveys. JohnWiley & Sons.

Complex sampling in latent variable models Daniel Oberski