1 Class 8 Measurement Issues in Diverse Populations Including Health Disparities Research November...
-
Upload
amice-barnett -
Category
Documents
-
view
219 -
download
2
Transcript of 1 Class 8 Measurement Issues in Diverse Populations Including Health Disparities Research November...
1
Class 8
Measurement Issues in Diverse Populations Including Health Disparities Research
November 17, 2005
Anita L. Stewart Institute for Health & Aging
University of California, San Francisco
2
Background
U.S. population becoming more diverse More minority groups are being included
in research due to:– NIH mandate
– Recent health disparities initiatives
3
Types of Diverse Groups
Health disparities research focuses on differences in health between the following groups:– Minority vs. non-minority– Low income vs. others– Low education vs. others– Limited English skills vs. others– …. and others
4
Health Disparities Research
Increasing research to:– Describe health disparities
» Differences in health across various diverse groups
– Identify determinants of health disparities» Individual level
» Environmental level
– Intervene to reduce health disparities
5
The Measurement Problem
Measurement goal - identify measures that can be used across all groups, and– are sensitive to diversity– have minimal bias between groups
Most self-reported measures were developed and tested in mainstream, well-educated groups– little research on measurement characteristics
in diverse groups
6
Issues Concerning Group Comparisons Observed mean differences in a measure can
be due to – culturally- or group-mediated differences in true
score (true differences) -- OR -- – bias - systematic differences between group
observed scores not attributable to true scores
7
Bias - A Special Concern
Measurement bias in any one group may make group comparisons invalid
Bias can be due to group differences in:– the meaning of concepts or items – the extent to which measures represent a concept – cognitive processes of responding– use of response scales– appropriateness of data collection methods
8
Effects of Bias on Depression: Chinese and White Respondents In Chinese respondents - 3 sources of bias that lower
observed score:– tendency to not express negative feelings– exacerbated by face-to-face interview– meaning of word “depression” is more severe than for
Whites – less likely to endorse it Comparing groups – assume true level of depression
is the same in both groups –– Observed scores would be lower in Chinese group– But lower level is due to these biases
9
Typical Sequence of Developing New Self-Report Measures
Develop concept
Create item pool
Pretest/revise
Field survey
Psychometric analyses
Final measures
10
Extra Steps in Sequence of Developing New Self-Report Measures for Diverse Groups
Develop concept
Create item pool
Pretest/revise
Field survey
Psychometric analyses
Final measures
Obtain perspectives of diverse groups
11
Extra Steps in Sequence of Developing New Self-Report Measures for Diverse Groups
Develop concept
Create item pool
Pretest/revise
Field survey
Psychometric analyses
Final measures
Obtain perspectives of diverse groups
.. to reflect these perspectives
12
Extra Steps in Sequence of Developing New Self-Report Measures for Diverse Groups
Develop concept
Create item pool
Pretest/revise
Field survey
Psychometric analyses
Final measures
Obtain perspectives of diverse groups
.. to reflect these perspectives
.. in all diverse groups
13
Extra Steps in Sequence of Developing New Self-Report Measures for Diverse Groups
Develop concept
Create item pool
Pretest/revise
Field survey
Psychometric analyses
Final measures
Obtain perspectives of diverse groups
.. to reflect these perspectives
.. in all diverse groups
.. in all diverse groups
14
Extra Steps in Sequence of Developing New Self-Report Measures for Diverse Groups
Develop concept
Create item pool
Pretest/revise
Field survey
Psychometric analyses
Final measures
Obtain perspectives of diverse groups
.. to reflect these perspectives
.. in all diverse groups
.. in all diverse groupsMeasurement studies across groups
15
Extra Steps in Sequence of Developing New Self-Report Measures for Diverse Groups
Develop concept
Create item pool
Pretest/revise
Field survey
Psychometric analyses
Final measures
Obtain perspectives of diverse groups
.. to reflect these perspectives
.. in all diverse groups
.. in all diverse groups
If results are non-equivalent
16
Measurement Adequacy vs. Measurement Equivalence
Making group comparisons requires conceptual and psychometric adequacy and equivalence
Adequacy - within a group– concepts are appropriate– psychometric properties meet minimal criteria
Equivalence - between groups– conceptual and psychometric properties are
comparable
17
Conceptual and Psychometric Adequacy and Equivalence
Conceptual
Psychometric
Adequacyin 1 Group
EquivalenceAcross Groups
Concept equivalentacross groups
Psychometric propertiesmeet minimal standards
within one group
Psychometric propertiesinvariant (equivalent)
across groups
Concept meaningfulwithin one group
18
Left Side of Matrix: Issues in a Single Group
Conceptual
Psychometric
Adequacyin 1 Group
EquivalenceAcross Groups
Concept equivalentacross groups
Psychometric propertiesmeet minimal standards
within one group
Psychometric propertiesinvariant (equivalent)
across groups
Concept meaningfulwithin one group
19
Ride Side of Matrix: Issues in More Than One Group
Conceptual
Psychometric
Adequacyin 1 Group
EquivalenceAcross Groups
Concept equivalentacross groups
Psychometric propertiesmeet minimal standards
within one group
Psychometric propertiesinvariant (equivalent)
across groups
Concept meaningfulwithin one group
20
Conceptual Adequacy in One Group
Conceptual
Psychometric
Adequacyin 1 Group
EquivalenceAcross Groups
Concept equivalentacross groups
Psychometric propertiesmeet minimal standards
within one group
Psychometric propertiesinvariant (equivalent)
across groups
Concept meaningfulwithin one group
21
Conceptual Adequacy in One Group Is concept relevant, meaningful, and acceptable
in that group? Traditional research
– Conceptual adequacy = simply defining a concept– Mainstream population “assumed”
Minority and cross cultural research– Mainstream concepts may be inadequate– Concept should correspond to how a particular
group thinks about it
22
(Old) Example of Inadequate Concept
Patient satisfaction typically conceptualized in mainstream populations in terms of, e.g.,– access, technical care, communication,
continuity, interpersonal style In minority and low income groups,
additional relevant domains include, e.g., – discrimination by health professionals
– sensitivity to language barriers
23
Psychometric Adequacy in One Group
Conceptual
Psychometric
Adequacyin 1 Group
EquivalenceAcross Groups
Concept equivalentacross groups
Psychometric propertiesmeet minimal standards
within one group
Psychometric propertiesinvariant (equivalent)
across groups
Concept meaningfulwithin one group
24
Psychometric Adequacy in any Group
Minimal standards:– Sufficient variability– Minimal missing data– Adequate reliability/reproducibility– Evidence of construct validity– Evidence of responsiveness to change
Basic classical test theory approach
25
Evidence of Psychometric Inadequacy of SF-36 Scale in Three Diverse Groups
SF-36 social functioning scale - internal consistency reliability < .70 in three different samples:– Chinese language, adults aged 55-96 years
– Japanese language, Japanese elders
– English, Pima Indians
Stewart AL & Nápoles-Springer A, 2000 (see readings)
26
Conceptual Equivalence Across Groups
Conceptual
Psychometric
Adequacyin 1 Group
EquivalenceAcross Groups
Concept equivalentacross groups
Psychometric propertiesmeet minimal standards
within one group
Psychometric propertiesinvariant (equivalent)
across groups
Concept meaningfulwithin one group
27
Conceptual Equivalence
Is the concept relevant, familiar, acceptable to all diverse groups being studied?
Is the concept defined the same way in all groups? – all relevant “domains” included (none missing)
– interpreted similarly Is the concept appropriate for all diverse
groups?
28
Example: Subjective Test of Conceptual Equivalence of Spanish FACT-G
Bilingual/bicultural expert panel reviewed all 28 items – One item had low cultural relevance to quality of life
– One concept was missing – spirituality
Developed new spirituality scale (FACIT-Sp) with input from cancer patients, psychotherapists, and religious experts– Sample item “I worry about dying”
Cella D et al. Med Care 1998: 36;1407
29
Examples of Nonequivalent Concept: Depression Mainstream concept expressed and reported via
affect, somatic symptoms, behavior, thought patterns
In Asian Americans– public expression of self-reflection is discouraged– saving face and self-sacrifice are powerful forces in
molding behavior and expression
30
Generic/Universal vs Group-Specific(Etic versus Emic)
Concepts unlikely to be defined exactly the same way across diverse ethnic groups
Generic/universal (etic)– features of a concept that are appropriate across
groups Group-Specific (emic)
– idiosyncratic portions of a concept
31
Etic versus Emic (cont.)
Goal in health disparities research– identify generic/universal portion of a concept (could
be entire concept) that can be applied across all groups
For within-group analyses or studies– the culture-specific portion is also relevant
32
Qualitative Approaches to Explore Conceptual Equivalence in Diverse Groups
Literature reviews– ethnographic and anthropological
In-depth interviews and focus groups – discuss concepts, obtain their views
Expert consultation from diverse groups– review concept definitions– rate relevance of items
33
Psychometric Equivalence
Conceptual
Psychometric
Adequacyin 1 Group
EquivalenceAcross Groups
Concept equivalentacross groups
Psychometric propertiesmeet minimal standards
within one group
Psychometric propertiesinvariant (equivalent)
across groups
Concept meaningfulwithin one group
34
Equivalence of Reliability?? No!
Difficult to compare reliability because it depends on the distribution of the construct in a sample– Thus lower reliability in one group may simply
reflect poorer variability More important is the adequacy of the
reliability in both groups– Reliability meets minimal criteria within each group
35
Equivalence of Criterion Validity
Determine if hypothesized patterns of associations with specified criteria are confirmed in both groups, e.g.– a measure predicts utilization in both groups
– a cutpoint on a screening measure has the same specificity and sensitivity in both groups
36
Equivalence of Construct Validity Are hypothesized patterns of associations
confirmed in both groups?– Example: Scores on the Spanish version of the
FACT had similar relationships with other health measures as scores on the English version
Primarily tested through subjectively examining pattern of correlations– Can test differences using confirmatory factor
analysis (e.g., through Structural Equation Modeling)
37
Equivalence of Factor Structure
Factor structure is similar in new group to structure in original groups in which measure was tested– In other words, the measurement model is the
same across groups Methods
– Specify the number of factors you are looking for
– Determine if the hypothesized model fits the data
38
Exploratory Factor Analysis (EFA)
Factor analysis methods that do not constrain the number of factors or the magnitude of the loadings
Identifies an underlying structure of a set of items with no particular hypotheses
Goal - identify as few explanatory variables (i.e., factors) as possible that account for covariation among the items
39
Confirmatory Factor Analysis (CFA)
Methods that specify a hypothesized structure a priori (before looking at the results)
Can test mean and covariance structures– to estimate bias
40
Equivalence of Factor Structure: Assuring Psychometric Invariance
Psychometric invariance (technical term for psychometric equivalence)
Invariance means that important properties of a theoretically-based factor structure (measurement model) do not differ or vary across groups (are invariant)– In other words, the measurement model is the same
across groups Empirical comparison of factor structure
41
Criteria for Psychometric Invariance: Non-technical Language
Across two or more groups, determine whether each criterion is true – a sequential process:
1. Same number of factors (dimensions) 2. Same items load on (correlate with) same factors 3. Each item has same factor loadings4. No bias on any item or scale across groups5. Same residuals on items 6. No item or scale bias AND same residuals
42
Dimensional Invariance: Same number of dimensions
Configural Invariance: Same items load on same dimensions
Metric Invariance, Factor Pattern Invariance:Items have same loadings on same dimensions
Strong Factorial Invariance,Scalar Invariance:
Observed scores are unbiased
Residual Invariance:Observed item and factor variances
can be compared across groups
Strict Factorial InvarianceBoth scalar invariance and residual invariance criteria are met
Criteria for Evaluating Invariance AcrossGroups: Technical Terms
43
Dimensional Invariance
Definition: Factor structure is the same, i.e., the same number of factors are observed in both groups
CES-D Example:– Four factors found in men and 3 factors in women
(n=1000), 18-92 years of age– Failed the dimensional invariance criterion
» a different number of factors was found in both groups
JM Golding et al., J Clin Psychol 1991:47;61-75
44
Example: Dimensional Invariance of CES-D in Hispanic EPESE Original 4 factors
– Somatic symptoms– Depressive affect– Interpersonal behavior– Positive affect
Hispanic EPESE - only 2 factors– Depression (included somatic symptoms,
depressive affect, and interpersonal behavior)– Well-being
Miller TQ et al., The factor structure of the CES-D in two surveys ofelderly Mexican Americans, J Gerontol: Soc Sci, 1997;520:S259-69.
45
Configural Invariance
Assumes: dimensional invariance is found– that there were the same number of factors
Definition: Item-factor patterns are the same, i.e., the same items load on the same factors in both groups
CES-D Example– 4 factors found in Anglos, Blacks, and Chicanos– Same items loaded on each factor in all groups
RE Roberts et al., Psychiatry Research, 1980;2:125-134
46
Metric Invariance or Factor Pattern Invariance
Assumes: dimensional and configural invariance are found
Definition: Item loadings are the same across groups, i.e., the correlation of each item with its factor is the same in both groups
47
Strong Factorial Invariance or Scalar Invariance
Assumes: dimensional, configural, and metric (factor pattern) invariance are found
Definition: Observed scores are unbiased, i.e., means can be compared across groups
Requires test of equivalence of mean scores across groups using confirmatory factor analysis
48
Residual Invariance
Assumes: dimensional, configural, and metric (factor pattern) invariance are found
Definition: Observed item and factor residual variances can be compared across groups, i.e., similar error associated with each item across groups
49
Strict Factorial Invariance
Assumes: dimensional, configural, metric (factor pattern), and strong factorial (scalar) invariance are found
50
Integrating Qualitative and Quantitative Methods in Assessing Cultural Equivalence
Optimal approach - use qualitative and quantitative methods in tandem to address issues of cultural equivalence
International studies do this routinely– typically develop a translated version of existing
measure for a new country and language»Swedish version of SF-36»Japanese version of Mental Health Inventory
51
Distinction Between International and U.S. Studies
International studies assume conceptual non-equivalence to begin with– Different nations, languages
Usually dealing with translated measures During translation, items can be added or
modified to improve conceptual and semantic equivalence– Product is an “adapted” instrument
52
International Versus U.S. Approach
AssessConceptual Equivalence(Qualitative)
AssessPsychometricEquivalence
(Quantitative)
53
Typical International Approach
AssessConceptual Equivalence(Qualitative)
AssessPsychometricEquivalence
(Quantitative)
Begin here(assumesconceptualdifferencesacross countries)
• If new domains or definitions are found, can revise and add items• Translated “adapted” version is the goal• May achieve conceptual equivalence before testing psychometric adequacy
54
Typical U.S. Approach in Studies of English Speaking Diverse Groups
Select existing well-tested measures (developed in mainstream) and assume they will work (universality)
Assumes perspectives of diverse group are similar to mainstream– “Cultural hegemony” (Guyatt)
– “Middle-class ethnocentrism (Rogler)
55
Typical U.S. Approach When No Translation is Done
Most studiesbegin here(assumes universality of constructs)
AssessConceptualEquivalence(Qualitative)
AssessPsychometricEquivalence
(Quantitative)
56
Typical U.S. Approach When No Translation is Done
Most studiesbegin here(assumes universality of constructs)
Ifequiv.
AssessConceptualEquivalence(Qualitative)
AssessPsychometricEquivalence
(Quantitative)
Proceed with analysis. May miss important domains
and definitions
57
Typical U.S. Approach When No Translation is Done
Most studiesbegin here(assumes universality of constructs)
If problems
Ifequiv.
AssessConceptualEquivalence(Qualitative)
AssessPsychometricEquivalence
(Quantitative)
No Guidelines!If refine items based
on qualitative studies, no longer have comparable
instrument
Proceed with analysis. May miss important domains
and definitions
58
Typical U.S. Subgroup Approach When No Translation is Done
No Guidelines!If refine items
based onqualitative
studies, no longerhave comparable
instrument
AssessConceptual Equivalence(Qualitative)
AssessPsychometricEquivalence(Quantitative
)
Most studiesbegin here(assumes universality of constructs)
If problems
Proceed with analysis. May missimportant domainsand definitions
Ifequiv.
59
What to do if Measures Are Not Equivalent in a Specific Study Comparing Groups
Need guidelines for how to handle data when substantial non-comparability is found in a study– Drop bad or “biased” items from scores
»Compare results with and without biased items
– Analyze study by stratifying diverse groups The current challenge for measurement in minority
health studies
60
Example: 20-item Spanish CES-D in Older Latinos
2 items had very low item-scale correlations, high rates of missing data in two studies– I felt hopeful about the future– I felt I was just as good as other people
20-item version Study 1 Study 2– Item-scale correlations -.20 to .73 .05 to .78– Cronbach’s alpha
18-item version – Item-scale correlations .45 to .76 .33 to .79
61
Example: Measure Can be Modified
GHAA Consumer Satisfaction Survey Adapted to be appropriate for African
American patients– Focus groups conducted to obtain perspectives of
African Americans– New domains added (e.g., discrimination/
stereotyping)– New items added to existing domains
Fongwa M et al. Characteristics of a patient satisfaction instrument tailored to concerns of African Americans. Ethnicity and Disease, in press.
62
Approaches to Conducting Studies When You Are Not Sure
Use a combination of “universal” and group- specific items– use universal items to compare across groups – use specific items (added onto universal items)
when conducting analyses within one group»To find a variable that correlates with a health
measure within one group
63
Conclusions
Measurement in health disparities research is a relatively new field– Few guidelines
Encourage first steps– Test and report adequacy and equivalence
As evidence grows, concepts and measures that work better across diverse groups will be identified
64
Homework: Optional
IF you are interested in conducting research in diverse populations,– Complete rows 40-44 in the matrix
(equivalence across diverse groups)