1 Class 8 Measurement Issues in Diverse Populations Including Health Disparities Research November...

1

Class 8

Measurement Issues in Diverse Populations Including Health Disparities Research

November 17, 2005

Anita L. Stewart Institute for Health & Aging

University of California, San Francisco

2

Background

U.S. population becoming more diverse More minority groups are being included

in research due to:– NIH mandate

– Recent health disparities initiatives

3

Types of Diverse Groups

Health disparities research focuses on differences in health between the following groups:– Minority vs. non-minority– Low income vs. others– Low education vs. others– Limited English skills vs. others– …. and others

4

Health Disparities Research

Increasing research to:– Describe health disparities

» Differences in health across various diverse groups

– Identify determinants of health disparities» Individual level

» Environmental level

– Intervene to reduce health disparities

5

The Measurement Problem

Measurement goal - identify measures that can be used across all groups, and– are sensitive to diversity– have minimal bias between groups

Most self-reported measures were developed and tested in mainstream, well-educated groups– little research on measurement characteristics

in diverse groups

6

Issues Concerning Group Comparisons Observed mean differences in a measure can

be due to – culturally- or group-mediated differences in true

score (true differences) -- OR -- – bias - systematic differences between group

observed scores not attributable to true scores

7

Bias - A Special Concern

Measurement bias in any one group may make group comparisons invalid

Bias can be due to group differences in:– the meaning of concepts or items – the extent to which measures represent a concept – cognitive processes of responding– use of response scales– appropriateness of data collection methods

8

Effects of Bias on Depression: Chinese and White Respondents In Chinese respondents - 3 sources of bias that lower

observed score:– tendency to not express negative feelings– exacerbated by face-to-face interview– meaning of word “depression” is more severe than for

Whites – less likely to endorse it Comparing groups – assume true level of depression

is the same in both groups –– Observed scores would be lower in Chinese group– But lower level is due to these biases

9

Typical Sequence of Developing New Self-Report Measures

Develop concept

Create item pool

Pretest/revise

Field survey

Psychometric analyses

Final measures

10

Extra Steps in Sequence of Developing New Self-Report Measures for Diverse Groups

Develop concept

Create item pool

Pretest/revise

Field survey


Final measures

Obtain perspectives of diverse groups

11


Develop concept

Create item pool

Pretest/revise

Field survey


Final measures


.. to reflect these perspectives

12


Develop concept

Create item pool

Pretest/revise

Field survey


Final measures



.. in all diverse groups

13


Develop concept

Create item pool

Pretest/revise

Field survey


Final measures





14


Develop concept

Create item pool

Pretest/revise

Field survey


Final measures




.. in all diverse groupsMeasurement studies across groups

15


Develop concept

Create item pool

Pretest/revise

Field survey


Final measures





If results are non-equivalent

16

Measurement Adequacy vs. Measurement Equivalence

Making group comparisons requires conceptual and psychometric adequacy and equivalence

Adequacy - within a group– concepts are appropriate– psychometric properties meet minimal criteria

Equivalence - between groups– conceptual and psychometric properties are

comparable

17

Conceptual and Psychometric Adequacy and Equivalence

Conceptual

Psychometric

Adequacyin 1 Group

EquivalenceAcross Groups

Concept equivalentacross groups

Psychometric propertiesmeet minimal standards

within one group

Psychometric propertiesinvariant (equivalent)

across groups

Concept meaningfulwithin one group

18

Left Side of Matrix: Issues in a Single Group

Conceptual

Psychometric

Adequacyin 1 Group




within one group


across groups


19

Ride Side of Matrix: Issues in More Than One Group

Conceptual

Psychometric

Adequacyin 1 Group




within one group


across groups


20

Conceptual Adequacy in One Group

Conceptual

Psychometric

Adequacyin 1 Group




within one group


across groups


21

Conceptual Adequacy in One Group Is concept relevant, meaningful, and acceptable

in that group? Traditional research

– Conceptual adequacy = simply defining a concept– Mainstream population “assumed”

Minority and cross cultural research– Mainstream concepts may be inadequate– Concept should correspond to how a particular

group thinks about it

22

(Old) Example of Inadequate Concept

Patient satisfaction typically conceptualized in mainstream populations in terms of, e.g.,– access, technical care, communication,

continuity, interpersonal style In minority and low income groups,

additional relevant domains include, e.g., – discrimination by health professionals

– sensitivity to language barriers

23

Psychometric Adequacy in One Group

Conceptual

Psychometric

Adequacyin 1 Group




within one group


across groups


24

Psychometric Adequacy in any Group

Minimal standards:– Sufficient variability– Minimal missing data– Adequate reliability/reproducibility– Evidence of construct validity– Evidence of responsiveness to change

Basic classical test theory approach

25

Evidence of Psychometric Inadequacy of SF-36 Scale in Three Diverse Groups

SF-36 social functioning scale - internal consistency reliability < .70 in three different samples:– Chinese language, adults aged 55-96 years

– Japanese language, Japanese elders

– English, Pima Indians

Stewart AL & Nápoles-Springer A, 2000 (see readings)

26

Conceptual Equivalence Across Groups

Conceptual

Psychometric

Adequacyin 1 Group




within one group


across groups


27

Conceptual Equivalence

Is the concept relevant, familiar, acceptable to all diverse groups being studied?

Is the concept defined the same way in all groups? – all relevant “domains” included (none missing)

– interpreted similarly Is the concept appropriate for all diverse

groups?

28

Example: Subjective Test of Conceptual Equivalence of Spanish FACT-G

Bilingual/bicultural expert panel reviewed all 28 items – One item had low cultural relevance to quality of life

– One concept was missing – spirituality

Developed new spirituality scale (FACIT-Sp) with input from cancer patients, psychotherapists, and religious experts– Sample item “I worry about dying”

Cella D et al. Med Care 1998: 36;1407

29

Examples of Nonequivalent Concept: Depression Mainstream concept expressed and reported via

affect, somatic symptoms, behavior, thought patterns

In Asian Americans– public expression of self-reflection is discouraged– saving face and self-sacrifice are powerful forces in

molding behavior and expression

30

Generic/Universal vs Group-Specific(Etic versus Emic)

Concepts unlikely to be defined exactly the same way across diverse ethnic groups

Generic/universal (etic)– features of a concept that are appropriate across

groups Group-Specific (emic)

– idiosyncratic portions of a concept

31

Etic versus Emic (cont.)

Goal in health disparities research– identify generic/universal portion of a concept (could

be entire concept) that can be applied across all groups

For within-group analyses or studies– the culture-specific portion is also relevant

32

Qualitative Approaches to Explore Conceptual Equivalence in Diverse Groups

Literature reviews– ethnographic and anthropological

In-depth interviews and focus groups – discuss concepts, obtain their views

Expert consultation from diverse groups– review concept definitions– rate relevance of items

33

Psychometric Equivalence

Conceptual

Psychometric

Adequacyin 1 Group




within one group


across groups


34

Equivalence of Reliability?? No!

Difficult to compare reliability because it depends on the distribution of the construct in a sample– Thus lower reliability in one group may simply

reflect poorer variability More important is the adequacy of the

reliability in both groups– Reliability meets minimal criteria within each group

35

Equivalence of Criterion Validity

Determine if hypothesized patterns of associations with specified criteria are confirmed in both groups, e.g.– a measure predicts utilization in both groups

– a cutpoint on a screening measure has the same specificity and sensitivity in both groups

36

Equivalence of Construct Validity Are hypothesized patterns of associations

confirmed in both groups?– Example: Scores on the Spanish version of the

FACT had similar relationships with other health measures as scores on the English version

Primarily tested through subjectively examining pattern of correlations– Can test differences using confirmatory factor

analysis (e.g., through Structural Equation Modeling)

37

Equivalence of Factor Structure

Factor structure is similar in new group to structure in original groups in which measure was tested– In other words, the measurement model is the

same across groups Methods

– Specify the number of factors you are looking for

– Determine if the hypothesized model fits the data

38

Exploratory Factor Analysis (EFA)

Factor analysis methods that do not constrain the number of factors or the magnitude of the loadings

Identifies an underlying structure of a set of items with no particular hypotheses

Goal - identify as few explanatory variables (i.e., factors) as possible that account for covariation among the items

39

Confirmatory Factor Analysis (CFA)

Methods that specify a hypothesized structure a priori (before looking at the results)

Can test mean and covariance structures– to estimate bias

40

Equivalence of Factor Structure: Assuring Psychometric Invariance

Psychometric invariance (technical term for psychometric equivalence)

Invariance means that important properties of a theoretically-based factor structure (measurement model) do not differ or vary across groups (are invariant)– In other words, the measurement model is the same

across groups Empirical comparison of factor structure

41

Criteria for Psychometric Invariance: Non-technical Language

Across two or more groups, determine whether each criterion is true – a sequential process:

1. Same number of factors (dimensions) 2. Same items load on (correlate with) same factors 3. Each item has same factor loadings4. No bias on any item or scale across groups5. Same residuals on items 6. No item or scale bias AND same residuals

42

Dimensional Invariance: Same number of dimensions

Configural Invariance: Same items load on same dimensions

Metric Invariance, Factor Pattern Invariance:Items have same loadings on same dimensions

Strong Factorial Invariance,Scalar Invariance:

Observed scores are unbiased

Residual Invariance:Observed item and factor variances

can be compared across groups

Strict Factorial InvarianceBoth scalar invariance and residual invariance criteria are met

Criteria for Evaluating Invariance AcrossGroups: Technical Terms

43

Dimensional Invariance

Definition: Factor structure is the same, i.e., the same number of factors are observed in both groups

CES-D Example:– Four factors found in men and 3 factors in women

(n=1000), 18-92 years of age– Failed the dimensional invariance criterion

» a different number of factors was found in both groups

JM Golding et al., J Clin Psychol 1991:47;61-75

44

Example: Dimensional Invariance of CES-D in Hispanic EPESE Original 4 factors

– Somatic symptoms– Depressive affect– Interpersonal behavior– Positive affect

Hispanic EPESE - only 2 factors– Depression (included somatic symptoms,

depressive affect, and interpersonal behavior)– Well-being

Miller TQ et al., The factor structure of the CES-D in two surveys ofelderly Mexican Americans, J Gerontol: Soc Sci, 1997;520:S259-69.

45

Configural Invariance

Assumes: dimensional invariance is found– that there were the same number of factors

Definition: Item-factor patterns are the same, i.e., the same items load on the same factors in both groups

CES-D Example– 4 factors found in Anglos, Blacks, and Chicanos– Same items loaded on each factor in all groups

RE Roberts et al., Psychiatry Research, 1980;2:125-134

46

Metric Invariance or Factor Pattern Invariance

Assumes: dimensional and configural invariance are found

Definition: Item loadings are the same across groups, i.e., the correlation of each item with its factor is the same in both groups

47

Strong Factorial Invariance or Scalar Invariance

Assumes: dimensional, configural, and metric (factor pattern) invariance are found

Definition: Observed scores are unbiased, i.e., means can be compared across groups

Requires test of equivalence of mean scores across groups using confirmatory factor analysis

48

Residual Invariance

Assumes: dimensional, configural, and metric (factor pattern) invariance are found

Definition: Observed item and factor residual variances can be compared across groups, i.e., similar error associated with each item across groups

49

Strict Factorial Invariance

Assumes: dimensional, configural, metric (factor pattern), and strong factorial (scalar) invariance are found

50

Integrating Qualitative and Quantitative Methods in Assessing Cultural Equivalence

Optimal approach - use qualitative and quantitative methods in tandem to address issues of cultural equivalence

International studies do this routinely– typically develop a translated version of existing

measure for a new country and language»Swedish version of SF-36»Japanese version of Mental Health Inventory

51

Distinction Between International and U.S. Studies

International studies assume conceptual non-equivalence to begin with– Different nations, languages

Usually dealing with translated measures During translation, items can be added or

modified to improve conceptual and semantic equivalence– Product is an “adapted” instrument

52

International Versus U.S. Approach

AssessConceptual Equivalence(Qualitative)

AssessPsychometricEquivalence

(Quantitative)

53

Typical International Approach



(Quantitative)

Begin here(assumesconceptualdifferencesacross countries)

• If new domains or definitions are found, can revise and add items• Translated “adapted” version is the goal• May achieve conceptual equivalence before testing psychometric adequacy

54

Typical U.S. Approach in Studies of English Speaking Diverse Groups

Select existing well-tested measures (developed in mainstream) and assume they will work (universality)

Assumes perspectives of diverse group are similar to mainstream– “Cultural hegemony” (Guyatt)

– “Middle-class ethnocentrism (Rogler)

55

Typical U.S. Approach When No Translation is Done

Most studiesbegin here(assumes universality of constructs)

AssessConceptualEquivalence(Qualitative)


(Quantitative)

56



Ifequiv.



(Quantitative)

Proceed with analysis. May miss important domains

and definitions

57



If problems

Ifequiv.



(Quantitative)

No Guidelines!If refine items based

on qualitative studies, no longer have comparable

instrument

Proceed with analysis. May miss important domains

and definitions

58

Typical U.S. Subgroup Approach When No Translation is Done

No Guidelines!If refine items

based onqualitative

studies, no longerhave comparable

instrument


AssessPsychometricEquivalence(Quantitative

)


If problems

Proceed with analysis. May missimportant domainsand definitions

Ifequiv.

59

What to do if Measures Are Not Equivalent in a Specific Study Comparing Groups

Need guidelines for how to handle data when substantial non-comparability is found in a study– Drop bad or “biased” items from scores

»Compare results with and without biased items

– Analyze study by stratifying diverse groups The current challenge for measurement in minority

health studies

60

Example: 20-item Spanish CES-D in Older Latinos

2 items had very low item-scale correlations, high rates of missing data in two studies– I felt hopeful about the future– I felt I was just as good as other people

20-item version Study 1 Study 2– Item-scale correlations -.20 to .73 .05 to .78– Cronbach’s alpha

18-item version – Item-scale correlations .45 to .76 .33 to .79

61

Example: Measure Can be Modified

GHAA Consumer Satisfaction Survey Adapted to be appropriate for African

American patients– Focus groups conducted to obtain perspectives of

African Americans– New domains added (e.g., discrimination/

stereotyping)– New items added to existing domains

Fongwa M et al. Characteristics of a patient satisfaction instrument tailored to concerns of African Americans. Ethnicity and Disease, in press.

62

Approaches to Conducting Studies When You Are Not Sure

Use a combination of “universal” and group- specific items– use universal items to compare across groups – use specific items (added onto universal items)

when conducting analyses within one group»To find a variable that correlates with a health

measure within one group

63

Conclusions

Measurement in health disparities research is a relatively new field– Few guidelines

Encourage first steps– Test and report adequacy and equivalence

As evidence grows, concepts and measures that work better across diverse groups will be identified

64

Homework: Optional

IF you are interested in conducting research in diverse populations,– Complete rows 40-44 in the matrix

(equivalence across diverse groups)

1 Class 8 Measurement Issues in Diverse Populations Including Health Disparities Research November...

Documents

Transcript of 1 Class 8 Measurement Issues in Diverse Populations Including Health Disparities Research November...