MEASUREMENT: RELIABILITY Lu Ann Aday, Ph.D. The University of Texas School of Public Health.

24
MEASUREMENT: RELIABILITY Lu Ann Aday, Ph.D. The University of Texas School of Public Health

Transcript of MEASUREMENT: RELIABILITY Lu Ann Aday, Ph.D. The University of Texas School of Public Health.

Page 1: MEASUREMENT: RELIABILITY Lu Ann Aday, Ph.D. The University of Texas School of Public Health.

MEASUREMENT: RELIABILITY

Lu Ann Aday, Ph.D.The University of Texas School of Public Health

Page 2: MEASUREMENT: RELIABILITY Lu Ann Aday, Ph.D. The University of Texas School of Public Health.

RELIABILITY: Definition Extent of random variation in

answers to questions as a function of when they are asked (test-retest), who asked them (inter-rater), and the fact that a given question is one of a number of questions that could have been asked to measure the concept of interest (internal consistency).

Page 3: MEASUREMENT: RELIABILITY Lu Ann Aday, Ph.D. The University of Texas School of Public Health.

RELIABILITY: TypesTest-test reliabilityInter-rater reliabilityInternal consistency reliability

Page 4: MEASUREMENT: RELIABILITY Lu Ann Aday, Ph.D. The University of Texas School of Public Health.

RELIABILITY: Computation

Requires repeated measures to estimate stability over time (test-retest) or equivalence across data gatherers (inter-rater) or across questions/ items intended to measure the same underlying concept (internal consistency).

Page 5: MEASUREMENT: RELIABILITY Lu Ann Aday, Ph.D. The University of Texas School of Public Health.

RELIABILITY: Test-retest

Definition: correlation between answers to same question by same respondent at two different points in time

Page 6: MEASUREMENT: RELIABILITY Lu Ann Aday, Ph.D. The University of Texas School of Public Health.

RELIABILITY: Test-retest

Factors affecting: Vague question wording Transient personal states, e.g., physical or mental

Situational factors, e.g., presence of other people

Page 7: MEASUREMENT: RELIABILITY Lu Ann Aday, Ph.D. The University of Texas School of Public Health.

RELIABILITY: Test-retest Computation: Compute

correlation coefficient between answers to same question by same respondent at two different points in time:Respondent Q1, Time 1 Q1, Time 21 Agree Agree2 Agree Agree3 Agree Agree44 Agree Agree DisagreeDisagree5 Agree Agree

Page 8: MEASUREMENT: RELIABILITY Lu Ann Aday, Ph.D. The University of Texas School of Public Health.

RELIABILITY: Test-retest

Correlation coefficients: Interval: Pearson r Ordinal: Spearman rho Nominal: Chi-square-based measures of association

Correlation desired: .70+

Page 9: MEASUREMENT: RELIABILITY Lu Ann Aday, Ph.D. The University of Texas School of Public Health.

RELIABILITY: Test-retest Comparisons of means:

Interval: paired t-test, repeated measures analysis of variance

Advantages: more accurately take into account

that the first and second measurements are not independent

more directly compare the actual answers at the two points in time

Page 10: MEASUREMENT: RELIABILITY Lu Ann Aday, Ph.D. The University of Texas School of Public Health.

RELIABILITY: Inter-rater Definition: correlation between answers to same question by same respondent obtained by different data gatherers at (approximately) the same point in time

Page 11: MEASUREMENT: RELIABILITY Lu Ann Aday, Ph.D. The University of Texas School of Public Health.

RELIABILITY: Inter-rater

Factors affecting: Lack of adequate interviewer training

Lack of standardization of data collection protocols and procedures

Page 12: MEASUREMENT: RELIABILITY Lu Ann Aday, Ph.D. The University of Texas School of Public Health.

RELIABILITY: Inter-rater Computation: Compute correlation

coefficient between answers to same question by same respondent obtained by different data gatherers:Respondent Q1, Int. A Q1, Int. B1 BP=140/90 BP=140/90 2 BP=150/80 BP=150/80 3 BP=145/95 BP=145/95 44 BP=145/95BP=145/95 BP=120/80BP=120/805 BP=140/90 BP=140/90

Page 13: MEASUREMENT: RELIABILITY Lu Ann Aday, Ph.D. The University of Texas School of Public Health.

RELIABILITY: Inter-rater Correlation coefficients:

(correlation coefficients for 3+ data gatherers noted in parentheses):

Interval: Pearson r (eta)

Ordinal: Spearman rho (chi-square)

Nominal: Kappa (chi-square)

Correlation desired: .80+

Page 14: MEASUREMENT: RELIABILITY Lu Ann Aday, Ph.D. The University of Texas School of Public Health.

RELIABILITY: Internal Consistency Definition: correlation between answers by same respondent to different questions about the same underlying concept (usually summarized in scales)

Page 15: MEASUREMENT: RELIABILITY Lu Ann Aday, Ph.D. The University of Texas School of Public Health.

RELIABILITY: Internal Consistency Factors affecting:

Number of different questions asked to capture the underlying concept

Level of association (correlation) between answers the same respondents give to different questions about the concept

Page 16: MEASUREMENT: RELIABILITY Lu Ann Aday, Ph.D. The University of Texas School of Public Health.

RELIABILITY: Internal Consistency

Computation: Compute internal consistency (underlying correlation) coefficients between answers by same respondent to different questions about the same concept:Respondent Q1 Q2 Q3

1 Agree DisagreeDisagree Agree2 Agree DisagreeDisagree Agree3 Agree DisagreeDisagree Agree44 Agree Agree AgreeAgree AgreeAgree5 Agree DisagreeDisagree Agree

Page 17: MEASUREMENT: RELIABILITY Lu Ann Aday, Ph.D. The University of Texas School of Public Health.

RELIABILITY:Internal Consistency

Internal consistency coefficients Corrected item-total correlation* Split-half reliability coefficient Cronbach alpha coefficient

Coefficient desired: .70+ (group) .90+ (individual) .40+ (corrected item-total)*

Page 18: MEASUREMENT: RELIABILITY Lu Ann Aday, Ph.D. The University of Texas School of Public Health.

RELIABILITY:Internal Consistency

Computation: Corrected item-total correlation Add up the scores for answers to different

questions about the same concept to create a total score

Subtract the score for answer to a given question from the total score to create item-specific “corrected” total scores

Compute Pearson correlation coefficients between score for each of the items and corresponding “corrected” total score

Page 19: MEASUREMENT: RELIABILITY Lu Ann Aday, Ph.D. The University of Texas School of Public Health.

RELIABILITY:Internal Consistency

Computation: Split-half reliability coefficient Randomly divide a series of questions

about the same concept into halves and add up the scores for answers to the questions in the respective halves

Compute Spearman-Brown prophecy coefficient for correlation between the scores for each half, adjusting for the fact that the respective scores are based on only half the original number of items

Page 20: MEASUREMENT: RELIABILITY Lu Ann Aday, Ph.D. The University of Texas School of Public Health.

RELIABILITY:Spearman-Brown prophecy adjustments

Original alpha/ Scale length -/+

-.75 -.67 -.50 2x 3x 4x

.50 .20 .25 .33 .67 .75 .80

.60 .25 .33 .43 .75 .82 .86

.70 .37 .44 .54 .82 .88 .90

.80 .50 .57 .67 .89 .92 .94

.90 .69 .75 .82 .95 .96 .97

Page 21: MEASUREMENT: RELIABILITY Lu Ann Aday, Ph.D. The University of Texas School of Public Health.

RELIABILITY:Spearman-Brown prophecy formula

Computation:k * ro /1 + [(k-1) * ro] where,

k = factor by which scale is increased or decreased

ro= alpha based on original length

Example:2 * .70/1 + [(2-1) * .70] = .82

Page 22: MEASUREMENT: RELIABILITY Lu Ann Aday, Ph.D. The University of Texas School of Public Health.

RELIABILITY:Cronbach alpha coefficient Computation:

k * ra /1 + [(k-1) * ra] where,

k = number of items in the scale ra= average Pearson r between

items

Example:10 * .32/1 + [(10-1) * .32] = .82

Page 23: MEASUREMENT: RELIABILITY Lu Ann Aday, Ph.D. The University of Texas School of Public Health.

WHEN TO UNDERTAKE RELIABILITY ANALYSIS

RELIABILITY/DIMENSIONS

TEST-RETEST

INTER-RATER INTERNALCONSISTENCY

QUESTIONS Concerned about stability of wording

Concerned about equivalence of data gatherers

Constructing summary scales of attitudes or other abstract concepts

STUDIES Esp. important in longitudinal or experimental designs

Monitored, but not usually measured directly in surveys

Esp. used in attitudinal surveys

STAGES Pilot test or pretest

Pretest plus monitor in final study

Pretest or final study

Page 24: MEASUREMENT: RELIABILITY Lu Ann Aday, Ph.D. The University of Texas School of Public Health.

REFERENCES DeVellis, Robert F. (2003). Scale

Development: Theory and Applications. Second Edition. Thousand Oaks, CA: Sage.

Ware, J.E., Jr., & Gandek, B., for the IQOLA Project (1998). Methods for testing data quality, scaling assumptions, and reliability: The IQOLA Project Approach. J. Clinical Epidemiology, 51 (11), 945-952.