Reliability Chapter 3. Every observed score is a combination of true score and error Obs. = T + E ...

Post on 31-Dec-2015

219 views 1 download

Tags:

Transcript of Reliability Chapter 3. Every observed score is a combination of true score and error Obs. = T + E ...

ReliabilityChapter 3

Every observed score is a combination of true score and error

Obs. = T + E

Reliability =

Classical Test Theory

ss

ss

O

T

O

E2

2

2

2

1

Systematic versus unsystematic error

Reliability only takes unsystematic error into account

Reliability

Reliability & Correlation

Reliability often based on consistency between two sets of scores

Correlation: Statistical technique used to examine consistency

Positive Correlation

Negative Correlation

Correlation coefficient: a numerical indicator of the relationship between two sets of data

Pearson-Product Moment correlation coefficient is most common

Pearson-Product MomentCorrelation Coefficient

r

1z 2zN

The percentage of shared variance between two sets of data

Coefficient of Determination

Test-Retest

Alternate/Parallel Forms

Internal Consistency Measures

Types of Reliability

Correlating performance on first administration with performance on the second

Co-efficient of stability

Test-Retest

Two forms of instrument, administered to same individuals

Alternate/Parallel Forms

Split-half reliability Spearman-Brown formula

Kuder-Richardson formulas KR 20 KR 21

Coefficient Alpha

Internal Consistency Measures

Typical methods for determining reliability may not be suitable for:

Speed tests

Criterion-referenced tests

Subjectively-scored instruments Interrater reliability

Nontypical Situations

Examine purpose for using instrument

Be knowledgeable about reliability coefficients of other instruments in that area

Examine characteristics of particular clients against reliability coefficients

Coefficients may vary based on SES, age, culture/ethnicity, etc.

Evaluating Reliability Coefficients

rsSEM 1

Standard Error of Measurement

Provides estimate of range of scores if someone were to take instrument repeatedly

Based on premise that when individuals take a test multiple times, scores fall into normal distribution

Sam’s SAT Verbal = 550 r = .91; s = 100

SEM

68% of the time, Sam’s true score would fall between 520 and 580

95% of the time, Sam’s true score would fall between 490 and 610 99.5% of the time, Sam’s true score would fall between 460 and

640

SEM: Example

30

3.100

09.100

91.1100

Determining Range of Scores Using SEM

Method to determine if difference between two scores is significant

Takes into account SEM of both scores

Standard Error of Difference

Generalizability or Domain Sampling Theory

Focus is on estimating the extent to which specific sources of variation under defined conditions are contributing to the score on the instrument

Alternative Theoretical Model