1 Test Worthiness Chapter 3. 2 Test Worthiness Four cornerstones to test worthiness: Validity...

43
1 Test Worthiness Chapter 3

Transcript of 1 Test Worthiness Chapter 3. 2 Test Worthiness Four cornerstones to test worthiness: Validity...

Page 1: 1 Test Worthiness Chapter 3. 2 Test Worthiness Four cornerstones to test worthiness: Validity Reliability Practicality Cross-cultural Fairness But first,

1

Test Worthiness

Chapter 3

Page 2: 1 Test Worthiness Chapter 3. 2 Test Worthiness Four cornerstones to test worthiness: Validity Reliability Practicality Cross-cultural Fairness But first,

2

Test Worthiness

Four cornerstones to test worthiness: Validity Reliability Practicality Cross-cultural Fairness

But first, we must learn one statistical concept: Correlation Coefficient

Page 3: 1 Test Worthiness Chapter 3. 2 Test Worthiness Four cornerstones to test worthiness: Validity Reliability Practicality Cross-cultural Fairness But first,

3

Correlation Coefficient

Correlation – Statistical expression of the relationship between two sets

of scores (or variables)

Positive correlation Increase in one variable accompanied by increase in other “Direct relationship” - They move in the same direction

Negative correlation Increase in one variable accompanied by decrease in

other “Inverse relationship” – Variables move in opposite

directions

Page 4: 1 Test Worthiness Chapter 3. 2 Test Worthiness Four cornerstones to test worthiness: Validity Reliability Practicality Cross-cultural Fairness But first,

4

Examples of Correlation Relationships

What is the relationship between: Gasoline prices and grocery prices? Grocery prices and good weather? Stress and depression? Depression and job productivity? Partying and grades? Study time and grades?

Page 5: 1 Test Worthiness Chapter 3. 2 Test Worthiness Four cornerstones to test worthiness: Validity Reliability Practicality Cross-cultural Fairness But first,

5

Correlation (Cont’d)

Correlation coefficient (r ) A number between -1 and +1 that

indicates direction and strength of the relationshipAs “r” approaches +1, strength increases

in a direct relationship (positive)As “r” approaches -1, strength increases

in an inverse relationship (negative)As “r” approaches 0, the relationship is

weak or non existent (at zero)

Page 6: 1 Test Worthiness Chapter 3. 2 Test Worthiness Four cornerstones to test worthiness: Validity Reliability Practicality Cross-cultural Fairness But first,

6

Correlation (cont)

Correlation coefficient “r”

0 to +.3 = weak+.4 to +.6 = medium +.7 to +1.0 = strong

0 to -.3 = weak -.4 to -.6 = medium -.7 to -1.0 = strong

-1 0 +1

DirectInverseWeak

Strong

Strong

Page 7: 1 Test Worthiness Chapter 3. 2 Test Worthiness Four cornerstones to test worthiness: Validity Reliability Practicality Cross-cultural Fairness But first,

7

Correlation ExamplesSAT score Coll. GPA

930 3.0

750 2.9

1110 3.8

625 2.1

885 3.3

950 2.6

605 2.8

810 3.2

1045 3.0

910 3.5

r = .35

Missed Classes Coll. GPA

3 3.0

5 2.9

2 3.8

8 2.1

1 3.3

6 2.6

3 2.8

1 3.2

3 3.0

0 3.5

r = -.67

Page 8: 1 Test Worthiness Chapter 3. 2 Test Worthiness Four cornerstones to test worthiness: Validity Reliability Practicality Cross-cultural Fairness But first,

8

Correlation Scatterplots

Plotting two sets of scores from the previous examples on a graph? Place person A’s SAT score on the x-

axis, and his/her GPA on the y-axis Continue this for person B,C, D etc.

This process forms a “Scatterplot.”

Page 9: 1 Test Worthiness Chapter 3. 2 Test Worthiness Four cornerstones to test worthiness: Validity Reliability Practicality Cross-cultural Fairness But first,

9

Examples of Scatterplots

SAT & GPA Correlation

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

0 200 400 600 800 1000 1200

SAT

GP

A

Missed Classes & GPA

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

0 1 2 3 4 5 6 7 8 9

Missed classes

GP

A

Page 10: 1 Test Worthiness Chapter 3. 2 Test Worthiness Four cornerstones to test worthiness: Validity Reliability Practicality Cross-cultural Fairness But first,

10

Scatterplots (cont)

What correlation (r ) do you think this graph has?

How about this correlation?

Page 11: 1 Test Worthiness Chapter 3. 2 Test Worthiness Four cornerstones to test worthiness: Validity Reliability Practicality Cross-cultural Fairness But first,

11

More Scatterplots

What might this correlation be?

This correlation?

Page 12: 1 Test Worthiness Chapter 3. 2 Test Worthiness Four cornerstones to test worthiness: Validity Reliability Practicality Cross-cultural Fairness But first,

12

More Scatterplots

This correlation? Last one

Page 13: 1 Test Worthiness Chapter 3. 2 Test Worthiness Four cornerstones to test worthiness: Validity Reliability Practicality Cross-cultural Fairness But first,

13

Coefficient of Determination (Shared Variance)

The square of the correlation (r = .80, r2 = .64)

A statement about factors that underlie the variables that account for their relationship.Correlation between depression and anxiety = .85. Shared variance = .72. What factors might underlie both depression and

anxiety?

Depression Anxiety

Shared Trait Variance

Page 14: 1 Test Worthiness Chapter 3. 2 Test Worthiness Four cornerstones to test worthiness: Validity Reliability Practicality Cross-cultural Fairness But first,

14

Validity

What is validity? The degree to which all accumulated

evidence supports the intended interpretation of test scores for the intended purpose

Lay Def’n: Does a test measure what it is supposed to measure?

It is a unitary concept; however, there are 3 general types of validity evidence

1.Content Validity2.Criterion-Related Validity3.Construct Validity

Page 15: 1 Test Worthiness Chapter 3. 2 Test Worthiness Four cornerstones to test worthiness: Validity Reliability Practicality Cross-cultural Fairness But first,

15

Content Validity

Is the content of the test valid for the kind of test it is? Developers must show evidence that the

domain was systematically analyzed and concepts are covered in correct proportion

Four-step process:Step 1 - Survey the domain Step 2 - Content of the test matches the

above domainStep 3 - Specific test items match the contentStep 4 - Analyze relative importance of each

objective (weight)

Page 16: 1 Test Worthiness Chapter 3. 2 Test Worthiness Four cornerstones to test worthiness: Validity Reliability Practicality Cross-cultural Fairness But first,

16

Content Validity (cont)Survey of Domain

Step 1: Survey the Domain

Step 2: Content MatchesDomain

Step 3: Test items reflect content

Step 4: Adjusted for relative importance

Content Matches Domain

Item 1x 3

Item 2x2

Item 3x1

Item 4x2

Item 5x

2.5

Page 17: 1 Test Worthiness Chapter 3. 2 Test Worthiness Four cornerstones to test worthiness: Validity Reliability Practicality Cross-cultural Fairness But first,

17

Content Validity (cont)

Face Validity Not a real type of content validity A quick look at “face” value of questions Sometimes, questions may not “seem” to

measure the content, but do (e.g., panic disorder example in book (p. 48)

How might you show content validity for an instrument that measures depression?

Page 18: 1 Test Worthiness Chapter 3. 2 Test Worthiness Four cornerstones to test worthiness: Validity Reliability Practicality Cross-cultural Fairness But first,

18

Criterion-Related Validity: Concurrent and Predictive

Validity

Criterion-Related Validity The relationship between the test and a

criterion the test should be related to

Two types: Concurrent Validity – Does the

instrument relate to another criterion “now” (in the present)?

Predictive Validity – Does the instrument relate to another criterion in the future?

Page 19: 1 Test Worthiness Chapter 3. 2 Test Worthiness Four cornerstones to test worthiness: Validity Reliability Practicality Cross-cultural Fairness But first,

19

Criterion-Related Validity: Concurrent Validity

Example 1 100 clients take the BDI Correlate their scores with clinicians’

ratings of depression of the same group of clients.

Example 2 500 people take test of alcoholism

tendency Correlate their scores with how significant

others rate the amount of alcohol they drink.

Page 20: 1 Test Worthiness Chapter 3. 2 Test Worthiness Four cornerstones to test worthiness: Validity Reliability Practicality Cross-cultural Fairness But first,

20

Criterion-Related Validity: Predictive Validity

Examples: SAT scores correlated with how well

students do in college. ASVAB scores correlated with success

at jobs. GREs correlated with success in

graduate school. (See Table 3.1, p. 50) Do Exercise 3.2, p. 49

Page 21: 1 Test Worthiness Chapter 3. 2 Test Worthiness Four cornerstones to test worthiness: Validity Reliability Practicality Cross-cultural Fairness But first,

21

Construct Validity

Construct Validity Extent to which the instrument measures a

theoretical or hypothetical trait Many counseling and psychological

constructs are complex, ambiguous and not easily agreed upon: Intelligence Self-esteem Empathy Other personality characteristics

Page 22: 1 Test Worthiness Chapter 3. 2 Test Worthiness Four cornerstones to test worthiness: Validity Reliability Practicality Cross-cultural Fairness But first,

22

Construct Validity (cont)Four methods of gathering evidence for construct validity:

1. experimental design2. factor analysis3. convergence with other instruments4. discrimination with other measures.

Page 23: 1 Test Worthiness Chapter 3. 2 Test Worthiness Four cornerstones to test worthiness: Validity Reliability Practicality Cross-cultural Fairness But first,

23

Construct Validity: Experimental Design

Creating hypothesis and research studies that show the instrument captures the correct concept Example:

Hypothesis: The “Blank” depression test will discriminate between clinically depressed clients and “normals.”

Method: Identify 100 clinically depressed clients Identify 100 “normals” Show statistical analysis

Second Example: Clinicians measure their depressed clients before, then after, 6 months of treatment

Page 24: 1 Test Worthiness Chapter 3. 2 Test Worthiness Four cornerstones to test worthiness: Validity Reliability Practicality Cross-cultural Fairness But first,

24

Construct Validity: Factor Analysis

Factor analysis: Statistical relationship between subscales of test How similar or different are the sub-scales? Example:

Develop a depression test that has three subscales: self-esteem, suicidal ideation, hopelessness.

Correlate subscales correlate: Self-esteem and suicidal ideation: .35 Self-esteem and hopelessness: .25 Hopelessness and suicidal ideation: .82

What implications might the above scores have for this test?

Page 25: 1 Test Worthiness Chapter 3. 2 Test Worthiness Four cornerstones to test worthiness: Validity Reliability Practicality Cross-cultural Fairness But first,

25

Construct Validity: Convergent Validity

Convergent Evidence – Comparing test scores to other, well-

established tests Example:

Correlate new depression test against the BDI Is there a good correlation between the two? Implications if correlation is extremely high?

(e.g., .96) Implications if correlation is extremely low?

(e.g., .21)

Page 26: 1 Test Worthiness Chapter 3. 2 Test Worthiness Four cornerstones to test worthiness: Validity Reliability Practicality Cross-cultural Fairness But first,

26

Construct Validity: Discriminant Validity

Discriminant Evidence – Correlate test scores with other tests that

are different Hope to find a meager correlation Example:

Compare new depression test with an anxiety test.

Implications if correlation is extremely high? (e.g., .96)

Implications if correlation is extremely low? (e.g., .21)

Page 27: 1 Test Worthiness Chapter 3. 2 Test Worthiness Four cornerstones to test worthiness: Validity Reliability Practicality Cross-cultural Fairness But first,

27

Validity Recap

Three types of validity Content Criterion

Concurrent Predictive

Construct validity Experimental Factor Analysis Convergent Discriminant

Page 28: 1 Test Worthiness Chapter 3. 2 Test Worthiness Four cornerstones to test worthiness: Validity Reliability Practicality Cross-cultural Fairness But first,

28

Reliability

Accuracy or consistency of test scores.Would you score the same if you took the test over, and over, and over again?Reported as a reliability(correlation) coeffiecient. The closer to r = 1.0, the less error in the test.

Page 29: 1 Test Worthiness Chapter 3. 2 Test Worthiness Four cornerstones to test worthiness: Validity Reliability Practicality Cross-cultural Fairness But first,

29

Three Ways of Determining Reliability

1.Test-Retest2.Alternate, Parallel, or Equivalent

Forms3.Internal Consistency

a. Coefficient Alphab. Kuder-Richardsonc. Split-half or Odd Even

Page 30: 1 Test Worthiness Chapter 3. 2 Test Worthiness Four cornerstones to test worthiness: Validity Reliability Practicality Cross-cultural Fairness But first,

30

Test-Retest ReliabilityGive the test twice to same group of people.

E.g. Take the first test in this class, and very soon after, take it again. Are the scores about the same?

person 1 person 2 person 3 person 4 person 5 others….

1st test: 35 42 43 34 38 2nd test: 36 44 41 34 37

Graphic:

Problem: Person can look up answers between 1st and second testing

TimeA A

Page 31: 1 Test Worthiness Chapter 3. 2 Test Worthiness Four cornerstones to test worthiness: Validity Reliability Practicality Cross-cultural Fairness But first,

31

Alternate, Parallel, or Equivalent Forms

Reliability

Have two forms of same test Give students two forms the same timeCorrelate scores on first form with scores on second form. Graphic:

Problem: Are two “equivalent” forms ever really equivalent?

A B

Page 32: 1 Test Worthiness Chapter 3. 2 Test Worthiness Four cornerstones to test worthiness: Validity Reliability Practicality Cross-cultural Fairness But first,

32

Internal Consistency Reliability

How do individual items relate to each other and the test as a whole?Internal Consistency reliability is going “within” the test rather than using multiple administrationsHigh speed computers and only one test administration has made internal consistency popularThree types:

Split-Half or Odd-Even Cronbach’s Coefficient Alpha Kuder-Richardson

Page 33: 1 Test Worthiness Chapter 3. 2 Test Worthiness Four cornerstones to test worthiness: Validity Reliability Practicality Cross-cultural Fairness But first,

33

Split-half or Odd-even Reliability

Correlate one half of test with other half for all who took the testExample:

Person 1 scores 16 on first half of test and 16 on second half

Person 2 scores 14 on first half and 18 on second half Also get scores for persons 3, 4, 5, etc. Correlate all persons scores on first half with their

scores on second half The correlation = the reliability estimate

Use “Spearman Brown formula to control for shortness of test

Page 34: 1 Test Worthiness Chapter 3. 2 Test Worthiness Four cornerstones to test worthiness: Validity Reliability Practicality Cross-cultural Fairness But first,

34

Split-half or Odd-even Reliability Internal

Consistency

Example Continued: Person Score on 1st Half Score on 2nd half 1 16 16 2 14 18 3 12 20 4 15 17 And so forth….. Problem: Are any two halves really

equivalent? Graphic: A

Page 35: 1 Test Worthiness Chapter 3. 2 Test Worthiness Four cornerstones to test worthiness: Validity Reliability Practicality Cross-cultural Fairness But first,

35

Cronbach’s Alpha and Kuder-Richardson Internal

Consistency

Other types of Internal Consistency: Average correlation of all of the

possible split-half reliabilities Two popular types:

Cronbach’s Alpha Kuder-Richardson (KR-20, KR-21)

Graphic: A

Page 36: 1 Test Worthiness Chapter 3. 2 Test Worthiness Four cornerstones to test worthiness: Validity Reliability Practicality Cross-cultural Fairness But first,

36

Cross-Cultural Fairness

Issues of bias in testing did not get much attention until civil rights movement of 1960’s.Series of court decisions established is was unfair to use tests to track students in schools. Black and Hispanic students were being

unfairly compared to whites-not their norm group.

Page 37: 1 Test Worthiness Chapter 3. 2 Test Worthiness Four cornerstones to test worthiness: Validity Reliability Practicality Cross-cultural Fairness But first,

37

Cross-Cultural Fairness

Americans with Disabilities Act: Accommodations for individuals taking tests for

employment must be made Tests must be shown to be relevant to the job in

question.

Buckley Amendment (FERPA): Right to access school records, including test

records. Parents have the rights to their child’s

records

Page 38: 1 Test Worthiness Chapter 3. 2 Test Worthiness Four cornerstones to test worthiness: Validity Reliability Practicality Cross-cultural Fairness But first,

38

Cross-Cultural Fairness

Carl Perkins Act: Individuals with a disability have the right to

vocational assessment, counseling and placement.

Civil Rights Acts: Series of laws concerned with tests used in

employment and promotion.

Freedom of Information Act: Assures access to federal records, including test

records. Most states have expanded this law so that it also

applies to state records.

Page 39: 1 Test Worthiness Chapter 3. 2 Test Worthiness Four cornerstones to test worthiness: Validity Reliability Practicality Cross-cultural Fairness But first,

39

Cross-Cultural Fairness

Griggs v. Duke Power Company Tests for hiring and advancement much show ability to

predict job performance. Example: Can’t give a test to measure

intelligence for those who want to get a job as a road worker.

IDEIA and PL 94-142: Assures rights of students (age 2 – 21) suspected of

having a learning disability to be tested at the school’s expense.

Child Study Teams and IEP set up when necessary

Page 40: 1 Test Worthiness Chapter 3. 2 Test Worthiness Four cornerstones to test worthiness: Validity Reliability Practicality Cross-cultural Fairness But first,

40

Cross-Cultural Fairness

Section 504 of the Rehabilitation Act: Relative to assessment, any instrument

used to measure appropriateness for a program or service must measure the individual’s ability, not be a reflection of his or he disability.

The Use of Intelligence Tests with Minorities: Confusion and Bedlam (See Insert 3.1, p. 58)

Page 41: 1 Test Worthiness Chapter 3. 2 Test Worthiness Four cornerstones to test worthiness: Validity Reliability Practicality Cross-cultural Fairness But first,

41

Disparities in Ability

Cognitive differences between people exist, however, they are clouded over by issues of SES, prejudice, stereotyping, etc: are there real differences?Why do differences exist and what can be done to eliminate these differences?Often seen as environmental-No Child Left BehindExercise 3.4, p. 58: Why might their be differences among cultural groups on their ability scores?

Page 42: 1 Test Worthiness Chapter 3. 2 Test Worthiness Four cornerstones to test worthiness: Validity Reliability Practicality Cross-cultural Fairness But first,

42

Practicality

Several practical concerns: Time Cost Format(clarity of print, print size,

sequencing of questions and types of questions)

Readability Ease of Administration, Scoring, and

Interpretation

Page 43: 1 Test Worthiness Chapter 3. 2 Test Worthiness Four cornerstones to test worthiness: Validity Reliability Practicality Cross-cultural Fairness But first,

43

Selecting & Administering Tests

Five Steps:1. Determine your client’s goals2. Choose instruments to reach client goals.3. Access information about possible instruments:

Source books: E.g.,: Buros Mental Measurement Yearbook and Tests in Print

Publisher resource catalogs Journals in the field Books on testing Experts The internet

4. Examine validity, reliability, cross-cultural fairness, and practicality.

5. Make a wise choice.