1 Validity – Outline 1. Definition 2. Two different views: Traditional 3. Two different views:...
-
Upload
bathsheba-shelton -
Category
Documents
-
view
214 -
download
0
Transcript of 1 Validity – Outline 1. Definition 2. Two different views: Traditional 3. Two different views:...
![Page 1: 1 Validity – Outline 1. Definition 2. Two different views: Traditional 3. Two different views: CSEPT 4. Face Validity 5. Content Validity: CSEPT 6. Content.](https://reader035.fdocuments.net/reader035/viewer/2022062721/56649f265503460f94c3cbd8/html5/thumbnails/1.jpg)
1
Validity – Outline
1. Definition
2. Two different views: Traditional
3. Two different views: CSEPT
4. Face Validity
5. Content Validity: CSEPT
6. Content Validity: Borsboom
![Page 2: 1 Validity – Outline 1. Definition 2. Two different views: Traditional 3. Two different views: CSEPT 4. Face Validity 5. Content Validity: CSEPT 6. Content.](https://reader035.fdocuments.net/reader035/viewer/2022062721/56649f265503460f94c3cbd8/html5/thumbnails/2.jpg)
2
Validity – Outline
7. Criterion Validity: CSEPTi. Predictive vs. Concurrent
ii. Validity Coefficients
8. Criterion Validity: Borsboom
9. Construct Validity: CSEPTi. Convergent
ii. Discriminant
![Page 3: 1 Validity – Outline 1. Definition 2. Two different views: Traditional 3. Two different views: CSEPT 4. Face Validity 5. Content Validity: CSEPT 6. Content.](https://reader035.fdocuments.net/reader035/viewer/2022062721/56649f265503460f94c3cbd8/html5/thumbnails/3.jpg)
3
Validity – Definition
Validity measures agreement between a test score and the characteristic it is believed to measure
![Page 4: 1 Validity – Outline 1. Definition 2. Two different views: Traditional 3. Two different views: CSEPT 4. Face Validity 5. Content Validity: CSEPT 6. Content.](https://reader035.fdocuments.net/reader035/viewer/2022062721/56649f265503460f94c3cbd8/html5/thumbnails/4.jpg)
4
Validity: CSEPT view
Validity is a property of test score interpretations
Validity exists when actions based on the interpretation are justified given a theoretical basis and social consequences
![Page 5: 1 Validity – Outline 1. Definition 2. Two different views: Traditional 3. Two different views: CSEPT 4. Face Validity 5. Content Validity: CSEPT 6. Content.](https://reader035.fdocuments.net/reader035/viewer/2022062721/56649f265503460f94c3cbd8/html5/thumbnails/5.jpg)
5
Validity: Traditional view
Validity is a property of tests
Does the test measure what you think it measures?
![Page 6: 1 Validity – Outline 1. Definition 2. Two different views: Traditional 3. Two different views: CSEPT 4. Face Validity 5. Content Validity: CSEPT 6. Content.](https://reader035.fdocuments.net/reader035/viewer/2022062721/56649f265503460f94c3cbd8/html5/thumbnails/6.jpg)
6
Note the difference:
Validity exists when actions based on the interpretation are justified given a theoretical basis and social consequences
Does the test measure what you think it measures?
![Page 7: 1 Validity – Outline 1. Definition 2. Two different views: Traditional 3. Two different views: CSEPT 4. Face Validity 5. Content Validity: CSEPT 6. Content.](https://reader035.fdocuments.net/reader035/viewer/2022062721/56649f265503460f94c3cbd8/html5/thumbnails/7.jpg)
7
A problem with the CSEPT view
Who is to say the ‘social consequences’ of test use are good or bad?
According to CSEPT validity is a subjective judgment
In my view, this makes the concept useless: “if you like the result the test gives you, you will consider it valid. If you don’t, you won’t.”
That’s not how scientists think.
![Page 8: 1 Validity – Outline 1. Definition 2. Two different views: Traditional 3. Two different views: CSEPT 4. Face Validity 5. Content Validity: CSEPT 6. Content.](https://reader035.fdocuments.net/reader035/viewer/2022062721/56649f265503460f94c3cbd8/html5/thumbnails/8.jpg)
8
Borsboom et al. (2004)
Borsboom et al reject CSEPT’s view
“Validity… is a very basic concept and was correctly formulated, for instance, by Kelley (1927, p. 14) when he stated that a test is valid if it measures what it purports to measure.” (p. 1061)
![Page 9: 1 Validity – Outline 1. Definition 2. Two different views: Traditional 3. Two different views: CSEPT 4. Face Validity 5. Content Validity: CSEPT 6. Content.](https://reader035.fdocuments.net/reader035/viewer/2022062721/56649f265503460f94c3cbd8/html5/thumbnails/9.jpg)
9
Borsboom et al. (2004)
Variations in what you are measuring cause variations in your measurements.
E.g., variations across people in intelligence cause variations in their IQ scores
This is not a correlational model of validity
![Page 10: 1 Validity – Outline 1. Definition 2. Two different views: Traditional 3. Two different views: CSEPT 4. Face Validity 5. Content Validity: CSEPT 6. Content.](https://reader035.fdocuments.net/reader035/viewer/2022062721/56649f265503460f94c3cbd8/html5/thumbnails/10.jpg)
10
Borsboom et al. (2004)
You don’t create a test and then do the analysis necessary to establish its validity
Rather, you begin by doing the theoretical work necessary to understand your subject and create a valid test in the first place.
On this view, validity is not a big problem.
![Page 11: 1 Validity – Outline 1. Definition 2. Two different views: Traditional 3. Two different views: CSEPT 4. Face Validity 5. Content Validity: CSEPT 6. Content.](https://reader035.fdocuments.net/reader035/viewer/2022062721/56649f265503460f94c3cbd8/html5/thumbnails/11.jpg)
11
Borsboom et al. vs. CSEPT
Who is right? Each scientist has to
make up his or her own mind on that question
I agree with Borsboom et al.’s arguments.
Other psychologists may disagree.
![Page 12: 1 Validity – Outline 1. Definition 2. Two different views: Traditional 3. Two different views: CSEPT 4. Face Validity 5. Content Validity: CSEPT 6. Content.](https://reader035.fdocuments.net/reader035/viewer/2022062721/56649f265503460f94c3cbd8/html5/thumbnails/12.jpg)
12
The CSEPT view
CSEPT recognizes 3 types of evidence for test validity: Content-related Criterion-related Construct-related
Boundaries not clearly defined
Cronbach (1980): Construct is basic, while Content & Criterion are subtypes.
![Page 13: 1 Validity – Outline 1. Definition 2. Two different views: Traditional 3. Two different views: CSEPT 4. Face Validity 5. Content Validity: CSEPT 6. Content.](https://reader035.fdocuments.net/reader035/viewer/2022062721/56649f265503460f94c3cbd8/html5/thumbnails/13.jpg)
13
Parenthetical Point – Face Validity
Face validity refers to the appearance that a test measures what it is intended to measure.
Face validity has P.R. value – test-takers may have better motivation if the test appears to be a sensible way to measure what it measures.
![Page 14: 1 Validity – Outline 1. Definition 2. Two different views: Traditional 3. Two different views: CSEPT 4. Face Validity 5. Content Validity: CSEPT 6. Content.](https://reader035.fdocuments.net/reader035/viewer/2022062721/56649f265503460f94c3cbd8/html5/thumbnails/14.jpg)
14
Content validity: CSEPT
Content-related evidence considers coverage of the conceptual domain tested.
Important in educational settings
Like face validity, it is determined by logic rather than statistics
Typically assessed by expert judges
![Page 15: 1 Validity – Outline 1. Definition 2. Two different views: Traditional 3. Two different views: CSEPT 4. Face Validity 5. Content Validity: CSEPT 6. Content.](https://reader035.fdocuments.net/reader035/viewer/2022062721/56649f265503460f94c3cbd8/html5/thumbnails/15.jpg)
15
Content validity: CSEPT
Construct-irrelevant variance arises when irrelevant
items are included or when external
factors such as illness influence test scores
requires a judgment about what is truly “external”
Construct under-representation Is domain adequately
covered or are parts of it left out?
![Page 16: 1 Validity – Outline 1. Definition 2. Two different views: Traditional 3. Two different views: CSEPT 4. Face Validity 5. Content Validity: CSEPT 6. Content.](https://reader035.fdocuments.net/reader035/viewer/2022062721/56649f265503460f94c3cbd8/html5/thumbnails/16.jpg)
16
Content validity: Borsboom et al.
Borsboom et al. would say that content validity is not something to be established after the test has been created.
Rather, you build it into your test by having a good theory of what you are testing
![Page 17: 1 Validity – Outline 1. Definition 2. Two different views: Traditional 3. Two different views: CSEPT 4. Face Validity 5. Content Validity: CSEPT 6. Content.](https://reader035.fdocuments.net/reader035/viewer/2022062721/56649f265503460f94c3cbd8/html5/thumbnails/17.jpg)
17
Criterion validity: CSEPT
Criterion-related evidence tells us how well a test score corresponds to a particular criterion measure.
Generally, we want the test score to tell us something about the criterion score.
How well the test does this provides criterion-related evidence
![Page 18: 1 Validity – Outline 1. Definition 2. Two different views: Traditional 3. Two different views: CSEPT 4. Face Validity 5. Content Validity: CSEPT 6. Content.](https://reader035.fdocuments.net/reader035/viewer/2022062721/56649f265503460f94c3cbd8/html5/thumbnails/18.jpg)
18
Criterion validity: CSEPT
CSEPT: we could compare undergraduate GPAs to SAT scores to produce evidence of validity of conclusions draw on basis of SAT scores.
Two basic types: Predictive Concurrent
![Page 19: 1 Validity – Outline 1. Definition 2. Two different views: Traditional 3. Two different views: CSEPT 4. Face Validity 5. Content Validity: CSEPT 6. Content.](https://reader035.fdocuments.net/reader035/viewer/2022062721/56649f265503460f94c3cbd8/html5/thumbnails/19.jpg)
19
Criterion validity: CSEPT
Predictive validity Test scores used to predict future performance – how good is the prediction?
E.g., SAT is used to predict final undergraduate GPA
SAT – GPA are moderately correlated
![Page 20: 1 Validity – Outline 1. Definition 2. Two different views: Traditional 3. Two different views: CSEPT 4. Face Validity 5. Content Validity: CSEPT 6. Content.](https://reader035.fdocuments.net/reader035/viewer/2022062721/56649f265503460f94c3cbd8/html5/thumbnails/20.jpg)
20
Criterion validity: CSEPT
Predictive validity Concurrent validity
Correlation between test scores and criterion when the two are measured at same time.
Test illuminates current performance rather than predicting future performance (e.g., why does patient have a temperature? Why can’t student do math?)
![Page 21: 1 Validity – Outline 1. Definition 2. Two different views: Traditional 3. Two different views: CSEPT 4. Face Validity 5. Content Validity: CSEPT 6. Content.](https://reader035.fdocuments.net/reader035/viewer/2022062721/56649f265503460f94c3cbd8/html5/thumbnails/21.jpg)
21
Criterion validity: Borsboom et al.
“Criterion validity” involves a correlation, of test scores with some criterion such as GPA
That does not establish the test’s validity, only its utility.
E.g., height and weight are correlated, but a test of height is not a test of what bathroom scales measure.
![Page 22: 1 Validity – Outline 1. Definition 2. Two different views: Traditional 3. Two different views: CSEPT 4. Face Validity 5. Content Validity: CSEPT 6. Content.](https://reader035.fdocuments.net/reader035/viewer/2022062721/56649f265503460f94c3cbd8/html5/thumbnails/22.jpg)
22
Criterion validity: Borsboom et al.
SAT is valid because it was developed on the sensible theory that “past academic achievement” is a good guide to “future academic achievement”
Validity is built into the test, not established after the test has been created
![Page 23: 1 Validity – Outline 1. Definition 2. Two different views: Traditional 3. Two different views: CSEPT 4. Face Validity 5. Content Validity: CSEPT 6. Content.](https://reader035.fdocuments.net/reader035/viewer/2022062721/56649f265503460f94c3cbd8/html5/thumbnails/23.jpg)
23
Criterion validity
Note: no point in developing a test if you already have a criterion – unless impracticality or expense makes use of the criterion difficult.
Criterion measure only available in the future?
Criterion too expensive to use?
![Page 24: 1 Validity – Outline 1. Definition 2. Two different views: Traditional 3. Two different views: CSEPT 4. Face Validity 5. Content Validity: CSEPT 6. Content.](https://reader035.fdocuments.net/reader035/viewer/2022062721/56649f265503460f94c3cbd8/html5/thumbnails/24.jpg)
24
Criterion validity
Validity Coefficient Compute correlation (r) between test score and criterion.
r = .30 or .40 would be considered normal.
r > .60 is rare
Note: r varies between -1.0 and +1.0
![Page 25: 1 Validity – Outline 1. Definition 2. Two different views: Traditional 3. Two different views: CSEPT 4. Face Validity 5. Content Validity: CSEPT 6. Content.](https://reader035.fdocuments.net/reader035/viewer/2022062721/56649f265503460f94c3cbd8/html5/thumbnails/25.jpg)
25
Criterion validity
Validity Coefficient r2 gives proportion of variance in criterion explained by test score.
E.g., if rxy = .30, r2 = .09, so 9% of variability in Y “can be explained by variation in X”
![Page 26: 1 Validity – Outline 1. Definition 2. Two different views: Traditional 3. Two different views: CSEPT 4. Face Validity 5. Content Validity: CSEPT 6. Content.](https://reader035.fdocuments.net/reader035/viewer/2022062721/56649f265503460f94c3cbd8/html5/thumbnails/26.jpg)
26
Interpreting validity coefficients
Watch out for: 1. Changes in causal relationships
2. What does criterion mean? Is it valid, reliable?
3. Is subject population for validity study appropriate?
4. Sample size
![Page 27: 1 Validity – Outline 1. Definition 2. Two different views: Traditional 3. Two different views: CSEPT 4. Face Validity 5. Content Validity: CSEPT 6. Content.](https://reader035.fdocuments.net/reader035/viewer/2022062721/56649f265503460f94c3cbd8/html5/thumbnails/27.jpg)
27
Interpreting validity coefficients
Watch out for: 5. Criterion/predictor confusion
6. Range restrictions
7. Do validity study results generalize?
8. Differential predictions
![Page 28: 1 Validity – Outline 1. Definition 2. Two different views: Traditional 3. Two different views: CSEPT 4. Face Validity 5. Content Validity: CSEPT 6. Content.](https://reader035.fdocuments.net/reader035/viewer/2022062721/56649f265503460f94c3cbd8/html5/thumbnails/28.jpg)
28
Construct validity: CSEPT
Problem: for many psychological characteristics of interest there is no agreed-upon “universe” of content and no clear criterion
We cannot assess content or criterion validity for such characteristics
These characteristics involve constructs: something built by mental synthesis.
![Page 29: 1 Validity – Outline 1. Definition 2. Two different views: Traditional 3. Two different views: CSEPT 4. Face Validity 5. Content Validity: CSEPT 6. Content.](https://reader035.fdocuments.net/reader035/viewer/2022062721/56649f265503460f94c3cbd8/html5/thumbnails/29.jpg)
29
Construct validity: CSEPT
Examples of constructs:
Intelligence Love Curiosity Mental health
CSEPT: We obtain evidence of validity by simultaneously defining the construct and developing instruments to measure it.
This is ‘bootstrapping.’
![Page 30: 1 Validity – Outline 1. Definition 2. Two different views: Traditional 3. Two different views: CSEPT 4. Face Validity 5. Content Validity: CSEPT 6. Content.](https://reader035.fdocuments.net/reader035/viewer/2022062721/56649f265503460f94c3cbd8/html5/thumbnails/30.jpg)
30
Bootstrapping construct validity
assemble evidence about what a test “means” – in other words, about the characteristic it is testing.
CSEPT: this process is never finished
![Page 31: 1 Validity – Outline 1. Definition 2. Two different views: Traditional 3. Two different views: CSEPT 4. Face Validity 5. Content Validity: CSEPT 6. Content.](https://reader035.fdocuments.net/reader035/viewer/2022062721/56649f265503460f94c3cbd8/html5/thumbnails/31.jpg)
31
Bootstrapping construct validity
assemble evidence about what a test “means” – in other words, about the characteristic it is testing.
Borsboom: this is part of the process of creating the test in the first place, not something done after the fact
![Page 32: 1 Validity – Outline 1. Definition 2. Two different views: Traditional 3. Two different views: CSEPT 4. Face Validity 5. Content Validity: CSEPT 6. Content.](https://reader035.fdocuments.net/reader035/viewer/2022062721/56649f265503460f94c3cbd8/html5/thumbnails/32.jpg)
32
Bootstrapping construct validity
assemble evidence show relationships
between a test and other tests
CSEPT: none of the other tests is a criterion but the web of relationships tells us what the test means
![Page 33: 1 Validity – Outline 1. Definition 2. Two different views: Traditional 3. Two different views: CSEPT 4. Face Validity 5. Content Validity: CSEPT 6. Content.](https://reader035.fdocuments.net/reader035/viewer/2022062721/56649f265503460f94c3cbd8/html5/thumbnails/33.jpg)
33
Bootstrapping construct validity
assemble evidence show relationships
between a test and other tests
Borsboom: these relationships do not tell us what a test score means (e.g., age is correlated
with annual income but a measure of age is not a measure of annual income).
![Page 34: 1 Validity – Outline 1. Definition 2. Two different views: Traditional 3. Two different views: CSEPT 4. Face Validity 5. Content Validity: CSEPT 6. Content.](https://reader035.fdocuments.net/reader035/viewer/2022062721/56649f265503460f94c3cbd8/html5/thumbnails/34.jpg)
34
Bootstrapping construct validity
assemble evidence show relationships each new relationship
adds meaning to the test
CSEPT: a test’s meaning is gradually clarified over time
![Page 35: 1 Validity – Outline 1. Definition 2. Two different views: Traditional 3. Two different views: CSEPT 4. Face Validity 5. Content Validity: CSEPT 6. Content.](https://reader035.fdocuments.net/reader035/viewer/2022062721/56649f265503460f94c3cbd8/html5/thumbnails/35.jpg)
35
Bootstrapping construct validity
assemble evidence show relationships each new relationship
adds meaning to the test
Borsboom: would say, why all the mystery? The meaning of many tests (e.g., WAIS, academic exams, Piaget’s tests) is clear right from the start
![Page 36: 1 Validity – Outline 1. Definition 2. Two different views: Traditional 3. Two different views: CSEPT 4. Face Validity 5. Content Validity: CSEPT 6. Content.](https://reader035.fdocuments.net/reader035/viewer/2022062721/56649f265503460f94c3cbd8/html5/thumbnails/36.jpg)
36
Construct validity
Example from text: Rubin’s work on Love.
Rubin collected a set of items for a Love scale
He read poetry, novels; he asked people for definitions
created a scale of Love and one of Liking
![Page 37: 1 Validity – Outline 1. Definition 2. Two different views: Traditional 3. Two different views: CSEPT 4. Face Validity 5. Content Validity: CSEPT 6. Content.](https://reader035.fdocuments.net/reader035/viewer/2022062721/56649f265503460f94c3cbd8/html5/thumbnails/37.jpg)
37
CSEPT: Construct validity
Rubin gave scale to many subjects & factor-analyzed results
Love integrates Attachment, Caring, & Intimacy
Liking integrates Adjustment, Maturity, Good Judgment, and Intelligence The two are
independent: you can love someone you don’t like (as song-writers know)
![Page 38: 1 Validity – Outline 1. Definition 2. Two different views: Traditional 3. Two different views: CSEPT 4. Face Validity 5. Content Validity: CSEPT 6. Content.](https://reader035.fdocuments.net/reader035/viewer/2022062721/56649f265503460f94c3cbd8/html5/thumbnails/38.jpg)
38
Rubin’s study of Love
Borsboom et al.: when creating a test, the researcher specifies “the processes that convey the effect of the measured attribute on the test score.”
Rubin laboriously built a theory about what the construct Love means.
Rubin’s process – reading poetry and novels, asking people for definitions – was a good process, so his test has construct validity.
![Page 39: 1 Validity – Outline 1. Definition 2. Two different views: Traditional 3. Two different views: CSEPT 4. Face Validity 5. Content Validity: CSEPT 6. Content.](https://reader035.fdocuments.net/reader035/viewer/2022062721/56649f265503460f94c3cbd8/html5/thumbnails/39.jpg)
39
Campbell & Fiske (1959)
Two types of Construct-related Evidence
Convergent evidence
When a test correlates well with other tests believed to measure the same construct
![Page 40: 1 Validity – Outline 1. Definition 2. Two different views: Traditional 3. Two different views: CSEPT 4. Face Validity 5. Content Validity: CSEPT 6. Content.](https://reader035.fdocuments.net/reader035/viewer/2022062721/56649f265503460f94c3cbd8/html5/thumbnails/40.jpg)
40
Campbell & Fiske (1959)
Two types of Construct-related Evidence
Convergent evidence Discriminant evidence
When a test does not correlate with other tests believed to measure some other construct.
![Page 41: 1 Validity – Outline 1. Definition 2. Two different views: Traditional 3. Two different views: CSEPT 4. Face Validity 5. Content Validity: CSEPT 6. Content.](https://reader035.fdocuments.net/reader035/viewer/2022062721/56649f265503460f94c3cbd8/html5/thumbnails/41.jpg)
41
Convergent validity
Example – Health Index Scores correlated with age, number of symptoms, chronic medical conditions, physiological measures
Treatments designed to improve health should increase Health Index scores. They do.
![Page 42: 1 Validity – Outline 1. Definition 2. Two different views: Traditional 3. Two different views: CSEPT 4. Face Validity 5. Content Validity: CSEPT 6. Content.](https://reader035.fdocuments.net/reader035/viewer/2022062721/56649f265503460f94c3cbd8/html5/thumbnails/42.jpg)
42
Discriminant validity
Low correlations between new test and tests believed to tap unrelated constructs.
Evidence that the new test measures something unique
![Page 43: 1 Validity – Outline 1. Definition 2. Two different views: Traditional 3. Two different views: CSEPT 4. Face Validity 5. Content Validity: CSEPT 6. Content.](https://reader035.fdocuments.net/reader035/viewer/2022062721/56649f265503460f94c3cbd8/html5/thumbnails/43.jpg)
43
Validity & Reliability: CSEPT
CSEPT: No point in trying to establish validity of an unreliable test.
It’s possible to have a reliable test that is not valid (has no meaning).
Logically impossible to produce evidence of validity for an unreliable test.
![Page 44: 1 Validity – Outline 1. Definition 2. Two different views: Traditional 3. Two different views: CSEPT 4. Face Validity 5. Content Validity: CSEPT 6. Content.](https://reader035.fdocuments.net/reader035/viewer/2022062721/56649f265503460f94c3cbd8/html5/thumbnails/44.jpg)
44
Validity & Reliability: Borsboom
Borsboom et al: what does it mean to say that a test is reliable but not valid?
What is it a test of? It isn’t a test at all, just a
collection of items
![Page 45: 1 Validity – Outline 1. Definition 2. Two different views: Traditional 3. Two different views: CSEPT 4. Face Validity 5. Content Validity: CSEPT 6. Content.](https://reader035.fdocuments.net/reader035/viewer/2022062721/56649f265503460f94c3cbd8/html5/thumbnails/45.jpg)
45
Blanton & Jaccard – arbitrary metrics
We observe a behavior in order to learn about the underlying psychological characteristic
A person’s test score represents their standing on that underlying dimension
Such scores form an arbitrary metric
That is, we do not know how the observed scores are related to the true scores on the underlying dimension
![Page 46: 1 Validity – Outline 1. Definition 2. Two different views: Traditional 3. Two different views: CSEPT 4. Face Validity 5. Content Validity: CSEPT 6. Content.](https://reader035.fdocuments.net/reader035/viewer/2022062721/56649f265503460f94c3cbd8/html5/thumbnails/46.jpg)
6543210
0 1 2 3 4 5 6
Person A Person B
Underlying dimension
Test 1
Test 2
Adapted from Blanton & Jaccard (2006) Figure 1, p. 29
Neutral
![Page 47: 1 Validity – Outline 1. Definition 2. Two different views: Traditional 3. Two different views: CSEPT 4. Face Validity 5. Content Validity: CSEPT 6. Content.](https://reader035.fdocuments.net/reader035/viewer/2022062721/56649f265503460f94c3cbd8/html5/thumbnails/47.jpg)
47
Arbitrary metrics – the IAT
Implicit Association Test (IAT) – claimed to diagnose implicit attitudinal preferences – or racist attitudes
IAT authors say you may have prejudices you don’t know you have.
Are these claims true?
![Page 48: 1 Validity – Outline 1. Definition 2. Two different views: Traditional 3. Two different views: CSEPT 4. Face Validity 5. Content Validity: CSEPT 6. Content.](https://reader035.fdocuments.net/reader035/viewer/2022062721/56649f265503460f94c3cbd8/html5/thumbnails/48.jpg)
48
Arbitrary metrics – the IAT
Task: categorize stimuli using 2 pairs of categories
2 buttons to press, 2 assignments of categories to buttons, used in sequence
![Page 49: 1 Validity – Outline 1. Definition 2. Two different views: Traditional 3. Two different views: CSEPT 4. Face Validity 5. Content Validity: CSEPT 6. Content.](https://reader035.fdocuments.net/reader035/viewer/2022062721/56649f265503460f94c3cbd8/html5/thumbnails/49.jpg)
49
Arbitrary metrics – the IAT
Assignment pattern A Button 1 – press if
stimulus refers to the category White or the category Pleasant
Button 2 – press if stimulus refers to the category Black or the category Unpleasant
Assignment pattern B Button 1 – press if
stimulus refers to the category White or the category Unpleasant
Button 2 – press if stimulus refers to the category Black or the category Pleasant
![Page 50: 1 Validity – Outline 1. Definition 2. Two different views: Traditional 3. Two different views: CSEPT 4. Face Validity 5. Content Validity: CSEPT 6. Content.](https://reader035.fdocuments.net/reader035/viewer/2022062721/56649f265503460f94c3cbd8/html5/thumbnails/50.jpg)
50
Arbitrary metrics – the IAT
IAT authors claim that if responses are faster to Pattern A than to Pattern B, that indicates a “preference” for Whites over Blacks – in other words, a racist attitude
IAT authors also give test-takers feedback about how strong their preferences are, based on how much faster their responses are to Pattern A than to Pattern B
This is inappropriate
![Page 51: 1 Validity – Outline 1. Definition 2. Two different views: Traditional 3. Two different views: CSEPT 4. Face Validity 5. Content Validity: CSEPT 6. Content.](https://reader035.fdocuments.net/reader035/viewer/2022062721/56649f265503460f94c3cbd8/html5/thumbnails/51.jpg)
51
Arbitrary metrics – the IAT
Blanton & Jaccard: The IAT does not tell us about racist attitudes
IAT authors take a dimension which is non-arbitrary when used by physicists – time – and use it in an arbitrary way in psychology
![Page 52: 1 Validity – Outline 1. Definition 2. Two different views: Traditional 3. Two different views: CSEPT 4. Face Validity 5. Content Validity: CSEPT 6. Content.](https://reader035.fdocuments.net/reader035/viewer/2022062721/56649f265503460f94c3cbd8/html5/thumbnails/52.jpg)
52
Arbitrary metrics – the IAT
The function relating the response dimension (time) to the underlying dimension (attitudes) is unknown
Zero on the (Pattern A – Pattern B) difference may not be zero on the underlying attitude preference dimension
There are alternative models of how that (Pattern A – Pattern B) difference could arise
![Page 53: 1 Validity – Outline 1. Definition 2. Two different views: Traditional 3. Two different views: CSEPT 4. Face Validity 5. Content Validity: CSEPT 6. Content.](https://reader035.fdocuments.net/reader035/viewer/2022062721/56649f265503460f94c3cbd8/html5/thumbnails/53.jpg)
53
Review
CSEPT:
1. Validity is a characteristic of evidence, not of tests.
2. Valid evidence supports conclusions drawn using test results
3. Validity is determined by social consequences of test use
Borsboom et al.
1. Validity is not a methodological issue, but a substantive (theoretical) issue
2. A test of an attribute is valid if (a) the attribute exists, and (b) variation in the attribute causes variation in test scores
![Page 54: 1 Validity – Outline 1. Definition 2. Two different views: Traditional 3. Two different views: CSEPT 4. Face Validity 5. Content Validity: CSEPT 6. Content.](https://reader035.fdocuments.net/reader035/viewer/2022062721/56649f265503460f94c3cbd8/html5/thumbnails/54.jpg)
54
Review
CSEPT:4. Validity can be
established in three ways, though boundaries between them are fuzzy:A. Content-related
evidenceB. Criterion-related
evidenceC. Construct-related
evidence
Borsboom et al:3. It’s all the same
validity: a test is valid if it measures what you think it measures
4. Validity is not mysterious
![Page 55: 1 Validity – Outline 1. Definition 2. Two different views: Traditional 3. Two different views: CSEPT 4. Face Validity 5. Content Validity: CSEPT 6. Content.](https://reader035.fdocuments.net/reader035/viewer/2022062721/56649f265503460f94c3cbd8/html5/thumbnails/55.jpg)
55
Review
CSEPT
5. Content-related evidence: do test items represent whole domain of interest?
6. Criterion-related evidence: do test scores relate to a criterion either now (concurrent) or in the future (predictive)?
Borsboom et al.
5. These questions are properly part of the process of creating a test
![Page 56: 1 Validity – Outline 1. Definition 2. Two different views: Traditional 3. Two different views: CSEPT 4. Face Validity 5. Content Validity: CSEPT 6. Content.](https://reader035.fdocuments.net/reader035/viewer/2022062721/56649f265503460f94c3cbd8/html5/thumbnails/56.jpg)
56
Review
CSEPT
6. Construct-related evidence is obtained when we develop a psychological construct and the way to measure it at the same time.
7. A test can be reliable but not valid. A test cannot be valid if not reliable.
Borsboom et al.
6. A test must be valid for a reliability estimate to have any meaning
![Page 57: 1 Validity – Outline 1. Definition 2. Two different views: Traditional 3. Two different views: CSEPT 4. Face Validity 5. Content Validity: CSEPT 6. Content.](https://reader035.fdocuments.net/reader035/viewer/2022062721/56649f265503460f94c3cbd8/html5/thumbnails/57.jpg)
57
Review
Blanton & Jaccard (2006) warn against over-interpretation of scores which are based on an arbitrary metric
For an arbitrary metric, we have no idea how the test scores are actually related to the underlying dimension