Whats wrong with the old approach? Classical test theory Sample
dependent Parallel test form issue Comparing examinee scores
Reliability No predictability Error is the same for everybody
Slide 3
So, what is IRT? A family of mathematical models that describe
the interaction between examinees and test items Examinee
performance can be predicted in terms of the underlying trait
Provides a means for estimating scores for people and
characteristics of items Common framework for describing people and
items
Slide 4
Some Terminology Ability We use this as a generic term used to
describe the thing that we are trying to measure The thing can be
any old thing and we need not concern ourselves with labeling the
thing, but examples of the thing include: Reading ability Math
performance Depression
Slide 5
The ogive Natural occurring form that describes something about
people Used throughout science, engineering, and the social
sciences Also, used in architecture, carpentry, photograph, art,
and so forth
Slide 6
The ogive
Slide 7
Slide 8
The Item Characteristic Curve (ICC) This function really does
everything: Scales items & people onto a common metric Helps in
standard setting Foundation of equating Some meaning in terms of
student ability
Slide 9
The ICC Any line in a Cartesian system can be defined by a
formula The simplest formula for the ogive is the logistic
function:
Slide 10
The ICC Where b is the item parameter, and is the person
parameter The equation represents the probability of responding
correctly to item i given the ability of person j.
Slide 11
b is the inflection point Item i b i =0.125
Slide 12
We can now use the item parameter to calculate p Lets assume we
have a student with =1.0, and we have our b = 0.125 Then we can
simply plug in the numbers into our formula
Slide 13
Using the item parameters to calculate p p = 0.705 i =1.00
Slide 14
Wait a minute What do you mean a student with an ability of
1.0?? Does an ability of 0.0 mean that a student has NO ability?
What if my student has a reading ability estimate of -1.2?
Slide 15
The ability scale Ability is on an arbitrary scale that just so
happens to be centered around 0.0 We use arbitrary scales all the
time: Fahrenheit Celsius Decibels DJIA
Slide 16
Scaled Scores Although ability estimates are centered around
zero reported scores are not However, scaled scores are typically a
linear transformation of ability estimates Example of a linear
transformation: (Ability x Slope) + Intercept
Slide 17
The need for scaled scores the kids will have negative ability
estimates
Slide 18
The Two Scales of Measurement Reporting Scale (Scaled Scores)
Student/parent level report School/district report Cross year
comparisons Performance level categorization The Psychometric Scale
( ) IRT item and person parameters Equating Standard setting
Slide 19
Unfortunately, life can get a lot worse Items vary from one
another in a variety of ways: Difficulty Discrimination Guessing
Item type (MC vs. CR)
Slide 20
Items can vary in terms of difficulty Ability of a student
Easier item Harder item
Slide 21
Items can vary in terms of discrimination Discrimination is
reflected by the pitch in the ICC Thus, we allow the ICCs to vary
in terms of their slope
Slide 22
Good item discrimination 2 close ability levels Noticeable
difference in p
Slide 23
Poor item discrimination smaller difference Same 2 ability
levels
Slide 24
Guessing This item is asymptotically approaching 0.25
Slide 25
Constructed Response Items
Slide 26
Items and people Interact in a variety of ways We can use IRT
to show that there exists a nice little s-shaped curve that shows
this interaction As ability increases the probability of a correct
response increases
Slide 27
Advantages of IRT Because of the stochastic nature of IRT there
are many statistical principles we can take advantage of A test is
a sum of its parts
Slide 28
The test characteristic curve A test is made up of many items
The TCC can be used to summarize across all of our items The TCC is
simply the summation of ICCs along our ability continuum For any
ability level we can use the TCC to estimate the overall test score
for an examinee
Slide 29
Several ICCs are on a test
Slide 30
The test characteristic curve
Slide 31
From an observed test score (i.e., a students total test score)
we can estimate ability The TCC is used in standard setting to
establish performance levels The TCC can also be used to equate
tests from one year to the next The test characteristic curve
Slide 32
Estimating Ability Total score = 3 Ability0.175
Slide 33
Psychometric Information The amount that an item contributes to
estimating ability Items that are close to a persons ability
provide more information than items that are far away An item is
most informative around the point of inflection
Slide 34
Item Information Item is most informative here because this is
where we can discriminate among nearby values
Slide 35
Item Information Item is much less informative at points along
where there is little slope in the ICC
Slide 36
Test Information Test information is the sum of item
information Tests are also most informative where the slope of the
TCC is the greatest Information (like everything else in IRT) is a
function of ability Test information really is test precision
Slide 37
Lets start with a TCC
Slide 38
Information Functions We can evaluate information at a given
cutpoint BP/P
Slide 39
Information and CTT CTT has reliability and of course the
famous coefficient IRT has the test information function Test
quality can be evaluated conditionally along the performance
continuum In IRT information is, conveniently, reciprocally related
to standard error
Slide 40
Standard Error as a function of ability 0.175 SE = 0.25
Slide 41
Standard Error of Ability Total score = 3 Ability0.175
Slide 42
Standard Error of Ability Total score = 3 Ability0.175
Confident region of ability estimate
Slide 43
Item Response Theory A vast kingdom of equations, and dizzying
array of complex concepts Ultimately, we use IRT to explain the
interaction between students and test items The cornerstone to IRT
is the ICC which depicts that as ability increases the chances of
getting an item correct increases
Slide 44
Item Response Theory Everything in IRT can be studied
conditionally along the performance continuum The CTT concept of
reliability is what we call test information, and we can think of
this as being a function of test precision SE is related to
information and can also be studied along
Slide 45
The Utility of Item Response Theory Can be used to estimate
characteristics of items and people Can be used in the test
development process to maximize information (minimize SE) at
critical points along Can even be used for test administration
purposes