A.hotiu Thesis
-
Upload
marilyn-monrow -
Category
Documents
-
view
17 -
download
1
Transcript of A.hotiu Thesis
![Page 1: A.hotiu Thesis](https://reader033.fdocuments.net/reader033/viewer/2022052504/553e837e55034655428b4a37/html5/thumbnails/1.jpg)
1
THE RELATIONSHIP BETWEEN ITEM DIFFICULTY AND
DISCRIMINATION INDICES IN MULTIPLE-CHOICE TESTS IN A
PHYSICAL SCIENCE COURSE
by
Angelica Hotiu
A thesis is submitted to the faculty of Charles Schmidt College of Science in
partial fulfillment of the requirements for the Degree of
Master in Science
Florida Atlantic University
Boca Raton, Florida
December 2006
![Page 2: A.hotiu Thesis](https://reader033.fdocuments.net/reader033/viewer/2022052504/553e837e55034655428b4a37/html5/thumbnails/2.jpg)
2
ABSTRACT
Author: Angelica Hotiu
Title: The relationship between item difficulty and discrimination
indices in multiple-choice tests in a physical science course
Institution : Florida Atlantic University
Thesis advisor Dr. Robin Jordan
Degree Master of Science
Year 2006
We have developed a method of quantifying multiple-choice test items in an
introductory physical science course in terms of the various tasks required to solve
the problem. We assign a numerical level of difficulty to each task so that any
question can be assigned a degree of difficulty, which is the sum of the individual
levels of difficulty associated in each steps. Using the questions and results from the
tests we have investigated the relationship between the degree of difficulty of each
question and the corresponding discrimination index. Our results indicate that as the
degree of difficulty increases so does the capability of the item to discriminate
between students with different abilities. There is a maximum degree of difficulty
beyond which the discrimination starts to decrease. At that point, test items become
too difficult. Thus, it should be possible in future to design items that will provide
optimum discrimination.
![Page 3: A.hotiu Thesis](https://reader033.fdocuments.net/reader033/viewer/2022052504/553e837e55034655428b4a37/html5/thumbnails/3.jpg)
3
ACKNOWLEDGEMENTS
First of all I would like to express my sincere gratitude and appreciation to Dr. Robin
Jordan for his effort, guidance, devotion and advice during the entire study and the
preparation of the thesis.
Also many thanks are extended to all members of the faculty, staff and graduate
students in the Department of Physics at FAU. I am also very grateful to Dr. Warner
Miller, Chair of the Department of Physics, and the members of my thesis committee
for all their advice and assistance.
Thanks also go to Dr. Fernando Medina for giving me the opportunity to study in
Physics Department at Florida Atlantic University.
Finally, I want to express my gratitude to my lovely family, to my parents and
specially to my mother taking care of my son during my studies and to my husband,
Laurentiu, who supported and encouraged me during this studies. Thank you for all
that you have done for me.
![Page 4: A.hotiu Thesis](https://reader033.fdocuments.net/reader033/viewer/2022052504/553e837e55034655428b4a37/html5/thumbnails/4.jpg)
4
Contents
1. Introduction …………………………………………………………… 5
2. Theory………………………………………………………………… 15
2.1 Anatomy of multiple-choice questions………………................... 15
2.2 Bloom’s taxonomy…………………………………….................. 16
2.3 The cognitive domain…………………………………………… 18
2.4 The discrimination index………………………………………… 24
2.5 The degree of difficulty………………………………................... 25
3. Results of the Research……………………………………………… 35
3.1 Description……………………………………………................. 35
3.2 Analysis of the questions for which D>0.5 ……………………… 37
4. Concluding remarks………………………………………………… 52
5. References…………………………………………………………… 55
![Page 5: A.hotiu Thesis](https://reader033.fdocuments.net/reader033/viewer/2022052504/553e837e55034655428b4a37/html5/thumbnails/5.jpg)
5
Chapter 1
Introduction
The classroom test is one of the most important parts of the teaching and learning
process. There are several different types of tests – those with short essay answers,
multiple-choice answers, etc. – and the type used will depend on a number of factors
such as the instructional objectives, the class size, the type of instruction, the type of
subject matter and the type of feedback required by the instructor. However, the two
most important characteristics of any achievement test are its content validity and
reliability. A test's validity is determined by how well it samples the range of
knowledge, skills, and abilities that students were supposed to acquire in the period
covered by the test. The reliability of a test depends upon grading consistency and
discrimination between students of differing performance levels.
There are two major types of multiple-choice tests, criterion-referenced tests (CRTs)
and norm-referenced tests (NRTs). In criterion-referenced testing, the goal is usually
to make a decision about whether or not an individual can demonstrate mastery in an
area of content and competencies; examples include the written part of a driving test,
certification and licensure exams. In norm-referenced testing, the goal is usually to
rank the entire set of individuals in order to make comparisons of their performances
relative to one another. In this study, we will be analyzing students’ performances on
multiple-choice tests administered during a physical science course; such tests are
NRTs.
![Page 6: A.hotiu Thesis](https://reader033.fdocuments.net/reader033/viewer/2022052504/553e837e55034655428b4a37/html5/thumbnails/6.jpg)
6
Although multiple-choice tests are widely used, many instructors do not hold them in
high regard; some believe, for example, that multiple-choice questions are really
“multiple-guess” items, or that multiple-choice questions are only capable of testing
factual information and so are ill suited for testing higher-order cognitive skills.
However, it is now accepted that well-constructed multiple-choice items can test
many of the same cognitive skills that essay test do. Moreover, they can be used to
diagnose student difficulties if the incorrect options are designed to reveal common
misconceptions, and they can provide a more comprehensive sampling of the subject
material because more questions can be asked. In addition, they are often more valid
and reliable than essay tests because (a) they sample material more broadly; (b)
discrimination between performance levels is easier to determine; and (c) scoring
consistency is virtually guaranteed when carried out by machine.
The validity of multiple-choice tests depends upon a systematic selection of items
with regard to both content and level of learning. Although most teachers try to select
items that sample the range of content covered in class, they often fail to consider the
level or degree of difficulty of the questions they use. Moreover, since it is easy to
develop items that require only recognition or recall of information, instructors tend
to rely heavily on those types of questions. Unfortunately, multiple-choice tests in the
instructor’s manuals that accompany textbooks are often composed exclusively of
recognition or recall items.
![Page 7: A.hotiu Thesis](https://reader033.fdocuments.net/reader033/viewer/2022052504/553e837e55034655428b4a37/html5/thumbnails/7.jpg)
7
Psychologists have elaborate systems for classifying different cognitive levels, but for
most test planning purposes, a simple three-level scheme is sufficient to ensure that
the range of knowledge, skills, and abilities are tested appropriately. The three
categories are recall, application, and evaluation/synthesis, and they are derived from
the six levels of “Bloom’s taxonomy” of cognitive objectives [1.1]. At the lowest
level, recall, students remember specific facts, terminology, principles, or theories,
e.g., stating Newton’s 2nd Law. At the median level, application, students use their
knowledge to solve a problem or analyze a situation, e.g., using Newton’s 2nd Law to
determine the motion of an object. The highest level, evaluation and synthesis,
requires students to derive hypotheses from data, or put the parts of a problem
together, or exercise informed judgment. By analyzing the course material in terms
of these three categories, multiple-choice tests can be constructed that sample both
the range of content and the various cognitive levels at which the students must
operate. Performing this analysis is an essential step in designing multiple-choice
tests that have high validity and reliability.
The purpose of this study is not to provide to comprehensive guide for constructing
multiple-choice items; there are several excellent articles available that provide such
information [1.2, 1.3]. Our main aim is to investigate and quantify two of the most
important factors in creating valid and discriminating multiple-choice tests, namely,
the degree of difficulty and the discrimination index - we define these quantities
below – using the results of actual tests. We have been unable to find any previously
published, quantitative data on such a study, except for a private communication from
![Page 8: A.hotiu Thesis](https://reader033.fdocuments.net/reader033/viewer/2022052504/553e837e55034655428b4a37/html5/thumbnails/8.jpg)
8
Hostetter and Haky who made a similar study of multiple-choice test items in
introductory General Chemistry [1.4]. Accordingly, we have analyzed the results of
six multiple-choice tests (labeled 1A, 1B, 2A, 2B, 3A and 3B) given in a Physical
Science class (PSC2121), at Florida Atlantic University in the Fall 2004 semester.
The numbers of students that took each test was
!
~ 50. The numbers 1, 2, and 3,
represent the number of the test during the semester – there were five tests in total -
and A, B represent two different versions given to different group of the students but
covering the same material and designed to be as “similar” as possible. Physical
science is a general science course for non-science majors, covering topics in physics,
chemistry and earth science. However, in this study we restricted ourselves to
questions on topics that were within the physics discipline; the subject material
covered by the tests is shown in Table 1.1 and the number of students taking each test
and the average scores are shown in Table 1.2. The tests were compiled by Dr. Robin
Jordan, Physics Department, Florida Atlantic University.
![Page 9: A.hotiu Thesis](https://reader033.fdocuments.net/reader033/viewer/2022052504/553e837e55034655428b4a37/html5/thumbnails/9.jpg)
9
Chapter Topics
Physical science and measurement Why standardization?
The metric System
SI units
Description of the motion Vector analysis
Resolution of vectors
Speed and velocity
Accelerated motion
A theory of motion
Galileo and the Experimental motion
Planetary motion Ptolomy’s system
The Copernican revolution
“Gateway to the skies”: Tyco Brahe
How planets move: Johannes Kepler
Galileo’s Discoveries with the Telescope
Law of motion and gravitation Isaac Newton’s “Marvelous year”
The principia
Newton’s first law of motion. Inertia
Newton’s 2nd law of motion. Force
Applications of Newton’s 2nd law
Newton’s 3rd law of motion. Action and
reaction
The “Center-Seeking” force
![Page 10: A.hotiu Thesis](https://reader033.fdocuments.net/reader033/viewer/2022052504/553e837e55034655428b4a37/html5/thumbnails/10.jpg)
10
Chapter Topics
Heat- A form of Energy Temperature measurement
Temperature scale
The lowest temperature
Kinetic theory and the molecular
interpretation of temperature
Temperature and the heat
Specific heat
Calorimetry
Change of state
Thermal expansion
Energy Conservation Mechanical equivalent of the heat
The 1st law of thermodynamic
The 2nd law of thermodynamic
Wave Motion and Sound Transverse waves
Longitudinal waves
Reflection of the waves
Refraction of the waves
Superposition of the waves. Interference
Standing waves
Vibrating air columns
Light and other electromagnetic waves The velocity of the light
Electromagnetic waves
![Page 11: A.hotiu Thesis](https://reader033.fdocuments.net/reader033/viewer/2022052504/553e837e55034655428b4a37/html5/thumbnails/11.jpg)
11
Chapter Topics
Electromagnetic spectrum, radio, TV,
microwaves
Simple lenses
The optic of the eye
Electricity and Magnetism Amber phenomenon
Conductors, Semiconductors, Insulators
Forces between electric
Electric current
Electric circuits
Electric power and energy
The Quantum Theory of Radiation and
Matter
Spectroscopy
The electron
X Rays
Radioactivity
Planck’s Quantum hypothesis
Einstein’s photoelectric equation
Table 1.1
The subject material (chapters and topics) covered by the tests
![Page 12: A.hotiu Thesis](https://reader033.fdocuments.net/reader033/viewer/2022052504/553e837e55034655428b4a37/html5/thumbnails/12.jpg)
12
Test Number of
questions
Number of
respondents
Scoring range
(%)
Average score
(%)
1A 30 52 26.7 – 86.7 56.9
1B 30 53 16.7 – 86.7 53.7
2A 30* 52 16.7 – 90.0 55.2
2B 31 53 19.4 – 77.4 52.6
3A 30 48 30.0 – 86.7 55.6
3B 30 48 26.7 – 83.3 57.0
Table 1.2
Details of the tests used in this study. The tests were part of the
PSC2121 course given in the Fall 2004 semester. * One question was
omitted from the analysis due to a technical problem.
Each question on a multiple-choice test has a discrimination index that determines
how well each question discriminates between students in the top 27% of the class on
![Page 13: A.hotiu Thesis](https://reader033.fdocuments.net/reader033/viewer/2022052504/553e837e55034655428b4a37/html5/thumbnails/13.jpg)
13
total test score and those in the lower 27% of the class on total test score. As we
explain in more detail below, the discrimination index can range from
!
+1 to
!
"1; a
value of
!
+1 means that all of the “high scorers” answered the question correctly and
all of the “low scorers” answered the question incorrectly. A value of 0 means that
the same number of high scorers and low scorers obtained the correct answer and so
the question does not discriminate between the two sub-groups of students. In this
study, we analyzed the questions from all tests for which the discrimination index was
>0.5. To determine the degree of difficulty, we identified the various tasks or
operations, such as memorization and identification, application, unit conversion,
algebraic manipulation, use of vectors, etc., required to answer each question [1.5].
We assigned a numerical level of difficulty to each task, based on the range of
knowledge, skill, and ability required, so that any question involving a number of
different steps has an overall degree of difficulty, which is the sum of the individual
levels of difficulty associated with each of the required steps.
The results indicate a definite correlation between the degree of difficulty and the
discrimination index. For example, as the degree of difficulty increases so does the
discrimination index, which is not unexpected. However, there is a maximum degree
of difficulty beyond which the discrimination index starts to fall off. At that point,
the test items become too difficult for both the high scorers and the low scorers to
answer, so that they no longer discriminate effectively. Clearly, there are two
extremes; questions that are too easy, i.e., with a small difficulty value, and those that
are too hard, i.e., with a high difficulty value. Such questions are not effective if the
![Page 14: A.hotiu Thesis](https://reader033.fdocuments.net/reader033/viewer/2022052504/553e837e55034655428b4a37/html5/thumbnails/14.jpg)
14
purpose of a test is to produce a spread of scores, reflecting differences in student
achievement and abilities.
As part of our study, we have been able to identify the common tasks are that are
involved in the most discriminating questions. Our results suggest that for optimum
discrimination, i.e., questions resulting in a discrimination index
!
> 0.5, the degree of
difficulty lies within a reasonably well-defined range for all the tests analyzed. So, in
principle, by adopting our assigned levels of difficulty for each task or operation, one
can actually design questions with the required level of difficulty and range of
cognitive levels that will result in multiple-choice tests that truly discriminate
between students of different abilities.
Our study is very similar to the analysis of multiple-choice test items in a General
Chemistry I course, carried out by Hostetter and Haky [1.5]. Indeed, it was their
study that prompted ours. Altogether, they used the results from approximately 300
students; a somewhat larger sampling group compared with our study. Our results –
based on an analysis of physics topics - indicated a similar correlation between the
degree of difficulty and discrimination; namely, as the difficulty increased the
average discrimination increased, but there was a critical level of difficulty beyond
which the discrimination decreased.
![Page 15: A.hotiu Thesis](https://reader033.fdocuments.net/reader033/viewer/2022052504/553e837e55034655428b4a37/html5/thumbnails/15.jpg)
15
Chapter 2
Theory
2.1. Anatomy of multiple-choice questions
A standard multiple-choice item consists of basic two parts:
• A problem (the stem)
• A list of suggested solutions (alternatives)
Typically, multiple-choice items present the stem in the complete question form or an
incomplete statement and the list of alternatives contains one correct or best
alternative (answer) and a number of incorrect or inferior alternatives (distractors).
For example:
Stem in complete form Incomplete statement
What is a weight of an object?
• The force with which it is
attracted to the earth
• The amount of matter that it
contains
• A measure of its inertia
• The same quantity as its mass but
expressed in different units
The weight of an object is:
• The force with which it is
attracted to the earth
• The amount of matter that it
contains
• A measure of its inertia
• The same quantity as its mass but
expressed in different units
![Page 16: A.hotiu Thesis](https://reader033.fdocuments.net/reader033/viewer/2022052504/553e837e55034655428b4a37/html5/thumbnails/16.jpg)
16
Students are directed to select either the correct answer or the best answer from the
list of options provided. In the correct answer form, the answer is correct beyond
question while the distractors are definitely incorrect. In the best answer version,
more than one option may be appropriate in varying degrees. The purpose of the
distractors is to appear as plausible solutions to the problem for those students who
have not achieved the required learning examined by the question. On the other hand,
the distractors will appear as implausible solutions for those students who achieved
the required learning; only the (required correct) answer is plausible for those
students. As we mentioned in the Introduction, multiple-choice items can be
designed to test not only the lower levels of the learning process, i.e., recall, but also
the higher-level skills of comprehension, application, analysis and all of which may
be part of the required educational objectives of the class.
2.2 Bloom’s taxonomy
Starting in 1948, a committee of colleges, led by Benjamin Bloom, began the task of
classifying education goals and objectives. The intent was to develop a classification
system for three domains: the cognitive, the affective, and the psychomotor:
• Cognitive: mental skills (Knowledge)
• Affective: growth in feelings or emotional areas (Attitude)
• Psychomotor: manual or physical skills (Skills)
![Page 17: A.hotiu Thesis](https://reader033.fdocuments.net/reader033/viewer/2022052504/553e837e55034655428b4a37/html5/thumbnails/17.jpg)
17
They completed their study on the cognitive domain in 1956 and the resulting
classification system is now commonly referred to as Bloom's Taxonomy of the
Cognitive Domain [2.1]. Work on the affective and psychomotor domains was
completed in 1972-3 [2.2, 2.3]. The divisions between different classes of skills or
behavior are not absolute and other systems or hierarchies have been devised in the
educational and training world. However, Bloom's taxonomy is the most easily
understood and is arguably the one most used today.
The major idea of the taxonomy of the cognitive domain is that what educators want
students to “know”, i.e., the educational objectives, can be arranged in a hierarchy,
starting from the simplest behavior or skill to the most complex. As a result, it can also
provide a useful structure within which to categorize and analyze test items.
Instructors characteristically ask questions within particular skill levels, for example,
Bloom found that over 95 % of the test questions students encounter require them to
think only at the lowest possible level, i.e., the recall of information. However,
education research shows that students remember more, and can apply their
knowledge more effectively, when they have learned to handle the topic at the higher
levels of the taxonomy, where more complex skills are required [2.4, 2.5]. Clearly,
Learning Process
Cognitive
Affective
Psychomotor
![Page 18: A.hotiu Thesis](https://reader033.fdocuments.net/reader033/viewer/2022052504/553e837e55034655428b4a37/html5/thumbnails/18.jpg)
18
students can "know" about a topic or subject at different levels. So, it is plain there
must be a close link between the taxonomy and test questions, if the latter are
constructed with the aim of checking the skill level of students, and discriminating
between students of different abilities.
2.3 The cognitive domain
The cognitive domain involves knowledge and the development of intellectual skills.
This includes the recall or recognition of specific facts, procedural patterns, and
concepts that serve in the development of intellectual abilities and skills. There are six
major categories, which are shown in Tables 2.1 to 2.3, starting from the simplest
behavior to the most complex. The categories can be thought of as degrees or
hierarchies of difficulties.
![Page 19: A.hotiu Thesis](https://reader033.fdocuments.net/reader033/viewer/2022052504/553e837e55034655428b4a37/html5/thumbnails/19.jpg)
19
Competence Skills demonstrated
1. Knowledge • observation and recall of
information
• knowledge of dates, events, places
• knowledge of major ideas
• mastery of subject matter
• Keywords
list, define, tell, describe, identify,
show, label, collect, examine,
tabulate, quote, name, who, when,
where, etc.
2. Comprehension • understanding information
• grasp meaning
• translate knowledge into new
context
• interpret facts, compare, contrast
• order, group, infer causes
• predict consequences
• Keywords
summarize, describe, interpret,
contrast, predict, associate,
![Page 20: A.hotiu Thesis](https://reader033.fdocuments.net/reader033/viewer/2022052504/553e837e55034655428b4a37/html5/thumbnails/20.jpg)
20
Competence Skills demonstrated
distinguish, estimate, differentiate,
discuss, extend
Table 2.1
The lowest levels of intellectual behaviors within the cognitive domain
Identified by Bloom.
![Page 21: A.hotiu Thesis](https://reader033.fdocuments.net/reader033/viewer/2022052504/553e837e55034655428b4a37/html5/thumbnails/21.jpg)
21
Competence Skills demonstrated
3. Application • use information
• use methods, concepts, theories in
new situations
• solve problems using required
skills or knowledge
• Keywords
apply, demonstrate, calculate,
complete, illustrate, show, solve,
examine, modify, relate, change,
classify, experiment, discover
4. Analysis • seeing patterns
• organization of parts
• recognition of hidden meanings
• identification of components
• Keywords
analyze, separate, order, explain,
connect, classify, arrange, divide,
compare, select, explain, infer
Table 2.2
The median levels of intellectual behaviors within the cognitive domain
identified by Bloom.
![Page 22: A.hotiu Thesis](https://reader033.fdocuments.net/reader033/viewer/2022052504/553e837e55034655428b4a37/html5/thumbnails/22.jpg)
22
Competence Skills demonstrated
5. Synthesis • use old ideas to create new ones
• generalize from given facts
• relate knowledge from several
areas
• predict, draw conclusions
• Keywords
combine, integrate, modify,
rearrange, substitute, plan, create,
design, invent, what if?, compose,
formulate, prepare, generalize,
rewrite
6. Evaluation • compare and discriminate
between ideas
• assess value of theories,
presentations
• make choices based on reasoned
argument
• verify value of evidence
• recognize subjectivity
• Keywords
![Page 23: A.hotiu Thesis](https://reader033.fdocuments.net/reader033/viewer/2022052504/553e837e55034655428b4a37/html5/thumbnails/23.jpg)
23
Competence Skills demonstrated
assess, decide, rank, grade, test,
measure, recommend, convince,
select, judge, explain,
discriminate, support, conclude,
compare, summarize
Table 2.3
The highest levels of intellectual behaviors within the cognitive domain
identified by Bloom.
![Page 24: A.hotiu Thesis](https://reader033.fdocuments.net/reader033/viewer/2022052504/553e837e55034655428b4a37/html5/thumbnails/24.jpg)
24
2.4 The Discrimination Index
The discrimination index is a useful measure of item quality whenever the purpose of
a test is to produce a spread of scores, reflecting differences in student achievement,
so that distinctions may be made among the performances of respondents. It
measures the extent to which item responses discriminate between individuals who
have a higher overall score on a test and those that get a lower overall score. The
discrimination index is determined by the FAU computer-based test scoring and
analysis system [2.6] automatically, in the following way. The distribution of
students is treated as normal and so the students’ scores are arranged into two sub-
groups [2.7],
• the top 27%; the upper group (U), and
• the bottom 27%; the lower group (L).
The discrimination index for a particular question is defined by the proportion of the
students in the top group who got it correct,
!
pU, and the proportion of the students in
the bottom group who got it correct,
!
pL. The discrimination index is defined as
Lu ppD != .
Note that
!
"1#D # +1. When D = 0, i.e.,
!
pU = pL, there is no discrimination, when
!
D = +1, i.e.,
!
pU =1 and
!
pL = 0 , there is perfect discrimination, and when
!
D = "1,
![Page 25: A.hotiu Thesis](https://reader033.fdocuments.net/reader033/viewer/2022052504/553e837e55034655428b4a37/html5/thumbnails/25.jpg)
25
there is inverse discrimination, which is most likely caused by a mis-keyed item.
Thus, discrimination indices
!
" 0 are found on difficult items such that almost
everyone gets them wrong and on items so easy that almost everyone gets them right.
For instructional purposes it is important to know the content areas and type of items
that most students get right or wrong. As mentioned earlier, when multiple-choice
tests are graded using the FAU computer-based test scoring and analysis system,
values of the discrimination indices are obtained automatically [2.6].
2.5 The degree of difficulty
In order to carry out this study, we need a quantitative measure of the “difficulty” of a
question. The difficulty of a question is normally determined from the proportion of
the total group selecting the correct answer to that question. The following formula
may be used to calculate the difficulty factor (sometimes called the p-value):
!
p =c
n"100
where c is the number of students who selected the correct answer and n is the total
number of respondents. A value of
!
p =100% indicates that all the students selected
the correct answer and so that item is very “easy”. A value of 0 indicates that none of
the students selected the correct answer and so that item is very “difficult”. So, this
ratio is one measure of how difficult the question was to the answer.
![Page 26: A.hotiu Thesis](https://reader033.fdocuments.net/reader033/viewer/2022052504/553e837e55034655428b4a37/html5/thumbnails/26.jpg)
26
The implication is that if the purpose of the test is to test an individual’s mastery of
the material, i.e., as in a criterion-referenced test (CRT),
!
p values of
!
~ 90% may be
expected. However, if the emphasis is to obtain a spread of scores between
individuals, as is the case in a norm-referenced test (NRT), then
!
p values over a
broad range can be expected, with the greatest spread if all test items have a difficulty
of 50%. If we plot the difficulty,
!
p , against the corresponding discrimination index
for each question of a test, we observe a definite correlation between the two
quantities, as shown in Figures 2.1 to 2.6 for tests 1, 2 and 3; similar behavior is
observed for all tests. First, as
!
p increases, the discrimination index also increases,
but at a
!
p value between ~40% and ~60%, the discrimination reaches a maximum.
When
!
p >~ 60% , the discrimination index decreases. It is generally claimed that
items for which 40% to 60% of the group passes are preferred to those that are easier
(
!
p > 60%) or more difficult (
!
p < 40%) [2.6]. In these particular cases, the number of
items falling into the range
!
40% < p < 60% in tests 1A, 1B, 2A, 2B, 3A and 3B are
8/26, 10/26, 8/31, 10/31, 12/30 and 9/30, respectively.
![Page 27: A.hotiu Thesis](https://reader033.fdocuments.net/reader033/viewer/2022052504/553e837e55034655428b4a37/html5/thumbnails/27.jpg)
27
0 20 40 60 80 100
Difficulty
!
p (%)
Figure 2.1. The discrimination index versus the difficulty factor,
!
p , for Test 1A.
0 20 40 60 80 100
Difficulty
!
p (%)
Figure 2.2. The discrimination index versus the difficulty factor,
!
p , for Test 1B.
![Page 28: A.hotiu Thesis](https://reader033.fdocuments.net/reader033/viewer/2022052504/553e837e55034655428b4a37/html5/thumbnails/28.jpg)
28
0 20 40 60 80 100
Difficulty
!
p (%)
Figure 2.3. The discrimination index versus the difficulty factor,
!
p , for Test 2A.
0 20 40 60 80 100
Difficulty
!
p (%)
Figure 2.4. The discrimination index versus the difficulty factor,
!
p , for Test 2B
![Page 29: A.hotiu Thesis](https://reader033.fdocuments.net/reader033/viewer/2022052504/553e837e55034655428b4a37/html5/thumbnails/29.jpg)
29
0 20 40 60 80 100
Difficulty
!
p (%)
Figure 2.5. The discrimination index versus the difficulty factor,
!
p , for Test 3A.
0 20 40 60 80 100
Difficulty
!
p (%)
Figure 2.6. The discrimination index versus the difficulty factor,
!
p , for Test 3B.
![Page 30: A.hotiu Thesis](https://reader033.fdocuments.net/reader033/viewer/2022052504/553e837e55034655428b4a37/html5/thumbnails/30.jpg)
30
Note that over the range
!
40% < p < 60%, the discrimination index is
!
>~ 0.5 .
Therefore, we will take
!
D = 0.5 as the desirable minimum value for a “discriminating
item”.
The difficulty factor, as defined above, is a property of the obtained measurements.
However, we require a definition that depends on the content of the question and
reflects the difficulty and complexity of the tasks required to find a solution. Thus, we
seek a quantitative and independent measurement of difficulty.
We have found that it is possible to assign a degree of difficulty to items on a
multiple-choice test based on the knowledge and tasks required to solve the problem.
Basically, all questions can be analyzed in terms of a combination of letters and
numbers. The letters represent the tasks or actions that students must perform in
order to obtain a complete solution to the problem; the numbers indicate the number
of times each task or action is performed. In general terms, Bloom’s taxonomy,
described above, classifies the various tasks and actions, e.g., simple memorization
(recall), unit conversion, solving a system of equations, etc., into a hierarchy. Using
the classification system as a guide, we are able to assign a numerical level of
difficulty to each of these tasks, as shown in Table 2.4.
![Page 31: A.hotiu Thesis](https://reader033.fdocuments.net/reader033/viewer/2022052504/553e837e55034655428b4a37/html5/thumbnails/31.jpg)
31
Task Level of difficulty
Knowledge and recall (K)
Identification (I)
1
Application (A) 2
Unit conversion (simple) (
!
C3)
Simple equation (E)
3
Unit conversion (
!
C4)
Vector analysis (V)
4
Solving equation (
!
S5), derive (D) 5
Solving a system of equation (
!
S6) 6
Table 2.4
Numerical level of difficulty associated with each task
![Page 32: A.hotiu Thesis](https://reader033.fdocuments.net/reader033/viewer/2022052504/553e837e55034655428b4a37/html5/thumbnails/32.jpg)
32
In this way we are able to assign an overall degree of difficulty to each question on a
test as the sum of the individual levels of difficulty encountered to obtain the answer.
In more detail, the tasks in Table 2.4 are:
Knowledge (K) or recall: a task that simply implies memorization or a
definition or a quantity that must be known in order to answer the question.
Identification (I): a task that requires identification of the process, laws or the
equation that must be used in order to solve the problem.
Application (A): a task when the knowledge is applied to a problem
Unit conversions (
!
C3 and
!
C4): are tasks when a unit conversion is done in
completing the problem.
Simple equation (E): describes a task that involves simply inserting numbers
into an equation to obtain a solution.
Vector analysis (V): a task when vector addition or manipulation of vectors is
required in order to solve the problem.
Derivation (D): a task that requires the derivation or proof of an algebraic
expression.
Equation (
!
S5 and
!
S6): tasks that involve the manipulation of one or more
equations before numbers can be input in order to obtain a result.
We provide three examples below.
![Page 33: A.hotiu Thesis](https://reader033.fdocuments.net/reader033/viewer/2022052504/553e837e55034655428b4a37/html5/thumbnails/33.jpg)
33
1. Example of 1K question (Test 1A, Q9):
Velocity is a rate of change of
a) Speed
b) Energy
c) Distance
d) Displacement
In order to answer this question correctly, the student should know the definition of
speed. The level of difficulty level of this question is 1.
2. Example of KI question (Test 1A, Q19):
A skydiver jumps from an airplane. As her velocity of fall increases, neglecting
air resistance, her acceleration
a) Increases
b) Is constant
c) Decreases
In order to answer to this question correctly, the student needs to
• Identify the type of motion for a skydiver (uniform accelerated motion)
• Know that acceleration is constant during the motion
Difficulty level of this question is 2.
![Page 34: A.hotiu Thesis](https://reader033.fdocuments.net/reader033/viewer/2022052504/553e837e55034655428b4a37/html5/thumbnails/34.jpg)
34
3. Example of 2IAE question (Test 1B, Q18):
What is a speed of an object after 4s, if it falls from the rest with an acceleration
of
!
32 ft/s2 ?
a) 32 ft/s
b) 128 ft/s
c) 256ft/s
d) 384 ft/s
In order to answer to this question correctly, the student has to
• Identify the type of motion (uniform accelerated motion! free fall)
• Apply the formula for the velocity in uniform accelerate motion
!
v = vo + at
• Identify that the initial speed is
!
vo = 0
• Solve the equation for v
Difficulty level of this question is
!
2 "1+ 2 + 3 = 7 .
The main aim of this study is to investigate any relationship between the level of
difficulty of a particular question and the corresponding discrimination index, using
the results of a total of six multiple-choice tests in a physical science course.
![Page 35: A.hotiu Thesis](https://reader033.fdocuments.net/reader033/viewer/2022052504/553e837e55034655428b4a37/html5/thumbnails/35.jpg)
35
Chapter 3
Results of the Research
3.1 Description
As mentioned previously, the main aim of this study is to investigate the relationship
between the degree of difficulty of a particular question and the corresponding
discrimination index. The degree of difficulty is defined in Chapter 2 and can be
described as a numerical quantity that depends on the content of the question and
reflects the difficulty and complexity of the tasks and operations required to find a
solution. In this study, we use a combination of letter and numbers to quantify a
complete solution to a question; the letters represent the task(s) or action(s) that must
be performed and the numbers represent the number of times each task or action is
performed. As we described above, we have classified the tasks and actions into a
hierarchy, using Bloom’s taxonomy as a guide, and assigned a numerical degree of
difficulty to each of the tasks. For example, the following question:
The speed limit in a school zone is 20mi/h and it is strictly enforced. If
you are driving at 30km/h are you likely to get a ticket?
(a) Yes
(b) No
![Page 36: A.hotiu Thesis](https://reader033.fdocuments.net/reader033/viewer/2022052504/553e837e55034655428b4a37/html5/thumbnails/36.jpg)
36
can be analyzed in the following way. In order to answer to this question the student
should …
• convert km/h to mi/h using the relationship
!
1 mi =1.61 km , i.e.,
!
1 km = (1 1.61) mi = 0.621 mi . This task is
!
C3, a simple unit
conversion with a level of difficulty of 3.
• solve the equation for v:
!
v = 30 km/h " 30 # 0.621=18.6 mi/h . This
task is E, a simple equation with a level of difficulty of 3.
• identify that
!
v < 20 mi/h . This task is I, identification with a level of
difficulty of 1.
Thus, the level of difficulty of this question is
!
3+ 3+1= 7.
The discrimination index measures the extent to which the question discriminates
between individuals who fall into the top 27% of scorers on a test and those who fall
into the bottom 27%. The index, as defined in Chapter 2, which has a value
!
"1#D # +1, is determined automatically for each question on a test by the FAU
computer-based test scoring service. For the purposes of this study, we claim that
questions with values of
!
D > 0.5 qualify as questions that are “reasonable”
discriminators; hence, we only concentrated on such test items in our study.
Altogether, we analyzed the results of six multiple-choice tests (labeled 1A, 1B, 2A,
2B, 3A and 3B) given in a Physical Science class (PSC2121), at Florida Atlantic
University in the Fall 2004 semester and selected only those items for which
!
D > 0.5.
![Page 37: A.hotiu Thesis](https://reader033.fdocuments.net/reader033/viewer/2022052504/553e837e55034655428b4a37/html5/thumbnails/37.jpg)
37
3.2 Analysis of the questions for which D>0.5
In Tables 3.1 to 3.6, we list the results of our analysis of the six tests for
which
!
D > 0.5.
In Figures 3.1 to 3.9, we show plots of the degree of difficulty and the discrimination
index for the individual tests. We have included a second order polynomial fit to the
data simply to act as a guide to the eye.
Despite the limited statistics, due to a relatively small number of respondents (~50)
on each test, a trend does appear to emerge. The data for each test suggests that there
is a correlation between the degree of difficulty and the discrimination index.
Specifically, initially, as the degree of difficulty increases the discrimination index
also increases. However, there is an optimum degree of difficulty beyond which the
discrimination begins to fall. (Such behavior was noted previously, in chapter 2,
when the difficulty factor, defined as:
!
p =c
n"100 ,
where c is the number of students who selected the correct answer and n is the total
number of respondents. But, as we argued in chapter 2, the difficulty factor is a
property of the obtained measurements and is not appropriate in our analysis, which is
why we found it necessary to introduce a quantitative and independent degree of
difficulty for each question, based on content of a question and the difficulty and
complexity of the tasks required to find a solution.)
![Page 38: A.hotiu Thesis](https://reader033.fdocuments.net/reader033/viewer/2022052504/553e837e55034655428b4a37/html5/thumbnails/38.jpg)
38
We can understand such behavior by identifying the two extremes, namely, (a)
questions that have a “low” degree of difficulty (
!
<~ 8), i.e., questions that are too
easy, and (b) questions with a “high” degree of difficulty (
!
>~ 14), i.e., questions that
are too hard. Questions in these regimes are less effective in discriminating between
students of different abilities because:
in case (a) more of the lower scoring students are likely to answer the
question correctly, so the test item is too easy for both the lower and
higher scorers, resulting in less discrimination, and
in case (b) fewer of the higher scoring students are likely to answer the
question correctly, so the test item is too difficult for both the high
scorers and the low scorers to answer and so it no longer discriminates
effectively.
In spite of the limited size of the data sets, we suggest that, for the tests that we have
analyzed, the optimum discrimination likely occurs when the degree of difficulty lies
in the range from ~9 to ~14. It might be tempting to compare the degrees of
difficulty for optimum discrimination from one test to the next; clearly, if students are
“learning” then we might expect the degree of difficulty for optimum discrimination
to increase! However, the sample set is simply not adequate for reliable comparisons.
These results are very similar to those obtained by Hostetter and Haky who analyzed
![Page 39: A.hotiu Thesis](https://reader033.fdocuments.net/reader033/viewer/2022052504/553e837e55034655428b4a37/html5/thumbnails/39.jpg)
39
the results of a number of multiple-choice tests given in an introductory General
Chemistry course [1.4].
A further outcome of this study is that, in principle, it is now possible to design
multiple choice items with a known degree of difficulty and, hence, discrimination.
Finally, in Figure 3.10 we show the correlation between the measured difficulty
factors (
!
p), as defined in Chapter 2, and our calculated degrees of difficulty for Test
1A. In (a) we have used the complete set of values; where there is more than one
measured difficulty factor for a particular degree of difficulty; we have plotted the
averaged value. The plots indicate a close relationship between the measured and
calculated values. When the calculated degree of difficulty is very small, most of the
students get the correct answer, so
!
p"100%; when the calculated degree of difficulty
is very large, most students fail to get the correct answer, so
!
p" 0.
![Page 40: A.hotiu Thesis](https://reader033.fdocuments.net/reader033/viewer/2022052504/553e837e55034655428b4a37/html5/thumbnails/40.jpg)
40
Question number Type question Degree of
difficulty
Discrimination
index
2
!
CS5 8 0.61
5
!
2CE 9 0.67
12
!
K3AS6 13 0.58
13
!
K2VS5 14 0.61
17
!
2IKAE 8 0.67
18
!
IAE 6 0.58
25
!
KAVS5 12 0.81
Table 3.1.
The results for Test 1A.
![Page 41: A.hotiu Thesis](https://reader033.fdocuments.net/reader033/viewer/2022052504/553e837e55034655428b4a37/html5/thumbnails/41.jpg)
41
Question number Type question Degree of
difficulty
Discrimination
index
7
!
2C3E 9 0.58
9
!
KAVI 8 0.52
15
!
KAS5E 11 0.65
17
!
IKAS5 9 0.66
21
!
3KI3A 10 0.66
22
!
AS5 7 0.51
23
!
5KAI 8 0.52
24
!
K2I2AS5 12 0.64
25
!
KAIVS5 13 0.58
Table 3.2.
The results for Test 1B.
![Page 42: A.hotiu Thesis](https://reader033.fdocuments.net/reader033/viewer/2022052504/553e837e55034655428b4a37/html5/thumbnails/42.jpg)
42
Question number Question type Degree of
difficulty
Discrimination
index
2
!
IKA 4 0.55
7
!
2IKAS5 10 0.70
22
!
5K3AI 12 0.69
25
!
KACE 9 0.60
27
!
2I2K2AE 11 0.84
28
!
2I2K2AE 11 0.70
Table 3.3.
The results for Test 2A.
![Page 43: A.hotiu Thesis](https://reader033.fdocuments.net/reader033/viewer/2022052504/553e837e55034655428b4a37/html5/thumbnails/43.jpg)
43
Question number Question type Degree of
difficulty
Discrimination
index
5
!
2IAS5 9 0.80
16
!
2KE 5 0.66
22
!
KS52AI 11 0.53
23
!
2K 2 0.55
24
!
2KS53AI 14 0.50
Table 3.4.
The results for Test 2B.
![Page 44: A.hotiu Thesis](https://reader033.fdocuments.net/reader033/viewer/2022052504/553e837e55034655428b4a37/html5/thumbnails/44.jpg)
44
Question number Question type Degree of
difficulty
Discrimination
index
1
!
IAE 6 0.52
5
!
2IKA 5 0.50
8 5
2ASK 10 0.61
9
!
KAE 6 0.57
12
!
KAS5 8 0.60
14
!
2I2AS5E 14 0.86
18
!
3I3AES5S6 23 0.60
20
!
K2A2S5 15 0.84
22
!
IDC3E 12 0.77
24
!
K2AES5C3 16 0.59
25
!
2K2AS5 11 0.70
26
!
6K3A 12 0.75
27
!
4KE 7 0.66
28
!
4KA 6 0.50
30
!
KAE 6 0.57
Table 3.5.
The results for Test 3A.
![Page 45: A.hotiu Thesis](https://reader033.fdocuments.net/reader033/viewer/2022052504/553e837e55034655428b4a37/html5/thumbnails/45.jpg)
45
Question number Question type Degree of
difficulty
Discrimination
index
6
!
4KA 6 0.57
7 5
2ASK 10 0.65
14
!
2I2AS5E 14 0.75
20
!
K2A2S5 15 0.57
21
!
KAS5I 9 0.65
22
!
IDC3E 12 0.65
24
!
2K2AS5 11 0.66
27
!
4KE 7 0.66
28
!
4KA 6 0.65
29
!
AE 5 0.50
Table 3.6.
The results for Test 3B.
![Page 46: A.hotiu Thesis](https://reader033.fdocuments.net/reader033/viewer/2022052504/553e837e55034655428b4a37/html5/thumbnails/46.jpg)
46
Figure 3.1. The discrimination index versus the degree of difficulty for Test 1A.
Figure 3.2. The discrimination index versus the degree of difficulty for Test 1B.
![Page 47: A.hotiu Thesis](https://reader033.fdocuments.net/reader033/viewer/2022052504/553e837e55034655428b4a37/html5/thumbnails/47.jpg)
47
Figure 3.3. The discrimination index versus the degree of difficulty for Tests 1A and
1B.
Figure 3.4. The discrimination index versus the degree of difficulty for Test 2A
![Page 48: A.hotiu Thesis](https://reader033.fdocuments.net/reader033/viewer/2022052504/553e837e55034655428b4a37/html5/thumbnails/48.jpg)
48
Figure 3.5. The discrimination index versus the degree of difficulty for Test 2B.
Figure 3.6. The discrimination index versus the degree of difficulty for Tests 2A and
2B.
![Page 49: A.hotiu Thesis](https://reader033.fdocuments.net/reader033/viewer/2022052504/553e837e55034655428b4a37/html5/thumbnails/49.jpg)
49
Figure 3.7. The discrimination index versus the degree of difficulty for Test 3A.
Figure 3.8. The discrimination index versus the degree of difficulty for Test 3B.
![Page 50: A.hotiu Thesis](https://reader033.fdocuments.net/reader033/viewer/2022052504/553e837e55034655428b4a37/html5/thumbnails/50.jpg)
50
Figure 3.9. The discrimination index versus the degree of difficulty for Tests 3A and
3B.
![Page 51: A.hotiu Thesis](https://reader033.fdocuments.net/reader033/viewer/2022052504/553e837e55034655428b4a37/html5/thumbnails/51.jpg)
51
Figure 3.10. (a) the difficulty factor (
!
p) and (b) the averaged
difficulty factor (
!
p av ) versus the calculated degree of difficulty for
test 1A. A linear trend line has been fitted to the data; in (a)
!
R2
= 0.73
and in (b)
!
R2
= 0.85 .
![Page 52: A.hotiu Thesis](https://reader033.fdocuments.net/reader033/viewer/2022052504/553e837e55034655428b4a37/html5/thumbnails/52.jpg)
52
Chapter 4
Concluding remarks
In this study we analyzed the questions and results of a total of six multiple-choice
tests in a physical science course at Florida Atlantic University in the Fall 2004. Our
main aim was to quantify two of the most important factors in creating valid and
discriminating test items, namely, the degree of difficulty of each item and the
corresponding discrimination index based the results of actual tests, and to investigate
the relationship between them. Following the analysis of the results of a test, each
item can be assigned a “discrimination index”, which determines how well it
discriminates between the top scoring students of the test and the bottom group of
students. In this study we confined our analysis to the questions from all tests for
which discrimination index is >0.5.
In order to associate a degree of difficulty with each item, we identified the various
tasks or operations, such as memorization and identification, application, unit
conversion, algebraic manipulation, use of vectors, etc., required to answer each
question. We assigned a numeric level of difficulty to each task, based on the range
of knowledge, skill, and ability required, so that any question involving a number of
different steps has an overall degree of difficulty, which is the sum of the individual
levels of difficulty associated with each of the required steps.
![Page 53: A.hotiu Thesis](https://reader033.fdocuments.net/reader033/viewer/2022052504/553e837e55034655428b4a37/html5/thumbnails/53.jpg)
53
Our results indicate a definite correlation between the degree of difficulty and the
discrimination index. For example, as the degree of difficulty increases so does the
discrimination index. However, there is a optimum degree of difficulty, in the range
~9 to ~12, beyond which the discrimination index starts to fall. At that point, the test
items become too difficult for both the high scorers and the low scorers to answer, so
the items no longer discriminate effectively. Clearly, there are two extremes;
questions that are too easy, i.e., with a low degree of difficulty, and those that are too
hard, i.e., with a high degree of difficulty. Such questions are not effective in
discriminating between students of different abilities.
By adopting our assigned levels of difficulty for each task or operation, one can
actually design questions with the required level of difficulty and range of cognitive
levels that will result in multiple-choice tests that truly discriminate between students
of different abilities. For example, the results of our study indicate that the most
discrimination questions, i.e., with
!
D > 0.6, have a degree of difficulty level is in
interval
!
9"14 . Using this result and we can set up an “inequation”
!
9 " a #K + b #A + c #E + d #V + e #S5 + f #S6 "14 ,
where K, A, E, V,
!
S5 and
!
S6 , etc., are the various tasks and operations required to
solve a problem, as defined in chapter 2, and a, b, c, d, e, f represent the number of
times each action is performed. We found that it was possible to assign a numerical
level of difficulty to each of these tasks, e.g.,
!
K =1,
!
A = 2,
!
E = 3,
!
V = 4 ,
!
S5 = 5 ,
![Page 54: A.hotiu Thesis](https://reader033.fdocuments.net/reader033/viewer/2022052504/553e837e55034655428b4a37/html5/thumbnails/54.jpg)
54
!
S6 = 6, based on a hierarchy of the skills required. Therefore, the inequation
becomes:
!
9 " a + 2b + 3c + 4d + 5e + 6f "14 .
Although there are many possible solutions to this equation, there are, however,
limits. So, in principle, we can use this inequation to develop items for multiple-
choice tests in a physical science course where the requirement is to obtain optimum
discrimination between students who have mastered the course material and those
who have not. However, the design of items with optimum discrimination and the
verification under test conditions is beyond the scope of this study; we suggest it
might form the basis of further research.
![Page 55: A.hotiu Thesis](https://reader033.fdocuments.net/reader033/viewer/2022052504/553e837e55034655428b4a37/html5/thumbnails/55.jpg)
55
References
[1.1] Bloom B. S. (1956). Taxonomy of Educational Objectives, Handbook I: The Cognitive Domain. New York: David McKay Co Inc. [1.2] Victoria Clegg and William Cashin (1986), “Improving Multiple-Choice Tests” iDEA PAPER No. 16 available from http://www.idea.ksu.edu/papers/ [1.3] “Improving Multiple Choice Questions” (1990) available from http://ctl.unc.edu/fyc8.html. [1.4] Laura Hostetter and Dr. J.E. Haky – private communication. Also, “A classification scheme for preparing effective multiple-choice questions based on item response theory”, L. Hostetter and J.E. Haky, FLORIDA ACADEMY OF SCIENCES, Annual meeting, University of South Florida, March 2005. [1.5] Note that our definition of the degree of difficulty is different from that used by the FAU Testing and Evaluation Center. [2.1] B.S. Bloom (1956). Taxonomy of Educational Objectives, Handbook I: The Cognitive Domain. New York: David McKay Co Inc. There is a considerable amount of information available about Bloom’s taxonomy on the internet, see, for example: http://www.nwlink.com/~donclark/hrd/bloom.html, http://www.coun.uvic.ca/learn/program/hndouts/bloom.html, http://www.valdosta.edu/~whuitt/psy702/cogsys/bloom.html. [2.2] D.R. Krathwohl, B.S. Bloom, and B.M. Bertram (1973). Taxonomy of Educational Objectives, the Classification of Educational Goals. Handbook II: Affective Domain. New York: David McKay Co., Inc. [2.3] E.J. Simpson (1972). The Classification of Educational Objectives in thePsychomotor Domain. Washington, DC: Gryphon House. [2.4] J. D. Bransford, A.L. Brown and R.R. Cocking (eds) (2000). How People Learn: expanded edition. Washington, D.C.: National Academy Press. [2.5] M. Suzanne Donovan and John D. Bransford (eds) (2005). How Students Learn. Washington, D.C.: The National Academies Press. [2.6] Handout entitled Computer based test scoring and analysis is available from the Florida Atlantic University, Testing and Evaluation Center. [2.7] “The selection of upper and lower groups for the validation of test items”, T.L. Kelley, J. Ed. Psych., 30, 17-24 (1939).