The Concept Inventory as a Probe of Physics Understanding · A concept inventory in physics is a...
Transcript of The Concept Inventory as a Probe of Physics Understanding · A concept inventory in physics is a...
-
20
future career. As far as possible
each question of such a catalogue
tests a single concept and does not
involve algebraic gymnastics or
computations.
Over thirty years of Physics
Education Research (PER) has
revealed that students have ideas on
how physical systems behave prior
to their study and these common
sense perceptions are robust and
hard to eliminate. A car moving at
high speed must have a greater force
acting on it; on a wintry day a metal
bar feels colder to the touch than a
wooden bar since it is at a lower
temperature; action and reaction are
equal and opposite but action
precedes reaction by a fraction of a
I. Introduction
The concept inventory (CI) in
physics education research (PER) is
a catalogue of carefully designed
conceptual questions on a given
topic. The questions may be open
ended and the student asked to
write or verbalize an appropriate
response. But often it takes the form
of a set of short questions each
followed by a number of choices one
of which is the most appropriate.
An essential difference from the
multiple choice questions (MCQs)
administered in a high stakes test is
its perspective: it is designed to
probe alternative conceptions and
elicit ill suited reasoning patterns
rather than to act as a toll-gate to a
second; if the lower half of a lens is
covered the image it produces is
proportionately sliced off, -- all these
are misconceptions we may hold
onto despite a good grounding in
high school physics.
As mentioned in the lead article
concept inventories form an integral
part of PER. Responses to questions
in mechanics related inventories
document in an unambiguous
fashion that there is a disconnect
between what we learn in class and
what we actually believe. To give the
reader an idea we list in Box I
examples taken from two published
concept inventories: one from
electricity and magnetism [1] and
the other from quantum mechanics
Prof. Vijay A. Singh was faculty, physics at IIT- Kanpur for over twenty years (1984 - 2005)
and is currently faculty at the Homi Bhabha Centre for Science Education (TIFR) Mumbai.
He oversees India's Olympiad efforts as the National Coordinator, Science Olympiad. He
coordinates the National Initiative on Undergraduate Science (NIUS) under which UG
students undergo extended nurturing and have carried out research and published in
several international journals. He has worked at several places abroad and has
authored over 135 peer reviewed publications. His current interests are in
semiconductor nanostructures and physics education research. One of his hobbies is
solving and designing challenging problems in Physics at the school and college level.
Sifting the Grain from the Chaff:The Concept Inventory as a Probe of Physics Understanding
Vijay A. SinghHomi Bhabha Centre for Science Education (TIFR)
V. N. Purav Marg, Mankhurd, Mumbai - 400088
A concept inventory in physics is a catalogue of carefully crafted questions on a given topic. We describe the
issues involved in the construction of a good inventory. The administration protocols, the evaluation of the
responses and more importantly the methods of gauging the validity and reliability of the inventory (“testing
the test”) are discussed. We list some well known inventories and discuss a few of them. The iterative and the
scientific nature of the enterprise is stressed. We point out the distinction between a concept inventory and a
high stakes multiple choice questions test. India with its huge and diverse student and teacher population has
an enormous potential to contribute to physics education research via inventory construction and evaluation
exercises.
-
[2]. A word about the nomenclature:
an inventory is sometimes called an
“instrument” and the questions are
called “items”.
Fig. 1 : The Key Elements of a Concept
Inventory
21
the relevant chapters of books are
good starting points. The experienced
teacher has little difficulty in chalking
out the content map. Figure 2
encapsulates some key steps in the
development of an inventory.
The first difficulty faced is identifying
the levels of knowledge being tested.
What are the common sense beliefs
in this domain? If the domain is
current electricity then one may
probe how current“flows”. Is it like
tap water? Do the electrons spill over
when the current carrying wire is
snipped midway? Is it really
electrons? And what are their
speeds? One draws up a taxonomy
of common sense misconceptions. A
good inventory is a test of knowledge
of the domain but more so a test of
The construction of a good concept
inventory broadly consists of
d e v e l o p m e n t ( s e e S e c . I I ) ,
administration, analysis (see Sec. III)
and dissemination. In Sec. IV we list
commonly known inventories and
discuss a few of them. It is an iterative
and scientific enterprise. It may take
several years and scientific though it
is, it is a labour of love. Figure 1
indicates this process.
An experienced teacher preferably
in a group or along with a graduate
student whose thesis entails PER
selects the domain of study. The
major concepts associated with the
domain are identified. The school or
university syllabi or the sections of
II. The Making of a Concept
Inventory
Box I: Items from inventories on electricity and magnetism and on
quantum mechanics. The correct alternatives are (c) and (d)
respectively.
1. The four separate figures to the left involve a cylindrical magnet
and a tiny light bulb connected to the ends of a loop of copper wire.
The plane of the wire loop is perpendicular to the reference axis. The
states of motion of the magnet and of the loop of wire are indicated in
the diagrams. Speed is v and CCW means counter clockwise. In
which of the figures will the light bulb glow?
(a) I, III, IV
(b)I, IV
(c) I, I I, IV
(d)IV
2. The figure on right shows a slanted potential energy function U(x), where U(x) is infinite if xL. Which
plot of the probability density |Y (x)|2 is most likely to
correspond to a stationary state in U(x)
-
22
belief systems. It may be useful to
read the pioneering work on the Force
Concept Inventory (FCI) and see how
it was developed [3]. An appreciation
o f t h e p r o c e s s o f c o g n i t i ve
transformation is useful [4,5]. This
process is elaborated in some of the
other articles of this issue.
Thus even an experienced teacher
cannot churn out an inventory on the
fly. She must consult other teachers
and paradoxical though it appears
take help from beginning students.
Typically a list of questions is drawn
up to which the students may
write their answers and thoughts.
They may voice their thinking
process and this is recorded. The
former is called “free response” and
the later the “think aloud” protocol.
This exercise is followed by an in-
depth interview of selected students.
This exercise helps in developing
items but more so in designing
alternative choices which in testing
terminology are call distractors.
Should the number of choices be
two (e.g. true/false?), three, or more?
The incorrect answers, in other words
the distractors should be sufficient in
number to demand thought.
However designing good distractors
is not easy. Moreover few adults can
recall more than seven unrelated
numbers in sequence [6] and
distractors are more complex entities.
The candidate must not spend too
long a time on each question, so
that also puts a cap on their number.
Statistical theories suggest three
choices per item [7]. Well known PER
inventories have four or five.
It goes without saying that one must
use simple words and phrases, be
clear and precise, avoid mathematical
jugglery and include a figure or
illustration where possible. One
must pay attention to caveats. Springs
mathematics) students only or is open
to algebra based(PCB: physics,
chemistry and biology) ones also?
Most inventories are concept based
and should hence be open to both
categories of students.
An inventory must be validated.
Content validation refers to the extent
to which the items cover the
knowledge base being tested and if
these are constructed in a sensible
manner. Content experts are
consulted. They may be asked to
r a n k a p p r o p r i a t e n e s s a n d
reasonableness on a 0-5 scale. At
times the test is also administered to
high ability advanced students. This
peer group must be satisfied that the
statements are unambiguous, the
figures clear and must then agree on
the answers. Content validity thus
means that an adequate sampling of
possible subtopics has been achieved.
It also requires value judgments as to
which subtopics or items to exclude
from the instrument. Other aspects of
validation are concurrent, predictive
and construct. We shall presently not
dwell on these but refer the reader to
specialized literature [7].
After the inventory is made and
validated it is tested on a group of
students. This is called the pilot test
and strings in classical mechanics are
usually light (massless) and strong
(unbreakable). The latter is also
inextensible. An item on elucidating
the trajectory of a charged particle in
an electric or magnetic field may
include the caveat to ignore gravity
and collisions with other particles.
Some important issues to be kept in
mind are: (i) Whether the test is time
bound? A long test will induce
fatigue. Usually a time is suggested,
say an hour (true of most inventories
mentioned in Table 1, but not rigidly
enforced). (ii) Do the students have to
respond to all the items? Since the
objective is to ferret out alternative
conceptions given the student's
present state of knowledge most
inventories are “forced choice” tests.
The students must attempt all items.
It is a good idea to have an additional
column for each item where the
student may indicate the confidence
level of her answer. (iii) Whether the
test is pre- or post-test? The FCI for
instance is administered both prior
to the instruction as well as after. The
CSEM on the other hand is useful
mainly as post test. (iv) Whether
the inventory at the higher secondary
school level is meant for calculus
based (PCM: physics, chemistry and
Fig. 2: The Development of the Concept Inventory (See Sec. II)
Free Response Content Map
Think Aloud
Concept Inventory
Content
validationExpert
Advanced
students
Pilot test
-
23
and there may be more than one. The
answers are analyzed. The next
section describes these methods of
analyses. A small group of students
are then interviewed and asked to
explain their responses. Armed with
these findings one goes back to the
drawing board. Items are added or
dropped and distractors modified.
The feedback loop in Fig.(2) indicates
this iterative process.
The inventories are “forced” choice
tests. But there is no penalty if the
student selects an incorrect choice
since a guiding principle of PER is
to unravel and understand a
student's alternative conceptions. We
describe some useful indices for the
evaluation of the inventory and the
candidates' understanding of the
topic.
If the inventory has N two-choices 2items (e.g. true/false), N four-choices 4items and N five-choices items then 5the lower estimate score L is given by
random guessing
.....(1)
III. Evaluation
while the upper score U is
.....(2)
and the range of scores would be U-
L. For example for 30 items evenly
divided between three- and five-
choices (N = N = 15), U= 30 and L = 8, 3 5and the spread would be = 22. For
comparison one would need 44
true/false questions to get the same
spread.
The difficulty level of the item is
defined as the ratio of the correct
responses to the total responses. An
item in which this is very low ( 1)
must ring an alarm bell and the one
with value unity is uninteresting. An
ideal difficulty level is half between
the chance score and unity, e.g. 0.6
for a five-choices item. Another
useful indicator is the index of
discrimination D for an item. One r takes the top 27% of the scorers in the
inventory (N) and finds the number of
correct responses (C ) to the item and U similarly takes the bottom 27% of
the scorers in the inventory (again
N) and obtains the number of correct
responses (C ) to the item. Then L
D =(C -C )/N.r U L
#0.
I f D i s 1 we h a ve p e r f e c t r
discrimination, if zero, we have an
item which tells us little and if D < 0 we r
definitely need to revise the item. The
average on the inventory and the
standard deviation are obvious
indicators known to practicing
teachers and we do not dwell on them
except to point out that the expected
average is halfway between the lower
estimate L and the upper score U and
the standard deviation should be
around one-sixth of the spread
discussed above. Thus for the 30
items inventory of evenly divided
three- and five-choices items
mentioned above the expected mean
would be 19 and 3.7 would be the
standard deviation [7]. Approximately
68% of the results should lie between
15.3 and 22.7 (19 ±3.7).
The above mentioned indices are
well known. However a detailed
picture of the efficiency of the
teaching-learning process and of the
quality of the test itself can be
obtained by examining item response
curves (IRCs). IRCs, though well
known in psychometry have only 2 4 2
2 4 5=++
N N NL
2 4 5=++U N N N
Box II: Illustration of item response curves. The example is adapted from FCI [3]. The correct alternative is (b).
See Sec. III for a discussion.
A ball is shot at high speed into a horizontal frictionless channel at p and exits at r. Which path in the figure on the left
below would the ball most closely follow after it exits the channel at r and moves across the frictionless table top?
-
24
recently been employed in PER [8-
10]. In IRC we display the percentage
or fraction of students P (q) selecting i
a given answer choice i vis-a-vis their
ability è. Box II illustrates this
exercise with an item from the Force
Concept Inventory (FCI) [3] with the
number of choices pruned from five
to four. A subset of ten FCI questions
focusing on the notion of inertia and
Newton's first law was administered
as a post-test to a group of 70 students
in Patna, Bihar. The students' total
score on the test was taken as a
measure of ability è. We note in
passing that there are other (and
better) measures of ability but often
the total score in the test is a
convenient measure [11]. The correct
choice (i = b) that the ball moving in
the circular channel exits tangentially
was overwhelmingly selected by
students in the ability range 80%
(8/10). It can be modeled by the
logistic response function
.....(3)
where s is the fraction of students
with low ability who will respond
correctly to the item (here s= 0), m is
the ability level at the inflection
point, and w is the discrimination
parameter. Students with ability
level m or higher are likely to pick
the correct choice. A small w
means that the IRC is almost a
step function and the item sharply
segregates students with the
correct conception from those
holding alternative conceptions. In
the present case the values of m, w
and s are approximately 7.8, 1.9 and 0.
Note that for convenience we have
smoothened the curves by a
polynomial fit.
$
Box I could be part of an inventory
where one has an item asking if an
emf is induced in a coil if a current
carrying coil is moved towards it. Or
for Box II one could ask for the
trajectory of stone being whirled
around in a horizontal circle just
after the string snaps. One could see
if the subject's answers to the
questions are consistent. In split
halves reliability one looks for
correlations between scores on two
halves of the same test. However the
implementation of the above
criteria is fraught with pitfalls.
What constitutes similar tests? How
do you account for learning between
tests in the test- retest reliability
criterion? Perhaps the simplest to
i m p l e m e n t i s t h e i n t e r n a l
consistency test. One could have
content experts vouchsafe that the
two items are similar. In addition
there is the Kuder-Richardson [12]
reliability prescription r which we
quote without proof.
.....(4)
where N is the number of items, p is i
the fraction of students obtaining the
correct answer and s the variance.
The formula can be implemented in a
straightforward fashion.
One way to make the test more
reliable is to add more items. If the
test is lengthened by a factor f the
new reliability [7] is
.....(5)
If r = 0.5 and the inventory is doubled
then the new reliability is 0.67. There
are other indicators of reliability
such as the Cronbach alpha
coefficient and principal component
analysis but we refer the reader to an
expert manual [7,13].
It is equally important to study the
distractors. The IRC for the choice ''a''
shows that even medium ability
students are prey to the ''trainer''
misconception. The ball moves in a
circle on exit since it has been
''trained'' to do so in its journey
across the circular channel from p to r.
Students selecting choice ''c'' are
perhaps prey to over-learning, i.e.,
there is a ''centrifugal force'' acting on
a circularly moving object and
continues to operate even when the
circular constraint is removed. The
low, flat IRC for choice “d” suggests
that it is inappropriate as a distractor
and needs to be dropped or replaced.
By displaying the rich texture
associated with P(q) IRCs throw light on the test itself.
Physicists are familiar with the
complement of Eq. (3) namely (1-
P(q)) (with s = 0) which is akin to the
Fermi funct ion in s tat is t ica l
mechanics [10]. Just as we have a
logistic fit for the correct choice, one
also has a model function for the
distractor [9,10]. One could fit the
IRCs to these model functions but the
point is that even a cursory visual
inspection suffices. As the saying
goes, 'a picture is worth a thousand
words'.
Perhaps the most important question
to ask is if the inventory is reliable.
An instrument is reliable if the
measurement error is small. In other
words are the results of the test
repeatable? This issue is addressed in
a number of innovative ways. In
parallel forms reliability the subjects
take two similar tests on the same
topic. Test-retest measures answer
stability on repeated administrations
of the same test. Internal consistency
looks at correlations between answers
to similar items in the test. For
example the induced emf question in
21
11
1 s=
é ù-ê ú= -ê ú-ë ûå
( )Ni i
i
p pNr
N
1=
+-( )new
frr
fr r
1
1
qq
-= +
æ ö-ç ÷+-ç ÷è ø
( )( )
expb
sP s
m
w
-
25
IV. Examples of Concept
Inventories
Table 1 lists some commonly known
concept inventories. We shall briefly
discuss a few of these.
The first and perhaps the most
famous is the Force Concept
Inventory(FCI) [3]. The methodology
employed in its construction and
administration has served as a
benchmark for later works in PER.
The early version of FCI was called
the physics diagnostic instrument
[14]. The authors found that (i)
common sense beliefs usually
conflict with Newtonian mechanics
and (ii)conventional instruction does
little to change these beliefs. The
instrument was administered to
over 1000 students and open ended
responses were sought in the early
versions. These helped in developing
distractors. It was found that the pre-
test and post-test scores were similar.
A group of students were also
administered the instrument midway
through the instruction period. It was
concluded that the conceptual gains,
if any, occur early in the course.
Reliability was established by
calculating statistical indices as well
as by interviewing a smaller group of
students after the test. The students
repeated the same answers as in
the test suggesting that “the student’s
answers reflected stable beliefs
rather than tentative, random or
flippant responses”. Validity studies
were also carried out to the authors'
satisfaction.
The work on the diagnost ic
instrument laid the foundations of the
FCI [3] . The 29 items in the FCI
are divided into six newtonian
categories: Kinematics, First Law,
S e c o n d L a w , T h i r d L a w ,
Superposition Principle and Kinds
of Forces. Four items fall into multiple
The FCI was followed by the
Mechanics Baseline Test (MBT) [15].
It too is largely conceptual but about
one third of the problems require
simple calculations (e.g. calculating
the tension in a rope). It is
recommended as a post-test but could
also be used as a placement test for
advanced courses. The FCI has
attracted a great deal of attention as
well as some criticism. Huffman and
Heller in a study found that for
categories. Very interestingly the
authors divided these 29 items
into six new categories based on
t h e d i s t r a c t o r s a n d s t u d e n t
misconceptions : Kinematics, Impetus,
Active Force, Action/ Reaction Pairs,
Concatenation of Influences and
Other Influences of motion. Detailed
discussion on each item of the
inventory is available. An example
from the first law with analysis was
presented in Box II and Sec. III.
TABLE 1 : List of some Concept Inventories (CI) and similar
instruments with abbreviations and authors
AbbreviationInstrument Authors (year)
Physics diagnostic instrument
Force Concept Inventory
Mechanics Baseline Test
Test of Understanding Graphs
in Kinematics
Force and Motion Conceptual
Evaluation
Lawson's Classroom Test of
Scientific Reasoning
Thermodynamics Concept
Inventory
Conceptual Survey of
Electricity and magnetism
Electronics Concept Inventory
Determining and Interpreting
Resistive Electric Circuit
Concepts Test
Rotational and Rolling Motion
Geosciences Concept Inventory
Brief Electricity and
Magnetism Assessment
Circuits Concept Inventory
Quantum Mechanics
Concept Survey
Fermi Energy
Friction in Rolling Bodies
Halloun and Hestenes (1985)[14]
Hestenes et al. (1992)[3]
Hestenes and Wells (1992)[15]
Beichner (1994)[21]
Thornton and Sokoloff (1998)[23]
Lawson (1978)[24]
Midkiff et. Al. (2001) [25]
Maloney et al.(2001)[1]
Simoni et al. (2004)[26]
Engelhardt and Beichner
(2004)[27]
Rimoldini and Singh (2005)[22]
Libarkin and Anderson
(2005)[28]
Ding et al. (2006)[29]
Helgeland and Rancour
(2008) [30]
McKagan (2010)[2]
Sharma and Ahluwalia
(2011)[31,32]
Singh and Pathak (2007)[9]
- - -
FCI
MBT
TUG-K
FMCE
LCTSR
- - -
CSEM
ECI
DIRECT
RRB
GCI
BEMA
CCI
QMCS
- - -
FRBI
-
26
high school students a significant
factor involved questions on the third
law and on the kinds of forces. A
student may be able to analyze
correctly the forces on a ball after it is
struck by the hockey stick but not the
forces on a rocket which is given a
thrust in space. The responses are
dependent on the context. They also
carried out studies on college
students. Their conclusion was that
questions on the FCI are only loosely
related to each other and do not
necessarily measure the six conceptual
dimensions as proposed by the
FCI authors [16,17]. Rebello and
Zollman administered open ended
versions of the FCI (i.e. without the
choices) and came up with a revised
set of distractors [18]. These criticisms
not with standing FCI has been
used to provide a comprehensive
comparison between interactive
learning (IE) and traditional teaching
by Hake [19]. It was found that all 41
IE courses had a higher gain than 14
traditional courses. In another study
on electricity and magnetism (CSEM)
[1] and the other on rotation and
rolling bodies (RRB) [22]. The CSEM
with 32 five-choice items purports to
cover all areas except capacitors and
current electricity. It is useful only as a
post-test and the average is about 50%.
An example from this was presented
in Box I. This item was attempted
correctly by barely 25% of the
students surveyed. Calculus based
students performed better than
algebra based ones. The RRB with 30
five-choice items spans eight
concepts including moment of
inertia, rotational kinetic energy,
angular velocity and acceleration,
torque, rolling, friction and sliding.
An interesting aspect about its
development was the use of
Piagetian style demonstration based
tasks. Students were asked a set
o f q u e s t i o n s a b o u t l e c t u r e –
demonstration based tasks. After they
had made their predictions they were
asked to perform the demonstrations
and reconcile their predictions with
by Crouch and Mazur it was found
that students involved in small
group discussions registered higher
normalized gains (0.49 to 0.74) as
opposed to those taught traditionally
(gain of 0.25 to 0.40) [20].
A related study involved testing
student's understanding of graphs in
kinematics (TUG-K) [21]. Twenty-one
five-choice items with graphs of
position, velocity and acceleration in
one dimensional kinematics were
constructed. The post-test average on
a survey of 524 students was 8.5
with 4.1 as standard deviation.
Students mistook graphs for
pictures, miscalculated slopes and
misinterpreted areas. We draw
attention to this study for the
meticulous way in which the data was
analyzed and its focus on a narrow
well defined and important domain.
A comfort level with graphs is a sine
qua non to understanding advanced
physics.
In contrast to FCI and TUG-K there
are two broad survey instruments, one
Box III: Examples from Indian inventories. The first is on Fermi energy [31] and the second on friction in rolling
bodies [9,10]. The correct alternatives are (d)and (b) respectively.
1. Which of the following is not a characteristic of a system of non-interacting electron in a metal at absolute zero:
a. the occupation index up to Fermi level is 1.
b. all the energy levels below the Fermi level are filled.
c. Fermi energy level is the highest energy level filled.
d. maximum number of electrons are in the Fermi energy level.
2. The following are the directions of frictional force on the rigid sphere rolling without slipping at a given point
on the curved bowl:
I. Zero
II. Tangential to the surface and opposite to the direction of motion.
III. Tangential to the surface and in the direction of
motion.
The directions of the frictional forces at points P , P and P are 1 2 3respectively
(a) II,II,II
(b) II,I,III
(c) III,II,II
(d) I,I,I
-
27
the observations using a “think aloud”
protocol. The authors found that
greater mathematical sophistication
did not translate into better
understanding. Also some of the
difficulties could be traced back to
similar difficulties in linear motion. In
our opinion both CSEM and RRB
attempt to cover a vast area and
maybe good as quick survey
instruments.
The instrument to gauge students'
u n d e r s t a n d i n g o f c u r r e n t
electricity(DIRECT) yielded a mean
result of 48% [27]. The studies on
quantum mechanics are few [2,33] and
a great deal needs to be done in
this area where we carry an enormous
baggage of misconceptions. The
Geoscience Concept Inventory (GCI)
was administered to over a 1000
students both as a pre-test and post-
test [28]. Instead of reporting item
statist ics (e .g. diff iculty and
discrimination indices) the authors
employed a Rasch model and fit the
scores on an adjusted scale of 0 to 100.
The authors have a small section on
“entrenchment of ideas” which
elsewhere we have referred to as
“persistent alternative conceptions”.
For example prior to instruction 78%
of the students believed that the
earth's age can be determined by
“fossils, rock layers and carbon” as
opposed to uranium and lead content
of rocks; this misconception is held by
72%even after the instruction. We
wonder how many physics students
and even faculty would share the
same misconception.
Finally we draw attention to two
Indian inventories. Two examples are
presented in Box III. Sharma and
Ahluwalia have a small inventory on
the notion of [31,32] Fermi energy
These authors began developing an
inventory for solid state physics but
changed track after preliminary
studies indicated that the difficulty
lay e l sewhere : in a lack of
u n d e r s t a n d i n g o f q u a n t u m
mechanics and statistical mechanics.
Consequently they have been
carrying out misconception studies in
all three areas. The Friction in Rolling
Bodies inventory (FRBI) was
developed to probe a specific narrow
domain in rotational dynamics [9].
The authors felt that the RRB (see
above) has too broad a canvas and
should be split into several focused
studies. An interesting aspect is that
the inventory was translated into
Hindi and Gujarati and administered
to over 1200 students across India.
As mentioned above analysis was
done using item response curves [10].
Both inventories are available from
the authors on request.
We have defined two indices for item
discrimination in Sec. III, D and w. r
The former is a gross index while the
V. Discussion
Box IV: MCQs from high stakes entrance tests to engineering courses in
India. Taken as a whole these would not qualify as inventory items (see
discussion in Sec. IV). The correct alternatives are (d)and (c) respectively.
1. A thin semicircular conducting ring of radius R is falling with its plane
vertical in a horizontal magnetic induction B. At the position MNQ the speed
of the ring is V, and the potential difference developed across the ring is
a. Zero2b. BVpR /2 and M is at a higher potential
c. pRBV and Q is at a higher potential
d. 2RBVand Q is at a higher potential
2. The electric potential between a proton and an electron is given by where r is a constant. 0 Assuming Bohr's model to be applicable, what is the variation of r with the principal quantum number n ?
n
a.
b.
c.
d.
V = V In ( )0 r
r0
rn %1n
rn %1
2n
rn %2n
rn %n
-
28
later, based on the item response
curve, defines the “gray” area
separating those with the correct
response from those with the
incorrect responses. It is analogous to
the temperature in the Fermi
function in statistical mechanics [10].
One can expect that an item with D r
close to unity will have small w. But a
rigorous connection between the two
indices cannot be established. We
prefer w. In the event that an IRC
analysis is not being carried out D r
would serve as an adequate measure.
Physicists can not only aid in the
construction of good inventories but
also in devising novel tools of analysis.
We present a couple of examples here.
IRCs can be used to construct an
entropy associated with learning
efficiency[10]. Assume that we
conduct a large number of surveys
and the IRCs of each are similar. In
other words we have an ensemble of
student groups. We may then be
justified in associating the fraction
P (q) (see Box II) with probability. We i
define an entropy based performance
index akin to Shannon entropy
.....(6)
where the summation is over the four
choices (a,b,c,d). For example in Box II
for q = 5.5, S is p
If we perform this calculation at each
ability level we would obtain over
the entire ability range. This is also
plotted in Box II with a dashed line.
The entropy index ( ) has an
appealing quality. It is normally (but
not always) large for low ability and
small for high ability. For low ability
S (q= 5.0)= - (0.55log 0.55 + 0.10log p 4 4
0.10 + 0.30 log 0.30 + 0.05log 0.05) = 4 4
0.84
S p
Sp
students P(q) values will be close to
0.25, or in other words, (
Hence, will be the maximum
(i.e. 1) for low ability. On the other
hand for the maximum ability level, S pwill go to zero since the correct
selection is made by all the students at
this level e.g. (
The entropy index defined above
seems to suggest a connection with
statistical mechanics. High ability
implies low entropy and vice versa.
An analogy between the Ising model
in statistical mechanics and the
teaching - learning process was
suggested and developed by
Bordogna and Albano [34,35]. How
useful this analogy is remains an open
question.
In the context of their work on TUG-
K and DIRECT (see Table I) Beichner
and coworkers report gender
comparisons [21,27]. They find that
the average male result is better than
female and females tend to have more
misconceptions. They also find that
males display greater “interview
confidence” than females. It appears
that the statistics involved were
sloppy. This is an area where
systematic work could be undertaken.
Most school and college teachers in
India are fully engaged in teaching
classes, grading assignments and
tests and running laboratories. They
may be encouraged to use the physics
inventories as quality assessment
tools. In this connection we note that
Kim and Pak have used the MBT (see
Table I) on students who are preparing
for university entrance exams and find
that solving a copious number of
problems has little bearing on
conceptual understanding [36].
India also has a good pool of expert
teachers at the high and higher
secondary school level. Some of them
P = P = P = Pa b c d =0.25). Sp
P =1, P = P = P =0). a b c d
VI. The Indian Outlook
are engaged in setting multiple choice
questions. This has become necessary
given the shift to objective type
questions in several high stakes
examinations for entry to professional
courses and to leading science
institutes. We caution that most MCQs
may not qualify as inventory items.
Box IV depicts two MCQs taken from
fiercely competitive entrance tests.
These tests are taken by higher
secondary (pre-college) students.
Item 1 tests multiple concepts
namely (i) motional emf is related to
speed; (ii) it is related to 2R and not
ðR, in other words the fact that the
wire is bent in the form of a
semicircular ring is irrelevant; (iii)
relative magnitudes of the potential
at the two ends of the ring. Recall
that Item 1 (Box I) taken from CSEM
also tests a similar area but is centred
around a single theme, namely, flux
change. Item 2 from Box IV asks one
to accept Bohr's model. But which
aspects? Angular momentum
quantization is central to the Bohr
model but a circular orbit for the
electron is not. Presumably we
accept both aspects, apply Newton's
second law, perform an algebraic
manipulation or two and arrive at the
answer. In contrast an item in an
inventory is as far as possible centered
on a single concept. Each item in an
inventory undergoes rigorous
scrutiny as described in Sec. II: content
validation by experts, pilot tests,
experimentation with distractors etc.
While we may find these two MCQs
intellectually satisfying to solve, they
do not qualify as candidates for an
inventory by a long shot.
Nevertheless our expertise should be
harnessed to construct a large
number of inventories on an array of
4q q q=
=-å( ) ( ) log ( )d
p i iS P P
-
29
topics in elementary, intermediate
and quantum physics. As discussed
in Sec. III there have been only
scattered attempts in this
direction. Perhaps a national
program should be launched and a
large number of workshops
conducted where experts in PER
could orient our teachers on the
methodologies , nuances and
pitfalls of making an inventory.
Parallel forms reliability implies
assessing a given inventory by
correlating it with similar tests. This
implies a judgment of what
constitutes other similar tests. Some
of the best inventories suffer from
the lacuna that there are no carefully
crafted similar tests. With a large
scale inventory making exercise one
can redress this issue of reliability.
One must also realize that India has
the advantage of not just a pool of
good teachers but of a far vaster
pool of students interested in science
coming from diverse linguistic,
cu l tura l and soc io -economic
backgrounds. It is imperative to
unders tand the i r a l ternat ive
conceptions and take remedial
steps. An inventory is an ideal tool.
The inventories could be translated
into regional languages. So far there
has been a lone attempt in this
direction [9,10]. The data set would
be large and the statistics would be
far more accurate than in the west.
The feedbacks and analyses would
be most informative making India a
vibrant hub of PER. The important
thing to realize is that we will be
honouring and gaining from extant
talent, both teachers and students.
Few activities are as empowering as
research and the fact that teachers
from remote corners of our land can
to make but more to grade. An
inventory on the other hand takes a
long time to make and less to
grade. In the Indian environment
where a large number of students
have to be evaluated the inventory
should be an indispensable tool.
This work was supported by the
Science Olympiad Programme
and the National Initiative on
Undergraduate Science (NIUS)
undertaken by the Homi Bhabha
Centre for Science Education -
Tata Institute of Fundamental
Research (HBCSE-TIFR), Mumbai,
India. I thank Profs. Arvind Kumar
and J. Ramadas for a critical
reading of the manuscript. I
thank Ms. Sarita Yadav, K. K.
Mashood and in particular
Praveen Pathak for many useful
discussions.
[1] D. P. Maloney, T. L. O'Kuma, C. J.
Hieggelke and A. V. Heuvelen,
“Surveying students' conceptual
knowledge of electricity and
magnetism”, Am. J. Phys., 69, S12-
S23 (2000).
[2] S . M c K a g a n , “ Q u a n t u m
mechanics conceptual survey”,
(2010), http://www.colorado.
e d u / p h y s i c s / e d u c a t i o n
Issues/QMCS/.
[3] D. Hestenes, M. Wells and G.
Swackhamer, “Force concept
inventory”, The Physics Teacher,
30, 141-158 (1992).
[4] B. S . Bloom, Taxonomy o f
Educational Objectives Handbook I:
Cognitive Domain, New York:
McKay, 1956.
[5] J. L. Mintzes, J. H. Wandersee and
J. D. Novan, eds., Assessing
S c i e n c e U n d e r s t a n d i n g : A
Acknowledgement
References
carry out collaborative research is a
prescription to invigorate science in
the nation.
Finally we draw attention to a
massive study by Bao et al. who
compared performances of Chinese
and American students on the FCI,
BEMA and LCTSR [37]. While the
performances on the scientific
reasoning test (LCTSR) were
comparable, in FCI and BEMA the
American students lagged behind
t h e C h i n e s e b y a b o u t f o r t y
percentage points! In FCI for
example the Chinese mean (523
students) was 85% while the
American mean (2681 students) was
only 49%. The authors concluded
that the large number of middle
and high school physics exercises
undertaken by Chinese students
was responsible for their superior
performance. There is a need to carry
out such surveys in our nation.
Each item of a well constructed
inventory thus acts as a sieve filtering
the correct concept from a host of
alternative concepts. To revert to the
title both the “grain” and the “chaff”
are important and often the “chaff'”
(alternative concept) provides richer
insights. On the other hand the
inventory which employs MCQs
maybe criticized as being a fast food
version of the evaluation process. A
free response or subjective question
or an essay allows more flexibility.
Substantive arguments can be made
for both modes. The counter
examples in Box IV are better suited
as subjective questions wherein the
student may at least get partial
credit. A “conservation of labour”
law operates in this domain.
Subjective questions take less time
VII. Conclusion
-
30
Human Constructivist View ,
Massachusetts, USA,: Elsevier
Academic Press, 1999.
[6] G. A. Miller, “The magical
number seven plus or minus two:
Some limits on our capacity for
i n f o r m a t i o n p r o c e s s i n g ” ,
Psychological Rev., 63, 81 (1956).
[7] R. L. Ebel, Essentials of Educational
Measurement, Englewoods Cliffs,
NJ: prentice Hall, 1972.
[8] G. A. Morris, L. Branum-Martin,
N. Harshman, S. D. Baker, E.
Mazur, S. Dutta, T. Mzoughi and
V. McCauley, “Testing the test:
Item response curves and test
quality”, Am. J. Phys., 74, 449-453
(2006).
[9] V. A. Singh and P. Pathak, “The
role of friction in rolling bodies:
Testing students' conceptions,
evaluating educational systems
and testing the test”, Proceedings
of the International Conference
to review Research in Science,
Technology and Mathematics
E d u c a t i o n ( E p i s t e m e 2 )
Conference, Mumbai, pp. 114-
115, McMillan India, 12-15 Feb.,
2007.
[10]V. A. Singh, P. Pathak and P.
Pandey, “An Entropic Measure
for the Teach ing-Learn ing
Process”,Physica-A, 388, 4453-
4458 (2009).
[11] One could choose for ability, the
final grades of the students in the
previous class or an average of
scores in previous physics tests.
[ 1 2 ] G . F. K u d e r a n d M . W.
Richardson, “The theory of the
estimation of test reliability”,
Psychometrica, 2, 151 (1937).
[13]J. C. Nunnally, Psychometric
Theory, New York: McGraw Hill,
1967.
[14] I. A. Halloun and D. Hestenes,
“The initial knowledge of college
[24]A. E. Lawson, “The development
and validation of a class room test
of formal reasoning”, J. Res. Sci.
Teach., 15, 11-24 (1978).
[25] K. C. Midkiff, T. Litzinger and D.
E v a n s , “ D e v e l o p m e n t o f
engineering thermodynamics
concept inventory instruments”,
i n F ro n t i e r s i n E d u c a t i o n stConference, 2001. 31 Annual,
vol. 2 , pp. F2A-F23 , IEEE,
2001.
[26]M. F. Simoni, M. E. Herniter and
B. A. Ferguson, “Concepts to
q u e s t i o n s : C r e a t i n g a n
electronics concept inventory
exam”, in Proceedings, ASEE
Annual Conference, 2004.
[27] P. Engelhardt and R. Beichner,
“Students' understanding of
direct current resistive electrical
circuits”, Am. J. Phys., 72, 98-115
(2004).
[28] J. C. Libarkin and S. W.
Anderson, “Assessment of
l e a r n i n g i n e n t r y - l e v e l
geoscience courses: results
from the geoscience concept
inventory”, Journal of Geoscience
Education, 53(4), 394-401 (2005).
[29] L. Ding, R. Chabay, B. Sherwood
and R. Beichner, “Evaluating an
electr ic i ty and magnetism
assessment tool: Brief electricity
and magnetism assessment”,
Phys. Rev. ST Phys. Educ. Res., 2,
010105 (2006).
[30] B. Helgeland and D. Rancour,
“Circuits concept inventory”,
http://www.foundationcoalition.
org/home/keycomponents/conce
pt/circuits.html, 2008.
[31] S. Sharma and P. K. Ahluwalia,
“ D i a g o n i s i n g a l t e r n a t i v e
conception of Fermi energy
a m o n g u n d e r g r a d u a t e
students”, Lat. Am. J. Phys. Edu.,
to appear, 2011.
physics students”, Am. J. Phys.,
53, 1043-1055 (1985).
[15] D. Hestenes and M. Wells , “A
mechanics baseline test”, The
Physics Teacher, 30, (1992).
[16] D. Huffman and P. Heller, “What
does the force concept inventory
actually measure?”, The Physics
Teacher, 33, 138-143(1995).
[17] P. Heller and D. Huffman,
“Interpreting the force concept
inventory: A reply to Hestenes
and Hallouin”, The Physics
Teacher, 33, 503-511(1995).
[18] N. S. Rebello and D. A. Zollman,
“The effect of distractors on
student performance in the force
concept inventory”, Am. J. Phys.,
72, 116-125 (2004).
[19]R. R. Hake, “Interactive-
engagement versus traditional
methods: A six thousand student
survey of mechanics test data for
introductory physics courses”,
Am. J. Phys., 66, 64-74 (1998).
[20] C. Crouch and E. Mazur, “Peer
i n s t r u c t i o n : Te n ye a r s o f
experience and results”, Am. J.
Phys., 69, 970-977 (2001).
[21] R. J. Beichner, “Testing student
interpretation of kinematics
graphs”, Am. J. Phys., 62, 750-762
(1994).
[22]L. G. Rimoldini and C. Singh,
“Student understanding of
rotational and rolling motion
concepts”, Physical Review Special
Topics – Physics Educat ion
Research, 1, 010102 (2005).
[23]R. K. Thornton and D. J. Sokoloff,
“Assessing student learning of
Newton's laws: The force and
motion conceptual evaluation of
active learning laboratory and
lecture curricula”, Am. J. Phys.,
66, 338-351 (1998).
-
[32] S. Sharma, O. K. S. Sastri and P. K.
A h l u w a l i a , “ D e s i g n o f
instructional object ives of
undergraduate solid state physics
course: a first step to physics
education research”, in AIP Conf.
Proc . , vol . 1263, pp. 171-
174(2010).
[ 3 3 ] C . S i n g h , “ S t u d e n t
understanding of quantum
mechanics”, Am. J. Phys., 69, 885-
895 (2001).
[36]E. Kim and S. Pak, “Students do
n o t o ve r c o m e c o n c e p t u a l
difficulties after solving 1000
traditional problems”, Am. J.
Phys.,70(7), 759-765 (2000).
[37]L. Bao, T. Cai, K. Koenig, K. Fang,
J. Han, J. Wang, Q. Liu, L. ding, L.
Cui, Y. Luo, Y. Wang, L. Li and N.
Wu, “Learning and Scientific
reasoning”, Science, 323, 586
(2009).
[34] C. M. Bordogna and E. V. Albano,
“Stimulation of social processes:
Application to social learning”,
PhysicaA, 329, 281-286 (2003).
[35] C. M. Bordogna and E. V. Albano,
“Theoretical description of
teaching-learning processes: A
multidisciplinary approach”,
P h y s . R e v. L e t t . , v o l . 8 7
(September, 2001).
31