thesis 7_16

8/18/2019 thesis 7_16

1/142

INFORMATION TO USERS

This was produced from a copy of a document sent to us for microfilming. While the

most advanced technological means to photograph and reproduce this document

have been used, the quality is heavily depe nd ent u pon the quality of the material

submit ted.

The following explan ation of techniq ues is provided to help you und erstan d

markings or notat ions which may appear on this rep roduct ion.

1. The sign or tar ge t for pages app arently lacking from the doc um ent

pho togra phe d is Missing Page (s) . If i t was possible to obtain the missing

page(s) or section, the y are spliced in to the film along w ith adjacent pag es.

This may have necessitated cutting through an image and duplicating

adjacent pages to assure you of complete continuity.

2. When an image on the fi lm is obliterated with a round black mark i t is an

indication that the fi lm inspector noticed either blurred copy because of

movement dur ing exposure, or dupl icate copy. Unless we meant to delete

copyrighted materials that should not have been fi lmed, you will f ind a

good image of the page in the adjacent frame.

3 .

When a map, drawing or chart , etc. , is part of the material being photo

graphed the photog rapher has fol lowed a def ini te method in sect ioning

the material . I t is customary to begin fi lming at the upper left hand corner

of a large sheet and to continue from left to right in equal sections with

small overlaps. If necessary, sectioning is continued again—beginning

below the first row and continuing on until complete.

4 . Fo r any i l lustrations tha t ca nn ot be reproduc ed satisfactorily by

xerography, photographic prints can be purchased at additional cost and

t ipped into your xerograp hic cop y. Requests can be made to our

Dissertations Customer Services Department.

5. Some pages in any d oc um en t may have indistinct prin t . In all cases we

have fi lmed the best available copy.

University

Microfilms

International

3 0 0 N Z E E B R O A D . A N N A R B O R , M l 4 8 1 0 6

1 8 B E D F O R D R O W , L O N D O N W C 1 R 4 E J , E N G L A N D

8/18/2019 thesis 7_16

2/142

8012402

OEHMKE,

T HERESA MARIA

THE DEVELOPMENT

AN D

VALIDATION

OF A

TESTING INSTRUMENT

TO MEASURE PROBLEM SOLVING SKILL S

O F

CHILDREN

IN

GRADES

FIVE THROUGH EIGHT

The University

of

Iowa

PH .D. 1979

University

Microf i lms

I n t e r n c i t I O

PI

3 .1 300 N. Zeeb

Road, Ann

Arbor, MI 48106 18 Bedford Row, London

WC1R

4EJ, England

Copyright 1979

by

OEHMKE, THERESA MARIA

All Rights Reserved

8/18/2019 thesis 7_16

3/142

THE DEVELOPMENT AND VALIDATION OF A

TESTIN G INSTRUMENT TO MEASURE PROBLEM SOLVING SKILLS

OF CHILDREN IN GRADES FIVE THROUGH EIGHT

by

Theresa Maria Oehmke

A thes is submitted in partial fulfillment of the

requirements for the degree of Doctor of Philosophy

in Education in the Graduate C ollege of

The University of Iowa

December, 1979

Thes is supervisor: Professor Harold L. Schoen

8/18/2019 thesis 7_16

4/142

Graduate College

The University of Iowa

Iowa City, Iowa

CERTIFICATE OF APPROV AL

PH.D.

THESIS

This is to certify that the Ph.D. thesis of

Theresa Maria Oehmke

has bee n approved by t he E xamining Committee

for th e thesis requirement for the Doctor of

Philosophy degree in Education

at the December, 1979 graduation.

Member

3 supervisor

Thesis committee: ffy

A

-

yi

~^-0L^&(• r^oci^-e^z^

Thesis supervisor

Member

Member

8/18/2019 thesis 7_16

5/142

DEDICATION

To Bob and Jim

i i

8/18/2019 thesis 7_16

6/142

ACKNOWLEDGEMENT

For their help in the preparation of this t hes is , I owe an expres

sion of appreciation to a large number of persons ; only a few of whom I

shall menti on by na me.

To Professor Harold Schoen I extend my thanks for all the guidance,

motiv ation, direction and assistance he gave me from the initiat ion to

the completion of th is study.

Special thanks are due George Immerzeel, Joan Due a, Ea rl Ockenga

and John Tar r and the personnel at th e Malcolm Price Laboratory School

for the ir encouragement and support during several phases of this i nve s

tigation.

A debt of gratitude to Professors H. D. Hoover and A. N . Hieronymus

is acknowledged for their assistance on some of the statistical details

of the te sti ng procedure and for providing me wi th the opportunity to

collect some of the pertinent data in this study.

I would like to express my appreciation to William M. Smith for the

use of his expertis e on t he technical aspects of the use of the computer

on data processi ng.

Thanks is given to Ada Burns for her amazing ability to type what I

thought I ha d writte n.

Finally, I would like to thank my husband B ob, and son Jim, and all

my friends w ho we re always ready wi th an encouraging word.

iii

8/18/2019 thesis 7_16

7/142

TABLE OF CONTENTS

Page

LIS T OF TAB LE S vi

LIST OF FIGUR ES vii

CHAPTER

I. INT ROD UCT ION 1

Purpose 3

Ov er vi ew of the Study 3

IPSP Test Development h

I P S P T e s t V a l i da t i on . . . . . 5

Reliability 6

Conte nt Validit y 8

Concurre nt Validity 8

Dis crimin ant Validity 9

Oper at iona l Defin iti ons Used in This Study 10

O v e r v i e w 1 1

I I . REV IEW OF TH E LITE RATU RE 12

Mode ls of Pr oblem Solvin g 13

Te st in g th e Abi lit y to Solve Problems 19

Summary 22

I I I .

D EV E L O P M E N T A N D P R O C E D U R E S 23

Pur pose of the Iowa Problem Solvin g Pr oje ct 23

De ve lopment of th e IPS P Test 2*1

De si gn and Dev elopment of Inte rvie w Pr ocedure s 28

The Q uant ificat ion Scale 30

Development 30

Use of the Scale

3b

Th e Pi lot In te rv ie w Study 1+0

Procedures ^0

Pilot Sample

k l

Interview Problems

k2

The Inter views ^3

The Final Interview Study kk

ITBS an d the IPSP Test ^5

IPSP Subtest Discrimination

h9

iv

8/18/2019 thesis 7_16

8/142

CHAPTER Page

IV. DATA AND RESULTS 50

IPSP Test and Interview Data 50

Pilot Interview Study 51

Final Interview Study 53

ITBS and the IPSP Test 55

IPSP Test Administration 55

ITBS Subtests 60

Correlations Between IPSP Subtests and ITBS

Subtests 60

IPSP Subtest Discrimination 62

V. ANALYSES AND IMPLICATIONS 72

Summary and Conclusions 72

Phase 1: The Final Interview Study 73

Phase 2: ITBS and the IPSP Test 73

Phase 3: IPSP Subtest Discrimination 7^

Limitations 7^

Classroom Implications . 76

Implications for Resea rch 8l

IPSP Test 82

APPENDIX A. DESCRIPTION OF THE IOWA PROBLEM SOLVING PROJECT . . . . 85

APPENDIX B. PHASES OF IPSP TEST DEVELOPMENT AND SUMMARY OF TEST

FORMS 9^

APPENDIX C. INTERVIEW PROBLEMS USED IN THE PILOT STUDY 98

APPENDIX D. IOWA TEST OF BASIC SKILLS RELIABILITY ANAL YSIS

NATIONAL REPRESENTATIVE SAMPLE 106

APPENDIX E. FORMULA FOR ITBS CORRELATIONS CORRECTED FOR

ATTENUATION 10 8

APPENDIX F. ANALYSIS OF IPSP TEST RESULTS OCTOBER 1978

AND MARCH 1979 ADMINISTRATIONS Ill

APPENDIX G. PILO T TEACHING STUDY 120

BIBLIOGRAPHY 125

v

8/18/2019 thesis 7_16

9/142

LIST OF TABLES

Table Page

1. Phases of Validation of the IPSP Test 27

2. Analysis of IPSP Test with Interviews-Pilot Study 52

3. Analysis of IPSP Test with Interviews—Final Study 5^

b .

Reliability Analysis 56

5. Reliability Analysis « 57

6. Reliability Analysis 58

7. Reliability Analysis 59

8. Correlations Between IPSP Subtests and Iowa Test of Basic

Skills Tests 6l

9- Reliability Analysis Sample of Iowa Students October 1978 . . 68

10. Reliability Analysis Sample of Iowa Students March 1979 . . . 69

11.

Correlations Corrected for Attenuation of October 1978 IPSP

Subtests 70

12. Iowa Test of Basic Skills Reliability Analysis Nati onal

Representative Sample 107

V I

8/18/2019 thesis 7_16

10/142

LIST OF FIGURES

Figure Page

1. Steps and Component Skills of the IPSP Test Model , 5

2.

Interviewing Procedures 31

3. Quantification Scheme for the Components of the IPSP Test

Model Step 1: Getting to Know the Problem 35

b . Quantification Scheme for the Components of the IPSP Test

Model Step 3: Carry ing Out the Plan 36

5- Quantification Scheme for the Components of the IPSP Test

Model Step it: Looking Back 37

6. Schedule of Events for Pilot I ntervi ew Study

b 2

7. Schedule of Events for Fina l Interview Study b G

8. Square of Correlations Corrected for Attenuation Between

IPSP Steps and ITBS Subtests Form 5 6 3, Grade 5 63

9. Square of Correlations Corrected for Attenuation Between

IPSP Steps and ITBS Subtests Form 5 6 3, Grade 6 6b

10.

Square of Correlations Corrected for Attenuation Between

IPSP Steps and ITBS Subtests Form 783, Grade 7 65

11.

Square of Correlations Corrected for Attenuation Between

IPSP Steps and ITBS Subtests Form 5 82 , Grade 8 66

12.

Ann's IPSP Test Profile 77

13-

Dave's IPSP Test Profile 78

lU. Schedule for Pilot Study 122

vii

8/18/2019 thesis 7_16

11/142

1

CHAPTER I

INTRODUCTION

The question of wha t processes are involved in problem solving and

more particularly in mathematical problem solving is one tha t has been

investigated for many ye ar s. As one surveys the literature, one not only

becomes aware of the vast quantity of research that is being done but

also of the many and diver se methods used to study the problem solving

processes.

Studies range from simply observi ng individual students as they

solve problems to factor analysis of paper-pencil measures of problem

solving. A data-gathering method that has come into prominen t use today

is the structured one-to-one in terview. In such case s, the researcher

observes the behav ior of th e subject as he "thinks aloud" wh ile solving

selected problems. Typically the interview s ar e audio- or video-taped

and a protocol or check list of processes is completed for each student-

problem pair. Other approaches include simulating the problem solving

processes of a human being on a computer and using the re sults to build

a problem solving theory . In addition, results and methods of rese arch

ers investigating th e cognitive processes involved in gene ral problem

solving often have application for mathematical problem solvin g.

In an attempt to analyze and thereby further understand the problem

8/18/2019 thesis 7_16

12/142

solving process r esea rchers hav e proposed various multiste p models.

One of the earliest wa s proposed by Wallas

(1926).

Based on his own

experiences and his analyses of what others think they are doing when

they solve problems, Wallas suggested a four-step model: prepa ra tion,

incubation, illuminat ion, and verification. Drawing upon many years of

experience as a mathematician and a t eache r, Polya (1957, 1962) pro

posed another four-step model: understa nd the problem, make a plan ,

carry out the plan , and look back at the complete solution. Res tle and

Davis (1962) suggested t hat the problem solver goes through a number of

independent but s equentia l stages . The student solves a subproblem at

each s tage, thereby allowing him to go on to the next step. Thes e and

other multistep models hav e appeared in the literature wit h varyin g

degrees of supporting empirical evidence.

If problem solving is a multis tep process t he question th en

arises:

would it be possible to measure a person's ability (or skill)

at each of the steps? If there wer e a reliable testing instrument

which measured this ability , of what value would it be t o the classroom

teacher? Would the te acher be able to use such a test to help plan for

problem solving instruction? Of course , the one-to-one inte rvi ew str at

egy is available to evaluate a child's problem solving process es but the

teacher does not have the time to conduct interviews with all the stu

dents in the classroom. In a ddition, the interview has been criticized

for yielding results wh ich are subjective and unreliable.

There is little in the literature concerning paper-pencil testing

instruments which purport t o measure problem solving processe s. Man y

existing group adminis tered tests contain problem solving or

8/18/2019 thesis 7_16

13/142

3

application sections which consist of verbal problems. For example,

the Iowa Test of Basic Skills (Houghton, Mifflin Co., 19 l b ) and the

Metropolitan Achievement Test s (Harcourt, Brace and Jovanovich, 1971)

contain subtests designed to measure problem solving ability. How ev er ,

the information these tests provide, namely, the grade-level equivalent

or percentile rank of the student and class in a larger group, is de

rived from the number of correct ans wer s. No attempt is made to mea

sure the process that was used to arrive at the solution or to identify

specific skills which may be the source of the child's difficulties.

Purpose

At

As part of the Iowa Problem Solving Project (IPSP) , Schoen and

Oehmke developed a multiple choice paper-pencil test designed t o pro

vide individual and class profiles which illustrate the performance of

fifth through eighth graders at each of three steps of the problem solv

ing process. A modified version of the four-step problem solving pro

cess model proposed by Polya served as th e IPSP testing model.

The purpose of this study is th e validation of this tes ting instr u

ment called the Iowa Problem Solving Project Test (IPSP test ).

Overview of the Study

The steps in the validation process are listed below.

1. Compute estimates of the reliability coefficients for the IPSP

test and its three subtests.

'A three-year project directed by George Immerzeel of the University of

Northern Iowa and funded under ISE A, Title I V, C.

8/18/2019 thesis 7_16

14/142

b

2.

Judge the content validity of the IPSP test using the j udg

ments of mathematics educators and educational measurement specialist s.

3. Judge the concurrent validity by measuring the r elationship

between the IPSP subtest scores and a) the results of a think aloud

measurement procedure, and b) the mathematics concepts, mathematics

problem solving, reading compreh ens ion, and graph skills subtest scores

of the Iowa Test of Basic Skills.

b . Judge the discriminant validity of the IPSP test by analyzing

a matrix of correlation coefficient s, corrected for atte nuation,

between pairs of the t hree IPSP subtests and the four ITBS subtests .

IPSP Test Development

Several "non-standard" instruments have been developed to test

problem solving ability (Wear ne, 1 976 ; Zalews ki, 1975; Lucas, 1972).

These will be described in Chapter two. Howev er , the IPSP test appears

to be the first attempt t o measure skills in a multistep model. A

mul

tiple choice format wa s chosen in order to maximize th e potential for

broad, long-range impact on classroom practices. It was reasoned that

a) standardized machine-scored tests presently ha ve a great influence on

curriculum and ins truction, and b) in this format the IPSP test may hav e

some influence on standardized testing. Furth ermore, the machine scor

ing capability is likely to increase the number of potential users .

After a good deal of "brain storming" wit h members of fie IPSP te am, a

search of the problem s olving literature and a detailed analysis of

each s tage, the testing model as shown in Figure 1 was developed.

8/18/2019 thesis 7_16

15/142

5

1. Get to Know the Problem

A. Determine insufficient information

B.

Identify extraneous information

C. Write a question for the problem setting

2.

Choose What to Do from a List of Strategies

3. Do It

A. Choose the necess ary computation

B.

Estimate from a diagram

C. Compute from a diagram

D. Use a table

E. Compute from an equation

b

. Look Back

A. Identify problems that can be solved in the same

way as a given one

B.

Vary conditions in a given problem

C. Check a solution with the conditions of the

problem

Steps and Component Skills of the IPSP Test Model

Figure 1

Like standardized tests the IPSP test can be efficiently a dminis

ter ed to large groups of students and machine scored with va rious n orm

data easily obtainable. In additi on, subtest scores corresponding to

steps 1 , 3, and b can be obtained. After nearly two years of effort,

no via'ble way to test skills in step 2 in a multiple-choice format w as

found.

IPSP Test Validation

As an innovative measurement instrument containing many items wit h

untested structural characteris tics and designed to measure constructs

nev er "before measured with paper-pencil in str ument, the IPSP test

development called for much careful planning and formative evaluation.

8/18/2019 thesis 7_16

16/142

6

Maj or questions concerned the validity of the te st , if indeed, a reli

able test wit h reliable subtests could be constructed. By utilizing

the Iowa Testing Program's tryout facilities it was possible to con

struct experimental test un it s, administe r th em to representative sam

ples of Iowa fifth through eighth grade rs , and revise the units based

on the item analyses and test data. Als o, over 100 students

wer e interviewed at various stages in the test development process as a

concurrent check on the test validity. Over a three-year period the

test evolved into its present form.

For th e present study esti mates of reliability and of several

types of validity of the final form of the IPSP test we re obtained:

(l) content va lidity, (2) concurrent validity , and (3) discriminant

validity. A general description of the procedures for each of the

above areas follows. A detailed description of the procedures and

findings are in later chapters.

Reliability

The concept of reliability r efers to an estimation of the degree

of consistency of measurement. Theoretically, if a test is reliable,

an individual should obtain the sa me, or almost the same, score on

repeated administrations. Many factors may affect a student's observed

score to make it different than t he theoretically true score. Thes e

"errors"

of measurement may be caused by the test itself, i.e., a par

ticular sample of items may be a "good" or "bad" s ample; by attributes

of the person taking the tes t, i.e., motivation, attitude, health, test-

wiseness;

or by administrative conditions and procedures at the time of

8/18/2019 thesis 7_16

17/142

7

the t est , i.e. , noise , distractions, poor lighting, or lack of uniform

ity in giving test ins tructions. Th e most difficult factors to control

are t he subject's attributes.

Reliability of a score for a testing sample is defined as th e ra tio

of the variance of the true score to t he variance of the observed score:

2

r

t t 2

S

t

2

where r

JX

= reliability of the te st , s

m

= variance of the truett T

2

score,

and s = variance of obtained score. Reliability is also re -

ferred to in the literature as a correlation between scores on the same

test or parallel tes ts.

One of the most commonly use d methods of estimating reliability is

the Cronbach alpha. If the data are in dichotomous form, this estimate

is equivalent to the Kuder-Richardson 20 coefficient. It may be calcu

lated by the following formula:

2

« s r r

1

A>

2 2

where s is the variance of the sum over the k items and s is the

J. Xi

average item variance. An advantage of this coefficient is that it

computes all ways a given test might be split and the n gives a "best

estimate" reliability. In this study, reliability coefficients were

estimated using either th e Cronbach alpha formula or a modified vers ion

of the KR-8 which is a formula in th e series of 20 developed by Kuder-

Richardson

(1937)•

8/18/2019 thesis 7_16

18/142

8

Content Validity

The basic content validity question i s: Are the i tems in question

a representat ive sample of the construct or subject matter domain to be

measured? A familiar example of a need for high content validity is

the ty pical classroom tes t. Here t he individual's score on a sample of

items from a content domain is used to infer t he students' achievement

level in the entire domain. Thus the assessment of content validity

involves careful judgment of the degree to which t he selection of items

is appropriate and representativ e of th e domain or construct to be

tested.

Content validity is usually assessed by th e judgment of experts ,

that i s, subject matter and testing specialists. A limitation of this

method'is that ther e are no clearly specified techniques or standards

for determining content validity.

The reaction of educational test ing experts and mathematics educa

tors to the IPSP testin g model and sample test units was sought. The

testi ng experts were two faculty members in Educational Measurement at

The U niversity of Iowa who also are authors of the Iowa Test of Basic

Skills. The mathematics educators were the IPSP project team and ad

visory board member s, both composed of mathematics teachers in grade

five through graduate mathematics education.

Concurrent Validity

Concurrent validity refers t o the degree to which a test corre

lates with th is independent measure. It is the determining factor in

deciding whet her a test can replace procedures th at are ei ther more

8/18/2019 thesis 7_16

19/142

elaborate or require special techniques as in the case of the IPSP test

and the one-to-one interview method. One w ay of demonstrating a test's

concurrent validity is to compare the t est scores with some independent

measure which is presumed to measure t he behav ior in question.

In thi s study data from interviews and from the IPSP t est were

gathered concurrently. The interview measure consisted of presenting a

series of problems t o a child and asking him to th ink aloud as he at

tempted to solve each problem. The data were th en obtained by coding

and a nalyzing t he solution strategies from obser vat ions, audio-tapes,

and th e s ubjects' writ ten work. Thus one estimate of the concurrent

validity of the IPSP test is the correlation between the score in each

of th e three IPSP subtests and corresponding data collected via the

think aloud procedure.

Another estimate of concurrent v alidity wa s derived from the IPSP

test's relationship to several standardized achievement measures . The

criteria selected were four Iowa Test of Basic Skills subtests: mathe

matics concepts, mathematics problem solving, reading comprehen sion,

and graph skills.

Discriminant Validity

Campbell and Fiske (1959) state tha t validation usually proceeds

by convergent meth ods, i.e., independent measurement techniques de

signed to measure the same trait. Howev er, for purposes of test inter

pretat ion, discriminant validity is also required; that i s , not only

should a test be highly correlated with other test s purporting to mea

sure the same trait but it should not correlate highly wi th tests that

measure distinctly different t raits . In the case of the IPSP t es t, the

8/18/2019 thesis 7_16

20/142

10

discriminant validity issue refers to the degree to which the subtest

scores differ from each other and from scores on other similar tes ts.

Discriminan t validity in this study is approached through th e use

of matrices of correlations corrected for atte nuation. Intercorrelated

variables include steps 1 , 3, and

b

of the IPSP test and the afore

mentioned subtests of the

ITBS.

In particular, the three IPSP subtests

should not be highly correlated with each other nor w ith the ITBS

subtests.

Operational Definitions Used in This Study

1. Problem solving

To search consciously for some action appropriat e to att aining a

clearly conceived, but not immediately atta inable aim. To solve

a problem means to find such action (Polya, 1962),

2.

Cogniti ve processes

Actions of cognitive performance such as per ceiv ing, remember

ing, thi nking, and desiring that depend on t he subject's per

formance capacities. These processes ar e not directly observ

able but are presumed to underlie a person's behavior whe n faced

with a cognitive task.

3. Four-step problem solving test model

For purposes of this study Figure 1 defines the model. Demon

stration of the skills in each of steps 1, 3, and b is taken

as evidence of a student's ability in that ste p,

it. Problem solving processes

a) the cognitive processes presumed to underlie or guide the

8/18/2019 thesis 7_16

21/142

11

subje ct's choices of solution stra tegie s;

b) th e observable actions or operations the subject actually

adopts to attempt to solve the problem. Also called problem

solving solution strategies. It should be clear from the

context w hich meaning applies.

Overview

Prev ious approaches to problem solving resea rch and attempts to

measure problem solving are discussed in Chapter I I , Review of the

Literature. Chapter III includes a discussion of the theoretical

framework and design of the IPSP t es t, procedures use d to develop the

verbal problems, scoring methods for the think aloud in ter vie ws, and

data gathering procedures. The results are discussed in Chapter IV.

A summary of th e v alidation re sults, implications for future res ear ch,

and implications for the teaching of mathematical problem solving are

discussed i n Chapter V.

8/18/2019 thesis 7_16

22/142

12

CHAPTER II

REVIEW OF THE LITERATURE

Since the purpose of this paper is the development and validation

of an instrument to test mathematical problem solving, first , various

models that mathematical educators, mathemat ician s, psychologist s, and

other researchers have used to study the problem solving processes will

be discussed, and s econd, testing instruments developed by other re

searchers t o measure problem solving processes wi ll be summarized. The

reader who is interested i n a more detailed discussion of heuri sti cs,

str ategies, task ana lysis , structures and other pertinent factors in

volved in the problem solving processes is referred t o one or more of

the following rev iew s.

A collection of papers edited by Kleinmuntz (1966) includes

dis

cussions of particular aspects of problem solving, research and theory..

Davis (1966) gives a comprehensive survey of research and theory rela

tive to tra ditional lear ning, cognitive and Gestalt approaches to prob

lem solving, and computer and mathematical models of problem solving.

Kilpatrick (1967) presents an extensive discussion of the defini

tion of heuri sti cs. He describes in detail the notion of problem solv

ing as a search process

,

the thinking aloud t echn ique, and other meth

ods used to study problem solving processes. He also develops a

coding sy stem for tran scribing t he problem solving protocol of subjects

8/18/2019 thesis 7_16

23/142

13

from audio-tape to paper as t hey use the think aloud techn ique.

Hollander's (1973) review focuses on studies related to word prob

lems for students in grades three through e ight. The review includes

studies carried out from 1922 to 1969 in seven categorie s: problem

analysis, computati on, general reading ability, specific reading skills,

specificity of the problem st ate ment , the problem sit uation, and lan

guage factors.

Webb (197 5) notes that

...Because of the complexity of problem solving

processes and the n umber of variables associated

with problem solving, research in this ar ea has

been too diverse t o have any real consolida

tion. ..(p.

l)

His review focuses on studies tha t involve problem solving tasks and

strat egies. These studies we re conducted from 1967 to 1973 and in

volved students in grades three through eight.

Lucas (1972) discusses the nature of problem solving, the search

mode used by information processors , some formal models of the problem

solving process and some techniques used by earlier resea rchers to

externalize thought processe s.

Models of Problem Solving

The attempt to describe the thought processes used in math ematical

problem solving is not a new quest. For many years mathematicians have

sought to determine and understand, the thought processes the y use in

discovering new math emat ics. Some accounts discuss in great detail

th e individual thought processes use d in formulating and proving math e

matical conj ectures.

8/18/2019 thesis 7_16

24/142

l i t

The mathematician Henri Poincare

(l9lit),

in attempting "to see

what h appens in the very soul of the mathematician," gives an explicit

account of his per sonal recollection of the discovery of the Fuchsian

functions. After his discussion he states:

...As regards my other resea rches, the account I

should give would be exactly s imilar , and th e ob

servations related by other mathematicians in the

inquiry of I'Enseignement Mat hemati que

1

would only

confirm th em.

One is at on.ce struck by t hes e appeara nces of

sudden illumination, obvious indications of a long

course of previous unconscious work in mathe mati

cal discovery seems t o me indisputable, and we

shall find traces of it in other cases wher e it is

less evident...(p. 5 5 ).

"The appearances of sudden illumination" recounted by Poincare was

also cited by Gauss in referring to a theorem on which he h ad worked for

years. He sta te s, "Like a sudden flash of lightning the riddle happened

to be solved."

"Sudden illumination" is the third s tage of the psychologist

Wallas' (1926) model. In an attempt to analyze and thereby

further understand the problem solving process , Wallas observed accounts

of thought processes related to him by his st udent s, colleagues and

friends.

It was the account of the German physicist Helmholz that in

spired Wallas to describe thr ee stages in the formation of new thought:

preparat ion: the first stage during which th e problem is investigated

and all the facts are gathered,

incubation: the second stage during which one rests from any conscious

thought about the problem at hand and/or consciously

X

A review w hich instituted an inquiry into the habits of mind and meth

ods of work of mathematicians during the early 20th century.

8/18/2019 thesis 7_16

25/142

15

thinks of another problem,

illumination: the th ircPstage during which th e idea and/or solution ap

pears as a 'flash' or 'aha'.

Wallas added a fourth stage, verification, during whi ch th e validity of

the idea is test ed: which Helmholz did not describe but which Poincare

vivi dly describes in his accounts.

Polya (1957,1962) also developed a four-step model for problem

solving. His principal aim in advocating a heurist ic approach t o prob

lem solving wa s to enable the student t o ask questions w hich focus on

the essence of the problem. Drawing upon many year s of experience as a

mathematician and a teacher of mathemati cs, he describes the following

four-step model:

(1) Understa nding the problem: The student tries to understand

the problem by looking at the data and asking th e questi ons, Is it pos

sible to satisfy the conditions of the problem?, Is there redundant or

insufficient data?

(2) Devising a plan: The student tries t o find a connection be

twe en th e data and the unknown; the student should eventually choose a

plan or strategy for t he solution.

(3) Carrying out the plan: The student carries out the plan,

checking each step along the way.

(it) Looking back: The student examines the solution by checking

the r esults and/or arguments. The student also attempts to relate the

method or re sult to other problems.

In a discussion of Polya's model, May er (1977) points out th at some of

Polya's ideas (restating the given and th e goal) are examples of the

8/18/2019 thesis 7_16

26/142

16

Gestalt idea of "restructuring." He concludes that:

...while Polya gives many excellent intuitions

about how the restructuring event occurs and how

to encourage it , the concept is still a vague one

that has not been experimentally we ll studied...

(p. 67)

The earlier mathematicians, as w ell as Wallas and Polya, based

their accounts of the problem solving process on either introspection

or ret rospection. That i s , the problem solvers reported on their

thought processes as they worked (introspection) or they recalled these

thought processes after they had completed the problem (ret rospection).

Kilpatrick (1967) finds difficulties with both of these methods. Of

introspection h e sta tes , "but psychologists soon were raising questions

about th e nature and magnitude of th e distortion introduced by requir

ing the subject; to observe himself th inking." His criticism of ret ro

spection refers to Bloom and Broder (1950)

,

who stated that the diffi

culties w ith retrospection lie in remembering all the steps in one's

thought processe s, including errors and blind alley s, and in reproduc

ing the se steps without rearranging th em into a more cohere nt, logical

order.

It appears that Claparede (1917;193*0 was the first to use a third

approach, t he "think aloud" technique (Kilpatr ick, 1967). This technique

does not require t he subjects to think and observe themselves thinking

at the same time. The subjects are asked to vocalize their thought pro

cesses as they are thinking. Hence the subjects do not have to analyze

thei r thought processes nor are they required t o have special trai ning.

Ther e ar e, howeve r, potentia l difficulties wit h the think aloud tech

nique:

interference of speech with th inking, the lapse into silence whe n

8/18/2019 thesis 7_16

27/142

17

the subject is deeply engrossed in thought, and the essential difference

between the verbalized solution and the one found silently. Kilpatrick

(1967) summarizes the views of several authors (Rota, 1966 ; Brunk, Col-

lister,

Swift and Slayton, 1 958 ; Gagne and Smith, 1 962 ; Dansereau and

Gugg, 1966) concerning thes e difficulties and concludes tha t:

...The method of thinking aloud has the special

virtues of being both productive and easy t o use .

If the subject understands what is wa nt ed—th at

he is not only to solve the problem but also to

tell how he goes about finding a soluti on—an d if

the method is used with an awareness of its

limi

tati on, then one can obtain detailed information

about thought processes

...(p.

8 ) .

One of the first attempts to systematically gather empirical evi

dence wa s by Duncker (19^5 ) who studied th e problem solving protocol of

subjects w ho were given a problem and asked to "think aloud." Two of

the problems that he used wer e the tumor problem:

...Given a human being wi th an inoperable stomach

tumor, and rays which destroy organic tissues at

sufficient int ens ity , by what procedure can one

free him of the tumor by thes e rays and at the

same time avoid destroying the healthy tissue which

surrounds it

?... p.

1).

and the 13 problem:

...Why are all six place numbers of the form 276 ,2 76,

591,591,

112,112 divisible by 1 3?...(p. 31 ).

Duncker illustrated a t ypical solution protocol for th e tumor prob

lem with a flow chart and observed that the problem solving process

starts from a gener al solution, then progresses to a functional solution

and then to a specific solution.

In a more recent attempt to gather data empirically, Restle and

Davis (1962) developed a model which describes th e subject as going

8/18/2019 thesis 7_16

28/142

18

through sequential stages when solving a problem. Each stage is a sub-

problem with its own subgoal.

Thus,

the individual solves a sequence

of subproblems which then enables him to continue on to the next sta ge.

The model states that the number of st ages , k, for any given problem

can be determined by the square of the average time to solution, t ,

divided by the square of the standard deviation, s, of the time to

2 2

solution, or k = t /s . They do not describe the stages, and assume

that each is of equal difficulty.

Simon (1975) and his associates hav e also investigated thought

processes used in problem solving. He used laboratory conditions to

observe human beings working on well-structured problems that the sub

jects find difficult but not unsolvable. He states the following broad

characteristics of the information processing system which he uses to

describe human problem solving:

(1) serial processing, one process at a time;

(2) small short te rm memory capacity for only a few symbols; and

(3) infinite long term memory with fact retrieval but slow storage.

Simon further states that the solver always appears to search

sequentially, adding small successive accretions to his store of infor

mation about the problem and its solution.

These models and other multiste p models (e.g., Dew ey, 1 933; John

son, 1955 ; Gagne, 1962; Guilford, 1967; Kilpatrick, 1969; Post,

1968;

Webb, 19TU, 1977; Les ter , 1975) ha ve appeared in the literature

with varying degrees of supporting empirical evidence. Such models

suggest a number of questions about teaching and measuring problem

solving skills. If problem solving is a multistep

8/18/2019 thesis 7_16

29/142

19

process,

would it be possible to measure a person's ability at each

step? Would this in formation be more useful to a teacher tha n just the

single number right, pe rcent ile, or grade level equivalent score?

How can this type of evaluation be effected? Can paper-pencil tests be

used or is a one-to-one interview assessment necessary? A paper and

pencil test can be a v er y accurate and efficient evaluation in str ument,

especially in the case of easily measured skills. However , the complex

process of problem solving is more difficult to evaluate.

Test ing the Ability to Solve Problems

In contrast to both the we alth and diversity of research in the

problem solving process , the re is a scarcity of instruments to measure

these processes. Johnson (l96l) cites the dearth of instruments to mea

sure the problem solving process in the N ational Council of Teachers of

Mathematics year book on Eva luation in Mathemathics:

...The committee would hav e liked t o include ma

terial on th e appraisal of mental processes in the

learning of mathemati cs. As teachers of mathemat

ics,

we are deeply concerned about developing

skill in productiv e thinking. Too often, many of

us find ourselves knowing little about t he rela

tions betw een the solutions given by our students

and the thought processes that led to these solu

tions. Howe ver , tests for appraising higher men

tal processes such as concept formation, problem

solving, and creative thinking in mathematics do

not

exis t... (pp.

2,3).

Recently, efforts have been made to develop instruments that me a

sure problem solving processe s. Spee die, Treffinger, and Feldhusen

(1973) summarized what they and other authors (Ray, 195 5 ; John , 195 ^ ;

Ke isler, 1969) vie w as characteristics of a good problem solving process

test. Three of these characteristics are :

8/18/2019 thesis 7_16

30/142

20

1. The test should yield a variety of continuous measures con

cerning the outcomes of th e problem solving, the processe s,

and the intellectual skills involved.

2.

The test should demonstrat e both reliability and validit y.

3. The test should be practical for group administrat ion.

Many standardized group administered t est s, such as the Iowa Test

of Basic Skills and the Metropolitan Achievement Tes ts, contain subtests

designed to measure problem solving ability. This measurement is ob

tained from a single score , i.e., the number of correct ans wer s. No

attempt is made to decompose either the v erbal problem it self, or the

solution, into component skills so that one can identify the specific

skill which may be the source of the student's difficulty.

Proudfit (1978) cites other limitations of standardized te st s. She

lists eight tests that wer e examined by Charles and Moses (Webb & Mos es ,

1977) who concluded that most of the items did not measure problem solv

ing processes but emphasized application of either previously learned

skills

or

algorithms. In additi on, Proudfit examined other problem solv

ing instr uments, e.g., the Purdue Elementary Problem Solving In vent ory,

and concluded that the y did not meet the previously stated crite ria on

at least one of the thre e counts.

Post

(1968),

using a four step model (recognition, ana lysi s, pro

duction, verification) designed a problem solving test which is in a

multiple-choice format. Howe ver , the scores on the test are measures

of the end product and not of any particular step in his model. His

content v alidation procedure used the judgment of a panel of expert s,

and his split h alf reliabilities ranged from .60 - .82.

8/18/2019 thesis 7_16

31/142

21

Foster (1972) developed a problem solving test which had a

mul

tiple-choice format for some items and an open-ended format for other

items.

Some of the items are designed to measure process more than

product, but t he test is not machine scorable.

Hiatt (1970) discusses the need for measuring mathematical problem

solving processes . He designed a "creative problem solving" test in

which points are awarded on the basis of approaches use d, e.g., more

points are awarded if the student does some computation mentally.

Since this test is an open-ended format, it would be difficult to admin

ister to large groups.

Wearne (1977) developed a test of problem solving behavior which

provides information about th e child's master y of the prerequisite mat h

ematical concepts in the problems. Each ite m, called a super-item, is

composed of thre e par ts : a comprehension question, an application ques

tion,

and a problem solving question. The comprehen sion question

assesses th e child's understanding of the problem set ting. The appli

cation question asses ses the child's understanding of the concept wh ich

is presumed to be a prerequisite to solving the problem, and finally,

the problem is solved. How ev er , it was difficult to substantiate the

assertion that th e comprehen sion and application items ar e assessing

prerequisites of th e problem solving ite ms.

Proudfit (1978) has designed a problem solving process test in

which the child is given three problems, asked to select one , solve i t,

and then answer a list of 12 questions before going on to the next prob

lem. The questions refer to the child's solution processes . Two of the

questions are open ended and the other 10 are in a multiple-choice

8/18/2019 thesis 7_16

32/142

22

format. The test was administered to 100 fifth grade students but no

formal reliability or validity data were reported.

Zalewski (197*0 developed a set of verbal problems w hich he admin

istered to a group of seventh graders. His purpose was t o predict th e

results of problem solving process assessments using the t hink aloud

procedure and coding scheme from hi s paper and pencil te st . For th e

interview test he used Lucas ' point system which gave 1 point for

'Approach', 2 points for 'Plan', and 2 points for 'Result'. The wri tt en

test was scored using the number of correct answers. The correlation

between the rankings on the writt en tests and the interviews was .6 8,

below t he criterion Zalewski h ad set prior to his study.

Summary

Many resea rchers, including psychologists, mathematicians and edu

cators, have investigated t he problem solving processes usi ng multis tep

models.

Their concensus is that th e multiste p model is a valid model

for t he investigation of the problem solving processes. In a ddition,

not only is there a s carcity of instruments to evaluate these models ,

but also a scarcity of instruments to measure problem solving processe s

in a machine scorable format. The IPSP test is an instrument developed

to measure problem solving skills in a machine scorable format. The

validation of this test is the purpose of this study.

8/18/2019 thesis 7_16

33/142

23

CHAPTER III

DEVELOPMENT AND PROCEDURES

Purpose of the Iowa Problem Solving Proj ect

The Iowa Problem Solving Project (IPSP) is a thre e year project

directed by George Immerzeel of the Uni versity of Northe rn Iowa and

funded under IS EA , Title IV, C. Its primary purpose is to deve lop,

eval

uate,

and disseminate materials to improve t he math ematical problem

solving abilities of students in grades five through eight.

The approach advocated by th e IPSP staff involves both the t each

ing of specific solution strategies and the solving of many interes ting

verbal problems wi th increasing difficulty levels. Eight instructional

modules have bee n developed to help the student build a variety of

skills and strategies necessar y to successfully solve v erbal problems.

Each module consists of a booklet and a card deck. Th e booklet provides

instructional activiti es aimed at developing a particular skill while

the 100-problem card deck provides practice in solving problems of va ry

ing difficulty levels wh ich are especially designed for that skill. Two

IPSP members (Schoen a nd Oehmke) wer e assigned the t ask of developing a

measurement instrument based on the IPSP problem solving model. A more

complete description of the IPS P proposal, modules, and teaching stra t

egies is given in Appendix A.

8/18/2019 thesis 7_16

34/142

2it

Development of the IPSP Test

After several meetings with the IPSP team and advisory board to

discuss the purpose, goals and philosophy of th e IPSP te st , Schoen and

Oehmke began th e tas k of developing a problem solving process t es t. The

testing model which was used, after several revis ions, is given in Chap

ter I (figure l) . It should be noted that while t here we re frequent

consultations among the sta ff, the IPSP test and th e IPSP modules were

designed and developed independently; the modules w er e developed at the

University of Northern Iowa and the testing instrument was developed at

The University of Iowa.

Three of the more important constraints th at were imposed on the

testing instruments we re : (l) th e format should be multiple-choice so

that the test can be machine scored, (2) the items should measure prob

lem solving subskills and not just the ability to get a final answer,

and (3) the test should be based on the IPSP tes ting model as developed

by the IPSP team.

A search of the literature was completed to locate instruments

which measure problem solving processes and any items which we re found

were classified according to the IPSP testing model. Instruments were

found whi ch we re in an open-ended or multiple-choice format or some

combination of both (Kilpatrick, 1967; Post, 196 8; Hia tt , 1970; Foster,

1972;

Hollander, 1973; James, 1973; Zalewski, 1 9 l b ; Wearne, 1976;

Krutetski,

1976).

In many cases it was difficult to classify the items

according to the specific subskills of the IP SP te sti ng model. Those

items tha t seemed to be most amenable to rev ision we re selected and re

written to conform to a specific step of the model. In addition, many

8/18/2019 thesis 7_16

35/142

25

items were wr itten to test subskills in each category. The objective

was to build a large item bank to be used during th e formative period.

Thus, valid items which satisfied it em analysis and reliability criteria

in tryouts could be selected for inclusion in the IPSP test .

A first draft of the IPSP test w as examined by two authors of the

Iowa Test of Basic Skills (ITBS) at The University of Iowa, by the IPSP

tea m, and by the IPSP Advisory Board. The consensus was that the items

did measure the subskills in the IPSP testing model. Howeve r, there was

also a consensus th at the items wer e "too wordy" and tended to be

"cute." These suggestions, as well as those of several doctoral stu

dents in mathematics education, were included in several revisions of

the first draft.

Each r evise d, open-ended item was then typed on a 3" X5 " card and

presented to twenty-nine students in individual interviews. These stu

dents we re in grades five through eight and were selected by the ir

teachers as representing a cross section of each of their classes. At

the beginning of each interview the student wa s told that these problems

might be different than any they had seen before, that s/he wa s not

being tested but that his/her thoughts and suggestions as s/he wa s work

ing the problems were needed to improve th e test items. The student

read each problem aloud; the n talked aloud while solving the problem.

Paper and pencil were available if the student chose to use them.

The primary purpose of these interviews was to get th e reaction of

the target audience to the items and to gather ideas for distractors.

It wa s t he interview ers' intent to be passive but still attempt to

elicit as much information as possible from th e students. Gen era lly,

8/18/2019 thesis 7_16

36/142

26

the s tudents wer e extremely cooperati ve. Two of the brightest eighth

graders not only solved th e "hardest" problems immediately, but analyzed

the items in great detail. Their comments were very informative, and

some of the ir suggestions wer e used as foils for the next revi sion. Th e

interview s w ere recorded on audio tape and analyzed according to reading

difficulties, concept difficulties , strategies used, computational

errors, and answers give n. Using information obtained from these

interv iews, the first draft of the IP SP test wa s revised.

Experimental units of multiple-choice items wer e developed and

administer ed at three different times w it h re visions following each tr y-

out.

The last tryout prior to the project e valuation was administered

to representa tive samples of Iowa students in grades five through eight .

Appendix B contains a more complete description of the phases in

the development of the IPSP test along with a summary of formative dat a.

The study reported here focuses on th e v alidation, rather than the de

velopment , of the IPSP t est . This validation was completed in three

main phases : determining the relationship betwee n the IPSP test and

data gathered from individual int er vie ws , determining the relationship

between th e IPSP subtests and several ITBS s cales, and determining the

relationships between the IPSP subtests. The first phase is an ass ess

ment of concurrent validity, and the last two a re forms of discriminant

validation using the Fisk and Campbell (1959) terminology. During the

validation several different forms of the IPSP test were used. These

are numbered and described in Table 1.

Phase one of the validation process wa s considered to be the most

important and the least straightforward. Thus, most of the effort, as

8/18/2019 thesis 7_16

37/142

27

Table 1

Phases of Validation of the IPSP Test

Test Date:

December, 1977

Sample Tested:

Malcolm Price

Lab School

Test

Form

5 6 1

5 6 2

781

782

Grade

Level

5,6

5,6

7,8

7,8

Number of Items

Total (Subtest)

it0(l2,it,12,12)

U0(12,it ,1 2,1 2)

it0(l2,it,12,12)

it0(l2,it,12,12)

Phase of

Validation

Pilot Interview

Study; Pretest

Pilot Interview

Study; Posttest

Pilot Interview

Study; Pretest

Pilot Interview

Study; Posttest

Test Date:

January, 1978

Sample Tested:

Representative

Sample of Iowa

Students

563

56b

581

582

783

5,6

5,6

20(6,2,6,6)

20(6,2,6,5)

5,6,7,8

20(6,2,6,6)

5,6,7,8

20(6,2,5,7)

7,8 20(6,2,6,6)

Final Tryout

Series

78U

7,8 20(it,2,7,7)

Test Date:

May, 1978

Sample Tested:

Two Fifth Grade

Classes from an

Iowa City Ele

mentary School

561

it0(l2,i|,12,12) Final Interview

Study

Test Date: 565 5,6

October, 1978

Sample Tested:

One Hundred 785 7,8

Fifth and Sixth

Grade Classes

from Iowa

Schools

30(10,0,10,10)

30(10,0,10,10)

Summativ e E valua

tion of IPSP Pro

ject:

Pretest

8/18/2019 thesis 7_16

38/142

28

well as the emphasis in this report, was placed on the relationship be

tween the interview data and IPSP test scores.

Design and Development of Interview Procedures

A preliminary list of interviewing procedures was developed using

the experiences that were gained during the first round of interviews

described in the previous section. The purpose of the later interviews

was to discover th e strategies th e child uses to solve ve rbal problems.

Hence it was decided that the intervi ewer should not lead the child into

selecting a particular heuristi c. Any questions asked by the inte r

viewer to elicit more information should not lead the child. The re wer e

also instances of nonverbal leading that occurred during the tri al in

terviews.

For example, a child unsure of which operation to use would

say , "I think I should multiply?" and then look at the interviewer's

face to get some

sort,

of reaction. Whether the child in fact multiplied

or performed some other operation depended on the re action of the i nte r

viewe r. Consequently a major reason for developing a set of writt en

interview procedures was to minimize th e interviewer's influence on the

student's problem solving processe s.

A first draft of the procedure wa s brief and contain ed such i n

structions to the interviewer as : Encourage the student t o vocalize his

thinking as much as possible during th e sample warm-up t ime ; Try to re

cord something on t ape ; Don't go any longer than 15 seconds wit hout re

cording something on tape . After the list was developed it was tested

with two volunteer fifth and sixth graders. Revisions wer e made ,

and these were tried with two volunteer students, a sixth and a

8/18/2019 thesis 7_16

39/142

29

seventh grader. In addition, two doctoral students in mathematics edu

cation tried the procedure with several of their students. Some of the

suggestions tha t we re incorporated into a revised edition wer e: Supply

a pencil with th e eraser cut off to prevent eras ures; Rath er t han giving

the student one or more sheets of paper for all the problems, put one

problem on each sh eet so that problem and computations are toget he r;

Tell the students that you will not let them know wheth er their solution

is correct.

A difficulty that occurred with some frequency was that of the stu

dent becoming silent while using paper and pencil ei ther to do arithme

tic computations or while apparently thinking. If the interviewer

interrupted and asked the child to tell what s/he was thi nking, the

reply would be , "I'll tell you as soon as I'm finished," or "Just a

minute,

I'm thi nkin g." Thi s difficulty was handled in se ver al

ways.

If

the child was in deep thought working on the problem, th e interviewer

would ask a question about the overt observable behav ior of the child,

e.g., "Are you doing some multiplication

now?"

or "Are you adding

now?",

when it was obvious that the child was doing that particular computation

using paper and pencil. The child usually mumbled "ye s" or shook his

or her head and went on with the arithmetic. This strategy was used as

an indicator on the tape to let the coder know what th e student was

doing during th at t ime .

A second strategy that worked very well was t o make a comment sim

ilar to the following: "That's fin e, you're telling me what you're

thinking"; "You're doing fine , Su e , you're telling me what I want to

know"; Can you put into words wha t you're thinking?" Such statements

8/18/2019 thesis 7_16

40/142

30

seemed to encourage t he child to vocalize and give more deta ils, yet

did not appear t o lead the students into using a specific strategy. The

final form of th e list of interview procedures wa s the result of five

tryouts and r evi sions. Figure 2 shows the final form used in this vali

dation study.

The Quantification Scale

Development

Since one of the goals of this study was to assess the relationship

between the IPSP test and the results of the think aloud interv iews, a

quantification code based on the IPSP testing model was needed to pro

cess the interview findings. Some related research was found.

Kilpatrick (1967) developed a coding scheme to analyze the proto

cols used by hi s subjects in a think aloud i nte rvi ew, but did not

attempt to quantify thes e protocols. Lucas (1972) used a modification

of Kilpatrick's coding scheme wit h calculus st udent s. Hi s five-point

scoring code is based on three categories: Approach (one

point),

Plan

(two points), and Result (two point s). The "Approach" phase represented

the subject's understa nding of the problem, th e "Plan" phase r epresented

the subject's attempt t o find a path to obtain th e answ er, and the

"Result"

phase wa s the subject's final answer.

Zalewski (l97it) investigated th e relationship bet wee n a paper-

pencil test and inte rview re sults. He esse ntially followed the proced

ures e stablished by K ilpatrick and Lucas wit h some modifications since

his subjects wer e seventh graders. The process score obtained by using

Lucas' scoring procedure provided a basis for ranking the subjects.

8/18/2019 thesis 7_16

41/142

31

Intervie wing Procedures

Problems should be t yped one on a page, preferably placed at the

side of the page so that th e student can use the rest of the page

for any computing, drawing diagrams, tables, or any type of think

ing.

Start the interview wi th 2 sample problems, thus allowing the s tu

dent to become familiar w ith the routine and wit h the type of

information the interviewer would like to find. At all times make

a conscious effort to put th e child at eas e.

Tell the student that no i nformation will be given on wheth er the

answer or strategies are correct since you want to get th e best

possible data.

Do encourage the students to go on by making comments such as ,

"You're doing just fine." "That's good, you're t elling me what

you're thinking." "Go ahead." BUT DO NOT LEAD THE STUDENT INTO

USING A STRATEGY.

Don't go any longer than about 15-20 seconds without recording

something on tape. EXCEPTIO N: If the student is doing computa

tions or drawing a diagram or making ta bles, etc., make some sort

of statement a s , "You're making a t able," etc.

Encourage the student to vocalize his thinking as much as possible.

If a student falls silent w hile wri ting or drawi ng, prompt him by

reading what he has writt en or ask him what he is doing. However ,

rule 5 takes precedence over rule 7-

If a student doesn't an sw er , or doesn't make any comments about

his thin king, wait about 15 seconds and ask, "Can you tell me what

you are th inking?" Wait another 1 0 seconds or so and ask again.

This ti me, "Are you trying t o figure something out?" If nothing

happens,

call this IMPA SSE. Now ask the quest ion, "Would you like

a hint or another problem?"

If the child says ye s , this would indicate to you that the rating

part of the data gathering is ove r, but to continue to get diag

nostic data. This can be done by asking the student to identify

the a rea that presented the trouble, why did he have this trouble,

e.g., didn't know met hod, lack of underst anding of the problem,

read problem incorrectly, etc.

Figure 2

8/18/2019 thesis 7_16

42/142

32

Figure 2 (cont'd.)

8c.

If the student says no, then allow more time and ask him if he

would tell you what he is thinking or what meth od he is trying,

or would he try to do his figuring on paper. Then repeat steps

8a,

8b and 8c again.

9- If the student is not trying to solve the problem get him on the

right tr ack, but after IMPASSE.

10. For the first half of the problems observe the student. Does he

have a habit of LOOKING BACK? If not , follow step 11 .

11.

If the student does NOT have the habit of LOOKING B ACK , and has

already been given the first half of the problems, then lead him

on wit h prompts listed on the LOOKI NG BACK coding sheet, e.g.

"Did you check your answer wit h the conditions of the problem?"

"Did you check your a nswer?" "How sure are y ou that your answer

is correct?"

DON'T

1. Do not allow the child to erase. Instruct him to make a line

through the mist ake.

2.

Do not give any tutoring or prompting until after the IMPASSE and

then only if the child asks a quest ion. However do use the pro

cedure listed in step 8a.

3. Do not summarize what the child ha s done . Try to get him/her to

do it.

it. Do not te ll the student whether he is on the right tra ck, or

whether his answer is correct.

5. Do not tell the student that you are going to use the strategies

listed in steps 8a , 8b and 8c.

8/18/2019 thesis 7_16

43/142

33

Written tests were administered to thes e same subjects who were then

ranked according to the number of correct ans wer s. The correlation

coefficient between the written te sts and interviews was .68. Zalewski

concluded that a higher correlation is necessary before the writte n

test scores can be used as a substitute or predictor" for interview

results.

Webb (1975) also used an adaptation of the coding system developed

by Kilpatrick and Lucas. He used th e "Approach," "Plan," and "Result"

scoring system and obtained a frequency count from a check list of prob

lem solving process variables.

From the preceding discussion it appeared that no

3-step

quant ifi

cation scheme was available to investigate the relationships between

intervie ws and IPSP test res ults. A first attempt at developing the

scale w as made using Kilpatrick's processi ng sequence with some modifi

cations in order to follow the IPS P tes ting model. In trying to quant

ify these processing sequences the procedures became very cumbersome.

A n ew attempt wa s made in which flow charts wer e designed for each step

of the model. Again when it came time to assign a number at the various

branches the instrument became unmanageable. Another attempt was made

in which behavior in each step of the testing model was assigned three

numbers:

0 , 1 , and 2, which were to serve as categories. In the 0

cat

egory would be those processes whi ch we re totally incorrect or a re

sponse such as "I don't know wha t to do." The 2 category would contain

responses wh ich wer e completely correct and the 1 category would con

tain the intermediate responses. This new procedure was used with th e

audio tapes from the first i nte rvi ews. It became immediately apparent

8/18/2019 thesis 7_16

44/142

3it

that at least one more category was needed and that the categories w ere

not explicit enough for each step of the model.

These revisions were made and the resulting instrument now had four

categories:

0, 1, 2 , 3, and more explicit descriptors under each cate

gory. This new instrument w as used to process additional tapes and fur

ther revisions were made. At this stage the instrument was examined by

the same two mathematics educators who were consulted on the inter view

form. Each category and its descriptors were thoroughly discussed.

Step

it,

the looking back st ep, present ed the greatest difficulty. If

the subject gives an a nsw er, appears to be mulling it over, goes back to

the problem, reads it again, and then gives another an sw er —is this

checking the answer or tryi ng to understand the problem? It was decided

to include this process under step 1 and be more explicit w ith the

descriptors under step it.

After general agreement on the appropriateness of the scale was

reached, three raters quantified audio tapes of interviews using this

form. After a few minor additions the instrument was considered to be

in "final" form. As a final te st , each rater analyzed the same three

interviews on audio tapes .

Use of the Scale

The final form of the quantification scale is given i n Figures 3,

it, 5. Behavior which involved rea ding, analyzing and understa nding the

problem was classified as step 1 behavior. Brie fly, a score of 0 was

assigned to a student who failed completely to understand a problem; 1

was assigned to a student w hose analysis of the problem was incorrect

8/18/2019 thesis 7_16

45/142

Quantification Scheme for the Components of the IPSP Test M odel

Step 1:

Getting to

Know

the P roblem

Says he doesn't understand

the problem and makes no

attempt at solution.

Tries t o solve the problem

unaware that there is in

sufficient information and

never starts correct

strategy.

Fails t o use data correctly

in attempts at solution,

e.g., uses all extraneous

data to arrive at solu

tion.

Immediately tries to do

some arithmetic operations

using all the numbers in

the problem without regard

to a correct strategy.

Makes a fa l se s t a r t ( recog

n izes i t as such) bu t c an ' t

a r r i v e a t a co r r ec t s t r a t e g y .

Says he doesn't understand

problem—rereads it—tries to

make a start but is unsuc

cessful (includes rephrasing,

trying to understand what is

unknown, what is given, or

searching for a path).

Reads problem, knows ther e

are "too many numbers" but

can't organize proper data

(extraneous

data).

Makes true statements about

extraneous data but does not

advance solution.

Rereads problem—appears to

know there is something miss

ing but cannot state what is

wrong and makes a false

start (missing

data).

Tries t o solve the problem

without regard to using data

correctly. After a brief tr ial

and error realizes he is not

using data correctly but can

not correct th e situation.

Tries t o summarize data or

repeat it in a different form

but does not find correct

strategy.

Makes a false star t—but

eventually arrives at a cor

rect strategy (includes

trial and error).

Tries to solve the problem

and unconsciously makes up

his own missing data but does

not state that there is in

sufficient data. His solution

strategy is correct for the

data he provides.

Tries t o summarize data or

repeat it in a different form

which starts him out on the

correct strategy but later he

gets off the track.

States there is no solution

because of insufficient data

and attempts to modify con

ditions but is unsuccessful.

Uses correct strategy but

does not use data in its

proper form (e.g., neglects

units).

Solves some of the cases in

volved in the problem but

fails to consider all solu

tions.

Any correct attempt to understand

the problem by reading or re

phrasin g, i.e., trying to under

stand what is unknown, what is

given,

or sea rching for a path.

Rereads the problem to assist in

drawing figures, tables , equa

tions , performing a check or in

troducing symbols. (Must be ap

parent that this is going to aid

in understanding the problem and

finding a correct strategy.)

States a plan for an intermediate

or final goal which is a correct

strategy.

Carries out exploratory manipula

tions which lead to correct solu

tion.

States problem can't be worked

and tells what is needed to work

it (modifies problem). States

reason why it can't be worked

(insufficient data).

States what data is not needed in

solution of problem while stating

correct strategy (extraneous

data).

States th e conditions and con

straints of the problem correctly.

Immediately starts out to work

the problem and succeeds.

Figure 3

OJ

8/18/2019 thesis 7_16

46/142

Step 3: Carrying Out the Plan

0

Any manipulations or compu

tations that are done are

incorrect.

Strategy is set up correct

ly but is not able to carry

it out, e.g., cannot solve

an equation.

Tries to use a diagram,

figure, or table but does

the computation incorrectly.

Suggests a plan but cannot

carry it out.

Quantification Scheme for

1

Does less than half the

number of necessary computa

tions correctly.

Sets up equation but cannot

solve it completely. Does

simple operations like addi

tion and subtraction.

Uses successive approximation

(systematic trial and er ror)

and does the first step cor

rectly but cannot carry it to

the end.

the Components of the IPSP Test

2

Does half or more of the

necessary computations cor

rectly.

Sets up equation but cannot

solve it completely in that

s/he makes errors on the ha

harder operations, e.g., mul

tiplication, divi dion, clear

ing fractions, etc.

Makes a mistake in copying

correct number but carries

out computations correctly.

Makes an incorrect diagram,

figure, or table but uses th e

numbers in the computation

correctly.

Uses successive approximation

and does the first few steps

correctly but bombs on the

computations involved in the

final step.

Does computation correctly

but uses units incorrectly.

Model

3

Sets up problem correctly

and carries out actual compu

tations correctly.

Sets up problem incorrectly,

but all computation is done

correctly.

Uses t he algorithm or equation

correctly, e.g., manipulates

all parts of the e quation cor

rectly .

When usin g successive approxi

mations (trial and error),

uses information from previous

trial correctly, i.e., com

putes all these va lues cor

rectly.

Starts to execute plans, makes

a mistake (computationally)

but finds errors and corrects

them.

Figure b

8/18/2019 thesis 7_16

47/142

Step

h : Looking Back

Quantification Scheme for the Components of the IP SP Test Model

0

Makes no attempt to check

answer or

conditions of

problem.

Says,

"It's probably

wrong, and makes n o

attempt to check the

answer.

Says s/he do esn 't know and

makes no attempt to cor

r e c t i t .

1

Expresses uncertainty about

answer.

Says it's probably wrong (or

some version) and attempts to

give a reason for his/her

uncertainty.

Makes an attempt t o check the

answer but is not successful

enough to be convinced that

it is right or wrong.

Checks computations involved

in answer but does not check

to see i f answer satisfies

condition of problem. Errors

here should be major.

2

Makes some at tempt to check

answer or decide whether it

is corre ct—but eventually

gives up.

Makes an attempt to check the

answer by var ious methods

(i.e.,

retraces ste ps, checks

condition of problem, substi

tutes answe r but cannot carry

out check

completely ).

Makes an attempt to check the

answer by various methods

(i.e., retraces st eps, checks

condition of problem, substi

tutes a nswer but fails to de

tect the incorrect answer).

Errors he re s hould be minor.

3

Attempts tt -heck the values

of an unknc . or the validity

of an argument.

Tries to decide whether the

answer makes sense

(i.e.,

realistic, reasonable

esti

mates) .

Checks that all pertinent data

has been used.

Suggests a new problem that

can be solved in the same way.

Successfully attempts to sim

plify the problem.

Checks solution by retracing

steps or substitution.

Checks that solution satisfies

conditions of problem.

Figure 5

8/18/2019 thesis 7_16

48/142

38

but wh o understood some of its e lements; 2 was assigned to a student

whose analysis was correct except for a minor er ror such as reading data

incorrectly; 3 was assigned to an ent irely correct understanding of the

problem which led to a valid solution str ategy. A crucial point is that

the step 1 score was not affected by errors in the application of a

solution strate gy, once that strategy wa s chosen. Errors in the appli

cation of a chosen strategy were reflected in the step 3 score.

Step it behavi or consisted of student moves after a tentati ve solu

tion was reached. Many students stopped as soon as they had an answer

and were given a score of 0 for step it. Bri efly, a score of 1 was

assigned if some uncertainty was expressed but no systematic check was

made;

2 was assigned if a check was atte mpted but w as either incorrect

or incomplete; 3 was assigned if a valid check of the computation, con

ditions and/or reasonableness of the solution was carried out. Agai n,

specific criteria we re described for each numerical s core, but an impor

tant point is that the step it score wa s not affected by any behavior

preceding a tentative solution. An exception was that students were

assigned 0 on step it if no tentative solution wa s reached.

The following two examples will illustrate the scoring scheme.

Ann, a sixth grader, was presented with thi s problem:

A bag of XL-50 brand marbles contai ns 25 marbles and costs

19$.

How much wi ll 125 marbles cost?

Ann read the problem aloud and this is the transcribed interview:

A: Uh . . . Oh boy . . . hm . . .

I: What are you doing now?

A: I'm tryi ng to figure out how I'll do this. Eit her add or multiply

8/18/2019 thesis 7_16

49/142

39

. . .O .K . I'm going to multiply 125 marbles by 19$. (multiplies)

It comes out $11.25. That's not right.

I: What are you trying to find?

A: I'm trying to get the right answ er.

I: But what answer?

A: What I should do with 25 , 1 9, and 12 5 , because I know with those

numbers I hav e to do something.

(Silence)

(Rereads the problem)

(Silence)

A: I want to see if I multiplied wrong . . .

(Remultiplies but is still stumped)

Ann exhibited a behavior wh ich occurred very frequently in student

interviews.

She tried to do some arit hmetic operations using all th e

numbers in the problem. She lacked, or at least failed to use, analytic

skills, essentially step 1 behavior. Howe ve r, her computational skills

and ability to use tables and diagrams, as illustrated in other prob

lems , wer e good. On this problem, she was given a score of 0 for st ep

1; 3 for step 3; 0 for step it. If she had made a computational error or

misused an equation, she would hav e received a 0 , 1, or 2 on step 3. A

similar pattern emerged in her solution to other problems.

Dave,

a fifth grader, is an example of a student w ho was able to

understand most of the problem settings presented to hi m, but had diffi

culty carrying out his solution stra tegies. This is illustrated wit h

the following example of a single-step problem:

8/18/2019 thesis 7_16

50/142

itO

Mr. Price earned $75 in each of 8 we eks. How much did he

earn for all 8 weeks?

D: O.K. 75 , 75 , 75 , . . . (add eight 75's)

O.K. 1 , 2 , 3, . . .

I: So you wrote eight 75's down , right?

D: Ye s , O.K., that'd be . . . eight 5's would be Uo. It'd be 0 and it

on top. And eight 7's would be . . .O .K . let's see . . . Hmm.

(Writes them down and adds them)

D: It'd be

$5.60.

I: O.K. that's your answer then?

D: Yes.

Dave chose to add eight 75 's, which was a correct strategy. How

ev er, he had difficulty in finding the sum. To make the computation

easi er, he correctly noted that 8 sevens is the same as it fourteens. He

was scored a. 3 on step 1, and 2 on step 3 on thi s problem. His score on

step

it

wa s 0 since he did not exhibit any behavior in that category.

Lateri wi th prompting, Dave realized that he h ad left out the 8 five s,

and he corrected himself.

The Pilot Interview Study

Procedures

After a year of development, a pilot study wa s designed in which

data from interviews and the IPSP test in thei r developing forms, wer e

gathered from the same sample of students in Fall, 1977- The year of

development included a trial run in Januar y, 1 977, and one in Ma rch,

1977, in which four 20-item forms of the IPSP test for each run wer e

8/18/2019 thesis 7_16

51/142

i t l

administered to students in grades five through eight. The form of th e

IPSP test that was used in this pilot study was a revised one based on

the data obtained from those trial

runs.

Preparations for th e pilot study included briefing th e classroom

teachers and administrators of the school, setting up a schedule for

interviewing the students , and working out the logistics of the sched

ule. A final session was held which was attended by the IPSP sta ff, the

classroom teachers who would be involved in the study, the h ead of the

school's mathematics departmen t, the prin cipal, and involved counse lors.

The form of the IPSP test that would be used was presented, and the team

discussed the purpose of th e study and answered questions raised by th e

school personnel.

Pilot Sample

All of the students in grades five through eight in the Malcolm

Price Laboratory School, Cedar Falls, Iowa , were involved in t his pilot

study. The students wer e randomly divided into two groups across grade

levels. Group one consisted of 99 students in grades five through

eight.

Group two consisted of 103 students from the same grade levels.

Howev er, within the gr oups, the fifth and sixth graders were administ er

ed one form of the test wh ile th e seventh and eighth graders completed

another form of the t es t. Concurrently with th e above groupings , each

teacher was asked to divide each of their classes into an upper ability

and lower ability half and to select one "verbal" student from each

half.

This resulted in the selection of 32 students, four from each

grade, to be involved in think aloud interv iews. All of the interviews

8/18/2019 thesis 7_16

52/142

it2

wer e conducted by the inv est igator and followed the intervie w form dis

cussed in the previous s ection. The interview tapes were then coded

according to the quantification code previously described. Students in

group two completed the IPSP tes t after the interviews w ere completed.

Correlation coefficients betwe en interview and IPSP test scores wer e

the n computed. Figure 6 shows t he time schedule for the study.

Schedule of Events for Pilot Interview Study

Date

11/28

11/29

11/30

Activity

paper and pencil test

for group 1

inte rvie w it students from

each grade in group 2

(room it)

paper and pencil test

for group 2

interview

b

s tudents from

each grade in group 1

(room it)

Figure 6

Responsibility

classroom tea cher

the investigator

classroom t eache r

the investigator

Interv iew P roblems

One hundred open-ended ve rbal problems were developed for grades

five through eight independently from those on the IPSP test . The se

problems were reviewed'by members of th e IPSP s taff. Samples from th e

100 problems were administered to six volunteer students in grades five

through eight in think aloud int erv iew s. Information obtained from

these interviews and suggestions from the staff were used in rev ising

8/18/2019 thesis 7_16

53/142

k3

some problems and eliminating othe rs. A pool of 65 problems re sulted.

These problems were then classified into seven levels: level one con

tain ing simple one step word problems and each succeeding level con

taining problems tha t w ere increasingly difficult in both concepts and

computations to be used. Each problem was typed on a half sheet of

paper so the student could do any n eeded computations on that paper.

These problems are included in Appendix C.

The Interviews

The investigator conducted all 32 interviews. Because the int er

views we re taking place during th e regular school day, a rather brief

time limit of 20 minutes per student was allotted. The first five min

utes were used in talking t o th e student about the procedure to be used

and in presenting two sample problems. Students wer e encouraged to talk

but were not given any hints or told whether wha t they wer

thesis 7_16

Documents

Transcript of thesis 7_16