thesis 7_16

download thesis 7_16

of 142

Transcript of thesis 7_16

  • 8/18/2019 thesis 7_16

    1/142

    INFORMATION TO USERS

    This was produced from a copy of a document sent to us for microfilming. While the

    most advanced technological means to photograph and reproduce this document

    have been used, the quality is heavily depe nd ent u pon the quality of the material

    submit ted.

    The following explan ation of techniq ues is provided to help you und erstan d

    markings or notat ions which may appear on this rep roduct ion.

    1. The sign or tar ge t for pages app arently lacking from the doc um ent

    pho togra phe d is Missing Page (s) . If i t was possible to obtain the missing

    page(s) or section, the y are spliced in to the film along w ith adjacent pag es.

    This may have necessitated cutting through an image and duplicating

    adjacent pages to assure you of complete continuity.

    2.  When an image on the fi lm is obliterated with a round black mark i t is an

    indication that the fi lm inspector noticed either blurred copy because of

    movement dur ing exposure, or dupl icate copy. Unless we meant to delete

    copyrighted materials that should not have been fi lmed, you will f ind a

    good image of the page in the adjacent frame.

    3 .

      When a map, drawing or chart , etc. , is part of the material being photo

    graphed the photog rapher has fol lowed a def ini te method in sect ioning

    the material . I t is customary to begin fi lming at the upper left hand corner

    of a large sheet and to continue from left to right in equal sections with

    small overlaps. If necessary, sectioning is continued again—beginning

    below the first row and continuing on until complete.

    4 .  Fo r any i l lustrations tha t ca nn ot be reproduc ed satisfactorily by

    xerography, photographic prints can be purchased at additional cost and

    t ipped into your xerograp hic cop y. Requests can be made to our

    Dissertations Customer Services Department.

    5.  Some pages in any d oc um en t may have indistinct prin t . In all cases we

    have fi lmed the best available copy.

    University

    Microfilms

    International

    3 0 0 N Z E E B R O A D . A N N A R B O R , M l 4 8 1 0 6

    1 8 B E D F O R D R O W , L O N D O N W C 1 R 4 E J , E N G L A N D

  • 8/18/2019 thesis 7_16

    2/142

    8012402

    OEHMKE,

     T HERESA MARIA

    THE DEVELOPMENT

     AN D

     VALIDATION

     OF A

     TESTING INSTRUMENT

    TO MEASURE PROBLEM SOLVING SKILL S

     O F

     CHILDREN

     IN

     GRADES

    FIVE THROUGH EIGHT

    The University

     of

     Iowa

      PH .D. 1979

    University

    Microf i lms

    I n t e r n c i t I O

      PI

     3 .1  300 N. Zeeb

     Road, Ann

     Arbor, MI 48106  18 Bedford Row,  London

     WC1R

     4EJ,  England

    Copyright 1979

    by

    OEHMKE, THERESA MARIA

    All Rights Reserved

  • 8/18/2019 thesis 7_16

    3/142

    THE DEVELOPMENT AND VALIDATION OF A

    TESTIN G INSTRUMENT TO MEASURE PROBLEM SOLVING SKILLS

    OF CHILDREN IN GRADES FIVE THROUGH EIGHT

    by

    Theresa Maria Oehmke

    A thes is submitted in partial fulfillment of the

    requirements for the degree of Doctor of Philosophy

    in Education in the Graduate C ollege of

    The University of Iowa

    December, 1979

    Thes is supervisor: Professor Harold L. Schoen

  • 8/18/2019 thesis 7_16

    4/142

    Graduate College

    The University of Iowa

    Iowa City, Iowa

    CERTIFICATE OF APPROV AL

    PH.D.

     THESIS

    This is to certify that the Ph.D. thesis of

    Theresa Maria Oehmke

    has bee n approved by t he E xamining Committee

    for th e thesis requirement for the Doctor of

    Philosophy degree in Education

    at the December, 1979 graduation.

    Member

    3 supervisor

    Thesis committee:  ffy

    A

    -

    yi

    ~^-0L^&(•  r^oci^-e^z^

    Thesis supervisor

    Member

    Member

  • 8/18/2019 thesis 7_16

    5/142

    DEDICATION

    To Bob and Jim

    i i

  • 8/18/2019 thesis 7_16

    6/142

    ACKNOWLEDGEMENT

    For their help in the preparation of this t hes is , I owe an expres

    sion of appreciation to a large number of persons ; only a few of whom I

    shall menti on by na me.

    To Professor Harold Schoen I extend my thanks for all the guidance,

    motiv ation, direction and assistance he gave me from the initiat ion to

    the completion of th is study.

    Special thanks are due George Immerzeel, Joan Due a, Ea rl Ockenga

    and John Tar r and the personnel at th e Malcolm Price Laboratory School

    for the ir encouragement and support during several phases of this i nve s

    tigation.

    A debt of gratitude to Professors H. D. Hoover and A. N . Hieronymus

    is acknowledged for their assistance on some of the statistical details

    of the te sti ng procedure and for providing me wi th the opportunity to

    collect some of the pertinent data in this study.

    I would like to express my appreciation to William M. Smith for the

    use of his expertis e on t he technical aspects of the use of the computer

    on data processi ng.

    Thanks is given to Ada Burns for her amazing ability to type what I

    thought I ha d writte n.

    Finally, I would like to thank my husband B ob, and son Jim, and all

    my friends w ho we re always ready wi th an encouraging word.

    iii

  • 8/18/2019 thesis 7_16

    7/142

    TABLE OF CONTENTS

    Page

    LIS T OF TAB LE S vi

    LIST OF FIGUR ES vii

    CHAPTER

    I. INT ROD UCT ION 1

    Purpose 3

    Ov er vi ew of the Study 3

    IPSP Test Development  h

    I P S P T e s t V a l i da t i on . . . . . 5

    Reliability 6

    Conte nt Validit y 8

    Concurre nt Validity 8

    Dis crimin ant Validity 9

    Oper at iona l Defin iti ons Used in This Study 10

    O v e r v i e w 1 1

    I I .   REV IEW OF TH E LITE RATU RE 12

    Mode ls of Pr oblem Solvin g 13

    Te st in g th e Abi lit y to Solve Problems 19

    Summary 22

    I I I .

      D EV E L O P M E N T A N D P R O C E D U R E S 23

    Pur pose of the Iowa Problem Solvin g Pr oje ct 23

    De ve lopment of th e IPS P Test 2*1

    De si gn and Dev elopment of Inte rvie w Pr ocedure s 28

    The Q uant ificat ion Scale 30

    Development 30

    Use of the Scale

      3b

    Th e Pi lot In te rv ie w Study 1+0

    Procedures ^0

    Pilot Sample

      k l

    Interview Problems

      k2

    The Inter views ^3

    The Final Interview Study  kk

    ITBS an d the IPSP Test ^5

    IPSP Subtest Discrimination

      h9

    iv

  • 8/18/2019 thesis 7_16

    8/142

    CHAPTER Page

    IV. DATA AND RESULTS 50

    IPSP Test and Interview Data 50

    Pilot Interview Study 51

    Final Interview Study 53

    ITBS and the IPSP Test 55

    IPSP Test Administration 55

    ITBS Subtests 60

    Correlations Between IPSP Subtests and ITBS

    Subtests 60

    IPSP Subtest Discrimination 62

    V. ANALYSES AND IMPLICATIONS 72

    Summary and Conclusions 72

    Phase 1: The Final Interview Study 73

    Phase 2: ITBS and the IPSP Test 73

    Phase 3: IPSP Subtest Discrimination 7^

    Limitations 7^

    Classroom Implications . 76

    Implications for Resea rch 8l

    IPSP Test 82

    APPENDIX A. DESCRIPTION OF THE IOWA PROBLEM SOLVING PROJECT . . . . 85

    APPENDIX B. PHASES OF IPSP TEST DEVELOPMENT AND SUMMARY OF TEST

    FORMS 9^

    APPENDIX C. INTERVIEW PROBLEMS USED IN THE PILOT STUDY 98

    APPENDIX D. IOWA TEST OF BASIC SKILLS RELIABILITY ANAL YSIS

    NATIONAL REPRESENTATIVE SAMPLE 106

    APPENDIX E. FORMULA FOR ITBS CORRELATIONS CORRECTED FOR

    ATTENUATION 10 8

    APPENDIX F. ANALYSIS OF IPSP TEST RESULTS OCTOBER 1978

    AND MARCH 1979 ADMINISTRATIONS Ill

    APPENDIX G. PILO T TEACHING STUDY 120

    BIBLIOGRAPHY 125

    v

  • 8/18/2019 thesis 7_16

    9/142

    LIST OF TABLES

    Table Page

    1. Phases of Validation of the IPSP Test 27

    2.  Analysis of IPSP Test with Interviews-Pilot Study 52

    3. Analysis of IPSP Test with Interviews—Final Study 5^

    b .

      Reliability Analysis 56

    5. Reliability Analysis « 57

    6. Reliability Analysis 58

    7.  Reliability Analysis 59

    8. Correlations Between IPSP Subtests and Iowa Test of Basic

    Skills Tests 6l

    9- Reliability Analysis Sample of Iowa Students October 1978 . . 68

    10.  Reliability Analysis Sample of Iowa Students March 1979 . . . 69

    11.

      Correlations Corrected for Attenuation of October 1978 IPSP

    Subtests 70

    12.  Iowa Test of Basic Skills Reliability Analysis Nati onal

    Representative Sample 107

    V I

  • 8/18/2019 thesis 7_16

    10/142

    LIST OF FIGURES

    Figure Page

    1. Steps and Component Skills of the IPSP Test Model , 5

    2.

      Interviewing Procedures 31

    3. Quantification Scheme for the Components of the IPSP Test

    Model Step 1: Getting to Know the Problem 35

    b .  Quantification Scheme for the Components of the IPSP Test

    Model Step 3: Carry ing Out the Plan 36

    5- Quantification Scheme for the Components of the IPSP Test

    Model Step it:  Looking Back 37

    6. Schedule of Events for Pilot I ntervi ew Study

      b 2

    7.  Schedule of Events for Fina l Interview Study  b G

    8. Square of Correlations Corrected for Attenuation Between

    IPSP Steps and ITBS Subtests Form 5 6 3, Grade 5 63

    9. Square of Correlations Corrected for Attenuation Between

    IPSP Steps and ITBS Subtests Form 5 6 3, Grade 6   6b

    10.

      Square of Correlations Corrected for Attenuation Between

    IPSP Steps and ITBS Subtests Form 783, Grade 7 65

    11.

      Square of Correlations Corrected for Attenuation Between

    IPSP Steps and ITBS Subtests Form 5 82 , Grade 8   66

    12.

      Ann's IPSP Test Profile 77

    13-

      Dave's IPSP Test Profile 78

    lU.  Schedule for Pilot Study 122

    vii

  • 8/18/2019 thesis 7_16

    11/142

    1

    CHAPTER I

    INTRODUCTION

    The question of wha t processes are involved in problem solving and

    more particularly in mathematical problem solving is one tha t has been

    investigated for many ye ar s. As one surveys the literature, one not only

    becomes aware of the vast quantity of research that is being done but

    also of the many and diver se methods used to study the problem solving

    processes.

    Studies range from simply observi ng individual students as they

    solve problems to factor analysis of paper-pencil measures of problem

    solving. A data-gathering method that has come into prominen t use today

    is the structured one-to-one in terview. In such case s, the researcher

    observes the behav ior of th e subject as he "thinks aloud" wh ile solving

    selected problems. Typically the interview s ar e audio- or video-taped

    and a protocol or check list of processes is completed for each student-

    problem pair. Other approaches include simulating the problem solving

    processes of a human being on a computer and using the re sults to build

    a problem solving theory . In addition, results and methods of rese arch

    ers investigating th e cognitive processes involved in gene ral problem

    solving often have application for mathematical problem solvin g.

    In an attempt to analyze and thereby further understand the problem

  • 8/18/2019 thesis 7_16

    12/142

    solving process r esea rchers hav e proposed various multiste p models.

    One of the earliest wa s proposed by Wallas

      (1926).

      Based on his own

    experiences and his analyses of what others think they are doing when

    they solve problems, Wallas suggested a four-step model: prepa ra tion,

    incubation, illuminat ion, and verification. Drawing upon many years of

    experience as a mathematician and a t eache r, Polya  (1957, 1962) pro

    posed another four-step model: understa nd the problem, make a plan ,

    carry out the plan , and look back at the complete solution. Res tle and

    Davis (1962) suggested t hat the problem solver goes through a number of

    independent but s equentia l stages . The student solves a subproblem at

    each s tage, thereby allowing him to go on to the next step. Thes e and

    other multistep models hav e appeared in the literature wit h varyin g

    degrees of supporting empirical evidence.

    If problem solving is a multis tep process t he question th en

    arises:

      would it be possible to measure a person's ability (or skill)

    at each of the steps? If there wer e a reliable testing instrument

    which measured this ability , of what value would it be t o the classroom

    teacher? Would the te acher be able to use such a test to help plan for

    problem solving instruction? Of course , the one-to-one inte rvi ew str at

    egy is available to evaluate a child's problem solving process es but the

    teacher does not have the time to conduct interviews with all the stu

    dents in the classroom. In a ddition, the interview has been criticized

    for yielding results wh ich are subjective and unreliable.

    There is little in the literature concerning paper-pencil testing

    instruments which purport t o measure problem solving processe s. Man y

    existing group adminis tered tests contain problem solving or

  • 8/18/2019 thesis 7_16

    13/142

    3

    application sections which consist of verbal problems. For example,

    the Iowa Test of Basic Skills (Houghton, Mifflin Co.,   19 l b )  and the

    Metropolitan Achievement Test s (Harcourt, Brace and Jovanovich, 1971)

    contain subtests designed to measure problem solving ability. How ev er ,

    the information these tests provide, namely, the grade-level equivalent

    or percentile rank of the student and class in a larger group, is de

    rived from the number of correct ans wer s. No attempt is made to mea

    sure the process that was used to arrive at the solution or to identify

    specific skills which may be the source of the child's difficulties.

    Purpose

    At

    As part of the Iowa Problem Solving Project (IPSP) , Schoen and

    Oehmke developed a multiple choice paper-pencil test designed t o pro

    vide individual and class profiles which illustrate the performance of

    fifth through eighth graders at each of three steps of the problem solv

    ing process. A modified version of the four-step problem solving pro

    cess model proposed by Polya served as th e IPSP testing model.

    The purpose of this study is th e validation of this tes ting instr u

    ment called the Iowa Problem Solving Project Test (IPSP test ).

    Overview of the Study

    The steps in the validation process are listed below.

    1. Compute estimates of the reliability coefficients for the IPSP

    test and its three subtests.

    'A three-year project directed by George Immerzeel of the University of

    Northern Iowa and funded under ISE A, Title I V, C.

  • 8/18/2019 thesis 7_16

    14/142

    b

    2.

      Judge the content validity of the IPSP test using the j udg

    ments of mathematics educators and educational measurement specialist s.

    3. Judge the concurrent validity by measuring the r elationship

    between the IPSP subtest scores and a) the results of a think aloud

    measurement procedure, and b) the mathematics concepts, mathematics

    problem solving, reading compreh ens ion, and graph skills subtest scores

    of the Iowa Test of Basic Skills.

    b .  Judge the discriminant validity of the IPSP test by analyzing

    a matrix of correlation coefficient s, corrected for atte nuation,

    between pairs of the t hree IPSP subtests and the four ITBS subtests .

    IPSP Test Development

    Several "non-standard" instruments have been developed to test

    problem solving ability (Wear ne, 1 976 ; Zalews ki, 1975; Lucas,  1972).

    These will be described in Chapter two. Howev er , the IPSP test appears

    to be the first attempt t o measure skills in a multistep model. A

     mul

    tiple choice format wa s chosen in order to maximize th e potential for

    broad, long-range impact on classroom practices. It was reasoned that

    a) standardized machine-scored tests presently ha ve a great influence on

    curriculum and ins truction, and b) in this format the IPSP test may hav e

    some influence on standardized testing. Furth ermore, the machine scor

    ing capability is likely to increase the number of potential users .

    After a good deal of "brain storming" wit h members of fie IPSP te am, a

    search of the problem s olving literature and a detailed analysis of

    each s tage, the testing model as shown in Figure 1 was developed.

  • 8/18/2019 thesis 7_16

    15/142

    5

    1. Get to Know the Problem

    A. Determine insufficient information

    B.

      Identify extraneous information

    C. Write a question for the problem setting

    2.

      Choose What to Do from a List of Strategies

    3. Do It

    A. Choose the necess ary computation

    B.

      Estimate from a diagram

    C. Compute from a diagram

    D. Use a table

    E.  Compute from an equation

    b

    . Look Back

    A. Identify problems that can be solved in the same

    way as a given one

    B.

      Vary conditions in a given problem

    C. Check a solution with the conditions of the

    problem

    Steps and Component Skills of the IPSP Test Model

    Figure 1

    Like standardized tests the IPSP test can be efficiently a dminis

    ter ed to large groups of students and machine scored with va rious n orm

    data easily obtainable. In additi on, subtest scores corresponding to

    steps 1 , 3, and  b can be obtained. After nearly two years of effort,

    no via'ble way to test skills in step 2 in a multiple-choice format w as

    found.

    IPSP Test Validation

    As an innovative measurement instrument containing many items wit h

    untested structural characteris tics and designed to measure constructs

    nev er "before measured with paper-pencil in str ument, the IPSP test

    development called for much careful planning and formative evaluation.

  • 8/18/2019 thesis 7_16

    16/142

    6

    Maj or questions concerned the validity of the te st , if indeed, a  reli

    able test wit h reliable subtests could be constructed. By utilizing

    the Iowa Testing Program's tryout facilities it was possible to con

    struct experimental test un it s, administe r th em to representative sam

    ples of Iowa fifth through eighth grade rs , and revise the units based

    on the item analyses and test data. Als o, over 100 students

    wer e interviewed at various stages in the test development process as a

    concurrent check on the test validity. Over a three-year period the

    test evolved into its present form.

    For th e present study esti mates of reliability and of several

    types of validity of the final form of the IPSP test we re obtained:

    (l) content va lidity, (2) concurrent validity , and (3) discriminant

    validity. A general description of the procedures for each of the

    above areas follows. A detailed description of the procedures and

    findings are in later chapters.

    Reliability

    The concept of reliability r efers to an estimation of the degree

    of consistency of measurement. Theoretically, if a test is reliable,

    an individual should obtain the sa me, or almost the same, score on

    repeated administrations. Many factors may affect a student's observed

    score to make it different than t he theoretically true score. Thes e

    "errors"

     of measurement may be caused by the test itself, i.e., a par

    ticular sample of items may be a "good" or "bad" s ample; by attributes

    of the person taking the tes t, i.e., motivation, attitude, health, test-

    wiseness;

     or by administrative conditions and procedures at the time of

  • 8/18/2019 thesis 7_16

    17/142

    7

    the t est , i.e. , noise , distractions, poor lighting, or lack of uniform

    ity in giving test ins tructions. Th e most difficult factors to control

    are t he subject's attributes.

    Reliability of a score for a testing sample is defined as th e ra tio

    of the variance of the true score to t he variance of the observed score:

    2

    r

    t t 2

    S

    t

    2

    where r

    JX

      = reliability of the te st , s

    m

      = variance of the truett T

    2

    score,

     and s = variance of obtained score. Reliability is also re -

    ferred to in the literature as a correlation between scores on the same

    test or parallel tes ts.

    One of the most commonly use d methods of estimating reliability is

    the Cronbach alpha. If the data are in dichotomous form, this estimate

    is equivalent to the Kuder-Richardson 20 coefficient. It may be calcu

    lated by the following formula:

    2

    « s r r

      1

     A>

    2 2

    where s is the variance of the sum over the k items and s is the

    J.   Xi

    average item variance. An advantage of this coefficient is that it

    computes all ways a given test might be split and the n gives a "best

    estimate" reliability. In this study, reliability coefficients were

    estimated using either th e Cronbach alpha formula or a modified vers ion

    of the KR-8 which is a formula in th e series of 20 developed by Kuder-

    Richardson

     (1937)•

  • 8/18/2019 thesis 7_16

    18/142

    8

    Content Validity

    The basic content validity question i s: Are the i tems in question

    a representat ive sample of the construct or subject matter domain to be

    measured? A familiar example of a need for high content validity is

    the ty pical classroom tes t. Here t he individual's score on a sample of

    items from a content domain is used to infer t he students' achievement

    level in the entire domain. Thus the assessment of content validity

    involves careful judgment of the degree to which t he selection of items

    is appropriate and representativ e of th e domain or construct to be

    tested.

    Content validity is usually assessed by th e judgment of experts ,

    that i s, subject matter and testing specialists. A limitation of this

    method'is that ther e are no clearly specified techniques or standards

    for determining content validity.

    The reaction of educational test ing experts and mathematics educa

    tors to the IPSP testin g model and sample test units was sought. The

    testi ng experts were two faculty members in Educational Measurement at

    The U niversity of Iowa who also are authors of the Iowa Test of Basic

    Skills.  The mathematics educators were the IPSP project team and ad

    visory board member s, both composed of mathematics teachers in grade

    five through graduate mathematics education.

    Concurrent Validity

    Concurrent validity refers t o the degree to which a test corre

    lates with th is independent measure. It is the determining factor in

    deciding whet her a test can replace procedures th at are ei ther more

  • 8/18/2019 thesis 7_16

    19/142

    elaborate or require special techniques as in the case of the IPSP test

    and the one-to-one interview method. One w ay of demonstrating a test's

    concurrent validity is to compare the t est scores with some independent

    measure which is presumed to measure t he behav ior in question.

    In thi s study data from interviews and from the IPSP t est were

    gathered concurrently. The interview measure consisted of presenting a

    series of problems t o a child and asking him to th ink aloud as he at

    tempted to solve each problem. The data were th en obtained by coding

    and a nalyzing t he solution strategies from obser vat ions, audio-tapes,

    and th e s ubjects' writ ten work. Thus one estimate of the concurrent

    validity of the IPSP test is the correlation between the score in each

    of th e three IPSP subtests and corresponding data collected via the

    think aloud procedure.

    Another estimate of concurrent v alidity wa s derived from the IPSP

    test's relationship to several standardized achievement measures . The

    criteria selected were four Iowa Test of Basic Skills subtests: mathe

    matics concepts, mathematics problem solving, reading comprehen sion,

    and graph skills.

    Discriminant Validity

    Campbell and Fiske (1959) state tha t validation usually proceeds

    by convergent meth ods, i.e., independent measurement techniques de

    signed to measure the same trait. Howev er, for purposes of test inter

    pretat ion, discriminant validity is also required; that i s , not only

    should a test be highly correlated with other test s purporting to mea

    sure the same trait but it should not correlate highly wi th tests that

    measure distinctly different t raits . In the case of the IPSP t es t, the

  • 8/18/2019 thesis 7_16

    20/142

    10

    discriminant validity issue refers to the degree to which the subtest

    scores differ from each other and from scores on other similar tes ts.

    Discriminan t validity in this study is approached through th e use

    of matrices of correlations corrected for atte nuation. Intercorrelated

    variables include steps 1 , 3, and

      b

     of the IPSP test and the afore

    mentioned subtests of the

     ITBS.

      In particular, the three IPSP subtests

    should not be highly correlated with each other nor w ith the ITBS

    subtests.

    Operational Definitions Used in This Study

    1. Problem solving

    To search consciously for some action appropriat e to att aining a

    clearly conceived, but not immediately atta inable aim. To solve

    a problem means to find such action (Polya,  1962),

    2.

      Cogniti ve processes

    Actions of cognitive performance such as per ceiv ing, remember

    ing, thi nking, and desiring that depend on t he subject's per

    formance capacities. These processes ar e not directly observ

    able but are presumed to underlie a person's behavior whe n faced

    with a cognitive task.

    3. Four-step problem solving test model

    For purposes of this study Figure 1 defines the model. Demon

    stration of the skills in each of steps 1, 3, and  b  is taken

    as evidence of a student's ability in that ste p,

    it.  Problem solving processes

    a) the cognitive processes presumed to underlie or guide the

  • 8/18/2019 thesis 7_16

    21/142

    11

    subje ct's choices of solution stra tegie s;

    b) th e observable actions or operations the subject actually

    adopts to attempt to solve the problem. Also called problem

    solving solution strategies. It should be clear from the

    context w hich meaning applies.

    Overview

    Prev ious approaches to problem solving resea rch and attempts to

    measure problem solving are discussed in Chapter I I , Review of the

    Literature. Chapter III includes a discussion of the theoretical

    framework and design of the IPSP t es t, procedures use d to develop the

    verbal problems, scoring methods for the think aloud in ter vie ws, and

    data gathering procedures. The results are discussed in Chapter IV.

    A summary of th e v alidation re sults, implications for future res ear ch,

    and implications for the teaching of mathematical problem solving are

    discussed i n Chapter V.

  • 8/18/2019 thesis 7_16

    22/142

    12

    CHAPTER II

    REVIEW OF THE LITERATURE

    Since the purpose of this paper is the development and validation

    of an instrument to test mathematical problem solving, first , various

    models that mathematical educators, mathemat ician s, psychologist s, and

    other researchers have used to study the problem solving processes will

    be discussed, and s econd, testing instruments developed by other re

    searchers t o measure problem solving processes wi ll be summarized. The

    reader who is interested i n a more detailed discussion of heuri sti cs,

    str ategies, task ana lysis , structures and other pertinent factors in

    volved in the problem solving processes is referred t o one or more of

    the following rev iew s.

    A collection of papers edited by Kleinmuntz (1966) includes

     dis

    cussions of particular aspects of problem solving, research and theory..

    Davis (1966) gives a comprehensive survey of research and theory rela

    tive to tra ditional lear ning, cognitive and Gestalt approaches to prob

    lem solving, and computer and mathematical models of problem solving.

    Kilpatrick (1967) presents an extensive discussion of the defini

    tion of heuri sti cs. He describes in detail the notion of problem solv

    ing as a search process

     ,

     the thinking aloud t echn ique, and other meth

    ods used to study problem solving processes. He also develops a

    coding sy stem for tran scribing t he problem solving protocol of subjects

  • 8/18/2019 thesis 7_16

    23/142

    13

    from audio-tape to paper as t hey use the think aloud techn ique.

    Hollander's (1973) review focuses on studies related to word prob

    lems for students in grades three through e ight. The review includes

    studies carried out from 1922 to 1969 in seven categorie s: problem

    analysis, computati on, general reading ability, specific reading skills,

    specificity of the problem st ate ment , the problem sit uation, and lan

    guage factors.

    Webb (197 5) notes that

    ...Because of the complexity of problem solving

    processes and the n umber of variables associated

    with problem solving, research in this ar ea has

    been too diverse t o have any real consolida

    tion. ..(p.

     l)

    His review focuses on studies tha t involve problem solving tasks and

    strat egies. These studies we re conducted from 1967 to 1973 and in

    volved students in grades three through eight.

    Lucas (1972) discusses the nature of problem solving, the search

    mode used by information processors , some formal models of the problem

    solving process and some techniques used by earlier resea rchers to

    externalize thought processe s.

    Models of Problem Solving

    The attempt to describe the thought processes used in math ematical

    problem solving is not a new quest. For many years mathematicians have

    sought to determine and understand, the thought processes the y use in

    discovering new math emat ics. Some accounts discuss in great detail

    th e individual thought processes use d in formulating and proving math e

    matical conj ectures.

  • 8/18/2019 thesis 7_16

    24/142

    l i t

    The mathematician Henri Poincare

      (l9lit),

     in attempting "to see

    what h appens in the very soul of the mathematician," gives an explicit

    account of his per sonal recollection of the discovery of the Fuchsian

    functions. After his discussion he states:

    ...As regards my other resea rches, the account I

    should give would be exactly s imilar , and th e ob

    servations related by other mathematicians in the

    inquiry of I'Enseignement Mat hemati que

    1

     would only

    confirm th em.

    One is at on.ce struck by t hes e appeara nces of

    sudden illumination, obvious indications of a long

    course of previous unconscious work in mathe mati

    cal discovery seems t o me indisputable, and we

    shall find traces of it in other cases wher e it is

    less evident...(p. 5 5 ).

    "The appearances of sudden illumination" recounted by Poincare was

    also cited by Gauss in referring to a theorem on which he h ad worked for

    years.  He sta te s, "Like a sudden flash of lightning the riddle happened

    to be solved."

    "Sudden illumination" is the third s tage of the psychologist

    Wallas'  (1926) model. In an attempt to analyze and thereby

    further understand the problem solving process , Wallas observed accounts

    of thought processes related to him by his st udent s, colleagues and

    friends.

      It was the account of the German physicist Helmholz that in

    spired Wallas to describe thr ee stages in the formation of new thought:

    preparat ion: the first stage during which th e problem is investigated

    and all the facts are gathered,

    incubation: the second stage during which one rests from any conscious

    thought about the problem at hand and/or consciously

    X

    A review w hich instituted an inquiry into the habits of mind and meth

    ods of work of mathematicians during the early 20th century.

  • 8/18/2019 thesis 7_16

    25/142

    15

    thinks of another problem,

    illumination: the th ircPstage during which th e idea and/or solution ap

    pears as a 'flash' or 'aha'.

    Wallas added a fourth stage, verification, during whi ch th e validity of

    the idea is test ed: which Helmholz did not describe but which Poincare

    vivi dly describes in his accounts.

    Polya (1957,1962) also developed a four-step model for problem

    solving. His principal aim in advocating a heurist ic approach t o prob

    lem solving wa s to enable the student t o ask questions w hich focus on

    the essence of the problem. Drawing upon many year s of experience as a

    mathematician and a teacher of mathemati cs, he describes the following

    four-step model:

    (1) Understa nding the problem: The student tries to understand

    the problem by looking at the data and asking th e questi ons, Is it pos

    sible to satisfy the conditions of the problem?, Is there redundant or

    insufficient data?

    (2) Devising a plan: The student tries t o find a connection be

    twe en th e data and the unknown; the student should eventually choose a

    plan or strategy for t he solution.

    (3) Carrying out the plan: The student carries out the plan,

    checking each step along the way.

    (it) Looking back: The student examines the solution by checking

    the r esults and/or arguments. The student also attempts to relate the

    method or re sult to other problems.

    In a discussion of Polya's model, May er (1977) points out th at some of

    Polya's ideas (restating the given and th e goal) are examples of the

  • 8/18/2019 thesis 7_16

    26/142

    16

    Gestalt idea of "restructuring." He concludes that:

    ...while Polya gives many excellent intuitions

    about how the restructuring event occurs and how

    to encourage it , the concept is still a vague one

    that has not been experimentally we ll studied...

    (p. 67)

    The earlier mathematicians, as w ell as Wallas and Polya, based

    their accounts of the problem solving process on either introspection

    or ret rospection. That i s , the problem solvers reported on their

    thought processes as they worked (introspection) or they recalled these

    thought processes after they had completed the problem (ret rospection).

    Kilpatrick (1967) finds difficulties with both of these methods. Of

    introspection h e sta tes , "but psychologists soon were raising questions

    about th e nature and magnitude of th e distortion introduced by requir

    ing the subject; to observe himself th inking." His criticism of ret ro

    spection refers to Bloom and Broder (1950)

     ,

     who stated that the diffi

    culties w ith retrospection lie in remembering all the steps in one's

    thought processe s, including errors and blind alley s, and in reproduc

    ing the se steps without rearranging th em into a more cohere nt, logical

    order.

    It appears that Claparede (1917;193*0 was the first to use a third

    approach, t he "think aloud" technique (Kilpatr ick, 1967).  This technique

    does not require t he subjects to think and observe themselves thinking

    at the same time. The subjects are asked to vocalize their thought pro

    cesses as they are thinking. Hence the subjects do not have to analyze

    thei r thought processes nor are they required t o have special trai ning.

    Ther e ar e, howeve r, potentia l difficulties wit h the think aloud tech

    nique:

     interference of speech with th inking, the lapse into silence whe n

  • 8/18/2019 thesis 7_16

    27/142

    17

    the subject is deeply engrossed in thought, and the essential difference

    between the verbalized solution and the one found silently. Kilpatrick

    (1967) summarizes the views of several authors (Rota, 1966 ; Brunk, Col-

    lister,

     Swift and Slayton, 1 958 ; Gagne and Smith, 1 962 ; Dansereau and

    Gugg, 1966) concerning thes e difficulties and concludes tha t:

    ...The method of thinking aloud has the special

    virtues of being both productive and easy t o use .

    If the subject understands what is wa nt ed—th at

    he is not only to solve the problem but also to

    tell how he goes about finding a soluti on—an d if

    the method is used with an awareness of its

     limi

    tati on, then one can obtain detailed information

    about thought processes

     ...(p.

     8 ) .

    One of the first attempts to systematically gather empirical evi

    dence wa s by Duncker (19^5 ) who studied th e problem solving protocol of

    subjects w ho were given a problem and asked to "think aloud." Two of

    the problems that he used wer e the tumor problem:

    ...Given a human being wi th an inoperable stomach

    tumor, and rays which destroy organic tissues at

    sufficient int ens ity , by what procedure can one

    free him of the tumor by thes e rays and at the

    same time avoid destroying the healthy tissue which

    surrounds it

     ?... p.

      1).

    and the 13 problem:

    ...Why are all six place numbers of the form 276 ,2 76,

    591,591,

     112,112 divisible by 1 3?...(p. 31 ).

    Duncker illustrated a t ypical solution protocol for th e tumor prob

    lem with a flow chart and observed that the problem solving process

    starts from a gener al solution, then progresses to a functional solution

    and then to a specific solution.

    In a more recent attempt to gather data empirically, Restle and

    Davis (1962) developed a model which describes th e subject as going

  • 8/18/2019 thesis 7_16

    28/142

    18

    through sequential stages when solving a problem. Each stage is a sub-

    problem with its own subgoal.

      Thus,

     the individual solves a sequence

    of subproblems which then enables him to continue on to the next sta ge.

    The model states that the number of st ages , k, for any given problem

    can be determined by the square of the average time to solution, t ,

    divided by the square of the standard deviation, s, of the time to

    2 2

    solution, or k = t /s . They do not describe the stages, and assume

    that each is of equal difficulty.

    Simon (1975) and his associates hav e also investigated thought

    processes used in problem solving. He used laboratory conditions to

    observe human beings working on well-structured problems that the sub

    jects find difficult but not unsolvable. He states the following broad

    characteristics of the information processing system which he uses to

    describe human problem solving:

    (1) serial processing, one process at a time;

    (2) small short te rm memory capacity for only a few symbols; and

    (3) infinite long term memory with fact retrieval but slow storage.

    Simon further states that the solver always appears to search

    sequentially, adding small successive accretions to his store of infor

    mation about the problem and its solution.

    These models and other multiste p models (e.g., Dew ey, 1 933; John

    son, 1955 ; Gagne, 1962; Guilford, 1967; Kilpatrick, 1969; Post,

    1968;

     Webb, 19TU, 1977; Les ter , 1975) ha ve appeared in the literature

    with varying degrees of supporting empirical evidence. Such models

    suggest a number of questions about teaching and measuring problem

    solving skills. If problem solving is a multistep

  • 8/18/2019 thesis 7_16

    29/142

    19

    process,

     would it be possible to measure a person's ability at each

    step? Would this in formation be more useful to a teacher tha n just the

    single number right, pe rcent ile, or grade level equivalent score?

    How can this type of evaluation be effected? Can paper-pencil tests be

    used or is a one-to-one interview assessment necessary? A paper and

    pencil test can be a v er y accurate and efficient evaluation in str ument,

    especially in the case of easily measured skills. However , the complex

    process of problem solving is more difficult to evaluate.

    Test ing the Ability to Solve Problems

    In contrast to both the we alth and diversity of research in the

    problem solving process , the re is a scarcity of instruments to measure

    these processes. Johnson (l96l) cites the dearth of instruments to mea

    sure the problem solving process in the N ational Council of Teachers of

    Mathematics year book on Eva luation in Mathemathics:

    ...The committee would hav e liked t o include ma

    terial on th e appraisal of mental processes in the

    learning of mathemati cs. As teachers of mathemat

    ics,

     we are deeply concerned about developing

    skill in productiv e thinking. Too often, many of

    us find ourselves knowing little about t he rela

    tions betw een the solutions given by our students

    and the thought processes that led to these solu

    tions.  Howe ver , tests for appraising higher men

    tal processes such as concept formation, problem

    solving, and creative thinking in mathematics do

    not

     exis t... (pp.

     2,3).

    Recently, efforts have been made to develop instruments that me a

    sure problem solving processe s. Spee die, Treffinger, and Feldhusen

    (1973) summarized what they and other authors (Ray, 195 5 ; John , 195 ^ ;

    Ke isler, 1969) vie w as characteristics of a good problem solving process

    test.  Three of these characteristics are :

  • 8/18/2019 thesis 7_16

    30/142

    20

    1. The test should yield a variety of continuous measures con

    cerning the outcomes of th e problem solving, the processe s,

    and the intellectual skills involved.

    2.

      The test should demonstrat e both reliability and validit y.

    3. The test should be practical for group administrat ion.

    Many standardized group administered t est s, such as the Iowa Test

    of Basic Skills and the Metropolitan Achievement Tes ts, contain subtests

    designed to measure problem solving ability. This measurement is ob

    tained from a single score , i.e., the number of correct ans wer s. No

    attempt is made to decompose either the v erbal problem it self, or the

    solution, into component skills so that one can identify the specific

    skill which may be the source of the student's difficulty.

    Proudfit (1978) cites other limitations of standardized te st s. She

    lists eight tests that wer e examined by Charles and Moses (Webb & Mos es ,

    1977) who concluded that most of the items did not measure problem solv

    ing processes but emphasized application of either previously learned

    skills

     or

     algorithms. In additi on, Proudfit examined other problem solv

    ing instr uments, e.g., the Purdue Elementary Problem Solving In vent ory,

    and concluded that the y did not meet the previously stated crite ria on

    at least one of the thre e counts.

    Post

      (1968),

     using a four step model (recognition, ana lysi s, pro

    duction, verification) designed a problem solving test which is in a

    multiple-choice format. Howe ver , the scores on the test are measures

    of the end product and not of any particular step in his model. His

    content v alidation procedure used the judgment of a panel of expert s,

    and his split h alf reliabilities ranged from .60 - .82.

  • 8/18/2019 thesis 7_16

    31/142

    21

    Foster (1972) developed a problem solving test which had a

     mul

    tiple-choice format for some items and an open-ended format for other

    items.

      Some of the items are designed to measure process more than

    product, but t he test is not machine scorable.

    Hiatt (1970) discusses the need for measuring mathematical problem

    solving processes . He designed a "creative problem solving" test in

    which points are awarded on the basis of approaches use d, e.g., more

    points are awarded if the student does some computation mentally.

    Since this test is an open-ended format, it would be difficult to admin

    ister to large groups.

    Wearne (1977) developed a test of problem solving behavior which

    provides information about th e child's master y of the prerequisite mat h

    ematical concepts in the problems. Each ite m, called a super-item, is

    composed of thre e par ts : a comprehension question, an application ques

    tion,

     and a problem solving question. The comprehen sion question

    assesses th e child's understanding of the problem set ting. The appli

    cation question asses ses the child's understanding of the concept wh ich

    is presumed to be a prerequisite to solving the problem, and finally,

    the problem is solved. How ev er , it was difficult to substantiate the

    assertion that th e comprehen sion and application items ar e assessing

    prerequisites of th e problem solving ite ms.

    Proudfit (1978) has designed a problem solving process test in

    which the child is given three problems, asked to select one , solve i t,

    and then answer a list of 12 questions before going on to the next prob

    lem. The questions refer to the child's solution processes . Two of the

    questions are open ended and the other 10 are in a multiple-choice

  • 8/18/2019 thesis 7_16

    32/142

    22

    format.  The test was administered to 100 fifth grade students but no

    formal reliability or validity data were reported.

    Zalewski (197*0 developed a set of verbal problems w hich he admin

    istered to a group of seventh graders. His purpose was t o predict th e

    results of problem solving process assessments using the t hink aloud

    procedure and coding scheme from hi s paper and pencil te st . For th e

    interview test he used Lucas ' point system which gave 1 point for

    'Approach', 2 points for 'Plan', and 2 points for 'Result'. The wri tt en

    test was scored using the number of correct answers. The correlation

    between the rankings on the writt en tests and the interviews was .6 8,

    below t he criterion Zalewski h ad set prior to his study.

    Summary

    Many resea rchers, including psychologists, mathematicians and edu

    cators,  have investigated t he problem solving processes usi ng multis tep

    models.

      Their concensus is that th e multiste p model is a valid model

    for t he investigation of the problem solving processes. In a ddition,

    not only is there a s carcity of instruments to evaluate these models ,

    but also a scarcity of instruments to measure problem solving processe s

    in a machine scorable format. The IPSP test is an instrument developed

    to measure problem solving skills in a machine scorable format. The

    validation of this test is the purpose of this study.

  • 8/18/2019 thesis 7_16

    33/142

    23

    CHAPTER III

    DEVELOPMENT AND PROCEDURES

    Purpose of the Iowa Problem Solving Proj ect

    The Iowa Problem Solving Project (IPSP) is a thre e year project

    directed by George Immerzeel of the Uni versity of Northe rn Iowa and

    funded under IS EA , Title IV, C. Its primary purpose is to deve lop,

     eval

    uate,

     and disseminate materials to improve t he math ematical problem

    solving abilities of students in grades five through eight.

    The approach advocated by th e IPSP staff involves both the t each

    ing of specific solution strategies and the solving of many interes ting

    verbal problems wi th increasing difficulty levels. Eight instructional

    modules have bee n developed to help the student build a variety of

    skills and strategies necessar y to successfully solve v erbal problems.

    Each module consists of a booklet and a card deck. Th e booklet provides

    instructional activiti es aimed at developing a particular skill while

    the 100-problem card deck provides practice in solving problems of va ry

    ing difficulty levels wh ich are especially designed for that skill. Two

    IPSP members (Schoen a nd Oehmke) wer e assigned the t ask of developing a

    measurement instrument based on the IPSP problem solving model. A more

    complete description of the IPS P proposal, modules, and teaching stra t

    egies is given in Appendix A.

  • 8/18/2019 thesis 7_16

    34/142

    2it

    Development of the IPSP Test

    After several meetings with the IPSP team and advisory board to

    discuss the purpose, goals and philosophy of th e IPSP te st , Schoen and

    Oehmke began th e tas k of developing a problem solving process t es t. The

    testing model which was used, after several revis ions, is given in Chap

    ter I (figure l) .  It should be noted that while t here we re frequent

    consultations among the sta ff, the IPSP test and th e IPSP modules were

    designed and developed independently; the modules w er e developed at the

    University of Northern Iowa and the testing instrument was developed at

    The University of Iowa.

    Three of the more important constraints th at were imposed on the

    testing instruments we re : (l) th e format should be multiple-choice so

    that the test can be machine scored, (2) the items should measure prob

    lem solving subskills and not just the ability to get a final answer,

    and (3) the test should be based on the IPSP tes ting model as developed

    by the IPSP team.

    A search of the literature was completed to locate instruments

    which measure problem solving processes and any items which we re found

    were classified according to the IPSP testing model. Instruments were

    found whi ch we re in an open-ended or multiple-choice format or some

    combination of both (Kilpatrick, 1967; Post, 196 8; Hia tt , 1970; Foster,

    1972;

     Hollander, 1973; James, 1973; Zalewski,   1 9 l b ;  Wearne, 1976;

    Krutetski,

     1976).

      In many cases it was difficult to classify the items

    according to the specific subskills of the IP SP te sti ng model. Those

    items tha t seemed to be most amenable to rev ision we re selected and re

    written to conform to a specific step of the model. In addition, many

  • 8/18/2019 thesis 7_16

    35/142

    25

    items were wr itten to test subskills in each category. The objective

    was to build a large item bank to be used during th e formative period.

    Thus, valid items which satisfied it em analysis and reliability criteria

    in tryouts could be selected for inclusion in the IPSP test .

    A first draft of the IPSP test w as examined by two authors of the

    Iowa Test of Basic Skills (ITBS) at The University of Iowa, by the IPSP

    tea m, and by the IPSP Advisory Board. The consensus was that the items

    did measure the subskills in the IPSP testing model. Howeve r, there was

    also a consensus th at the items wer e "too wordy" and tended to be

    "cute."  These suggestions, as well as those of several doctoral stu

    dents in mathematics education, were included in several revisions of

    the first draft.

    Each r evise d, open-ended item was then typed on a 3" X5 " card and

    presented to twenty-nine students in individual interviews. These stu

    dents we re in grades five through eight and were selected by the ir

    teachers as representing a cross section of each of their classes. At

    the beginning of each interview the student wa s told that these problems

    might be different than any they had seen before, that s/he wa s not

    being tested but that his/her thoughts and suggestions as s/he wa s work

    ing the problems were needed to improve th e test items. The student

    read each problem aloud; the n talked aloud while solving the problem.

    Paper and pencil were available if the student chose to use them.

    The primary purpose of these interviews was to get th e reaction of

    the target audience to the items and to gather ideas for distractors.

    It wa s t he interview ers' intent to be passive but still attempt to

    elicit as much information as possible from th e students. Gen era lly,

  • 8/18/2019 thesis 7_16

    36/142

    26

    the s tudents wer e extremely cooperati ve. Two of the brightest eighth

    graders not only solved th e "hardest" problems immediately, but analyzed

    the items in great detail. Their comments were very informative, and

    some of the ir suggestions wer e used as foils for the next revi sion. Th e

    interview s w ere recorded on audio tape and analyzed according to reading

    difficulties, concept difficulties , strategies used, computational

    errors, and answers give n. Using information obtained from these

    interv iews, the first draft of the IP SP test wa s revised.

    Experimental units of multiple-choice items wer e developed and

    administer ed at three different times w it h re visions following each tr y-

    out.

      The last tryout prior to the project e valuation was administered

    to representa tive samples of Iowa students in grades five through eight .

    Appendix B contains a more complete description of the phases in

    the development of the IPSP test along with a summary of formative dat a.

    The study reported here focuses on th e v alidation, rather than the de

    velopment , of the IPSP t est . This validation was completed in three

    main phases : determining the relationship betwee n the IPSP test and

    data gathered from individual int er vie ws , determining the relationship

    between th e IPSP subtests and several ITBS s cales, and determining the

    relationships between the IPSP subtests. The first phase is an ass ess

    ment of concurrent validity, and the last two a re forms of discriminant

    validation using the Fisk and Campbell (1959) terminology. During the

    validation several different forms of the IPSP test were used. These

    are numbered and described in Table 1.

    Phase one of the validation process wa s considered to be the most

    important and the least straightforward.  Thus, most of the effort, as

  • 8/18/2019 thesis 7_16

    37/142

    27

    Table 1

    Phases of Validation of the IPSP Test

    Test Date:

    December, 1977

    Sample Tested:

    Malcolm Price

    Lab School

    Test

    Form

    5 6 1

    5 6 2

    781

    782

    Grade

    Level

    5,6

    5,6

    7,8

    7,8

    Number of Items

    Total (Subtest)

    it0(l2,it,12,12)

    U0(12,it ,1 2,1 2)

    it0(l2,it,12,12)

    it0(l2,it,12,12)

    Phase of

    Validation

    Pilot Interview

    Study; Pretest

    Pilot Interview

    Study; Posttest

    Pilot Interview

    Study; Pretest

    Pilot Interview

    Study; Posttest

    Test Date:

    January, 1978

    Sample Tested:

    Representative

    Sample of Iowa

    Students

    563

    56b

    581

    582

    783

    5,6

    5,6

    20(6,2,6,6)

    20(6,2,6,5)

    5,6,7,8

      20(6,2,6,6)

    5,6,7,8

      20(6,2,5,7)

    7,8 20(6,2,6,6)

    Final Tryout

    Series

    78U

    7,8 20(it,2,7,7)

    Test Date:

    May, 1978

    Sample Tested:

    Two Fifth Grade

    Classes from an

    Iowa City Ele

    mentary School

    561

    it0(l2,i|,12,12) Final Interview

    Study

    Test Date: 565 5,6

    October, 1978

    Sample Tested:

    One Hundred 785 7,8

    Fifth and Sixth

    Grade Classes

    from Iowa

    Schools

    30(10,0,10,10)

    30(10,0,10,10)

    Summativ e E valua

    tion of IPSP Pro

    ject:

     Pretest

  • 8/18/2019 thesis 7_16

    38/142

    28

    well as the emphasis in this report, was placed on the relationship be

    tween the interview data and IPSP test scores.

    Design and Development of Interview Procedures

    A preliminary list of interviewing procedures was developed using

    the experiences that were gained during the first round of interviews

    described in the previous section. The purpose of the later interviews

    was to discover th e strategies th e child uses to solve ve rbal problems.

    Hence it was decided that the intervi ewer should not lead the child into

    selecting a particular heuristi c. Any questions asked by the inte r

    viewer to elicit more information should not lead the child. The re wer e

    also instances of nonverbal leading that occurred during the tri al in

    terviews.

      For example, a child unsure of which operation to use would

    say , "I think I should multiply?" and then look at the interviewer's

    face to get some

     sort,

     of reaction. Whether the child in fact multiplied

    or performed some other operation depended on the re action of the i nte r

    viewe r. Consequently a major reason for developing a set of writt en

    interview procedures was to minimize th e interviewer's influence on the

    student's problem solving processe s.

    A first draft of the procedure wa s brief and contain ed such i n

    structions to the interviewer as : Encourage the student t o vocalize his

    thinking as much as possible during th e sample warm-up t ime ; Try to re

    cord something on t ape ; Don't go any longer than 15 seconds wit hout re

    cording something on tape . After the list was developed it was tested

    with two volunteer fifth and sixth graders. Revisions wer e made ,

    and these were tried with two volunteer students, a sixth and a

  • 8/18/2019 thesis 7_16

    39/142

    29

    seventh grader. In addition, two doctoral students in mathematics edu

    cation tried the procedure with several of their students. Some of the

    suggestions tha t we re incorporated into a revised edition wer e: Supply

    a pencil with th e eraser cut off to prevent eras ures; Rath er t han giving

    the student one or more sheets of paper for all the problems, put one

    problem on each sh eet so that problem and computations are toget he r;

    Tell the students that you will not let them know wheth er their solution

    is correct.

    A difficulty that occurred with some frequency was that of the stu

    dent becoming silent while using paper and pencil ei ther to do arithme

    tic computations or while apparently thinking. If the interviewer

    interrupted and asked the child to tell what s/he was thi nking, the

    reply would be , "I'll tell you as soon as I'm finished," or "Just a

    minute,

     I'm thi nkin g." Thi s difficulty was handled in se ver al

     ways.

      If

    the child was in deep thought working on the problem, th e interviewer

    would ask a question about the overt observable behav ior of the child,

    e.g., "Are you doing some multiplication

     now?"

     or "Are you adding

     now?",

    when it was obvious that the child was doing that particular computation

    using paper and pencil. The child usually mumbled "ye s" or shook his

    or her head and went on with the arithmetic. This strategy was used as

    an indicator on the tape to let the coder know what th e student was

    doing during th at t ime .

    A second strategy that worked very well was t o make a comment sim

    ilar to the following: "That's fin e, you're telling me what you're

    thinking"; "You're doing fine , Su e , you're telling me what I want to

    know"; Can you put into words wha t you're thinking?" Such statements

  • 8/18/2019 thesis 7_16

    40/142

    30

    seemed to encourage t he child to vocalize and give more deta ils, yet

    did not appear t o lead the students into using a specific strategy. The

    final form of th e list of interview procedures wa s the result of five

    tryouts and r evi sions. Figure 2 shows the final form used in this vali

    dation study.

    The Quantification Scale

    Development

    Since one of the goals of this study was to assess the relationship

    between the IPSP test and the results of the think aloud interv iews, a

    quantification code based on the IPSP testing model was needed to pro

    cess the interview findings. Some related research was found.

    Kilpatrick (1967) developed a coding scheme to analyze the proto

    cols used by hi s subjects in a think aloud i nte rvi ew, but did not

    attempt to quantify thes e protocols. Lucas (1972) used a modification

    of Kilpatrick's coding scheme wit h calculus st udent s. Hi s five-point

    scoring code is based on three categories: Approach (one

     point),

     Plan

    (two points), and Result (two point s).  The "Approach" phase represented

    the subject's understa nding of the problem, th e "Plan" phase r epresented

    the subject's attempt t o find a path to obtain th e answ er, and the

    "Result"

     phase wa s the subject's final answer.

    Zalewski (l97it) investigated th e relationship bet wee n a paper-

    pencil test and inte rview re sults. He esse ntially followed the proced

    ures e stablished by K ilpatrick and Lucas wit h some modifications since

    his subjects wer e seventh graders. The process score obtained by using

    Lucas' scoring procedure provided a basis for ranking the subjects.

  • 8/18/2019 thesis 7_16

    41/142

    31

    Intervie wing Procedures

    Problems should be t yped one on a page, preferably placed at the

    side of the page so that th e student can use the rest of the page

    for any computing, drawing diagrams, tables, or any type of think

    ing.

    Start the interview wi th 2 sample problems, thus allowing the s tu

    dent to become familiar w ith the routine and wit h the type of

    information the interviewer would like to find. At all times make

    a conscious effort to put th e child at eas e.

    Tell the student that no i nformation will be given on wheth er the

    answer or strategies are correct since you want to get th e best

    possible data.

    Do encourage the students to go on by making comments such as ,

    "You're doing just fine." "That's good, you're t elling me what

    you're thinking." "Go ahead." BUT DO NOT LEAD THE STUDENT INTO

    USING A STRATEGY.

    Don't go any longer than about 15-20 seconds without recording

    something on tape. EXCEPTIO N: If the student is doing computa

    tions or drawing a diagram or making ta bles, etc., make some sort

    of statement a s , "You're making a t able," etc.

    Encourage the student to vocalize his thinking as much as possible.

    If a student falls silent w hile wri ting or drawi ng, prompt him by

    reading what he has writt en or ask him what he is doing. However ,

    rule 5 takes precedence over rule 7-

    If a student doesn't an sw er , or doesn't make any comments about

    his thin king, wait about 15 seconds and ask, "Can you tell me what

    you are th inking?" Wait another 1 0 seconds or so and ask again.

    This ti me, "Are you trying t o figure something out?"  If nothing

    happens,

     call this IMPA SSE. Now ask the quest ion, "Would you like

    a hint or another problem?"

    If the child says ye s , this would indicate to you that the rating

    part of the data gathering is ove r, but to continue to get diag

    nostic data. This can be done by asking the student to identify

    the a rea that presented the trouble, why did he have this trouble,

    e.g., didn't know met hod, lack of underst anding of the problem,

    read problem incorrectly, etc.

    Figure 2

  • 8/18/2019 thesis 7_16

    42/142

    32

    Figure 2 (cont'd.)

    8c.

      If the student says no, then allow more time and ask him if he

    would tell you what he is thinking or what meth od he is trying,

    or would he try to do his figuring on paper. Then repeat steps

    8a,

     8b and 8c again.

    9- If the student is not trying to solve the problem get him on the

    right tr ack, but after IMPASSE.

    10.  For the first half of the problems observe the student. Does he

    have a habit of LOOKING BACK? If not , follow step 11 .

    11.

      If the student does NOT have the habit of LOOKING B ACK , and has

    already been given the first half of the problems, then lead him

    on wit h prompts listed on the LOOKI NG BACK coding sheet, e.g.

    "Did you check your answer wit h the conditions of the problem?"

    "Did you check your a nswer?" "How sure are y ou that your answer

    is correct?"

    DON'T

    1. Do not allow the child to erase. Instruct him to make a line

    through the mist ake.

    2.

      Do not give any tutoring or prompting until after the IMPASSE and

    then only if the child asks a quest ion. However do use the pro

    cedure listed in step 8a.

    3. Do not summarize what the child ha s done . Try to get him/her to

    do it.

    it.  Do not te ll the student whether he is on the right tra ck, or

    whether his answer is correct.

    5. Do not tell the student that you are going to use the strategies

    listed in steps 8a , 8b and 8c.

  • 8/18/2019 thesis 7_16

    43/142

    33

    Written tests were administered to thes e same subjects who were then

    ranked according to the number of correct ans wer s. The correlation

    coefficient between the written te sts and interviews was .68. Zalewski

    concluded that a higher correlation is necessary before the writte n

    test scores can be used as a substitute or predictor" for interview

    results.

    Webb (1975) also used an adaptation of the coding system developed

    by Kilpatrick and Lucas. He used th e "Approach," "Plan," and "Result"

    scoring system and obtained a frequency count from a check list of prob

    lem solving process variables.

    From the preceding discussion it appeared that no

     3-step

     quant ifi

    cation scheme was available to investigate the relationships between

    intervie ws and IPSP test res ults. A first attempt at developing the

    scale w as made using Kilpatrick's processi ng sequence with some modifi

    cations in order to follow the IPS P tes ting model. In trying to quant

    ify these processing sequences the procedures became very cumbersome.

    A n ew attempt wa s made in which flow charts wer e designed for each step

    of the model. Again when it came time to assign a number at the various

    branches the instrument became unmanageable. Another attempt was made

    in which behavior in each step of the testing model was assigned three

    numbers:

      0 , 1 , and 2, which were to serve as categories. In the 0

     cat

    egory would be those processes whi ch we re totally incorrect or a re

    sponse such as "I don't know wha t to do." The 2 category would contain

    responses wh ich wer e completely correct and the 1 category would con

    tain the intermediate responses. This new procedure was used with th e

    audio tapes from the first i nte rvi ews. It became immediately apparent

  • 8/18/2019 thesis 7_16

    44/142

    3it

    that at least one more category was needed and that the categories w ere

    not explicit enough for each step of the model.

    These revisions were made and the resulting instrument now had four

    categories:

      0, 1, 2 , 3, and more explicit descriptors under each cate

    gory. This new instrument w as used to process additional tapes and fur

    ther revisions were made. At this stage the instrument was examined by

    the same two mathematics educators who were consulted on the inter view

    form. Each category and its descriptors were thoroughly discussed.

    Step

     it,

     the looking back st ep, present ed the greatest difficulty. If

    the subject gives an a nsw er, appears to be mulling it over, goes back to

    the problem, reads it again, and then gives another an sw er —is this

    checking the answer or tryi ng to understand the problem? It was decided

    to include this process under step 1 and be more explicit w ith the

    descriptors under step it.

    After general agreement on the appropriateness of the scale was

    reached, three raters quantified audio tapes of interviews using this

    form. After a few minor additions the instrument was considered to be

    in "final" form. As a final te st , each rater analyzed the same three

    interviews on audio tapes .

    Use of the Scale

    The final form of the quantification scale is given i n Figures 3,

    it, 5. Behavior which involved rea ding, analyzing and understa nding the

    problem was classified as step 1 behavior. Brie fly, a score of 0 was

    assigned to a student who failed completely to understand a problem; 1

    was assigned to a student w hose analysis of the problem was incorrect

  • 8/18/2019 thesis 7_16

    45/142

    Quantification Scheme for the Components of the IPSP Test M odel

    Step 1:

      Getting to

     Know

     the P roblem

    Says he doesn't understand

    the problem and makes no

    attempt at solution.

    Tries t o solve the problem

    unaware that there is in

    sufficient information and

    never starts correct

    strategy.

    Fails t o use data correctly

    in attempts at solution,

    e.g., uses all extraneous

    data to arrive at solu

    tion.

    Immediately tries to do

    some arithmetic operations

    using all the numbers in

    the problem without regard

    to a correct strategy.

    Makes a fa l se s t a r t ( recog

    n izes i t as such) bu t c an ' t

    a r r i v e a t a co r r ec t s t r a t e g y .

    Says he doesn't understand

    problem—rereads it—tries to

    make a start but is unsuc

    cessful (includes rephrasing,

    trying to understand what is

    unknown, what is given, or

    searching for a  path).

    Reads problem, knows ther e

    are "too many numbers" but

    can't organize proper data

    (extraneous

      data).

    Makes true statements about

    extraneous data but does not

    advance solution.

    Rereads problem—appears to

    know there is something miss

    ing but cannot state what is

    wrong and makes a false

    start (missing

     data).

    Tries t o solve the problem

    without regard to using data

    correctly. After a brief tr ial

    and error realizes he is not

    using data correctly but can

    not correct th e situation.

    Tries t o summarize data or

    repeat it in a different form

    but does not find correct

    strategy.

    Makes a false star t—but

    eventually arrives at a cor

    rect strategy (includes

    trial and  error).

    Tries to solve the problem

    and unconsciously makes up

    his own missing data but does

    not state that there is in

    sufficient data. His solution

    strategy is correct for the

    data he provides.

    Tries t o summarize data or

    repeat it in a different form

    which starts him out on the

    correct strategy but later he

    gets off the track.

    States there is no solution

    because of insufficient data

    and attempts to modify con

    ditions but is unsuccessful.

    Uses correct strategy but

    does not use data in its

    proper form (e.g., neglects

    units).

    Solves some of the cases in

    volved in the problem but

    fails to consider all solu

    tions.

    Any correct attempt to understand

    the problem by reading or re

    phrasin g, i.e., trying to under

    stand what is unknown, what is

    given,

     or sea rching for a path.

    Rereads the problem to assist in

    drawing figures, tables , equa

    tions , performing a check or in

    troducing symbols. (Must be ap

    parent that this is going to aid

    in understanding the problem and

    finding a correct strategy.)

    States a plan for an intermediate

    or final goal which is a correct

    strategy.

    Carries out exploratory manipula

    tions which lead to correct solu

    tion.

    States problem can't be worked

    and tells what is needed to work

    it (modifies problem).  States

    reason why it can't be worked

    (insufficient data).

    States what data is not needed in

    solution of problem while stating

    correct strategy (extraneous

     data).

    States th e conditions and con

    straints of the problem correctly.

    Immediately starts out to work

    the problem and succeeds.

    Figure 3

    OJ

  • 8/18/2019 thesis 7_16

    46/142

    Step 3: Carrying Out the Plan

    0

    Any manipulations or compu

    tations that are done are

    incorrect.

    Strategy is set up correct

    ly but is not able to carry

    it out, e.g., cannot solve

    an equation.

    Tries to use a diagram,

    figure, or table but does

    the computation incorrectly.

    Suggests a plan but cannot

    carry it out.

    Quantification Scheme for

    1

    Does less than half the

    number of necessary computa

    tions correctly.

    Sets up equation but cannot

    solve it completely. Does

    simple operations like addi

    tion and subtraction.

    Uses successive approximation

    (systematic trial and er ror)

    and does the first step cor

    rectly but cannot carry it to

    the end.

    the Components of the IPSP Test

    2

    Does half or more of the

    necessary computations cor

    rectly.

    Sets up equation but cannot

    solve it completely in that

    s/he makes errors on the ha

    harder operations, e.g., mul

    tiplication, divi dion, clear

    ing fractions, etc.

    Makes a mistake in copying

    correct number but carries

    out computations correctly.

    Makes an incorrect diagram,

    figure, or table but uses th e

    numbers in the computation

    correctly.

    Uses successive approximation

    and does the first few steps

    correctly but bombs on the

    computations involved in the

    final step.

    Does computation correctly

    but uses units incorrectly.

    Model

    3

    Sets up problem correctly

    and carries out actual compu

    tations correctly.

    Sets up problem incorrectly,

    but all computation is done

    correctly.

    Uses t he algorithm or equation

    correctly, e.g., manipulates

    all parts of the e quation cor

    rectly .

    When usin g successive approxi

    mations (trial and error),

    uses information from previous

    trial correctly, i.e., com

    putes all these va lues cor

    rectly.

    Starts to execute plans, makes

    a mistake (computationally)

    but finds errors and corrects

    them.

    Figure   b

  • 8/18/2019 thesis 7_16

    47/142

    Step

      h :  Looking Back

    Quantification Scheme for the Components of the IP SP Test Model

    0

    Makes no attempt to check

    answer or

     conditions of

    problem.

    Says,

     "It's probably

    wrong, and makes n o

    attempt to check the

    answer.

    Says s/he do esn 't know and

    makes no attempt to cor

    r e c t i t .

    1

    Expresses uncertainty about

    answer.

    Says it's probably wrong (or

    some version) and attempts to

    give a reason for his/her

    uncertainty.

    Makes an attempt t o check the

    answer but is not successful

    enough to be convinced that

    it is right or wrong.

    Checks computations involved

    in answer but does not check

    to see i f answer satisfies

    condition of problem. Errors

    here should be major.

    2

    Makes some at tempt to check

    answer or decide whether it

    is corre ct—but eventually

    gives up.

    Makes an attempt to check the

    answer by var ious methods

    (i.e.,

     retraces ste ps, checks

    condition of problem, substi

    tutes answe r but cannot carry

    out check

     completely ).

    Makes an attempt to check the

    answer by various methods

    (i.e., retraces st eps, checks

    condition of problem, substi

    tutes a nswer but fails to de

    tect the incorrect answer).

    Errors he re s hould be minor.

    3

    Attempts tt -heck the values

    of an unknc . or the validity

    of an argument.

    Tries to decide whether the

    answer makes sense

     (i.e.,

    realistic, reasonable

     esti

    mates) .

    Checks that all pertinent data

    has been used.

    Suggests a new problem that

    can be solved in the same way.

    Successfully attempts to sim

    plify the problem.

    Checks solution by retracing

    steps or substitution.

    Checks that solution satisfies

    conditions of problem.

    Figure 5

  • 8/18/2019 thesis 7_16

    48/142

    38

    but wh o understood some of its e lements; 2 was assigned to a student

    whose analysis was correct except for a minor er ror such as reading data

    incorrectly; 3 was assigned to an ent irely correct understanding of the

    problem which led to a valid solution str ategy. A crucial point is that

    the step 1 score was not affected by errors in the application of a

    solution strate gy, once that strategy wa s chosen. Errors in the appli

    cation of a chosen strategy were reflected in the step 3 score.

    Step it behavi or consisted of student moves after a tentati ve solu

    tion was reached. Many students stopped as soon as they had an answer

    and were given a score of 0 for step it. Bri efly, a score of 1 was

    assigned if some uncertainty was expressed but no systematic check was

    made;

     2 was assigned if a check was atte mpted but w as either incorrect

    or incomplete; 3 was assigned if a valid check of the computation, con

    ditions and/or reasonableness of the solution was carried out. Agai n,

    specific criteria we re described for each numerical s core, but an impor

    tant point is that the step it score wa s not affected by any behavior

    preceding a tentative solution. An exception was that students were

    assigned 0 on step it if no tentative solution wa s reached.

    The following two examples will illustrate the scoring scheme.

    Ann, a sixth grader, was presented with thi s problem:

    A bag of XL-50 brand marbles contai ns 25 marbles and costs

    19$.

      How much wi ll 125 marbles cost?

    Ann read the problem aloud and this is the transcribed interview:

    A: Uh . . . Oh boy . . . hm . . .

    I: What are you doing now?

    A: I'm tryi ng to figure out how I'll do this.  Eit her add or multiply

  • 8/18/2019 thesis 7_16

    49/142

    39

    . . .O .K . I'm going to multiply 125 marbles by 19$. (multiplies)

    It comes out $11.25. That's not right.

    I: What are you trying to find?

    A: I'm trying to get the right answ er.

    I: But what answer?

    A: What I should do with 25 , 1 9, and 12 5 , because I know with those

    numbers I hav e to do something.

    (Silence)

    (Rereads the problem)

    (Silence)

    A: I want to see if I multiplied wrong . . .

    (Remultiplies but is still stumped)

    Ann exhibited a behavior wh ich occurred very frequently in student

    interviews.

      She tried to do some arit hmetic operations using all th e

    numbers in the problem. She lacked, or at least failed to use, analytic

    skills, essentially step 1 behavior. Howe ve r, her computational skills

    and ability to use tables and diagrams, as illustrated in other prob

    lems , wer e good. On this problem, she was given a score of 0 for st ep

    1; 3 for step 3; 0 for step it.  If she had made a computational error or

    misused an equation, she would hav e received a 0 , 1, or 2 on step 3. A

    similar pattern emerged in her solution to other problems.

    Dave,

     a fifth grader, is an example of a student w ho was able to

    understand most of the problem settings presented to hi m, but had diffi

    culty carrying out his solution stra tegies. This is illustrated wit h

    the following example of a single-step problem:

  • 8/18/2019 thesis 7_16

    50/142

    itO

    Mr. Price earned $75 in each of 8 we eks. How much did he

    earn for all 8 weeks?

    D: O.K. 75 , 75 , 75 , . . . (add eight 75's)

    O.K. 1 , 2 , 3, . . .

    I: So you wrote eight 75's down , right?

    D: Ye s , O.K., that'd be . . . eight 5's would be Uo. It'd be 0 and it

    on top. And eight 7's would be . . .O .K . let's see . . . Hmm.

    (Writes them down and adds them)

    D: It'd be

     $5.60.

    I: O.K. that's your answer then?

    D: Yes.

    Dave chose to add eight 75 's, which was a correct strategy. How

    ev er, he had difficulty in finding the sum. To make the computation

    easi er, he correctly noted that 8 sevens is the same as it fourteens. He

    was scored a. 3 on step 1, and 2 on step 3 on thi s problem. His score on

    step

     it

     wa s 0 since he did not exhibit any behavior in that category.

    Lateri wi th prompting, Dave realized that he h ad left out the 8 five s,

    and he corrected himself.

    The Pilot Interview Study

    Procedures

    After a year of development, a pilot study wa s designed in which

    data from interviews and the IPSP test in thei r developing forms, wer e

    gathered from the same sample of students in Fall, 1977- The year of

    development included a trial run in Januar y, 1 977, and one in Ma rch,

    1977, in which four 20-item forms of the IPSP test for each run wer e

  • 8/18/2019 thesis 7_16

    51/142

    i t l

    administered to students in grades five through eight. The form of th e

    IPSP test that was used in this pilot study was a revised one based on

    the data obtained from those trial

     runs.

    Preparations for th e pilot study included briefing th e classroom

    teachers and administrators of the school, setting up a schedule for

    interviewing the students , and working out the logistics of the sched

    ule.  A final session was held which was attended by the IPSP sta ff, the

    classroom teachers who would be involved in the study, the h ead of the

    school's mathematics departmen t, the prin cipal, and involved counse lors.

    The form of the IPSP test that would be used was presented, and the team

    discussed the purpose of th e study and answered questions raised by th e

    school personnel.

    Pilot Sample

    All of the students in grades five through eight in the Malcolm

    Price Laboratory School, Cedar Falls, Iowa , were involved in t his pilot

    study. The students wer e randomly divided into two groups across grade

    levels.  Group one consisted of 99 students in grades five through

    eight.

      Group two consisted of 103 students from the same grade levels.

    Howev er, within the gr oups, the fifth and sixth graders were administ er

    ed one form of the test wh ile th e seventh and eighth graders completed

    another form of the t es t. Concurrently with th e above groupings , each

    teacher was asked to divide each of their classes into an upper ability

    and lower ability half and to select one "verbal" student from each

    half.

      This resulted in the selection of 32 students, four from each

    grade, to be involved in think aloud interv iews. All of the interviews

  • 8/18/2019 thesis 7_16

    52/142

    it2

    wer e conducted by the inv est igator and followed the intervie w form dis

    cussed in the previous s ection. The interview tapes were then coded

    according to the quantification code previously described. Students in

    group two completed the IPSP tes t after the interviews w ere completed.

    Correlation coefficients betwe en interview and IPSP test scores wer e

    the n computed. Figure 6 shows t he time schedule for the study.

    Schedule of Events for Pilot Interview Study

    Date

    11/28

    11/29

    11/30

    Activity

    paper and pencil test

    for group 1

    inte rvie w it students from

    each grade in group 2

    (room it)

    paper and pencil test

    for group 2

    interview

      b

     s tudents from

    each grade in group 1

    (room it)

    Figure 6

    Responsibility

    classroom tea cher

    the investigator

    classroom t eache r

    the investigator

    Interv iew P roblems

    One hundred open-ended ve rbal problems were developed for grades

    five through eight independently from those on the IPSP test . The se

    problems were reviewed'by members of th e IPSP s taff. Samples from th e

    100 problems were administered to six volunteer students in grades five

    through eight in think aloud int erv iew s. Information obtained from

    these interviews and suggestions from the staff were used in rev ising

  • 8/18/2019 thesis 7_16

    53/142

    k3

    some problems and eliminating othe rs. A pool of 65 problems re sulted.

    These problems were then classified into seven levels: level one con

    tain ing simple one step word problems and each succeeding level con

    taining problems tha t w ere increasingly difficult in both concepts and

    computations to be used. Each problem was typed on a half sheet of

    paper so the student could do any n eeded computations on that paper.

    These problems are included in Appendix C.

    The Interviews

    The investigator conducted all 32 interviews. Because the int er

    views we re taking place during th e regular school day, a rather brief

    time limit of 20 minutes per student was allotted. The first five min

    utes were used in talking t o th e student about the procedure to be used

    and in presenting two sample problems. Students wer e encouraged to talk

    but were not given any hints or told whether wha t they wer