Performance Assessment Design Principles Gleaned from ...faculty.metrostate.edu/barrerma/Fall...

10
Volume 53, Number 1 TechTrends • January/February 2009 81 his article is Part 1 of a two-part series. Part 1 focuses on theory-based principles for domain definition and program-wide assessment specifications. Part 2, which will ap- pear in the next issue, will focus on the linkages between underlying theory and performance task specification principles. Foundations of Assessment Design and Development Designing high-quality assessments requires well-founded foundational principles and a rea- soned development process. Measurement pro- fessionals are oſten guided by the principles of psychometric theory and the Standards for Edu- cational and Psychological Testing (AERA, APA, & NCME, 1999). But not all assessment design principles or devel- opment processes are founded upon psychometric theo- ry or the Standards. Some practices and design principles can be traced fur- ther back to princi- ples and tenets that originated from psychological and learning theory. Objectively scored, large-scale tests have been suffused with a mix of psychometric theory and standards for the past century that in turn were founded upon certain principles within learn- ing and psychological theory (see Figure 1). Learning theory has much to offer to guide construction of high-quality learning and test- ing products (Ertmer & Newby, 1993). Careful examination of learning theory can yield link- ages that ground design principles, provide rea- soning for day-to-day design decisions, and can even offer assumptions useful for testing the vi- ability of programs and products. Indeed, a cen- tury of objectivist learning theory has provided underpinnings for traditional standardized test design (Shepard, 1991). During this time, mea- surement experts have continued theoretical and practical efforts to create stronger founda- Performance Assessment Design Principles Gleaned from Constructivist Learning Theory (Part 1) By Thomas W. Zane T Figure 1. Foundations of objective assessment design and development.

Transcript of Performance Assessment Design Principles Gleaned from ...faculty.metrostate.edu/barrerma/Fall...

80 TechTrends • January/February 2009 Volume 53, Number 1 Volume 53, Number 1 TechTrends • January/February 2009 81

his article is Part 1 of a two-part series. Part 1 focuses on theory-based principles for domain definition and program-wide

assessment specifications. Part 2, which will ap-pear in the next issue, will focus on the linkages between underlying theory and performance task specification principles.

Foundations of AssessmentDesign and Development

Designing high-quality assessments requires well-founded foundational principles and a rea-soned development process. Measurement pro-fessionals are often guided by the principles of psychometric theory and the Standards for Edu-cational and Psychological Testing (AERA, APA, & NCME, 1999). But not all assessment design principles or devel-opment processes are founded upon psychometric theo-ry or the Standards. Some practices and design principles can be traced fur-ther back to princi-ples and tenets that originated from psychological and learning theory.

Objectively scored, large-scale tests have been suffused with a mix of psychometric theory and standards for the past century that in turn were founded upon certain principles within learn-ing and psychological theory (see Figure 1).

Learning theory has much to offer to guide construction of high-quality learning and test-ing products (Ertmer & Newby, 1993). Careful examination of learning theory can yield link-ages that ground design principles, provide rea-soning for day-to-day design decisions, and can even offer assumptions useful for testing the vi-ability of programs and products. Indeed, a cen-tury of objectivist learning theory has provided underpinnings for traditional standardized test design (Shepard, 1991). During this time, mea-surement experts have continued theoretical and practical efforts to create stronger founda-

Performance Assessment Design Principles Gleaned from Constructivist Learning Theory (Part 1)By Thomas W. Zane

T

Figure 1. Foundations of objective assessment design and development.

82 TechTrends • January/February 2009 Volume 53, Number 1

tional linkages between learning theories and modern instructional and testing practice. The linkage between underlying learning theory and objectively-scored test design/development is now strong enough to frame our thinking about what constitutes reliable, valid (“good”) tests.

Large-Scale PerformanceAssessment Emerges

Performance assessment1 is not new (Madaus & O’Dwyer, 1999). Anyone who has seen the development of an Apprentice’s Bowl at the Wa-terford Crystal factory in Ireland will recognize that performance assessment has a very long history (Waterford Wedgwood, 2007). Appren-tices at the Waterford factory work for years to learn necessary skills and then prove their competence via a real-world performance as-sessment where they create an apprentice’s bowl that contains all of the types of wedge cuts per-formed at the factory.

While performance as-sessment is not new, pre-vious uses were quite lim-ited in scope. Large-scale standardized, high-stakes performance testing is rel-atively new and therefore does not have the strong psychometric or learning theory foundations enjoyed by objective testing.

The Problem for Perfor-mance AssessmentDesigners

Objectivist theory, which provides a strong founda-tion for traditional large-scale testing, does not offer similar foundations for the complex decisions that go into performance assess-ment design. For example,

objectivist theory does not do a good job of explaining the cognitive activities that occur when competent students perform real-world integrated activities. Therefore, it is necessary to look beyond objectivist learning theory. Unfor-tunately, the lack of literature on this topic sug-gests that assessment has not yet been subjected to the same process of scrutiny and refinement that develops firm foundations. � A performance assessment consists of a se-ries of performance tasks. Each task contains various elements but must at least include one or more prompts to direct students to do something. Each performance task is scored using a rubric.

Relatively recent literature suggests that con-structivism has evolved as a lens for designing instruction, but until recently constructivist the-ory has not been used to view assessment (Jona-ssen, 1991a). While performance assessment de-velopers are held to the same standards of psy-chometric quality and validity as any objective test developer, there is a singular lack of support from psychometric theory for performance as-sessment. For example, there is some disagree-ment about the nature of reliability, validity, and what constitutes “good” performance assess-ments (Dunbar, Koretz, & Hoover, 1991; Kane, Crooks, & Cohen, 1999; Linn & Baker, 1996; Linn, Baker, & Dunbar, 1991; Messick, 1994; Moss, 1994; and Popham, 1997). In short, the measurement field has not sufficiently evolved to deal with complex performance assessment (Mislevy & Haertel, 2006).

Performance assessment developers are, there-fore, in the position of making design decisions with relatively little psychometric theory, psy-chometric standards, or learning theory-based foundational guidance.

Potential Foundations for Performance Assess-ment Design and Development Principles

A review of constructivist learning and related thinking theory suggests that some underlying principles could be extracted to support perfor-mance assessment design decisions and guide development methods. For example, using con-structivist theory as a foundation for assessment design suggests a greater emphasis on cogni-tive processing (versus content topics or visible behaviors) as assessments are designed. This greater emphasis on cognitive processing could then lead to the specification of more robust per-formance measures and the creation of scoring rubrics2 that focus more on enduring traits than on content knowledge. The underlying theory could then lend itself to a more detailed assess-ment materials review for potential construct irrelevant variance3 in the performance tasks (Messick, 1989). Finally, the cognitively-based assessment specifications could drive meaning-ful interpretation of scores because linkages could be made between the scores and the un-derlying cognitive traits of interest—thus sup-porting a stronger validity argument (Mislevy, 1994; Gorin, 2006).

� Rubrics – guidelines for scoring a performance or stu-dent submission. WGU uses a two dimensional rubric ta-ble with scoring criterion in rows and scores in columns.� CIV (construct irrelevant variance) is variance in scores that is due to poor design of the assessments—e.g., including too few contextual variables in the task which then leads to under-representing the construct of interest.

“Performance assessment developers

are, therefore, in the position of making

design decisions with relatively

little psychometric theory, psychometric

standards, or learning theory-

based foundational guidance.”

82 TechTrends • January/February 2009 Volume 53, Number 1 Volume 53, Number 1 TechTrends • January/February 2009 83

The balance of this paper documents efforts to extract performance design and development principles from selected constructivist theories. The extracted principles are organized under common steps in any performance assessment development process. These recognizable steps include: 1. Domain Definition2. Assessment Program Specification3. Performance Task Specification*4. Rubric Specification and Development*5. Review and Validation* * Steps 3-5 will be presented in part 2 of this

series.

Domain Definition PrinciplesImagine for a moment a cardboard box as a

representation of a domain. Like a box, a domain is simply a container. The task of domain defini-tion is to seek out, prioritize, select, and organize the most valuable knowledge, skill, ability, and dispositional components of real-world compe-tence.

A common goal of performance assessment is to measure how well examinees can handle the messy (ill-structured) real world. However, the foundations of learning and assessment domains are sometimes limited to content listings that take the form of syllabi, learning objectives, or lists that look much like tables of contents. Alter-natively, industrial and training developers may use some form of task analysis to define observ-able job behaviors. Both of these approaches are consistent with behaviorally-based frameworks founded in objectivist theory. They are also consistent with traditional instructional design methods that seek to deconstruct complex learn-ing into component parts for more efficient de-livery (Jonassen, 1991b).

Our experience has shown that domains can drastically differ in quality depending on the methods used to fill and organize the metaphori-cal cardboard box. Constructivist theory can of-fer three important principles that help develop-ers identify, gather, select, and organize the right elements for guiding performance assessment development.

1. Learning and assessment should be based upon the complex and integrated nature of the real world.From a constructivist perspective, real-world

learning and thinking is contextual and process oriented (Baker, 1992; Cunningham, 1991; Ert-mer & Newby, 1993; Greeno, 1989; Jonassen, 1991a; and Kolb, 1985). Moreover, constructivist theory suggests that learning requires developers to maintain a level of complexity and integration

within their domains (Jonassen & Rohrer-Mur-phy, 1999). Constructivist theory clearly sug-gests that domains should define real-world, integrated tasks as opposed to listing a series of content topics or decontextualized knowledge components or a series of individual decontex-tualized behaviors.

2.Boundaries of the real world should be re-duced to a manageable size.The real world is large, messy, and impos-

sible to circumscribe within any assessment program. Yet, common assessment practice uses the results of a few objective items, per-formance tasks, and/or observation scores to make inferences about students’ real-world competence. These assessment items are sam-ples from a theoretical and undefined real world defined by a universe of all possible test items called the Universe of Generalizability or UG (Shavelson & Webb, 1991; Sireci, 1995). Leav-ing this real-world universe undefined makes it impossible to select an appropriate set of test items or performance tasks. For example, scores on an essay could be used to make infer-ences about a student’s overall writing ability. An undefined universe would require that developers consider all types of writing, in all situations, contexts, lan-guages, and under every imaginable condition over the student’s lifetime. Ob-viously, developers need to narrow this down a bit.

Defining the UG is a useful way of conceptual-izing the practical job of reducing the size of the real world down to some-thing more manageable. To accomplish this, the UG should be viewed as a space with depth and breadth defined by how far and wide we wish to make inferences from the test items/tasks to meaningful activities in the real world. The UG space then represents the set of possible observations that could be made within stated or defined contexts, condi-tions, situations, events, time frames, etc., that are applicable to the sorts of inferences devel-opers wish to make.

Constructivist theory offers foundations for narrowing and defining the UG while keeping the definitions from reducing down to singular and decontextualized knowledge element. For example, Situated Cognition (SC) theory offers an understanding of the environment (situa-

“A common goal of performance assessment is to measure how well examinees can handle the messy (ill-structured) real world.”

84 TechTrends • January/February 2009 Volume 53, Number 1

tion) that contributes to our appreciation of how performances can be generalized to untested ac-tivities (Brown, Collins, & Duguid, 1989). While the ideas presented within SC theory are useful, SC does not provide a practical means for ex-

ploring and identifying the boundaries of the UG. Ac-tivity Theory (AT) presents a model of human interac-tion within an environment. AT suggests that “conscious learning emerges from ac-tivity (performance), not as a precursor to it” (Jonassen & Rohrer-Murphy, 1999, p. 62; Ryder, 2007). That is,

human activity is initiated by a person motivated to solve a problem. The activity is mediated by tools and the community within which the activ-ity occurs. All of this is constrained in a cultural context (Ryder, 2007). AT argues that all human activity (the real world) can be modeled within an activity system organized around:1. Subject – the person who is engaged in

the activity2. Object – the physical or mental product

that is sought3. Tools – anything used in the transfor-

mation process (may include physical tools, mental aids, or rules)

4. Activity – goal-directed actions (tasks, ac-tions, and operations), chains of operationsThese ideas suggest that we narrow the UG by

defining the boundaries, strata, and variables of interest. Take, for example, the UG for a driver’s license assessment. Boundaries could include limiting performances to passenger vehicles and to a particular state’s driving regulations. Strata and variables could include transmission types, time of day, variations in weather, traffic density, and so forth. In addition, developers could item-ize such things as conditions, contexts, and situ-ations to define the space within which a person must perform successfully.

Note, however, that content areas and other domain elements (discussed later) are not a part of this UG definition. The knowledge, skills, abil-ities, and dispositions suggested by content areas and other elements should be the purview of do-main definition. In other words, a UG descrip-tion is simply an outline that defines the bound-aries, strata, and variables of interest for making appropriate inferences from scores to real-world competence. As such, this simple definition acts as a guide for subsequent construct identifica-tion, domain definition, assessment specifica-tion, item/task development, and ongoing vali-dation work.

3.Domains should utilize appropriate theo-retical foundations to help define cognitive processing and enduring characteristics of examinees.Constructivist theory (such as Situated Cog-

nition and Cognitive Flexibility theory) suggests that assessing complex real-world competence requires more than a listing of knowledge ele-ments and steps in a task (Greeno, 1989; Glaser, Lesgold, & Gott, 1991; Gorin, 2006; La Marca, 2006; Mislevy & Haertel, 2006; Boger-Mehall, 2007). Rather, developers may need to dig deep-er into the unobservable cognitive realm and then deeper still into the theoretical realm that helps explain the cognitive activity at work and the enduring characteristics of the examinee.

Take, for example, the job skill called trouble-shooting. At a surface level, assessments could measure the knowledge about and tasks for trou-bleshooting. These sorts of measures are useful, but may be insufficient because real-world com-petence requires more than this. Delving deeper into the cognitive realm requires that developers define and measure such things as decision mak-ing regarding a given troubleshooting situation or the causal models that troubleshooters may utilize. Going deeper still requires that the de-veloper shift from a perspective of troubleshoot-ing as a skill to troubleshooting as a theoretical human construct. The benefit to this conceptual shift is that the developer then has access to a wealth of published information about human constructs like troubleshooting, decision mak-ing, problem solving, etc. (Gorin, 2006). Pub-lished information often lists many potential knowledge, skill, ability, and dispositional ele-ments that define the enduring characteristics of an examinee who is a troubleshooter. Thus, constructs can provide important elements that provide better and more detailed descriptions of the contents of the cardboard box.

Summary of Domain DefinitionGeneral constructivist theory and the tenets

of specific constructivist theories such as Cogni-tive Flexibility Theory, Situated Cognition, and Activity Theory provide principles that guide the development of real-world, competency-based domain definitions. Defining a full, rich domain of this sort requires more than a simple listing of topics, knowledge elements, or tasks. Rather, developers need to begin by reducing the real world by setting the boundaries of the Universe of Generalizability. Then, developers should identify, gather, select, and organize important elements of content, tasks, cognitive processes, and enduring characteristics (see Figure 2).

“The constructivist’s world is complex, multi-faceted, ill-

structured, and contextually based. ”

84 TechTrends • January/February 2009 Volume 53, Number 1 Volume 53, Number 1 TechTrends • January/February 2009 85

This multifaceted ap-proach leads to a fuller domain definition con-taining many of the com-ponents of real-world competence. Moreover, this broad perspective provides developers with a lens for identify-ing, prioritizing, and se-lecting the knowledge, skill, ability, and dispo-sitional components for their domains.AssessmentSpecification

Many developers move directly from domain definition to develop-ment of testing materi-als, thus bypassing the critical second and third steps of Program Speci-fication and Perfor-mance Task Specification. Assessment program and assessment specifications provide high-level guidance for the developer, much like a blue-print. To follow that metaphor, trying to build a housing project without a set of requirements and plans would be a difficult endeavor at best. More important, the final result may not serve the purpose for which it was intended.

There are several ways to conceptualize speci-fications ranging from simple to complex (Mis-levy & Haertel, 2006). A simple two-step process will suffice for the current discussion. The first specification step is at the program level when the type and nature of the measurement process is outlined; the second step is when developers design the individual assessments.

Assessment Program Specification Principles

1. Assessments should be designed to gather evi-dence across all facets of competency. The constructivist’s world is complex, multi-

faceted, ill-structured, and contextually based. As such, developers might rely almost exclusive-ly upon contextually grounded, real-world per-formance-based assessments such as case-based situations, simulations, and live performances to assess competence. However, as stated ear-lier, domains should consist of a healthy mix of knowledge, skills, and abilities. There are reliable, valid, yet efficient means for testing many parts of this type of multifaceted domain. Moreover, exclusive reliance on case-based exams, simula-tions, and especially live performances can be relatively expensive and may not provide the

necessary degree of reliability or validity. The obvious answer is to use an appropriate mix of assessment types that balances psychometric and budgetary requirements. For example, fac-tual and conceptual knowledge and some rea-soning can be efficiently measured using objec-tively scored exams. Knowledge of processes, connections, and deeper reasoning can be effi-ciently measured with simple performance-like tasks (e.g., essays, reports, simple experiments, etc.). Demonstration of skills and abilities can be measured using more complex performance tasks, simulations, and through direct observa-tion of performances.

2. Assessments should be integrated into the cur-riculum and sequenced to support a coherent pedagogy of learning.This principle should resonate with most

readers because constructivist theory has al-ready been used to provide principles for in-structional design (Jonassen, 1991a). The prin-ciples that apply to learning also apply to per-formance assessment. While this principle may seem common to many readers, experience has shown that designing assessments that are deeply integrated into the curriculum is rare. True integration is more than simply adding an assessment to the end of a unit. It is possible to embed assessments so deeply that students hardly recognize that they are being tested. Well-founded assessments provide an environ-ment where students learn, struggle, produce, and then—by the way—are scored on their performance. Clearly, this sort of assessment is

Figure 2. Conceptual relationship between the real world and defined domains.

86 TechTrends • January/February 2009 Volume 53, Number 1

more complex than traditional end-of-unit mastery checking. It involves designing the proper levels of con-textual support (also known as scaf-folding) to allow students to struggle just enough for learning, but not so much that they are not able to move forward.

Theories such as Social-Develop-mental Model of Adult Thinking and Learning (Scheurman, 1995) and its predecessor Experiential Learning Theory (Kolb, 1985) offer useful foun-dational considerations for designing truly integrated assessments. For ex-ample, Scheurman (1995) suggested that developers provide an environ-ment that contains sufficient “imme-diate and sustained contextual sup-port” (p. 3). Further, Facione (1990) provides suggestions based upon this perspective to improve learning. In essence, Facione calls for supporting student progress toward epistemic de-velopment through the use of inquiry, cultivating experience with decision making, helping students accept the ill-structured nature of real life, en-couraging recognition that there may be many (or no) solutions to a given situation, and cautioning students that certainty may be fleeting (Fa-cione, 1990).

Our experience suggests that per-formance assessments can play a dual role, providing an environment for students to learn and for gathering evidence of real-world competence. As such, we propose that develop-ers design a sequence of assessments within an environment that supports both goals. In addition to designing an assessment to measure whether students possess sufficient domain knowledge before moving on to pro-cedural and conceptual knowledge, developers could also design appropri-ate scaffolding (contextual support) to appropriately guide students within a given stage in the learning sequence. Developers could embed these assess-ments into a sequence that begins with concrete situations and moves to more ill-structured situations over time. Fi-nally, developers should consider suffi-

cient assessment opportunities to allow students time to practice, learn, grow, and self-assess.

3. Assessments should be situated in multiple contexts and use multiple modes to account for differences in candidates, domains, and contexts. A basic tenet of constructivist the-

ory suggests that there are significant differences across individual students based on their unique experiences and their equally unique constructions of reality (Ertmer & Newby, 1993). Fur-ther, learning and real-world compe-tence are situated within the specific and previously experienced contexts (Brown, Collins, & Duguid, 1989; Greeno, 1989; Spiro, Feltovitch, & Coulson, 2007). For example, situated cognition theory suggests that human learning is a function of the activity, context, and culture in which it occurs (Greeno, 1989). Brown, Collins, and Duguid (1989) advocated that knowl-edge and skills (real-world compe-tence) are products “of the activity and situations in which they are pro-duced” and thus situations “co-pro-duce knowledge through activity” (p. 33). In addition to differences among individuals, there are significant dif-ferences in the nature of each domain and the uses of the material contained within that domain. A mathematics domain, for example, will be different from a business ethics domain. How either domain is used in a hospital, manufacturing, or training context can further complicate the matter. Thus, the student’s experience, nature of the content, and the nature of the context are variables that need to be considered to provide solid contextual connections for assessments.

If domain definition (previously discussed) principles were followed, developers would already have a han-dle on the contexts where the domain elements might be used in real life and will have a general understanding of the unique nature of the content. Consider the design of performance assessments for MBA students. De-signers would limit their work to ap-propriate MBA or job-related content

and would situate the assessments within business cases.

What often causes developers con-cern is the interaction between the relatively unknown student character-istics, the content of the domain, and the contextual uses of that domain. For example, a business student who has experience with retail manage-ment may be familiar with—and per-form very well on—a retail business case-study assessment, but perform poorly on a similar assessment that poses an unfamiliar manufacturing business situation. To add one more level of complexity, unique student experiences may affect whether they perform better on a traditional exam, case study, role play, or other mode of assessment.

Clearly, it is difficult to disentangle whether differences in student scores are caused by misunderstanding the content, the nature of the contextual situations, the mode of assessment, or the interaction with the candidates’ experiences/characteristics. There-fore, fairness and measurement validi-ty suggest designers do the following:1. Begin by stating assumptions about

candidate characteristics and expe-riences.

2. Consider whether the content used and measured in the assessment is within the defined Universe of Gen-eralizability.

3. Design a testing program that col-lects multiple data points (scores) from as many contexts as is feasi-ble.

4. When possible, offer different modes of assessment over time.

5. Give students the ability to select familiar assessment contexts and/or topics that match their experiences and personal situations. For ex-ample, if the performance measure deals with the ability to perform ex-perimental research, students could select familiar areas of research without damaging the intended measure.

86 TechTrends • January/February 2009 Volume 53, Number 1 Volume 53, Number 1 TechTrends • January/February 2009 87

Summary and Subsequent StepsJust as objectivist theories have pro-

vided foundations for traditional tests, constructivist theories can offer foun-dations for performance assessment design and development methods. The tenets and principles embedded in various learning theories provide a solid foundation that can be com-bined with psychometric principles to help assessment developers make consistent, well-founded design and development decisions.

Success in using these founda-tions depends upon understanding the linkages from theory to practice. The theoretical foundations discussed here in Part 1 of this series provide a set of general principles for how to go about defining domains and assess-ment programs. These foundational principles easily and smoothly con-nect with various psychometric prin-ciples. Part 2 of this series, which will appear in the next issue of TechTrends, will discuss principles for specifying individual performance tasks.

Together, the two parts of this ar-ticle offer a first attempt at explicitly defining a set of performance assess-ment design principles. At least two important subsequent steps should be evident at this point for the long term. First, these and other performance as-sessment design principles need to be researched, tested, revised, and even-tually codified into standards of best practice. Second, psychometric theory and practice needs to build means for dealing with large-scale, standardized performance assessment—perhaps following a similar path.

Thomas W. Zane is the Director of Assess-ment Quality and Validity at Western Gover-nors University. He has been a principal archi-tect of the WGU assessment systems including traditional examinations, performance and e-portfolio testing, and live performance assess-ment protocols for the Business, Information Technology, Health Professions, and Teachers colleges. Dr. Zane teaches test development and psychometrics and conducts reliability, align-ment, and other validity-related studies of WGU competency assessments. Dr. Zane’s credentials include a Ph.D. in Measurement from Brigham Young University, an MLIS, and an MS in Re-search and Statistics.

Western Governors University is an online, re-gionally and professionally accredited institution offering over 50 bachelors and masters degree pro-grams to more than12,000 adult students world-wide. In WGU competency-based programs, student progress and achievement is measured by mastery of competencies rather than classroom seat time. On average, BS degree seeking students must pass approximately 65 assessments (e.g., traditional exams, performance assessments, live performances, portfolios, capstones, etc.), begin-ning with lower division liberal arts domains and continuing throughout the upper division (major) domains (see http://www.wgu.edu/about_WGU/who_we_are.asp).

ReferencesAmerican Educational Research Association

(AERA), American Psychological Associa-tion (APA), & National Council on Mea-surement in Education (NCME). (1999). Standards for educational and psychological testing. Washington, DC: American Psycho-logical Association.

Baker, E. L. (1992). The role of domain specifi-cations in improving the technical quality of performance assessments. Project 2.2: Alter-native approaches to measuring liberal arts subjects: History, geography, and writing. Center for Research on Evaluation, Stan-dards, and Student Testing, Los Angeles, CA. (ERIC Document Reproduction Service No. ED346133).

Boger-Mehall, S. R. (2007). Cognitive flexibility theory: Implications for teaching and teacher education. University of Houston. Retrieved March 1, 2007, from http://www.kdassem.dk/didaktik/l4-16.htm

Brown, J. S., Collins, A., & Duguid, P. (1989). Situated cognition and the culture of learn-ing. Educational Researcher, 18(1), 32–42.

Clauser, B. E. (2000). Recurrent issues and recent advances in scoring performance as-sessments. Applied Psychological Measure-ment, 24(4), 310–324.

Cronbach, L. J. (1971). Test validation. In R. L. Thorndike (Ed.), Educational measurement (2nd ed., pp. 443–507). Washington, DC: American Council on Education.

Cronbach, L. J. (1988). Five perspectives on validity argument. In H. Wainer (Ed.), Test validity (pp. 3–17). Hillsdale, NJ: Lawrence Erlbaum Associates.

Cunningham, D. J. (1991). Assessing construc-tions and constructing assessments: A dia-log. Educational Technology, 31(5), 13–17.

Dunbar, S. B., Koretz, D. M., & Hoover, H. D. (1991). Quality control in the develop-ment and use of performance assessments. Applied Measurement in Education, 4(4), 289–303.

Endres, B. (1996). Habermas and critical think-ing. Retrieved December 29, 2006, from http://www.ed.uiuc.edu/EPS/PES-year-book/96_docs/endres.html

Ertmer, P. A., & Newby, T. J. (1993). Behavior-ism, cognitivism, constructivism: Compar-ing critical features from an instructional design perspective. Performance Improve-ment Quarterly, 6(4), 50–72.

Facione, P. A. (1990). Critical thinking: A state-ment of expert consensus for purposes of edu-cational assessment and instruction. Mill-brae, CA: California Academic Press.

Glaser, R., Lesgold, A., & Gott, S. (1991). Im-plications of cognitive psychology for mea-suring job performance. In Performance as-sessment for the workplace: Vol. 2. Technical issues. The National Academy of Sciences. Retrieved January 15, 2007, from http://www.nap.edu/openbook/0309045398/html/1.html

Gorin, J. S. (2006). Test design with cognition in mind. Educational measurement: Issues and practice, 25(4), 21–35.

Greeno, J. G. (1989). A perspective on think-ing. American Psychologist, 44(2), 134–141.

Jonassen, D. H. (1991a). Evaluating construc-tivist learning. Educational Technology, 31(9), 28–33.

Jonassen, D. H. (1991b). Objectivism versus constructivism: Do we need a new philo-sophical paradigm? Educational Technology, 31(9), 28–33.

Jonassen, D. H. (1997). Instructional design models for well-structured and ill-struc-tured problem-solving learning outcomes. Educational Technology: Research and De-velopment, 45(1), 65–94.

Jonassen, D. H., & Rohrer-Murphy, L. (1999). Activity theory as a framework for design-ing constructivist learning environments. Educational Technology: Research and De-velopment, 47(1), 61–79.

Kane, M. T., Crooks, T., & Cohen, A. (1999). Validating measures of performance. Edu-cational Measurement: Issues and Practice, 18(2), 5–17.

Kolb, D. A. (1985). Experiential learning: Expe-rience as the source of learning and develop-ment. Englewood Cliffs, NJ: Prentice Hall.

La Marca, P. M. (2006). Commentary: Student cognition, the situated learning context, and test score interpretation. Educational Mea-surement: Issues and Practice, 25(4), 65–71.

Linn, R. L., & Baker, E. L. (1996). Can perfor-mance-based assessments be psychometri-cally sound? In J. B. Baron & D. P. Wolf (Eds.), Performance-based student assess-ment: Challenges and possibilities. Ninety-fifth Yearbook of the National Society for the Study of Education (chap. 4, pp. 84–103). Chicago: University of Chicago Press.

88 TechTrends • January/February 2009 Volume 53, Number 1

Linn, R. L., Baker, E. L., & Dunbar, S. B. (1991). Complex, performance-based assessment: Expectations and validation criteria. Educa-tional Researcher, 20(8), 15–21.

Madaus, G. F., & O’Dwyer, L. M. (1999). A short history of performance assessment. Phi Delta Kappan, 80(9), 688–695.

McDaniel, E., & Lawrence, C. (1990). Levels of cognitive complexity: An approach to the measurement of thinking. New York: Spring-er-Verlag.

Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational Measurement (3rd ed., pp. 3–104). New York: American Council on Edu-cation/Macmillan.

Messick, S. (1994). The interplay of evidence and consequences in the validation of per-formance assessments. Educational Re-searcher, 23(2), 13–23.

Mislevy, R. J. (1994). Test theory reconceived. National Center for Research on Evaluation, Standards, and Student Testing (CRESST) CSE Technical Report 376.

Mislevy, R. J., & Haertel, G. D. (2006). Implica-tions of evidence centered design for edu-cational testing. Educational Measurement: Issues and Practice, 25(4), 6–20.

Moss, P. A. (1994). Can there be validity with-out reliability? Educational Researcher, 23(2), 5–12.

Popham, W. J. (1997). Consequential validity: Right concern—wrong concept. Educational Measurement: Issues and Practice, 16(2), 9–13.

Quellmalz, E. S. (1991). Developing criteria for performance assessments: The missing link. Applied Psychological Measurement In Edu-cation, 4(4), 319–331.

Ryder, M. (2007). What is activity theory? Denver: University of Colorado. Retrieved March 1, 2007, from http://carbon.cuden-ver.edu/~mryder/itc_data/act_dff.html

Scheurman, G. (1995, April). Constructivism, personal epistemology, and teacher educa-tion: Toward a social-developmental model of adult reasoning. Paper presented at the An-nual Meeting of the American Educational Research Association, San Francisco, CA.

Shavelson, R. J., & Webb, N. M. (1991). Gener-alizability theory: A primer. Newbury Park, CA: Sage Publications.

Shepard, L. (1991). Psychometricians’ beliefs about learning. Educational Researcher, 20(7), 2–16.

Siegel, H. (1992). On defining “critical thinker” and justifying critical thinking: An essay in response to McCarthy and Norris. Retrieved September 20, 2005, from http://www.ed.uiuc.edu/EPS/PES-Yearbook/92_docs/Siegel.HTM

Sireci, S. G. (1995, April). The central role of content representation in test validity. Paper presented at the Annual Meeting of the Na-tional Council on Measurement in Educa-tion (NCME), San Francisco, CA.

Spiro, R. J., Coulson, R. L., Feltovich, P. J., & Anderson, D. (1988). Cognitive flexibility theory: Advanced knowledge acquisition in ill-structured domains. In V. Patel (Ed.), Proceedings of the 10th Annual Conference of the Cognitive Science Society. Hillsdale, NJ: Erlbaum.

Spiro, R. J., Feltovich, P. J., & Coulson, R. L. (2007). Cognitive flexibility theory. Retrieved March 1, 2007 from http://tip.psychology.org/spiro.html

Taylor, E. W. (1997). Implicit memory and transformative learning theory: Uncon-scious cognition. In Proceedings of the 38th Annual Adult Education Research Confer-ence, Stillwater, OK. Retrieved September 12, 2005, from http://www.edst.educ.ubc.ca/aerc/1997/97taylor1.htm

Terwilliger, J. (1997). Semantics, psychomet-rics, and assessment reform: A close look at ‘authentic’ assessments. Educational Re-searcher, 26(8), 24–27.

Tombari, M. L., & Borich, G. (1999). Authen-tic assessment in the classroom: Applications and practice. Upper Saddle River, NJ: Pren-tice Hall.

Waterford Wedgwood. (2007). Story of Wa-terford. Retrieved June 2, 2008, from http://www.waterford.com/about/history.asp

Wiggins, G. (1989). A true test: Toward more authentic and equitable assessment. Phi Del-ta Kappan, 70(9), 703–713.

Recommended ReadingAmerican Psychological Association. (1997).

Learner centered psychological principles: A framework for school reform and redesign. Retrieved August 2, 2007, from www.apa.org/ed/cpse/LCPP.pdf

Baker, E. L., & Herman, J. L. (1983). Task struc-ture design: Beyond linkages. Journal of Ed-ucational Measurement, 20(2), 149–164.

Baker, E. L., O’Neil, H. F., Jr., & Linn, R. L. (1994). Policy and validity prospects for performance-based assessment. Journal for the Education of the Gifted, 17(4), 332–353.

Bloom, B. S., Englehart, M. D., Hill, W. H., & Krathwohl, D. R. (1956). Taxonomy of edu-cational objectives: Handbook I, cognitive do-main. New York: David McKay Company.

Brennan, R. L. (2000). Performance assess-ments from the perspective of generalizabil-ity theory. Applied Psychological Measure-ment, 24(4), 339–353.

Burbules, N. C., & Berk, R. (1999). Critical thinking and critical pedagogy: Relations, differences, and limits. In T. S. Popkewitz & L. Fendler (Eds.), Critical theories in educa-tion. NY: Routledge. Retrieved September 20, 2005, from http://faculty.ed.uiuc.edu/burbules/papers/critical.html

Dewey, J. (1933). How we think: A restate-ment of the relation of reflective thinking to the educative process. New York: Houghton Mifflin.

Delandshere, G., & Petrosky, A. R. (1998). As-sessment of complex performances: Limi-tations of key measurement assumptions. Educational Researcher, 27(2), 14–24.

Frederiksen, J. R., & Collins, A. (1989). A sys-tems approach to educational testing. Edu-cational Researcher, 18(9), 27–32.

Gronlund, N. E. (2000). Writing instructional objectives for teaching and measurement (7th ed.). Upper Saddle River, NJ. Pearson Pren-tice Hall.

Haertel, E. H. (1999). Validity arguments for high-stakes testing: In search of the evi-dence. Educational Measurement: Issues and Practice, 18(4), 5–9.

James, W. (1907). What pragmatism means. Lecture 2 in pragmatism: A new name for some old ways of thinking. New York: Long-man Green.

Kearsley, G. (2005). Explorations in learning & instruction: The theory into practice database (TIP). Retrieved September 8, 2005, from http://tip.psychology.org

Parkes, J. (2001). The role of transfer in the variability of performance assess-ment scores. Educational Assess-ment, 7(2), 143–164.

Schommer, M. (1990). Effects of beliefs about the nature of knowledge on comprehension. Journal of Educational Psychology, 82(3), 498–504.

88 TechTrends • January/February 2009 Volume 53, Number 1 Volume 53, Number 1 TechTrends • January/February 2009 89

90 TechTrends • January/February 2009 Volume 53, Number 1