Criterion-Referenced Definitions for Rating Scales in ...€¦ · Criterion-Referenced Definitions...

Criterion-Referenced Definitions forRating Scales in Clinical Evaluation

Kathleen Nowak Bondy, RN, PhD

ABSTRACT

The evaluation of clinical competence of nursing studentsis often considered subjective and inconsistent. To avoidthese pitfalls, the author developed criteria for a five-pointrating scale for evaluation of student clinical performance.This rating system may be applied to any professionalbebavior and provides the student with diagnostic feedbackas well as a fair assessment of performance.

Nursing, like many other academic disciplines, has a"hands-on" or clinical component which is integral to itseducational progress. Historically and unlike most areas ofstudy, however, students of nursing are expected to demon-strate competency in the clinical area as a condition forgraduation. Even a student who receives A's and B's onwritten tests usually does not complete a nursing programif clinical evaluations show evidence of unsatisfactoryperformance.

The importance of clinical competence and faculty desireto be "objective" when evaluating a student's performancehas led to much work by faculty members on clinicalevaluation but with little research based upon psycho-metric principles. While some schools have developedminimal competency or critical element systems or clinicalperformance examinations in order to promote objectivityand fairness, most schools use some type of rating form. Thepurpose of this paper is twofold: 1) to explain a set of labelsand their definitions which were developed for use withflve-point rating scales for clinical evaluation; and 2) topresent a framework for their use.

Background

Rating forms when used as clinical evaluation instru-ments consist of three parts: first, the clinical behaviors to

KATHLEEN NOWAK BONDY. RN, PhD. Assistant Professor,School of Nursing, Center for Health Sciences, University ofWisconsin-Madison.

be evaluated; second, a predetermined range of referencepoints on the rating scale; and third, a label for eachreference or scale point. The clinical behaviors vary toreflect differences in the type of educational program,philosophy and mission of the school, the course objectivesand content, and the level of the learner. In addition,clinical behaviors are developed to reflect psychomotorlearning (as in skill development), affective learning (as inprofessional socialization or development), and cognitivelearning (as in use of a knowledge base). On the rating formbehaviors can be organized as a series of discrete state-ments or, as shown in Figure 1, as subsets beaded by ageneral course objective and explicated by a number ofspecific behaviors. Achievement in a specified period of timesuch as a semester or a quarter is inferred by a compositeweighted score of the behaviors over multiple observationpoints. Since much thought, time and effort are appropri-ately spent in determining and wording behaviors so thatthey clearly reflect many considerations, it is frustrating tofaculty to see their effectiveness reduced by inadequatedefinitions, descriptions, or criteria for the referencepoints.

The reference points on a rating scale indicate the degreeof competency or accomplishment with which the studenthas performed the behavior. Unlike checklists which utilizetwo-point scales (pass/fail, present/absent, satisfactory/unsatisfactory, and credit/no credit), rating scales can usefrom three to 20 scale points, and thus convey far moreinformation about the quality of the performance anddiscriminate more accurately between learners. Psycho-metric studies indicate that the reliability of individualrating scales is a monotonically increasing function of thenumber of scale points; this reliability rises rapidly as tbescale points increase from two to seven with little addi-tional gain from 11 to 20. While it might seem that thereliability would decrease witb additional scale points on aretest, the true score variance increases at an even morerapid rate than the error variance (Nunnally, 1978). Consid-ering this, undefined or vague definitions for the scalepoints and rater or observer problems are more likely tocontribute to low reliabilities than tbe number of scalepoints (DeMers, 1978, pp. 89-115).

Most rating scales in nursing have used five-point scales.It might be helpful to briefly review some of tbe scale

376VOL. 22, NO. 9, NOVEMBER 1983

FIGURE 1

FORMAT FOR A CLINICAL EVALUATION RATING SCALE

Scale Points I S

1.0 Objective1.1 Behaviorial Statement

1.2

I.n

2.0 Objective2.1 Behavioral Statement

2.2

SUBSCORE

2.n

3.0 Objective3.1 Behavioral Statement

SUBSCORE

SUBSCORE

TOTAL SCORE

descriptors that are commonly used. The most common areabstract labels such as A, B, C, D, E or 5, 4, 3, 2,1. "What isthe real difference between an A and^a B or between a 4 anda 3? Is an A equivalent to a 5 or to a 1? Rarely aredefinitional statements available to faculty and students toaccompany this type of abstract labeling.

In a similar manner, qualitative labels such as excellent,very good, good, fair, and poor usually lack a description ofthe criteria that make a performance excellent instead ofgood. Also, it is difficult to discern if a performance isexcellent for that student, for that group, for all students, orfor the number of students to which the instructor has been

exposed. Normative labels such a,s superior, above average,average, weak, and failing raise similar interpretativeproblems. Even when a de,scription of sati,sfactory perfor-mance is written into each behavioral ,statement, it isusually not sufficient guidance to discriminate betweenlevels of performance and yet to write a full de,scription ofeach behavior at each level is time-consuming and yields avery lengthy evaluation tool.

The problem is more acute wiih frequency labels such asalways, usually, frequently, .sometimes, and never. Thesepose problems when a student performs a behavior onlyonce or the opportunity to demonstrate a behavior occurred

JOURNAL OF NURSING EDUOATION 377

CRITERIA INCLINICAL EVALUATION

only once early in the clinical experience. On occasion,these terms more adequately describe the clinical siteinstead of the clinical student.

In summary, while all the label sets fit five-point scales,they are not necessarily applicable to all types of behavioror situations. When weighted to obtain a score (for possibletranslation into a grade), the scores may not necessarilymean the same thing. For example, if one instructor usedfrequency labels, another normative labels, and a thirdqualitative labels and all were weighted 5 (high) to 1 (low), ascore of 3.5 on the behavior, able to make an unoccupiedbed, may not indicate similar competency in that behavioreven within the same educational setting. Therefore, evenvery conscientious and experienced faculty with carefullydeveloped behavioral outcomes can experience qualmsabout tbe validity and reliability of clinical evaluationwhen definitions or criteria for the scale points are notexplicit and explicable.

Given the situation, one would expect more confusionthan has been experienced, but in fact, given a behavioralstatement, clinical situations and level of learner, experi-enced faculty members through discussion can agree on thelevel of competency.

If areas of agreement evolve through substantial experi-ence in clinical teaching, then identification of thoseelements and their development into a criterion-referencedset of scale labels may contribute to the validity andreliability of clinical evaluation.

Developmentof the Criteria

The criterion-referenced scale labels evolved fromnumerous discussions with clinical faculty, a comparison ofthe process of student learning with that of rehabilitationclients and literature on the process of skill acquisition(Harrow, 1972; Simpson, 1966). Two groups of experiencedfaculty judged preliminary drafts and suggested itemswhich made the criteria useful for affective, cognitive andaction-pattern (Simpson, 1966) types of behaviors as well aspsychomotor behaviors. Comments from students and fac-ulty when the criteria were used at several curriculumlevels and in diverse clinical settings further refined thedefinitions. Since their conception, over 150 faculty havereviewed and commented favorably on the validity, com-prebensiveness and manageability of the criterion-refer-enced scale definitions. Additionally, literature appearingsubsequent to their development validates the includedcriteria (Crocker, 1976; Smania, McClelland, andMcClosky, 1978).

Explanationof the Criteria

The characteristics of competency or criteria for clinicalevaluation cluster into three major areas:

1. Professional standards and procedures for thebehavior.

2. Qualitative aspects of the performance.3. Assistance needed to perform the behavior.

Five levels of competency are identified which can bedescriptively labeled: Independent, Supervised, Assisted,Marginal, and Dependent, and which are applicable to eachof the three major areas. The label names were chosen fortheir overall descriptive value and differences from otherforms of labels. In this context. Independent means meet-ing the criteria identified in each of the three areas; it doesnot mean without observation, for the performance must beobserved to be rated independent by someone other thanthe performer or client. A student can demonstrate inde-pendent judgment when appropriately requestingassistance or guidance in a situation. In similar manner,each of the levels are defined by the description of thecharacteristics in the three areas, as shown in Figure 2

PROFESSIONAL STANDARDS: The intent of educa-tional programs in nursing is to produce practitioners whohave acquired a knowledge base, therapeutic and interper-sonal skills, and values and attitudes that characterize thenursing profession, are safe for the public and reflect thephilosophy of the school. Standards are set by the profes-sion and transmitted through the professional educationalprocess (Smania et al., 1978).

Professional standards, when interpreted in terms ofsafety, accuracy, effect, and affect, can be applied tobebaviors in tbree domains: cognitive, affective and psycho-motor. A behavior should be safe not only for the client butfor the nurse and others in the environment as well.Contaminating an area normally considered clean in strictisolation without correcting the situation makes thatenvironment unsafe for others. Accuracy and precision canbe evaluated in applying the knowledge base to the clinicalsituation, in verbal, non-verbal and written communica-tions, in the appropriate use of one's professional vocabu-lary, in approaches to various situations as well as incommonly recognized psychomotor skills.

EFFECT: Effect refers to achieving the intended purposeof the behavior. For example, if the desired effect ofinterviewing a client is to assess the current health status,the evaluation of performance is in part based on whetherappropriate information was requested and obtained; i.e., ifthe purpose of a bath is to cleanse a client's body, is tbat bodynow clean?

AFFECT: Affect refers to the manner in which thebehavior is performed and the demeanor of the student.This could influence the effect of the intervention on aclient. For example, when a nurse enters a room to turn aspinal cord-injured client who has begun to question hislife's value, it could make a considerable difference to hisself-concept whether the nurse's tone of voice and demeanorconvey that this is important because be is valued as aperson or that this is just another boring task to beperformed.

In describing the professional standards and proceduresarea at five levels, the performance must be safe, accurate

378VOL. 22, NO. 9, NOVEMBER 1983

BONDY

Scale Label

Independent

Supervised

Assisted

Marginal

Dependent

X

FIGURE 2

CRITERIA FOR CLINICAL EVALUATION

Standard Procedure

SafeAccurateEffect ) EachAffect J time

SafeAccurateEffect ) EachAffect ) time

SafeAccurate Each timeEffect ) Most ofAffect ^ the time

Safe but not alonePerforms at riskAccurate - Not alvi/aysEffect \

> OccasionallyAffect '

UnsafeUnable to demonstrate

behavior

Not observed

Ouality of Performance

Proficient: coordinated; confidentOccasional expenditure of excess energyWithin an expedient time period

Efficient; coordinated: confidentSome expenditure of excess energyWithin a reasonable time period

Skillful in parts of behaviorinefficiency and uncoordinationExpends excess energyWithin a delayed time period

Unskilled; inefficientConsiderable expenditure of excess energyProlonged time period

Unable to demonstrate procedure/behaviorLacks confidence, coordination, efficiency

Assistance

Without supporting cues

Occasional supportive cues

Frequent verbal andoccasional physical directivecues in addition to supportiveones

Continuous verbal andfrequent physical cues

Continuous verbal andphysical cues

with the appropriate effect and affect each time at both theIndependent and Supervised levels. At the Assisted level,the effect and affect are appropriate most of the time. At theMarginal level, the student performs with risk to the client,the student or others or the student may be safe but notwhen alone; usually the latter is inferred from unsafe or atrisk behavior on previous occasions,^questionable safety asreported by-others or the student's ability to describe ordiscuss the intended behavior raises questions of safety. Atthis level, the student is not always accurate, and occasion-ally achieves the accepted level or affect. At the Dependentlevel, the student's performance is unsafe or the studentcannot demonstrate the behavior.

QUALITY OF THE PERFORMANCE: The second areaincorporates qualitative aspects of the performance. These

are based upon degrees of skill development which encom-pass the use of time, space, equipment, and the utilizationor expenditure of energy. While the total range of skilldevelopment extends from mastery or expertnes,s tbroughan inability to do something, the range in these criteriabegins with proficiency in basic programs. While studentscould have "mastered" selected ,skills, basic programsprepare beginning practitioners, not master practitioners.

At the Independent level, proficiency means usual effi-ciency and implies exceptional deftness, use of subtleperceptual cues to modify the behavior to achieve thedesired effect and exceptional coordination and integra-tion. The sequence of movements and communication arefluid, even and intertwined. There is an economical use ofmovements, equipment, and conversation. The behavior is

JOURNAL OF NURSING EDUOATION 379


demonstrated within an expedient or minimal time perioda.s when giving a STAT medication. The student appearsconfident and relaxed and only occasionally or subtlyexpends exce.ss energy in performance. This level of perfor-mance is seen when the student focuses on the client ratherthan on the skill that i.s being performed.

At the Supervised level, the student is efficient andcoordinated but expends more of her energy or that of theclient in accomplishing the behavior She appears confidentand focuses on the client but can be distracted to the skill asit becomes more complex. The behavior is performed withina reasonable time period; for example, she can prepare andadminister a routine IM injection within an acceptabletime period from the specified time but becomes flusteredunder a STAT situation.

At the Assisted level, the student is skillful in parts of thebehavior while the rest of the performance is characterizedby inefficiency and incoordination, thereby expendingexcess energy in movements, as in selecting inappropriatesupplies in type or number At times the student appearsanxious, worried or flustered, but makes an effort to appearconfident. Accomplishment of the behavior takes longerand the end result is sometimes late. The student focusesmore attention on the behavior or on herself than on theclient.

At the Marginal level, the student is unskilled, ineffi-cient and expends considerable excess energy in perfor-mance; little thought appears to have been given to thesequence of activities that is to be performed. Anxiety maybe apparent or masked. Completion of the behavior isconsiderably delayed to the extent that other activities aredisrupted or omitted.

At the Dependent level, the student may attempt theprocedure but is unsuccessful; unreasonable energy may beexpended in attempting the procedure or the studentappears unable to move.

ASSISTANCE REQUIRED; The third area that contrib-utes to evaluation is the type and amount of instructorassistance or cues needed to demonstrate the behavior(Smania et al., 1978). Cues can be supportive or directive.Cues such as "that's right," "keep going" and the like aresupportive, encouraging or reinforcing but do not change ordirect what the student does or says. Directive cues, whichcan be verbal and/or physical, indicate either what to do orsay next or correct an ongoing activity. Reminding astudent to check a nameband as she hands the medicationto the client is a directive cue; making the same commentbefore entering the room is superfluous since it interfereswith observing what the student knows or will do. Telling astudent she did well after the performance is not a cue butfeedback information; cues refer to what is necessary tomaintain or encourage the student's performance. Fre-quently, the cues a student needs can indicate that student'sability to follow directions or her anxiety level.

At the Independent level, a behavior is performed pri-marily without supporting cues or the student did not needthe cues that were given. At the Supervised level, occa-sional supporting cues and infrequently a directive cue are

380

needed. At the Assisted level the student requires frequentverbal and occasional physical directive cues in addition tosupportive cues; at the Marginal level, continuous verbalcues and frequent physical cues are required; at theDependent level, the verbal and physical cues are sodirective and continuous that essentially it is the instructorwho actually performed the behavior.

These three basic areas of criteria are usually observedsimultaneously and are taken together to determine thelevel of a student's performance on a given behavior Thebehavior is evaluated at the lowest level of achievement inany of the three areas. For example, if the behavior is theability to give a complete bed bath and the studentperformed it correctly (independent), efficiently within areasonable time frame (supervised), but required frequentverbal in addition to some physical directive cues (assisted),then that performance is evaluated at the assisted level.The issue of whether the performance by standards isindependent or supervised is moot since the lowest leveldetermines the overall evaluation of the behavior. This,then, reflects the minimal level of performance which canbe expected at another time or in a similar situation andmost closely approximates reality. A focus on the best partsof the performance occludes those areas which wouldbenefit from teaching and which may interfere with anexpected performance level at another time or in a similarsituation. The criteria by which the student demonstratedgreater competency is more reflective of potential perfor-mance than actual performance when reported. In an effortto give students positive support and reduce anxiety, thereis a tendency to overrate the strong aspects of a perfor-mance and underrate the weak aspects of the same perfor-mance.

If a part ofa behavior is omitted, then the instructor mustjudge whether the omission relates to the acceptability or tothe quality of the task. For example, failure to check anameband when giving a medication pertains to theacceptability of the performance, but failure to ask if theclient is comfortable or needs anything before leaving theroom pertains to the quality ofa performance. An X or NO(not observed) column is employed to protect the studentfrom lack of opportunity to demonstrate a behavior and/orthe necessity of random sampling of behavior. Use of thecriteria assumes that the behaviors used are appropriatefor the level of the learner and generally available in theclinical area. Therefore, a category of "not appropriate" isnot necessary in addition to a "not observed" column.

Up to this point the discussion of the criteria has focusedon observing a performance of a single behavior In theclinical experience, many observations are made and theseneed to be summarized in some manner. Over the clinicalexperience, students tend to demonstrate patterns ofdevelopment (Figure 3). The expected pattern of studentperformance is that when presented with new or morecomplex behaviors, students' level of competency is lowerand rises as they learn to demonstrate the expectedbehavior. There is an unstated assumption by most instruc-tors that a student's performance will be generally consis-

VOL. 22, NO. 9, NOVEMBER 1983

BONDY

FIGURE 3

PATTERNS OF COMPETENCY DEVELOPMENT OVER SEVERAL OBSERVATIONS

Behavior/Observation

L Skills

II. Nursing Process

III. Prof. Develop

IV Communication Skills

V. Leadership

1

D

A

'

1

A

2

P

A

1

1

1

3

A

A

S

1

P

4

S

A

S

D

5

A

S

A

S

S

6

A

S

P

1

P

s

s

D

1

1

10

s

s

1

s

A

Ll_

S

S

D

1

P

tent with the expected Pattern I or similar to Patterns IIand IV which show some degree of predictability. Consis-tency and predictability of performance are aspects ofcompetence among professionals so hence, an erratic seriesof performances as in Pattern V is professionally consideredprovisional or even dependent. A disintegrating perfor-mance as in Pattern III may be the result of numerouscauses. However, if appropriate counseling and problemsolving have been done, a sterling performance on the lastclinical day more accurately reflects potential rather thanreality; the consistency of performance is a more criticalcriterion for evaluation.

The use of pattern of performance in deriving a summa-tive evaluation implies that I, S, A, M and D cannot benumerically weighted, summed over the semester and thenaveraged for a score. For example, if the weighting were 5,4, 3, 2 and 1, respectively, the student in Pattern I withsteady development would achieve a score no higher than2.0-3.0 which does not reflect the correct competency levelat the end of the course for that behavior.

Grading

The summative levels of competency over behaviors forobjectives) can be weighted, summed and averaged when anoverall value is necessary for grading purposes.

Assuming a weighting of 5, 4, 3, 2, 1 for I, S, A, M, D,respectively, acceptable minimal clinical scores are set by afaculty referent to the definitions these numbers represent.One faculty group may set 3.0 as the minimal overall scoreand another group may use 2.75 or 3.25. The clinical scorecan be translated to other equivalent forms as letters.

percentages or points as required. The equivalencies arealso determined by the faculty using the criteria and,perhaps in part, by policies of the school. For example, aclinical score of 3.0 could be a C, C - or even B - ; it could be70%, 75% or 80%. The point is that these are facultydecisions which may vary from one school/course toanother The choice of letters, percentages or other formsmay depend on whether the score stands alone or isincorporated with other material for a course grade. Inother words, a "grade" is extrinsic to the criteria; thecriteria make a grade more interpretable. Tbe role of theclinical instructor primarily is to compare a student'sperformance to the criteria; the assignment of "grades" isrelative to the criteria, not to the individual student.

Implications

Use of the criteria has several benefits both for individualfaculty and students as well as collectively within aprogram. Individual faculty members who have used thecriterion-referenced scale definitions comment that itenables them to describe and classify more accurately thestrengths and limitations of a student's performance. Con-sequently, students are given more diagnostic feedback andperceive the comments as constructive and positive ratherthan as critical or negative in nature. Students have .statedthat the criteria "make sense" to them and that evaluationof performance is more "above board" and reduces per-ceived subjectivity.

When the criteria are the common base for discussionbetween student and instructor, the student learns to self-evaluate and to validate self-perceptions of performance.

JOURNAL OF NURSING EDUCATION 381


Several faculty have commented that the criteria can beactively used to teach students to self-evaluate with valida-tion from the instructor. An occasional student hasobserved that the same words can be used to describe aclient's progress in learning self-care skills.

In discussing student performance in general or inparticular with colleagues, faculty have stated that theybegin to identify various strategies that are more likely tosucceed in assisting a student to improve performance. Astudent who performs the behavior with excessive energyand time requires a different approach to improve thaneither one who does not know how to perform the behavioror one who is inordinately relying on supportive cues.Using the criteria as a common framework within which todiscuss student performance, faculty members are betterable to identify the norm behaviors and competency levelsachieved by students at the individual curricular levels.Thus, articulation between clinical courses and leveling ofbehaviors is enhanced.

The use of the criteria schema in all clinical coursesdecreases students' need to "psych out" the implicit criteriaused by individual instructors and to use that energyinstead to understand the expected behaviors and/or raisetheir competency levels; in other words, to learn. In likemanner, a clinical evaluation in one clinical course can havesome relationship to a clinical evaluation in another course.

In addition, the criteria can be used to give new faculty orclinical instructors a standardized framework withinwhich to observe student clinical behavior.

Conclusion

When carefully developed, rating scales in clinical eval-uation of student performance can be psychometricallyeffective and efficient. This includes equal attention to boththe clinical behaviors and criteria for competency.Undefined or vague labels for the scale points can contrib-ute to lower reliability. These criteria and their scale pointdescriptors have, in pilot testing and use over seven years,demonstrated their content validity and utility. Research isin progress to test the extent to which they affect thereliability of clinical evaluations of student performance.

References

Crocker, LM. (1976). Validity of certification measures for occupa-tional therapists. American Journal of Occupational Therapy,30(4), 229-233.

DeMers, J.L. (1978). Observational assessment of performance. InM.K. Morgan & D.M. Irby (Eds.), Evaluating clinical compe-tence in the health professions. St. Louis: The C.V. Mosby Co.

Harrow, A.J. (1972). A taxonomy of the psychomotor domain. NewYork: David McKay Company.

Nunnally (1978).Simpson, E.J. (1966). The classification of educational objectives:

Psychomotor domain (Research Project No. DE5-84-104).Urbana, IL: University of Illinois.

Smania, M.A., McClelland, M.G., & McClosky, J.C. (1978). Andstill another look at clinical grading: Minimal behaviors. NurseEducator, 3(3), 6-9.

First National Conference and Exhibitsponsored by

University of Utah College of Nursingand

University Hospital Department of Nursing

Title: Human and Artificial Organ Transplants - Implications for Nursing Practice, Education, and Research.Dates: February 22-25,1984. Credit: 3.6 CEUs will be awarded. Place: Salt Lake City, Utah. Information:Alice Z. Welch, RN, MSN, University of Utah, College of Nursing, Salt Lake City, Utah 84112. (801) 581-8244.

382 VOL. 22, NO. 9, NOVEMBER 1983

Criterion-Referenced Definitions for Rating Scales in ...€¦ · Criterion-Referenced Definitions...

Documents

Transcript of Criterion-Referenced Definitions for Rating Scales in ...€¦ · Criterion-Referenced Definitions...