Carol O’Byrne Jeffrey Kelley Richard Hawkins Sydney Smee

“Sterling Examples of Computer Simulations & OSCEs

(Objective Structured Clinical Examinations)”

Carol O’Byrne Jeffrey Kelley

Richard Hawkins Sydney SmeePresented at the 2005 CLEAR Annual Conference

September 15-17 Phoenix, Arizona

Presented at the 2005 CLEAR Annual ConferenceSeptember 15-17 Phoenix, Arizona

Session Format

Introduction: 25+ years of Performance Assessment Presentations

• Richard Hawkins, National Board of Medical Examinersoverview of a new national OSCE

• Jeff Kelley, Applied Measurement Professionals development of a new real estate computer simulation

• Sydney Smee, Medical Council of Canada setting performance standards for a national OSCE• Carol O’Byrne, Pharmacy Examining Board of Canada

scoring performance and reporting results to candidatesfor a national OSCE

Q&A


Session Goals

Consider the role and importance of simulations in a professional qualifying examination context

Explore development and large scale implementation challenges

Observe how practice analysis results are integrated with the implementation of a simulation examination

Consider options for scoring, standard setting and reporting to candidates

Consider means to enhance fairness and consistency Identify issues for further research and development


Defining ‘Performance Assessment’

...the assessment of the integration of two or more learned capabilities

…i.e., observing how a candidate performs a physical examination (technical skill) is not performance-based assessment unless findings from the examination are used for purposes such as generating a problem list or deciding on a management strategy (cognitive skills)(Mavis et al, 1996)


Why Test Performance?

To determine if individuals can ‘do the job’• integrating knowledge, skills and abilities to solve

complex client and practice problems • meeting job-related performance standards

To complement MC tests • measuring important skills, abilities and attitudes

which are difficult to impossible to measure through MCQs alone

• reducing impact of factors, such as cuing, logical elimination & luck or chance that may confound MC test results


A 25+ Year Spectrum of Performance Assessment

• ‘Pot luck’ direct observation apprenticeship, internship, residency programs

• Oral and pencil-paper, short- or long-answer questions

• Hands-on job samplesmilitary, veterinary medicine, mechanics, plumbers

• Portfoliosadvanced practice, continuing competency


Simulations

• Electronic: architecture, aviation, respiratory care, real

estate, nursing, medicine, etc.

• Objective Structured Clinical Examination (OSCE): medicine, pharmacy, physiotherapy, chiropractic

medicine, massage therapy and including the legal profession, psychology, and others


Simulation Promotes Evidence-based Testing…

1900 Wright brothers flight testFlew manned kite 200 feet in 20 seconds

1903 Wright brothers flight test Flew manned glider 852 feet in 59 seconds,

8 to 12 feet in the air!

In between they built a wind tunnel • to simulate flight under various wind direction

and speed conditions, varying wing shapes, curvatures and aspect ratios

• to test critical calculations and glider lift• to assess performance in important and potentially risky situations

without incurring actual risk


Attitudes, Skills and Abilities tested through Simulations

Attitudes:• client centeredness• alignment with ethical and professional values and principlesSkills: • interpersonal and communications• clinical, e.g. patient / client care• technicalAbilities to:• analyze and manage risk, exercise sound judgment• gather, synthesize and critically evaluate information• act systematically and adaptively, independently and within

teams• defend, evaluate and/or modify decisions/actions taken• monitor outcomes and follow up appropriately


Performance / Simulation Assessment Design Elements

Domain(s) of interest & sampling plan Realistic context – practice-related problems and scenarios Clear, measurable performance standards Stimuli and materials to elicit performance Administrative, observation and data collection procedures Assessment criteria that reflect standards Scoring rules that incorporate assessment criteria Cut scores/performance profiles reflecting standards Quality assurance processes Meaningful data summaries for reports to candidates and

others


Score Variability and Reliability

Multiple factors interact and influence scores • differential and compensatory aptitudes of candidates

(knowledge, skills, abilities, attitudes)• format, difficulty and number of tasks or problems• consistency of presentation between candidates,

locations, occasions• complex scoring schemes (checklists, ratings, weights)• rater consistency between candidates, locations,

occasions Designs are often complex (not crossed)

• examinees ‘nested’ within raters - within tasks – within sites, etc.

Problems and tasks are multidimensional


Analyzing Performance Assessment Data

• Generalizability (G) studies – to identify and quantify sources of variation

• Dependability (D) studies – to determine how to minimize the impact of error and optimize score reliability

• Heirarchical linear modeling (HLM) studies – to quantify and rank sources of variation in complex nested designs


Standard Setting

• What score or combination of scores (profile) indicates that the candidate is able to meet expected standards of performance, thereby fulfilling the purpose(s) of the test?

• What methods can be used to determine this standard?


Reporting Results to Candidates

Pass-fail (classification)

May also include:• Individual test score and passing score

• Sub-scores by objective(s) and/or other criteria

• Quantile standing among all candidates – or among those who failed

• Group data - score ranges, means, standard deviations)

• Reliability and validity evidence (narrative, indices and/or error estimates and their interpretation)

• Other


Some Validity Questions• Exactly what are we measuring with each

simulation? Does it support the test purpose?• To what extent is each candidate is presented with

the same or equivalent challenges?• How consistently are candidates’ performances

assessed no matter who or where the assessor is?• Are the outcomes similar to findings in other

comparable evaluations? • How ought we to inform & report to candidates

about performance standards / expectations & their own performance strengths/gaps?


Evaluation Goals

Validity evidenceStrong links from job analysis to interpretation of test

resultsSimulation performance relates to performance in

training and other tests of similar capabilitiesReliable, generalizable scores and ratingsDependable pass-fail (classification) standards

Feasibility and sustainabilityFor program scale (number of candidates, sites, etc.)Economic, human, physical, technological resources

Continuous evaluation and enhancement plan


Wisdom Bytes

• Simulations should be as true to life as possible (fidelity) • Simulations should test capabilities that cannot be tested

in more efficient formats• Simulation tests should focus on integration of multiple

capabilities rather than on a single basic capability• The nature of each simulation/task should be clear but

candidates should be ‘cued’ only as far as is realistic in practice

• Increasing the number of tasks contributes more to the generalizability and dependability of results than increasing the number of raters


Expect the Unpredictable…

Candidate diversity• Language• Training• Test format familiarity• Accommodation requests

Logistical challenges• Technological glitches• Personnel fatigue and/or attention gaps• Site variations

Security cracks• Test content exposure in prep programs, study materials

– in various languages

Carol O’Byrne Jeffrey Kelley Richard Hawkins Sydney Smee

Documents

Transcript of Carol O’Byrne Jeffrey Kelley Richard Hawkins Sydney Smee