Carol O’Byrne Jeffrey Kelley Richard Hawkins Sydney Smee

18
“Sterling Examples of Computer Simulations & OSCEs (Objective Structured Clinical Examinations)Carol O’Byrne Jeffrey Kelley Richard Hawkins Sydney Smee Presented at the 2005 CLEAR Annual Conference September 15-17 Phoenix,

description

“Sterling Examples of Computer Simulations & OSCEs (Objective Structured Clinical Examinations) ”. Carol O’Byrne Jeffrey Kelley Richard Hawkins Sydney Smee. Presented at the 2005 CLEAR Annual Conference September 15-17 Phoenix, Arizona. Session Format. - PowerPoint PPT Presentation

Transcript of Carol O’Byrne Jeffrey Kelley Richard Hawkins Sydney Smee

Page 1: Carol O’Byrne     Jeffrey Kelley Richard Hawkins  Sydney Smee

“Sterling Examples of Computer Simulations & OSCEs

(Objective Structured Clinical Examinations)”

Carol O’Byrne Jeffrey Kelley

Richard Hawkins Sydney SmeePresented at the 2005 CLEAR Annual Conference

September 15-17 Phoenix, Arizona

Page 2: Carol O’Byrne     Jeffrey Kelley Richard Hawkins  Sydney Smee

Presented at the 2005 CLEAR Annual ConferenceSeptember 15-17 Phoenix, Arizona

Session Format

Introduction: 25+ years of Performance Assessment Presentations

• Richard Hawkins, National Board of Medical Examinersoverview of a new national OSCE

• Jeff Kelley, Applied Measurement Professionals development of a new real estate computer simulation

• Sydney Smee, Medical Council of Canada setting performance standards for a national OSCE• Carol O’Byrne, Pharmacy Examining Board of Canada

scoring performance and reporting results to candidatesfor a national OSCE

Q&A

Page 3: Carol O’Byrne     Jeffrey Kelley Richard Hawkins  Sydney Smee

Presented at the 2005 CLEAR Annual ConferenceSeptember 15-17 Phoenix, Arizona

Session Goals

Consider the role and importance of simulations in a professional qualifying examination context

Explore development and large scale implementation challenges

Observe how practice analysis results are integrated with the implementation of a simulation examination

Consider options for scoring, standard setting and reporting to candidates

Consider means to enhance fairness and consistency Identify issues for further research and development

Page 4: Carol O’Byrne     Jeffrey Kelley Richard Hawkins  Sydney Smee

Presented at the 2005 CLEAR Annual ConferenceSeptember 15-17 Phoenix, Arizona

Defining ‘Performance Assessment’

...the assessment of the integration of two or more learned capabilities

…i.e., observing how a candidate performs a physical examination (technical skill) is not performance-based assessment unless findings from the examination are used for purposes such as generating a problem list or deciding on a management strategy (cognitive skills)(Mavis et al, 1996)

Page 5: Carol O’Byrne     Jeffrey Kelley Richard Hawkins  Sydney Smee

Presented at the 2005 CLEAR Annual ConferenceSeptember 15-17 Phoenix, Arizona

Why Test Performance?

To determine if individuals can ‘do the job’• integrating knowledge, skills and abilities to solve

complex client and practice problems • meeting job-related performance standards

To complement MC tests • measuring important skills, abilities and attitudes

which are difficult to impossible to measure through MCQs alone

• reducing impact of factors, such as cuing, logical elimination & luck or chance that may confound MC test results

Page 6: Carol O’Byrne     Jeffrey Kelley Richard Hawkins  Sydney Smee

Presented at the 2005 CLEAR Annual ConferenceSeptember 15-17 Phoenix, Arizona

A 25+ Year Spectrum of Performance Assessment

• ‘Pot luck’ direct observation apprenticeship, internship, residency programs

• Oral and pencil-paper, short- or long-answer questions

• Hands-on job samplesmilitary, veterinary medicine, mechanics, plumbers

• Portfoliosadvanced practice, continuing competency

Page 7: Carol O’Byrne     Jeffrey Kelley Richard Hawkins  Sydney Smee

Presented at the 2005 CLEAR Annual ConferenceSeptember 15-17 Phoenix, Arizona

Simulations

• Electronic: architecture, aviation, respiratory care, real

estate, nursing, medicine, etc.

• Objective Structured Clinical Examination (OSCE): medicine, pharmacy, physiotherapy, chiropractic

medicine, massage therapy and including the legal profession, psychology, and others

Page 8: Carol O’Byrne     Jeffrey Kelley Richard Hawkins  Sydney Smee

Presented at the 2005 CLEAR Annual ConferenceSeptember 15-17 Phoenix, Arizona

Simulation Promotes Evidence-based Testing…

1900 Wright brothers flight testFlew manned kite 200 feet in 20 seconds

1903 Wright brothers flight test Flew manned glider 852 feet in 59 seconds,

8 to 12 feet in the air!

In between they built a wind tunnel • to simulate flight under various wind direction

and speed conditions, varying wing shapes, curvatures and aspect ratios

• to test critical calculations and glider lift• to assess performance in important and potentially risky situations

without incurring actual risk

 

Page 9: Carol O’Byrne     Jeffrey Kelley Richard Hawkins  Sydney Smee

Presented at the 2005 CLEAR Annual ConferenceSeptember 15-17 Phoenix, Arizona

Attitudes, Skills and Abilities tested through Simulations

Attitudes:• client centeredness• alignment with ethical and professional values and principlesSkills: • interpersonal and communications• clinical, e.g. patient / client care• technicalAbilities to:• analyze and manage risk, exercise sound judgment• gather, synthesize and critically evaluate information• act systematically and adaptively, independently and within

teams• defend, evaluate and/or modify decisions/actions taken• monitor outcomes and follow up appropriately

Page 10: Carol O’Byrne     Jeffrey Kelley Richard Hawkins  Sydney Smee

Presented at the 2005 CLEAR Annual ConferenceSeptember 15-17 Phoenix, Arizona

Performance / Simulation Assessment Design Elements

Domain(s) of interest & sampling plan Realistic context – practice-related problems and scenarios Clear, measurable performance standards Stimuli and materials to elicit performance Administrative, observation and data collection procedures Assessment criteria that reflect standards Scoring rules that incorporate assessment criteria Cut scores/performance profiles reflecting standards Quality assurance processes Meaningful data summaries for reports to candidates and

others

Page 11: Carol O’Byrne     Jeffrey Kelley Richard Hawkins  Sydney Smee

Presented at the 2005 CLEAR Annual ConferenceSeptember 15-17 Phoenix, Arizona

Score Variability and Reliability

Multiple factors interact and influence scores • differential and compensatory aptitudes of candidates

(knowledge, skills, abilities, attitudes)• format, difficulty and number of tasks or problems• consistency of presentation between candidates,

locations, occasions• complex scoring schemes (checklists, ratings, weights)• rater consistency between candidates, locations,

occasions Designs are often complex (not crossed)

• examinees ‘nested’ within raters - within tasks – within sites, etc.

Problems and tasks are multidimensional

Page 12: Carol O’Byrne     Jeffrey Kelley Richard Hawkins  Sydney Smee

Presented at the 2005 CLEAR Annual ConferenceSeptember 15-17 Phoenix, Arizona

Analyzing Performance Assessment Data

• Generalizability (G) studies – to identify and quantify sources of variation

• Dependability (D) studies – to determine how to minimize the impact of error and optimize score reliability

• Heirarchical linear modeling (HLM) studies – to quantify and rank sources of variation in complex nested designs

Page 13: Carol O’Byrne     Jeffrey Kelley Richard Hawkins  Sydney Smee

Presented at the 2005 CLEAR Annual ConferenceSeptember 15-17 Phoenix, Arizona

Standard Setting

• What score or combination of scores (profile) indicates that the candidate is able to meet expected standards of performance, thereby fulfilling the purpose(s) of the test?

• What methods can be used to determine this standard?

Page 14: Carol O’Byrne     Jeffrey Kelley Richard Hawkins  Sydney Smee

Presented at the 2005 CLEAR Annual ConferenceSeptember 15-17 Phoenix, Arizona

Reporting Results to Candidates

Pass-fail (classification)

May also include:• Individual test score and passing score

• Sub-scores by objective(s) and/or other criteria

• Quantile standing among all candidates – or among those who failed

• Group data - score ranges, means, standard deviations)

• Reliability and validity evidence (narrative, indices and/or error estimates and their interpretation)

• Other

Page 15: Carol O’Byrne     Jeffrey Kelley Richard Hawkins  Sydney Smee

Presented at the 2005 CLEAR Annual ConferenceSeptember 15-17 Phoenix, Arizona

Some Validity Questions• Exactly what are we measuring with each

simulation? Does it support the test purpose?• To what extent is each candidate is presented with

the same or equivalent challenges?• How consistently are candidates’ performances

assessed no matter who or where the assessor is?• Are the outcomes similar to findings in other

comparable evaluations? • How ought we to inform & report to candidates

about performance standards / expectations & their own performance strengths/gaps?

Page 16: Carol O’Byrne     Jeffrey Kelley Richard Hawkins  Sydney Smee

Presented at the 2005 CLEAR Annual ConferenceSeptember 15-17 Phoenix, Arizona

Evaluation Goals

Validity evidenceStrong links from job analysis to interpretation of test

resultsSimulation performance relates to performance in

training and other tests of similar capabilitiesReliable, generalizable scores and ratingsDependable pass-fail (classification) standards

Feasibility and sustainabilityFor program scale (number of candidates, sites, etc.)Economic, human, physical, technological resources

Continuous evaluation and enhancement plan

Page 17: Carol O’Byrne     Jeffrey Kelley Richard Hawkins  Sydney Smee

Presented at the 2005 CLEAR Annual ConferenceSeptember 15-17 Phoenix, Arizona

Wisdom Bytes

• Simulations should be as true to life as possible (fidelity) • Simulations should test capabilities that cannot be tested

in more efficient formats• Simulation tests should focus on integration of multiple

capabilities rather than on a single basic capability• The nature of each simulation/task should be clear but

candidates should be ‘cued’ only as far as is realistic in practice

• Increasing the number of tasks contributes more to the generalizability and dependability of results than increasing the number of raters

Page 18: Carol O’Byrne     Jeffrey Kelley Richard Hawkins  Sydney Smee

Presented at the 2005 CLEAR Annual ConferenceSeptember 15-17 Phoenix, Arizona

Expect the Unpredictable…

Candidate diversity• Language• Training• Test format familiarity• Accommodation requests

Logistical challenges• Technological glitches• Personnel fatigue and/or attention gaps• Site variations

Security cracks• Test content exposure in prep programs, study materials

– in various languages