www.wiser.pitt.edu
Using simulation in medical student assessment
WR McIvor, MD Associate Professor of Anesthesiology
Associate Director of WISER for Medical Student Simulation Education
www.wiser.pitt.edu
I don’t use sim to determine med student proficiency
• 12 yrs experience
• “Teach” medical students, and presume my efforts improve their KSA’s
• “Day 1” MS III course – 90 minutes long, from 7:30 - 9:00 am
– Goal: KSA around “Do you want to intubate my patient?”
– Value of keeping students in the sim lab who are grossly incompetent?
www.wiser.pitt.edu
Factors driving assessment
• Public accountability1
• LCME2 – Educational program provide a general
professional education that prepares students for all career options in medicine. Cite relevant outcomes indicating success in that preparation.
– Ensure students have acquired core clinical skills.
• Performance (vs time) criterion for advancement3
1Crossing the Quality Chasm: A New Health System for the 21st Century. IOM, 2001 2http://www.lcme.org/selfstudyguide1011.pdf 3Educating Physicians: A call for reform of medical school and residency
www.wiser.pitt.edu
Advantages of sim-based assessment
• Reproducible
• Realistic
• Safe for patients
• Assess ability across many medical and surgical scenarios
www.wiser.pitt.edu
Challenges
• What should we expect of a trainee? • How hard is this scenario? • Limitations of what can be simulated • Require a number (4-8) scenarios in order to get accurate
assessment – Necessitates short experiences – Time validity?
• Clear understanding of what we are seeking to measure – Knowledge – Procedural skill – Decision making – Communication – Professionalism
www.wiser.pitt.edu
Simulation used to assess medical students: USMLE Part 2 CS
• Uses SP’s to assess:
– Communication skills
– Diagnostic skills
– Interpersonal skills
– Documentation ability
– English proficiency
• Pass/fail exam
www.wiser.pitt.edu
CS test characteristics
• Utilizes a method that has 35 years of history • The cases (12) all have the same difficulty • Very specific instructions
– Trust the VS, unless you don’t think you should – Do a focused, not necessarily complete physical exam – Some physical findings will be real/some simulated
(suspend belief) – Genital/rectal/pelvic simulators are used for those
exams
• Only performed in Philadelphia • Schools (certainly Pitt) rehearse this test
1http://www.usmle.org/Examinations/step2/cs/content/description.html
www.wiser.pitt.edu
Mannequin simulator limitations
• Some things the simulators do not model well – Cyanosis – Sweating – Respiratory distress
• Airway problems tend to be all or nothing – Can’t have a moderately difficult intubation
• Time issues – Students give drugs, or mask ventilate, and expect an
instantaneous change in VS – Sometimes administer several drugs at once; produces
conflicting responses
• The frequency of simulators crashing
www.wiser.pitt.edu
Key areas of human-patient simulation (HPS) assessment1
1. Defining the skills to be assessed
• Choosing appropriate sim tasks
• Appropriate simulators
2. Establishing appropriate metrics
3. Determining the source of error in measurements
4. Evidence of the validity of test scores
1Anesthesiology 2010; 112:1041–52
www.wiser.pitt.edu
1. Defining the skills to be measured and choosing the correct simulation
• The assessment needs – Defined purpose
– Delineation of the knowledge and skills evaluated
– Context for performance-based activities
• Targeted to the examinee’s ability
• Choose scenarios based upon: – Competency guidelines
– Curriculum information
– Simulation capabilities
www.wiser.pitt.edu
2. Developing appropriate metrics- Do the scores reflect actual ability?
• Implicit and explicit scoring – Explicit: checklists or key actions
• Established by content experts informed by experience and practice guidelines
• Advantages: logical, objective scoring, modest reproducibility • Disadvantages: subjectively constructed, reward scripted approach &
“shot gun” performance, do not consider the order in which actions are taken
– Implicit: Entire performance is rated as a whole (“Global assessment”) • Applied to teamwork/communication assessment • Often require multiple well-trained raters • Typically scored retrospectively • How to assess varying performance over time?
• “Patient” (simulator) outcome
www.wiser.pitt.edu
3. Test score reliability
• Generalizability (G) studies are conducted to identify sources of error (score inconsistency) and their interactions
• Decision (D) studies are then conducted to determine optimal scoring design – How many simulations and raters are necessary for reliable
scores given the construct being assessed
• Task sampling variance has greater impact on assessment than rater’s effect – Participants can do a great job treating hypotension, and a poor
job with hypoxia – Need more sim scenarios (not more raters) to improve reliability
www.wiser.pitt.edu
4. Validity of test scores- What inferences can be made from the
assessment scores? • Content validity:
– Base simulations on actual occurrences/practice characteristics – Base scoring rubrics on evidence – Stakeholder feedback – Realistic modeling using real world equipment
• Internal consistency – Good proceduralists are likely good communicators
• Criterion validity – Sim performance correlates positively with experience and test
scores (board scores, for e.g.)
• Competency threshold (“cut score”) must be determined
www.wiser.pitt.edu
Experience with mannequin-based sim assessment1
• Med school grads are expected to manage acute care scenarios
– These scenarios can’t be modeled with SP’s
– Knowledge (cognitive tests) may not be sufficient to assess management skills
– Looked to HPS for a testing platform
• Had MS IV’s and interns perform 6 of 10 scenarios
1Anesthesiology 2003; 99:1270–80
www.wiser.pitt.edu
Results
• Interns were more proficient than MS IV’s
• Variance in student/resident scores were attributable to case content
• To improve the precision of the assessment measurement, increase the number of cases performance
• Increasing the number of raters would not improve reliability – Agreement among raters about key elements during
scenario development
www.wiser.pitt.edu
Results
– Based scoring on specific diagnostic and treatment guidelines
– Brief scenarios
– Evaluated technical, not non-technical skills
• Participants with ACLS/PALS certification and CCM experience performed better
www.wiser.pitt.edu
Conclusions
• Rater’s facet did not impact overall reproducibility – Scenarios with high degree of content validity
(performance objectives established by experts)
– Well-defined scoring rubrics
• Person x case variance was large – Number of cases are the most important factors affecting
the reliability of this assessment
• Clinical experience correlated with better performance
• HPS can be used to evaluate clinical performance in med students and residents
www.wiser.pitt.edu
To be an effective assessment piece, participants must be familiar with HPS
• More penetration of HPS into med school curricula
• ACGME statement that anesthesia residency programs use simulation yearly
• MOCA’s HPS requirement
• HPS is being studied as an evaluation instrument
• HPS will become common place in the next few years, therefore…
Top Related