Choosing a Background - SGIM Library/SGIM/Communities...Summative evaluation ... Test/retest...

236
Curriculum Evaluation Eva Aagaard MD University of Colorado Denver School of Medicine Stephen D. Sisson MD The Johns Hopkins University School of Medicine Abby L. Spencer MD MS Allegheny General Hospital Donna Windish MD MPH Yale Primary Care Residency Program

Transcript of Choosing a Background - SGIM Library/SGIM/Communities...Summative evaluation ... Test/retest...

  • Curriculum Evaluation

    Eva Aagaard

    MDUniversity of ColoradoDenver School of Medicine

    Stephen D. Sisson MDThe Johns Hopkins University School of Medicine

    Abby L. Spencer MD MSAllegheny General Hospital

    Donna Windish

    MD MPHYale Primary Care Residency Program

  • Workshop OverviewReview vocabulary, hypothesis testing, and study designSmall group exercisesReview evaluation objects and evaluation methodsSmall group exercisesReview statistical methodsSmall group exercisesWrap-up/evaluations

  • Vocabulary

  • EvaluationFormative evaluation

    Evaluation with intent to improve performance, usually provided during evaluated experience

    Summative evaluationEvaluation with intent to judge performance, usually provided at end of evaluated experience

  • Quantitative DataQuantifiable, numerically expressed data

    Examples:Number of students taking courseAverage post-test score of all PGY-3 residents

    Source: oerl.sri.com

  • Quantitative AnalysisUse of computational procedures and statistical tests to evaluate quantitative data

    Examples: Means, standard deviations, tests of statistical significance

    Source: oerl.sri.com

  • Qualitative DataNon-quantified narrative information

    Example: Words used by physicians talking to patients to admit errors

    Source: oerl.sri.com

  • Qualitative AnalysisUse of systematic procedures (inductive, iterative) for deriving meaning from qualitative data

    Example: Physician panel reviews doctor/patient discussion of medical errors to reach consensus on which words used by physicians to admit errors

    Source: oerl.sri.com

  • ReliabilityConsistency or reproducibility of measurements

    Intrarater/interrater: measurements are the same when repeated by same/different personTest/retest reliability: measurements are the same when repeated at different timesEquivalence: do alternate forms of the test produce same results (i.e. paper forms vs. online forms)

    Source: Kern et al.

  • ValidityDo results represent what they claim to?

    Validity is a measure of results, not just the instrumentBased on several criteria, including:

    Face validityCriterion validityConstruct validity

  • Face ValidityDegree to which instrument seems to measure what it is supposed to (aka surface/content validity)

  • Criterion ValidityConcurrent validity: Results from new instrument are the same as another proven instrument

    Example: Pass rate of students on new curriculum MCQ test is same as on shelf exam

    Predictive validity: Instrument predicts individual’s performance on specific abilities

    Example: Students who pass cardiology curriculum post-test are more likely to prescribe beta blockers when treating post-MI patients

    Source: Kern et al.

  • Construct ValidityInstrument performs as expected when used in groups with or without the attribute being measured

    Example: Nutritionists score highly on test of knowledge of nutrition while first year medical students’ scores are very low

    http://hotdogcharlies.com/about.htm

  • CongruenceSome methods of measurement are more appropriate for measuring a specific attribute (i.e. knowledge, skill, behavior, attitude) than others

  • Examples of CongruenceKnowledge Skill Attitude

    MCQ testOral exam

    OSCEStandardized

    patientObservation

    checklist

    Self-assessment questionnaireRatings formInterviews

    Source: Kern et al.

  • Hypothesis testing

  • Hypothesis Testing vs.

    Study Question

  • Comparison

    Hypothesis Testing

    Statement

    Used to determine statistical significance

    Study Question

    Question

    Used to determine practical significance

  • Hypothesis Testing

    An approach that helps you make decisionsabout your results.

    1.

    Requires a statement of the null hypothesis.

    2.

    A threshold for declaring a p-value to be significant.

    3.

    Deciding if the p-value obtained is statistically significant.

  • Null Hypothesis

    A statement

    of no effect or no association.

    “Participants and controls do not differ

    in interpersonal scores at the end of the curriculum.”

    Reject or accept the null hypothesis based on the p-value obtained, and the level of a p-

    value you consider to be statistically significant.

  • P-Value

    Probability of obtaining an outcome as extreme or more extreme than the observed result assuming the null hypothesis is true.

    p=0.05 means: there is a 1 in 20 chance that a difference could occur by chance alone.

  • Study Question

    Question

    you wish to answer regarding a comparison.

    “Do participants have higher interpersonal scores than controls at the end of the

    curriculum?”

  • Study Question

    Do the study results have practical significance.

    “Is a Likert

    scale score of 1.99 different from 2.10 practically, even if

    statistically they are different?”

  • Good Study Questions

    State what/who is being compared.

    When the comparison is being made.

    State the outcome of interest.

    State the direction of change.

  • Good or Bad Study Question?

    Do students score higher on multiple choice tests?

  • Good or Bad Study Question?

    Do students differ before or after the curriculum?

  • Good or Bad Study Question?

    Do students who received the curriculum have improved interpersonal scores at the

    end of the intervention compared to control students?

  • Study designs

  • Study design: Post test

    X--------OUsed to test proficiencyThreats:

    Selection bias HistoryMaturation

    Example:After completion of OB/GYN rotation, how many students pass shelf exam

    Source: Kern et al.

  • Study design: Pretest-Post test

    O1---X---O2Used to quantify impact of interventionThreats:

    Selection biasHistoryMaturationTestingInstrumentation

    May include a control group or be randomizedReduces bias, controls for instrument

    Source: Kern et al.

  • Study design: Pretest-Post test

    Example:Does student knowledge improve from baseline after completion of module on hypertension?

    Does student knowledge improve more from baseline after completion of module on hypertension compared to students who don’t complete module?

  • Study design: Randomized controlled trial

    E

    O1---X---O2RC

    O1--------O2

    Addition of control group controls for instrument, maturation, and other factorsAddition of randomization reduces biasConsidered a true experimental designDisadvantages:

    Resource intensiveIntervention denied to control group

  • RCT: Option 2E

    O1---X---O2

    R C

    O1--------O2---X

    Allows both groups to receive interventionParticularly useful when desire is to offer intervention to both groups

  • Alternate control groupsConcurrent controls

    Like standard controls, do not receive interventionUnlike standard controls, are not randomly selectedExamples:

    Convenience samplesLearners from a different year of training Learners from a different institution

    Historical controlsSimilar to concurrent controls, but subjects studied from prior time frame

  • Study design: Randomized controlled trial

    Example: How are second-year medical students’ history-taking skills affected by taking a course on cultural sensitivity as compared to second-year medical students who do not take this course?

    True experimental design: students in the same class randomized to intervention or control groupConcurrent controls: e.g., students from the second year class at another medical school used for comparison with intervention groupHistorical controls: e.g., results of a previous second-year medical school class’ assessment of history-taking skills were used for comparison with intervention group

  • Breakout #1

  • Evaluation Objects

  • Evaluation objectsLearner outcomes

    E.g., changes in knowledge, skills, behaviors, attitudes

    Structural outcomesE.g., attendance, ACGME compliance

    Patient outcomesE.g., patient satisfaction, ER visits, LDL-C levels

  • Levels of EvaluationMiller’s Pyramid of Clinical Competence

    Does

    Shows how

    Knows how

    Knows

    Action

    Performance

    Competence

    Knowledge

    Miller. Acad Med 1990

  • ResourcesKern DE, Thomas PA, Howard DM, Bass EB. Curriculum development for medical education: A six-step approach. Johns Hopkins University Press. National Science Foundation Online Education Resources Library. http://oerl.sri.com

  • Evaluation Methods

  • Questionnaires/Surveys

  • DescriptionM-W: “A set of questions obtaining statistically useful or personal information from individuals”Proper instrument construction essential to obtaining reliable/valid results

  • Step 1Define goals and objectives of questionnaire

    Key questions:What do you want to learn from the results?Who will be surveyed?What will be done with results?

  • Step 2Write draft questions to match goals and answer questions that are purpose of survey

  • Area of Inquiry

    Purpose of Question

    Indicators First draft

    Attendance Does attendance impact ratings?

    % of lectures attended

    What # of lectures did you attend?

    Impact of syllabus

    Did people use the syllabus?

    Read syllabus vs. not

    Did you read any of the syllabus?

    Demographics Does PGY year impact ratings?

    PGY1PGY2PGY3

    What is your year of training?

  • Step 3Determine format of final drafts of questions

  • Closed-ended vs. open-ended? Use closed-ended for quantitative results

    Range of results must be anticipatedMany add “Other______” to allow for unanticipated responses

    Use open-ended for qualitative results May establish unanticipated themes

    Questions must be:ClearUse unambiguous termsAvoid bias

  • Closed-ended questionsDichotomous

    Yes/no, did/did not etc.Scaled

    Likert scaleTraditionally 5-point, some advocate 7 or 9Label all optionsCentral option typically neutral choice

    Even number scale used to push fence-sitters

  • Rank order scaleObjects are ranked based on particular attribute

    E.g., “Rank the following clinical rotations according to how useful you view them in preparing you for your post-graduate career (1=most useful; 4=least useful)”

    __MICU__CCU__General Medical Wards__Ambulatory Clinic

  • Open-ended questionsCan be completely unstructured

    E.g., “Tell us how to improve this curriculum”Can be sentence or paragraph completion

    E.g., “The best way to improve the syllabus would be to ___________”

  • Step 4Determine order of questionsQuestions should flow:

    LogicallyFrom general to specificFrom least sensitive to most sensitiveFrom factual/behavioral to attitudinal/opinion

    Best to establish rapport at beginning, esp. if sensitive questions included

  • Step 5Pilot survey

  • Step 6Distribute/collect survey

    Online resources available for electronic survey distribution/collection

    E.g., www.surveymonkey.com

  • UseUseful for collecting a wide range of information from a large number of individuals

    Confidentiality compromised in small groupsCommon uses:

    Needs assessment during curriculum development Curriculum evaluation

  • Psychometric qualitiesCan evaluate:

    AttitudesSatisfactionSelf-reported behaviorsBeliefs

    Not used to evaluate individuals or predict clinical performance

  • Bias hard to avoidCentral tendency bias: extreme response categories avoidedAcquiescence bias: respondents agree with statements as presentedOrder of questions can bias result

    Consider different forms of same questionnaire

  • Feasibility/practicalityMost resources used for questionnaire construction

    Inappropriate questions, ordering, scaling, or format can compromise results

    Also challenging:DistributionCollectionStatistical analysis

  • Suggested referencesOnline Evaluation Resource Library: www.oerl.sri.com

    Woodward CA. Questionnaire construction and question writing for research in medical education. Med Educ 1988; 22: 345-63.

    http://www.oerl.sri.com/

  • Self-assessment exercises

  • DescriptionDefinition

    “The involvement of learners in judging whether or not learner-identified standards have been met”

    Eva KW, Regehr G. Acad Med 2005

  • Three types of self assessment: Predictive: Physician predicts his/her performance on a task to be completed

    Concurrent: Physician assesses his/her performance while performing a task

    Summative: Physician compares performance on a completed task to some standard of reference

  • Most commonly used methods for obtaining self-assessment:

    Questionnaires/surveysChecklists

  • Competence: The ability to perform a task properly, when compared to a standardized reference

    May include knowledge, skills, and behavior

    Confidence: A person’s sense of being capable

    Confidence does not equal competence

  • Motivational DiscomfortWhen self-assessment is compared to external assessment, a performance gap creates “motivational discomfort”, which leads to improvementCommonly used external assessment methods:

    OSCEStandardized patientsSimulationsIn-training/other examsChart auditOral exam

  • UseOften used to:

    Establish learning needsAssess confidenceAssess general clinical skillsAssess medical knowledgeOther (teaching skills; cultural competence)

    Davis DA et al. JAMA 2006

  • Psychometric qualitiesLittle evidence that self-assessment predicts clinical performance

    Little correlation between self-assessment and external assessment

  • 20 studies reviewed comparing self-assessment to external assessment Majority (13/20) showed little, no, or inverse relationship between self-assessment and external assessmentInability to self-assess was independent of level of training, specialty, or manner of comparisonThose who performed least well by external assessment were also the worst at self-assessment

  • Feasibility/practicalityQuestionnaires/checklists relatively easy to design and administerLack of psychometric validation seriously limits this methodBest use may be as tool to create motivational discomfort to stimulate improvement

  • ResourcesDavis DA, Mazmanian PE, Fordis M et al. Accuracy of physician self-assessment compared with observed measures of competence: A systematic review. JAMA 2006; 296: 1094-1102. Epstein RM. Assessment in medical education. New Engl J Med 2007; 356: 387-96.

  • Multiple Choice Questions (MCQs)

  • DescriptionMost common type of written test in all medical educationOften written according to a test blueprint, which itself is based on learning objectives

  • Terminology“Item”: an entire test question (stem + options)“Stem”: the question-asking section“Options”: the answer choices“Keyed response”: the correct answer choice“Distractors”: incorrect options

  • Option formatsConventional multiple choiceAlternate choiceTrue/FalseMatchingComplex multiple choice (“K type”)Context-dependent item (item set)

  • The StemShould express a complete thoughtBest items answerable by reading stem onlyBest written in positive, not negativeShould avoid “window dressing”Avoid:

    Absolute terms (“always”, “never”)Imprecise terms (“seldom”, “occasionally”, “rarely”), Opinion terms (“may”, “could”, “can”)

  • Bad stem (1)Among the following antibiotics, which one could be used for endocarditis prophylaxis during dental procedures?

  • Better stemOf the antibiotics listed, which one is acceptable for endocarditis prophylaxis during dental procedures?

  • Bad stem (2)Your favorite patient, a 57-year-old woman, returns to follow up on her diabetes. It is the end of clinic, so you are running late. You note her hemoglobin A1c was 8.9% from blood work done last week. She was previously on glyburide, but 1 year ago you added metformin, which caused diarrhea for the first month, since resolved. She has been on full doses of glyburide and metformin for 6 months. She has seen the nutritionist. Which ONE of the following statements is correct?

  • Better stemA 57-year-old woman with type 2 diabetes, hepatitis C and congestive heart failure has a hemoglobin A1c of 8.9% despite being on maximal doses of glyburide/metformin. Which combination of medications should be used in this patient to improve diabetes control?

  • Bad stem (3)When obtaining informed consent, you should never do any of the following except…?

  • Better stemWhich one of the following is a core principle of obtaining informed consent?

  • The Options3-5 commonly providedDistractors most important discriminators of knowledge

    Should be accurate, plausible, but clearly incorrectMay address common misconceptions

    Keyed response and distractors should be similar in grammar, format, etc.Avoid:

    “All of the above”“None of the above”

  • Bad items (1)Of the antibiotics listed, which one is acceptable for endocarditis prophylaxis during dental procedures?

    A. DoxycyclineB. CiprofloxacinC. A semi-synthetic penicillinD. MetronidazoleE. None of the above

  • Bad items (2)A 57-year-old woman with type 2 diabetes, hepatitis C and congestive heart failure has a hemoglobin A1c of 8.9% despite being on maximal doses of glyburide/metformin. Which combination of medications should be used in this patient to improve diabetes control?

    A. Double glyburide/metforminB. Metformin/glargineC. Add pioglitazoneD. Obtain a nutrition consult and focus on lifestyle

    modification

  • UseMCQs can be used to assess:

    KnowledgeComprehensionApplicationAnalysis

    Good for large-scale assessments of groups

  • Psychometric QualitiesTests cognitive processesContext-rich questions may assess more complex cognitive processes (i.e. “knows how”rather than just “knows”)

  • CueingRespondent able to answer from options, but couldn’t if options not providedMay mimic premature closure in clinical-decision makingMinimized by using extended match lists or open-ended short answer questionsRemains a limitation of MCQ tests

  • Item discriminationGood items

    Answered correctly by those who do well on a testAnswered incorrectly by those who do poorly on a test

    Multiple equations used to determine item discrimination score

    Scores range from -1 to +1Negative scores (and those

  • Cronbach’s alphaMeasures how well a set of items measure a construct (i.e. knowledge)

    Tests reliability of an entire set of items (i.e. a test)Improves with increasing number of items and increasing inter-item correlationsScore of 0.70 or higher is considered acceptableScore of 0.85 or higher used for pass/fail decisionsLower scores acceptable for low-stakes testing

  • Feasibility/practicalityMost resources used in item writing

    Professionals expect ~1H to write 1 itemMany programs use pre-written “shelf” exams (e.g., ITE, which has reliability 0.90)

    Administering test easy partHigher stakes tests require more items and should be piloted and have reliability testing

    Pass/Fail tests should have reliability 0.85 or greater

  • ResourcesCase SM, Swanson DB. Constructing written test questions for the basic and clinical sciences (3rd edition, revised). Philadelphia; National Board of Medical Examiners. 2002.Haladyna TM, Downing SM, Rodriguez MC. A review of multiple-choice item-writing guidelines for classroom assessment. Appl Meas Ed. 2001; 15: 309-34

  • Patient Surveys

  • Miller’s Pyramid of Clinical Competence

    Patient Surveys

    Does

    Shows how

    Knows how

    Knows

    Action

    Performance

    Competence

    Knowledge

    Miller. Acad Med 1990

  • DescriptionEvaluation completed by patientsGenerally assess patient satisfaction

  • Psychometric QualitiesNeed 20-80 patient ratings for sufficient reliabilityPatients unable to discriminate different dimensions of competencePatient ratings often quite highCorrelate poorly with physician ratings (poor concurrent validity)Difficult to assess trainee /curriculum separate from rest of health care team, environmentProfessional resistance to use

    Evans RG, et al. Family Practice 2007;Apr;24(2):117-27.Matthews DA and Feinstein AR. JGIM 1989; 4: 14-21.

    ABIM PSQ Project Executive Summary. Philadelphia, PA 1989.Weaver, et al. JGIM. 1993; 8: 135-9.

  • UsesEVALUATION/ FEEBACK TO PROVIDERS ON:

    CommunicationHumanismProfessionalismOverall satisfaction with care

    IMPACT OF CURRICULUM ON PATIENT PERCEPTION

    RECERTIFICATION/ PROMOTION

    PAY FOR PERFORMANCE

  • FeasibilityLogistically challenging

    Large # neededData collection & recording

    Primarily used for resident formative assessment; part of 3600 EvalIncreasingly used by health care/ insurance co to determine salary/ bonuses

  • ExampleEffects of structured encounter forms on pediatrichousestaff knowledge, parent satisfaction, andquality of care: A randomized, controlled trial

    Purpose: To evaluate the effects of structured encounter forms on pediatric housestaff knowledge, parent satisfaction, and quality of care

    Intervention: Housestaff randomized to use structured encounter forms focused on developmental milestones during health supervision visits

    Zenni A.Arch Pediatr Adolesc Med. 1996 Sep;150(9):975-80

  • Example cont.Outcome Measurements:

    Changes in housestaff knowledge Pretest and posttest MCQs

    Parent satisfaction Parent Surveys

    Quality of care, defined as compliance with recommended guidelines for age-specific health supervision

    Audiotaped visit review

    Zenni A.Arch Pediatr Adolesc Med. 1996 Sep;150(9):975-80

  • Example Cont.Results:

    Intervention group > knowledge of developmental milestones (not stat sig)Parent satisfaction with developmental screening greater with intervention group (P < .001)Compliance with recommended standards of developmental screening greater with intervention group (P = .001)

    Zenni A.Arch Pediatr Adolesc Med. 1996 Sep;150(9):975-801

  • References

    Evans RG, et al. Family Practice 2007;Apr;24(2):117-27. ABIM PSQ Project. ABIM. Philadelphia, PA. 1989.Chang JT et al. Ann Int Med 2006; 144: 665-72.Thomas PA, et al. Acad Med 1999; 74: 90-91Matthews DA, et al. Am J Med. 1987; 83: 938-44.Weaver MJ. JGIM. 1993; 8: 135-9.Calhoun JG et al. Proc Annu Conf Res Med Educ 1984; 23:205-10.

  • Oral Examinations & Chart Stimulated Recall

  • Miller’s Pyramid of Clinical Competence

    Chart Stimulated Recall

    Oral Examination DoesShows how

    Knows how

    Knows

    Action

    Performance

    Competence

    Knowledge

    Miller. Acad Med 1990

  • DescriptionExaminer presents a patient case scenarioExaminee describes patient managementQuestions probe:

    Clinical reasoningInterpretation of findingsTreatment plans

  • Description- Oral BoardsCommittee of experts craft clinical scenarios from patient cases

    Focus on key features of caseRepresentative cases chosen

    1-2 physician examiners18-60 clinical casesEach scenario: 3-5 minExam duration: 90 min to 2 ½ hours

  • Psychometric Qualities Board Oral Exams

    SCORINGPre-defined scoring rules Scores from each scenario combinedAnalyzed using item response theory or generalizability theory

    RELIABILITYFair to Good 0.45-0.88

    VALIDITYConcurrent: 0.75Predictive: 0.45

    Maatsch JL. Emergency Med Annual 1982.Soloman DG, et al. Acad Med 1990; 65:S34-44.

    Kearney RA, et al. Can J Anesth 2002 Mar;49(3):232-6.

  • Chart Stimulated RecallTrainees own patient chart used as basis for examination/ evaluationPredesigned questions used as framework for discussion

    ID “disconnects” evident from chartDefine ? to probe disconnectsDefine desired response

    15-20 min longOften video or audiotaped

    Jennett P and Affleck L. J of Cont Educ in Health Care Prof. 1998; 18:

    163-171

  • Psychometric Qualities CSR (Best Circumstances)

    Require 3-6 cases to assess competencyConcurrent validity with ABEM written exam Good: 0.7Reliability fair to good: 0.54-0.64

    Munger, Oral Examinations.Jennett P and Affleck L. J of Cont Educ in Health Care Prof. 1998; 18: 163-171

  • UsesAssesses:

    KnowledgeApplication of knowledgeUnderlying reasoningAreas for remediation/ curriculum enhancementImpact of other variables (patient, provider, system, environment, etc) on understanding/ decisions

    Can be used as a teaching tool

  • FeasibilityExtensive expertise required for scenario development in non-CSRExaminers must be trained and inter-rater reliability assessed prior to implementationDifficult to standardizeTime and faculty intensiveExpensive

  • Example- Oral ExamPoor inter-rater reliability on mock anesthesia oral examinations

    Purpose:Assess the impact of a curriculum on Oral Examination communication and presentation techniques on resident performance on the oral examinations

    Can J Anaesth. 2006 Jul;53(7):659-68

  • Oral Exam ContMethods:

    Randomized, pretest-posttest trial of 25 residents taking a mock anesthesia board oral examination

    E1 (25 residents) E2Curriculum E2

    Videotaped oral exams graded by 6 experienced gradersResults:

    Curriculum did not improve scores on oral exams, but limited by poor inter-rater reliability

    Can J Anaesth. 2006 Jul;53(7):659-68

  • Example- CSRAre physicians discussing prostate cancer screening with their patients and why or why not? A pilot studyPurpose:

    Assess whether primary care physicians routinely discuss prostate cancer screening (PCS) Explore the barriers to and facilitators of these discussions

    Methods: 18 academic and community-based primary care physicians

    Semi-structured interviewsCSR

    J Gen Intern Med. 2007 Jul;22(7):901-7

  • CSR Cont.Results:

    All physicians reported discussed PCS with patients6 reported ordering PSA tests without discussionsPCS occurred in 36% of 44 encounters qualifying for CSRImportant barriers to discussion are:

    inadequate time for health maintenancephysician forgetfulnesspatient characteristics

    J Gen Intern Med. 2007 Jul;22(7):901-7

  • ReferencesMancall EL, Bashook PG. (eds.) Assessing clinical reasoning: the oral examination and alternative methods. Evanston, IL: American Board of Medical Specialties, 1995.Jacobsen E et al. Can J Anesthesia 2006; 53 (7): 659-668.Jennett P and Affleck L. Chart audit and Chart Stimulated Recall as Methods of Needs Assesment in Continuing Professional Health Education. J Cont Educ 1998;18: 163-71

  • Performance Audit

  • Miller’s Pyramid of Clinical Competence

    Performance Audit

    Does

    Shows how

    Knows how

    Knows

    Action

    Performance

    Competence

    Knowledge

    Miller. Acad Med 1990

  • Performance Audit: Description

    Patient information abstracted from medical records Results compared to accepted standards

    Agency for Healthcare Research and QualityUSPSTFhttp://www.ahrq.gov

    HEDIS (health plan employer data & information set)http://web.ncqa.org

    Most commonly used (and studied) to assess quality of care

  • Performance Audit: Description

    Patient data may include:Tests/studies ordered (Lipids, Mammogram)Laboratory/study results (Hemoglobin A1C)ImmunizationsDiabetic foot examinations Counseling for smoking cessationDocumentation of DNR or end-of-life discussions

    Data usually collected by trained chart reviewersor member(s) of research team

  • Performance Audit: Uses

    Provides evidence about:Clinical decision-makingFollow-through of tests orderedProvision of preventative servicesAppropriate consultation

    Allows evaluation:Before and after educational interventionExposed versus not to educational intervention

    Curriculum on screening guidelines, using EBP, lipid or A1C targets……

  • Performance Audit: Psychometric Qualities

    Reliability:Sample size of 10 patient records is sufficient

    Accuracy:Recording biasMissing or incomplete data is interpreted as not meeting accepted standardVariability in skills or chart reviewerCharting skills may differ from clinical skills

  • Performance Audit: Psychometric Qualities

    Chart abstraction vs standardized patients20 GIM residents and faculty blindly evaluated and treated standardized patients (SP)Each SP had one of four diagnosesEach resident was evaluated for 2 of 4 cases

    160 resident/SP encounters

    Luck et al. 2000 Am JMed.

  • Performance Audit: Psychometric Qualities

    Compared chart abstraction by trained nurses to SP reports (gold-standard) for 4 aspects of encounter:

    Taking historyPerforming proper Physical ExamMaking correct diagnosisPrescribing appropriate treatment

    pre-determined by national guidelinesSensitivity of chart abstraction =70%Specificity =81%

    Luck et al. 2000 Am JMed.

  • Performance Audit: Psychometric Qualities

    Medical chart abstractionProvides modest sensitivity and specificityMay underestimate quality of care for common

    outpatient medical conditions

    From a medical education standpoint, chart abstraction can

    Underestimate curricular success if actual practice outperforms documentationOverestimate success if chart reflects actions or decisions by someone other than learner

  • Performance Audit: Feasibility/Practicality

    Training reviewers to decode clinical data is timelyReview by trained reviewers averages 30 minutesReview by study authors may take even longerRecords may be inaccurate or incompleteDocumented care may represent decisions by other members of health care team rather than residentMust agree on standard to be compared against

  • Performance Audit: Example in the literature

    Implementing achievable benchmarks in preventive health: A controlled trial in residency education

    Purpose: to evaluate success of preventive health curriculum Methods: practice-based evaluation of 208 residents’delivery of preventive careCompared baseline and follow-up data from 2001-0410 Outcome: difference in receipt of preventive care for patients seen by intervention vs control residentsIntervention= preventive health curriculum

    Houston et al. Ac Med 2000

  • Performance Audit: Example in the literature

    Results: Charts reviewed for ~4000 resident patientsReceipt of preventive care increased for patients of intervention group, but not for patients of controlIntervention group: significant increases occurred for:

    Screening for: smoking, colon ca, lipids Advice to quit smoking, provision of pneumovacc

    Conclusions: Residents exposed to the curriculum outperformed controls on a practice-based evaluation of provision of preventive services.

    Houston et al. Ac Med 2000

  • Performance Audit: Example in the literature 2

    Evaluation of an educational intervention to encourage advance directive discussions between medicine residents and patients

    Purpose: evaluate educational intervention to teach residents to discuss advance directivesMethods:

    Didactic and role-play curriculumChart audit 10 days prior and 5 days post-interventionDNR discussion rates were noted

    Furman et al. J Palliat Med. 2006

  • Performance Audit: Suggested References

    Luck J, Peabody JW, Dresselhaus TR, Lee M, Glassman P. How well does chart abstraction measure quality? A prospective comparison of standardized patients with the medical record. American Journal of Medicine. 2000 Jun 1;108(8):642-9.

  • Standardized Patients and Objective Structured

    Clinical Examination

  • SPs and OSCEs

  • Miller’s Pyramid of Clinical Competence

    Standardized Pt/ OSCEDoes

    Shows how

    Knows how

    Knows

    Action

    Performance

    Competence

    Knowledge

    Miller. Acad Med 1990

  • Standardized Patients: Description

    Person trained to simulate a specific patient with a medical condition in a standardized fashion

    Proposed in 1964 to overcome threats to validity of written-simulation tests

    Students would say they would ask more questions or perform more physical exam maneuvers than in actuality

  • Standardized Patients: Vocabulary

    Simulated patientMedical encounter conducted for educational purposeMay or may not use simulator’s personal medical history

    Standardized patientConsistent content of verbal and behavioral response by SP to stimulus provided by trainee

    A standardized patient is a simulated patient, but a simulated patient may not be standardized

  • Standardized Patients: Uses

    Practice skills and formative feedbackWill not be covered today

    Evaluation of skillsInterview skills (taking a good history)Physical examination skills*Communication skillsDifferential diagnosis skillsManagement/Treatment skillsProfessionalism skills

    *Barrow. Acad Med 1993

  • Standardized Patients: Uses

    SPs can be trained to provide: Written and objective reports via check-listsPatient-centered subjective rating and descriptive evaluation of trainees’ behaviorConstructive verbal or written feedback to the student

    Additional raters/observers may also be present to assess competence

  • Standardized Patients: Uses

    Can be used in actual clinical setting as registered patient with false records

    Assess actual physician behaviors More commonly used as summative exam to evaluate clinical skills as individual station or collection of stations

    Objective Structured Clinical Examination (OSCE)

  • OSCE AKA Clinical Skills Assessment/ Exam

    (CSA/ CSE) AKA Clinical Practice Examinations (CPX)

    http://images.google.com/imgres?imgurl=http://www.uniklinik-freiburg.de/augenklinik/live/lehre/OSCE.jpg&imgrefurl=http://www.uniklinik-freiburg.de/augenklinik/live/lehre.html&h=533&w=600&sz=103&hl=en&start=16&tbnid=M6XwPnoPu78BKM:&tbnh=120&tbnw=135&prev=/images?q="OSCE"&gbv=2&svnum=10&hl=enhttp://images.google.com/imgres?imgurl=http://www.heicumed.uni-hd.de/images/img_osce_doppler1_big.jpg&imgrefurl=http://www.heicumed.uni-hd.de/index.php4?cat=pruefungsformen&subcat=osceinnere&h=488&w=650&sz=67&hl=en&start=36&tbnid=nohnI-haTLJeQM:&tbnh=103&tbnw=137&prev=/images?q="OSCE"&start=20&gbv=2&ndsp=20&svnum=10&hl=en&sa=Nhttp://images.google.com/imgres?imgurl=http://www.thieme.de/viamedici/studienort_koeln/klinik/osce-test_bild1.jpg&imgrefurl=http://www.thieme.de/viamedici/studienort_koeln/klinik/osce-test.html&h=248&w=233&sz=15&hl=en&start=89&tbnid=HEVbyfpzgRg_5M:&tbnh=111&tbnw=104&prev=/images?q="OSCE"&start=80&gbv=2&ndsp=20&svnum=10&hl=en&sa=N

  • OSCE: DescriptionMultiple station SP exerciseUses multiple focused clinical encounters Each encounter assesses different skills/ competenciesOften incorporate non-patient stations for additional evaluations

    Interpret EKG, CXR, Labs…Mannequins for technical skills

  • OSCE: DescriptionStudents read patients’ charts while waiting for the signal to enter the “exam rooms”

    Chart contains pertinent information about the "patient" and background of the medical situation to which student is about to enter

    medicine.iu.edu/body.cfm

    http://medicine.iu.edu/body.cfm?id=836&oTopID=223

  • OSCE: DescriptionStudents alert the SPs by knocking and immediately begin to act out the patient-doctor relationship

    Behind the doors the students are presented with a variety of clinical situations

    Here, the student tells an older man that his wife had a heart attack and explains the EKG report to him

    medicine.iu.edu/body.cfm

    http://medicine.iu.edu/body.cfm?id=836&oTopID=223

  • OSCE: DescriptionSP presents case history in response to trainee’s questionsTrainee examines SP as appropriateSP then completes checklist to document actions (history, PE, behaviors, communication..)Score usually determined by percentage of actions recorded on SP checklist

  • OSCE:UsesSimilar to Standardized Patients

    Medical school/ residency assessments

    USMLE Step 2 Clinical SkillsQualifying Examination for Licensure

    (Canada)

  • OSCE/SP: Psychometrics

    Reliability averages ~0.7 for scored testsRecommended value for educational tests = 0.8

    Can improve Reliability by:Proper training of evaluators/raters (MD’s or SPs) Increasing number of cases/stations on exam

    Especially important for more complex skills such as clinical reasoning or high stakes examsNeed 3.5 hrs depending on # and complexity of cases

    Using pass/fail rather than scored tests (up to 0.96)

  • OSCE/SP: Psychometrics

    ValidityConstruct validity

    Senior residents perform better than junior residentsResidents perform better than medical studentsExam scores improve with more trainingTime in direct patient care

  • OSCE/SP: Psychometrics

    Face validity: ExcellentSimilar to clinical tasksCan assess multiple aspects of competencies

    Concurrent validity: Modest correlations between SP/OSCE scores and clinical ratings or written examsAre they measuring different competencies?

  • OSCE/SP: Psychometrics

    Predictive ValidityBetter predictors of resident performance than MCQ examsPoor correlation between SP performance in testing vs. real environmentCorrelation improves when factor in efficiency and consultation time

    Rethans J-J, et al. BMJ 1991; 303: 1377-80

  • OSCE/SP: Feasibility/Practicality

    Creating OSCE/SP examDetermine specific competencies to be testedTrain SP’s

    Case-presentation, Rating, FeedbackDevelop check-lists or rating forms

    Listen to heart in 4 placesDid patient make you feel comfortable

    Set criteria for passing

  • OSCE/SP: Feasibility/Practicality

    Creating OSCE/SP examTime-intensive

    New SP can learn to simulate new case in 8-10 hrsExperienced SP can learn new case in 6-8 hrsLearning to use checklists to evaluate resident performance takes much longer

    Cost-intensive ~300$ per student testedSpace-intensive (need rooms)Time/cost can be reduced by sharing SPs

    http://www.vwsd.k12.ms.us/wwwroot1/vwse/moneysign.gif

  • OSCE/SP: Feasibility/Practicality

    Challenges to using SPsAre SPs accurate and believable portraying their rolesAre they consistent and accurate completing checklists

    What does the data show? When sent unannounced to MD’s office, experienced MD’s cannot differentiate SP from real patient Detection rate

  • OSCE/SP: Suggested References

    Colliver JA, Swartz MH. Assessing clinical performance with standardized patients. JAMA. 1997 Sep 3;278(9):790-1.Van der Vleuten, CPM and Swanson, D. Assessment of clinical skills with standardized patients: State of the art. Teach Learn Med. 1990; 2: 58-76.Barrows HS. An overview of the uses of standardized patients for teaching and evaluating clinical skills. Academic Medicine. 1993. 68(6):443-51.Adamo G. Simulated and standardized patients in OSCEs: achievements and challenges 1992-2003. Med Teach. 2003 May;25(3):262-70. Barrows HS. Acad Med 1993; 6: 443-51

  • Breakout #2

  • Statistical Methods

  • How Do You Choose A Statistical Test?

  • Goals

    Use a case of an educational intervention to help demonstrate:

    -

    Study designs-

    Variable types

    -

    Exploratory data analysis-

    Confirmatory (inferential) data analysis

    -

    Basic interpretation of results

  • Step 1. Study Question

    Step 2. Study Design

    Step 3. Type of Outcome Variable

    Step 4. Distribution of the Outcome Variable

    The Four Step Approach to Choosing a Statistical Test

  • Case Presentation: Educational Intervention

  • New curriculum for 2nd-year medical students aimed at improving:

    1.

    Physical examination skills

    2.

    Confidence in performing

    physical examination maneuvers

    3.

    Interpersonal skills

  • Randomized Controlled Trial

    120 students

    60 students 60 students

    Standard Curriculum New Curriculum

    Standardized patient exam used to evaluate outcomes

  • Step 1. Study Question

    Step 2. Study Design

    Step 3. Type of Outcome Variable

    Step 4. Distribution of the Outcome Variable

    The Four Step Approach to Choosing a Statistical Test

  • The Case: Study Question 1

    Do participants and controls differ in the mean number of relevant physical

    examination maneuvers performed correctly at the end of the curriculum?

  • Hypothesis Testing

    Participants and controls do not

    differ in the mean number of relevant physical

    examination maneuvers performed correctly at the end of the curriculum.

  • Step 1. Study Question

    Step 2. Study Design

    Step 3. Type of Outcome Variable

    Step 4. Distribution of the Outcome Variable

    The Four Step Approach to Choosing a Statistical Test

  • Types of Study Designs: Observational vs. Experimental

    Observational Study DesignStudies that observe groups at one or morepoints in time without imposing an intervention:

    Cross-sectional studies•

    Case-control studies

    Cohort, longitudinal, prospective studies

  • Types of Study Designs: Observational vs. Experimental

    Experimental Study DesignStudies that allocate interventions to one or more groups and make comparisons:

    Pre-post tests•

    Controlled clinical trials

    Randomized controlled trials

  • Designing a Study: Are the Data Paired or Unpaired?

    Importance: Measurements of paired subjects are more likely to be highly correlated (highly related) than measurements of two randomly selected subjects.

  • Designing a Study: Are the Data Paired or Unpaired?

    Paired measurements come from common origins:

    Same subject before and afterTwins (genetic)Husbands and wives (environmental)Matched cases and controls (e.g., age)

  • Designing a Study: Are the Data Paired or Unpaired?

    Unpaired measurements come from 2 independent (or unrelated) groups:

    e.g., Cholesterol levels from different study groups

  • Back to the Case Step 2: Study Design

    Observational or Experimental?

    Randomized Controlled TrialExperimental

  • Paired or Unpaired Data?

    Within our RCT, we could have paired and unpaired data depending on the question we wish to answer.

    Do participants and controls differ in mean number ofphysical exam maneuvers at the end of the curriculum?

    Unpaired groups•

    Intervention Students

    Control Students

  • Step 1. Study Question

    Step 2. Study Design

    Step 3. Type of Outcome Variable

    Step 4. Distribution of the Outcome Variable

    The Four Step Approach to Choosing a Statistical Test

  • Types of Research Variables

    ContinuousDichotomous

    OrdinalNominal

  • Types of Research Variables

    Continuous VariableVariable with no gaps in values

    Example: Age

    Birth

    Death

  • Types of Research Variables

    Dichotomous VariableA discrete categorical variable

    with two possible values.

    Example:

    Gender

    FemaleMale

  • Types of Research Variables

    Ordinal VariableA ranked or ordered variable

    Example:

    Likert scale (1 –

    5)

    1 2 3 4 5

  • Types of Research Variables

    Nominal VariableClassifies data into categories

    Example:

    Marital status

    Single Married

    Divorced Widowed

  • Back to the Case Step 3: Type of Outcome Variable

    The mean number of relevant physicalexam maneuvers performed correctly.

    Continuous variable

  • Step 1. Study Question

    Step 2. Study Design

    Step 3. Type of Outcome Variable

    Step 4. Distribution of the Outcome Variable

    The Four Step Approach to Choosing a Statistical Test

  • Exploratory Data Analysis

  • Why Explore Your Data?

    Look for mistakes in data entryChoose summary measuresChoose a parametric or nonparametric statistical test

    Look at your data!

  • Summary Measures

    Measures of Central Tendency Mean = the average (continuous)Median = the midpoint (continuous, ordinal)Mode = the most frequent number (any variable)

  • Types of Statistical Tests

    Parametric versus NonparametricStatistical Tests

  • Parametric TestsKey point: Use when evaluating continuous or ordinal variables with a normal distribution.

    05

    1015

    Freq

    uenc

    y

    0 20 40 60 80 100Age

  • Parametric Tests

    Examples:

    Student t-test –

    comparison of means (unpaired)

    Paired t-test –

    comparison of means (paired)

    Linear regression –

    analysis when outcome is continuous and normally distributed

  • Nonparametric Tests

    Key point: Use when the sample size is small or if data are NOT normally distributed.

    More conservative than parametric tests.0

    1020

    3040

    No.

    Sei

    zure

    s

    Plbo Drug

    Boxplots: Baseline Seizure Rate

    Treatment Group

  • Nonparametric Tests

    Examples:

    Wilcoxon rank sum test –

    unpaired

    Wilcoxon signed rank test –

    paired

    Nonparametric regression

  • Confirmatory Data Analysis (Inferential Statistics)

  • Inferential Statistics

    Uses Hypothesis Testing to:

    Assess the strength of the evidenceMake predictions Draw conclusions about a population

    Based on sample data

  • Inferential Statistics

    1. Comparisons between two groups:Bivariate analyses

    2. Assess one outcome with more than one predictor variables:Multivariable regression analyses

  • Regression

    Statistical method used to describe theassociation between one dependent (outcome) variable and one or more independent (predictor) variables.

    One Reason to use Regression:To adjust for confounding factors.

  • Confounding Factor

    A variable related to ≥

    1 of the variables in a study.It may mask an actual association or falsely demonstratean association where no real association exists.

    Examples: age, gender, comorbidities

  • Back to the Case

    Step 1: Study QuestionIs there a difference in the mean number of relevant physical examination maneuvers performed correctlybetween groups?

    Step 2: Study DesignRandomized controlled trial using unpaired

    data.

  • Two Unpaired (Independent)

    Samples

    ContinuousOutcome

    Dichotomous Outcome

    OrdinalOutcome

    NominalOutcome

    Normally Distributed

    Student t-test

    Wilcoxonrank-sum

    Small Sample Size

    Fisher’s Exact test

    Chi-square test

    Wilcoxonrank-sum

    Fisher’s Exact test

    noyes noyes

    Figure B

    Parametric

    Nonparametric

  • Back to the Case

    Step 1: Study QuestionIs there a difference in the mean number of relevant physical examination maneuvers performed correctlybetween groups?

    Step 2: Study DesignRandomized controlled trial using unpaired

    data.

    Step 3: Type of Outcome VariableContinuous

  • Two Unpaired (Independent)

    Samples

    ContinuousOutcome

    Dichotomous Outcome

    OrdinalOutcome

    NominalOutcome

    Normally Distributed?

    Student t-test

    Wilcoxonrank-sum

    Small Sample Size

    Fisher’s Exact test

    Chi-square test

    Wilcoxonrank-sum

    Fisher’s Exact test

    noyes noyes

    Figure B

    Parametric

    Nonparametric

  • Back to the Case Step 4: Distribution of the Outcome Variable

    05

    1015

    20

    0 2 4 6 8 0 2 4 6 8

    1 2

    Freq

    uenc

    y

    P h ys ical E xa m Item s O btain edG raphs by 1 = Case, 2 = Control

    Intervention Control

  • The distribution of the number of physical exam maneuvers for each group plotted on a histogram appears normally distributed.

  • Back to the Case

    Step 1: Study QuestionIs there a difference in the mean number of relevant physical examination maneuvers performed correctlybetween groups?

    Step 2: Study DesignRandomized controlled trial using unpaired

    data.

    Step 3: Type of Outcome VariableContinuous

    Step 4: Distribution of Outcome VariableNormally distributed

  • Two Unpaired (Independent)

    Samples

    ContinuousOutcome

    Dichotomous Outcome

    OrdinalOutcome

    NominalOutcome

    Normally Distributed?

    Student t-test

    Wilcoxonrank-sum

    Small Sample Size

    Fisher’s Exact test

    Chi-square test

    Wilcoxonrank-sum

    Fisher’s Exact test

    noyes noyes

    Figure B

    Parametric

    Nonparametric

  • Two Unpaired (Independent)

    Samples

    ContinuousOutcome

    Dichotomous Outcome

    OrdinalOutcome

    NominalOutcome

    Normally Distributed?

    Student t-test

    Wilcoxonrank-sum

    Small Sample Size

    Fisher’s Exact test

    Chi-square test

    Wilcoxonrank-sum

    Fisher’s Exact test

    noyes noyes

    Parametric

    Nonparametric

    Figure B

  • Understanding and Interpreting our

    Statistical Results

  • Results of the Student t-test

    Mean number (standard deviation)of relevant physical examinationmaneuvers performed correctly

    Intervention Control14.4 (1.1) 12.1 (1.0)

    p

  • Case Interpretation

    Reject the null hypothesis and conclude:

    The intervention students scoredstatistically significantly higher than the controls.

  • Curriculum for second-year medical students aimed at improving:

    1.

    Physical examination skills 2.

    Confidence in performing

    physical

    examination maneuvers 3.

    Interpersonal skills

  • Study Question 2

  • Step 1: Study Question 2

    Is there a difference in the intervention students’

    confidence level in performing

    physical examination maneuvers before and after the curriculum?

  • Step 2: Study Design

    Intervention students before

    and after

    the curriculum

    Pre-post design comparing paired group

  • Two Paired (Dependent)

    Samples

    ContinuousOutcome

    Dichotomous Outcome

    OrdinalOutcome

    NominalOutcome

    Normally Distributed

    (parametric)

    Paired t-testWilcoxon

    signed-rank test

    McNemar’sTest

    Wilcoxonsigned-rank

    test

    McNamar’sTest

    noyes

    Parametric Nonparametric

    Figure C

  • Step 3: Type of Outcome Variable

    The confidence level is measured on a 4-point Likert scale: 1 = not very confident

    4 = very confidentand is a(an) _______ variable.

  • Step 3: Type of Outcome Variable

    The confidence level is measured on a 4-point Likert scale: 1 = not very confident

    4 = very confidentand is a(an)

    ordinal variable.

  • Two Paired (Dependent)

    Samples

    ContinuousOutcome

    Dichotomous Outcome

    OrdinalOutcome

    NominalOutcome

    Normally Distributed

    (parametric)

    Paired t-testWilcoxon

    signed-rank test

    McNemar’sTest

    Wilcoxonsigned-rank

    test

    McNamar’sTest

    noyes

    Parametric Nonparametric

    Figure C

  • Step 4: Distribution of the Outcome Variable

    Before After

    020

    4060

    80Fr

    eque

    ncy

    1 2 3 4Confidence After

    010

    2030

    40Fr

    eque

    ncy

    1 2 3 4Confidence Before

  • Statistical Test

    Compare differences in confidence scores with a: parametric

    or nonparametric test?

    nonparametric test

  • Two Paired (Dependent)

    Samples

    ContinuousOutcome

    Dichotomous Outcome

    OrdinalOutcome

    NominalOutcome

    Normally Distributed

    (parametric)

    Paired t-testWilcoxon

    signed-rank test

    McNemar’sTest

    Wilcoxonsigned-rank

    test

    McNamar’sTest

    noyes

    Parametric Nonparametric

    Figure C

  • ResultsThe median number (interquartile range)

    of the confidence scores:

    Pre Post2 (IQR 2 – 3) 3.5 (IQR 3 – 4)

    p

  • Interpretation

    Reject the null hypothesis and conclude:

    The intervention was successful at improving students’

    confidence.

  • Curriculum for second-year medical students aimed at improving:

    1.

    Physical examination skills 2.

    Confidence in performing

    physical

    examination maneuvers 3.

    Interpersonal skills

  • Study Question 3

  • Step 1: Study Question 3

    Do participants and controls differ in their overall interpersonal scores at the end of

    the curriculum?

  • Step 2: Study Design

    Randomized controlled trial comparingtwo: paired

    or unpaired groups?

    two unpaired groups:

    Intervention Students•

    Control Students

  • Two Unpaired (Independent)

    Samples

    ContinuousOutcome

    Dichotomous Outcome

    OrdinalOutcome

    NominalOutcome

    Normally Distributed

    (parametric)

    Student t-test

    Wilcoxonrank-sum

    Small Sample Size

    Fisher’s Exact test

    Chi-square test

    Wilcoxonrank-sum

    Fisher’s Exact test

    noyes noyes

    Figure B

    Parametric

    Nonparametric

  • Step 3: Type of Outcome Variable

    The overall interpersonal score is the sum of the20-item interpersonal scores each rated on a

    5-point Likert scale.

    (1=poor, 5=excellent)

    This score is continuous ranging from 20 -

    100.

  • Two Unpaired (Independent)

    Samples

    ContinuousOutcome

    Dichotomous Outcome

    OrdinalOutcome

    NominalOutcome

    Normally Distributed?

    Student t-test

    Wilcoxonrank-sum

    Small Sample Size

    Fisher’s Exact test

    Chi-square test

    Wilcoxonrank-sum

    Fisher’s Exact test

    noyes noyes

    Figure B

    Parametric

    Nonparametric

  • Step 4: Distribution of the Outcome Variable

    05

    1015

    20

    4 0 6 0 8 0 1 0 0 4 0 6 0 8 0 1 0 0

    1 2

    Freq

    uenc

    y

    T o ta l In te rp e r so n a l S c o r eG ra ph s b y 1 = Ca se , 2 = C on tro l

    Intervention Control

  • Step 4: Distribution of the Outcome Variable

    Although the outcome is continuous, the distribution of the scores plotted on a histogramappeared negatively skewed.

  • Statistical Test

    Compare differences in overall interpersonal scores with a: parametric

    or nonparametric test?

    nonparametric test

  • Two Unpaired (Independent)

    Samples

    ContinuousOutcome

    Dichotomous Outcome

    OrdinalOutcome

    NominalOutcome

    Normally Distributed?

    Student t-test

    Wilcoxonrank-sum

    Small Sample Size

    Fisher’s Exact test

    Chi-square test

    Wilcoxonrank-sum

    Fisher’s Exact test

    noyes noyes

    Figure B

    Parametric

    Nonparametric

  • Results

    The median number (interquartile range) of the interpersonal scores:

    Intervention Control79 (IQR 72 – 89) 74 (IQR 65 – 86)

    p=0.06

  • Interpretation

    Cannot reject the null hypothesis and conclude:

    Our curriculum did not significantly improve interpersonal skills.

  • Any Questions ? ? ?

  • Your Turn!

  • Final Comments

    Consult a statistician or someone with statistical knowledge early in your research for guidance.

    Helpful BooksIntuitive BiostatisticsStudying a Study and Testing a Test Basic and Clinical Biostatistics

  • Other Resources

    1. Free Statistical Calculatorshttp://graphpad.com/quickcalcs/index.cfm

    2. Free Statistical & Power Analysis Softwarehttp://www.ncss.com/download.html

    3. Online Statistics Texthttp://www.statsoft.com/textbook/stathome.html

    http://graphpad.com/quickcalcs/index.cfmhttp://www.ncss.com/download.htmlhttp://www.statsoft.com/textbook/stathome.html

  • ReferencesLuck J, Peabody JW, Dresselhaus TR, Lee M, Glassman P. How well does chart abstraction measure quality? A prospective comparison of standardized patients with the medical record. American Journal of Medicine. 2000 Jun 1;108(8):642-9.Houston TK, Wall T, Allison JJ, Palonen K, Willett LL, Keife CI, Massie FS, Benton EC, Heudebert GR. Implementing achievable benchmarks in preventive health: a controlled trial in residency education. Academic Medicine. 2006 Jul;81(7):608-16.Furman CD, Head B, Lazor B, Casper B, Ritchie CS. Evaluation of an educational intervention to encourage advance directive discussions between medicine residents and patients. J Palliat Med. 2006 Aug;9(4):964-7Miller GE. The assesment of clinical skills/competence/performance. Acad Med 1990. 65 (S9) S63-67. Colliver JA, Swartz MH. Assessing clinical performance with standardized patients. JAMA. 1997 Sep 3;278(9):790-1.Van der Vleuten, CPM and Swanson, D. Assessment of clinical skills with standardized patients: State of the art. Teach Learn Med. 1990; 2: 58-76.Barrows HS. An overview of the uses of standardized patients for teaching and evaluating clinical skills. Academic Medicine. 1993. 68(6):443-51.Epstein RM, Hundert EM. Defining and assessing professional competence.JAMA. 2002 Jan 9;287(2):226-35. Adamo G. Simulated and standardized patients in OSCEs: achievements and challenges 1992-2003. Med Teach. 2003 May;25(3):262-70. Norman GR et al. J Med Educ 1982; 57:708-15Petrusa ER, et al. Arch Int Med 1990; 150: 573-7Rethans J-J, et al. BMJ 1991; 303: 1377-80Norman GR, et al. J Med Educ 1985; 60: 925-34King AM, et al. Teach Learn Med 1994; 6: 6-14Williams R. Teach Learn Med 2004; 16 (2): 215-222.http://mededonline.usc.edu/spcalconsortium.html

    Curriculum EvaluationWorkshop OverviewVocabularyEvaluationQuantitative DataQuantitative AnalysisQualitative DataQualitative AnalysisReliabilityValidityFace ValidityCriterion ValidityConstruct ValidityCongruenceExamples of CongruenceHypothesis testingSlide Number 17Slide Number 18Slide Number 19Slide Number 20Slide Number 21Slide Number 22Slide Number 23Slide Number 24Slide Number 25Slide Number 26Slide Number 27Study designsStudy design: �Post testStudy design: �Pretest-Post testStudy design: �Pretest-Post testStudy design: �Randomized controlled trialRCT: Option 2Alternate control groupsStudy design: �Randomized controlled trialBreakout #1Evaluation Objects� Evaluation objectsLevels of EvaluationResourcesEvaluation MethodsQuestionnaires/SurveysDescriptionStep 1Step 2Slide Number 46Step 3Slide Number 48Closed-ended questionsSlide Number 50Open-ended questionsStep 4Step 5Step 6UsePsychometric qualitiesSlide Number 57Feasibility/practicalitySuggested referencesSelf-assessment exercisesDescriptionSlide Number 62Slide Number 63Slide Number 64Motivational DiscomfortUsePsychometric qualitiesSlide Number 68Slide Number 69Feasibility/practicalityResourcesMultiple Choice Questions (MCQs)DescriptionSlide Number 74Slide Number 75The StemBad stem (1)Better stemBad stem (2)Better stemBad stem (3)Better stemThe OptionsBad items (1)Bad items (2)UsePsychometric QualitiesCueingItem discriminationCronbach’s alphaFeasibility/practicalityResourcesPatient SurveysMiller’s Pyramid of Clinical CompetenceDescriptionPsychometric QualitiesUsesFeasibilityExampleExample cont.Example Cont.ReferencesOral Examinations & �Chart Stimulated RecallMiller’s Pyramid of Clinical CompetenceDescriptionDescription- Oral BoardsPsychometric Qualities�Board Oral ExamsChart Stimulated RecallPsychometric Qualities�CSR (Best Circumstances)UsesFeasibilityExample- Oral ExamOral Exam ContExample- CSRCSR Cont.ReferencesPerformance AuditMiller’s Pyramid of Clinical CompetencePerformance Audit: �DescriptionPerformance Audit: �DescriptionPerformance Audit: �UsesPerformance Audit: � Psychometric QualitiesPerformance Audit: � Psychometric QualitiesPerformance Audit: � Psychometric QualitiesPerformance Audit: � Psychometric QualitiesPerformance Audit: Feasibility/PracticalityPerformance Audit: �Example in the literaturePerformance Audit: �Example in the literaturePerformance Audit: �Example in the literature 2Performance Audit: �Suggested ReferencesStandardized Patients�and Objective Structured Clinical ExaminationSPs and OSCEsMiller’s Pyramid of Clinical CompetenceStandardized Patients: DescriptionStandardized Patients: VocabularyStandardized Patients:� UsesStandardized Patients:� UsesStandardized Patients:� UsesOSCE �AKA Clinical Skills Assessment/ Exam �(CSA/ CSE)�AKA Clinical Practice Examinations (CPX)OSCE: DescriptionOSCE: DescriptionOSCE: DescriptionOSCE: DescriptionOSCE:UsesOSCE/SP:� PsychometricsOSCE/SP:� PsychometricsOSCE/SP:� PsychometricsOSCE/SP: � PsychometricsOSCE/SP: �Feasibility/PracticalityOSCE/SP: �Feasibility/PracticalityOSCE/SP: � Feasibility/PracticalityOSCE/SP:� Suggested ReferencesBreakout #2Statistical MethodsSlide Number 155Slide Number 156Slide Number 157Slide Number 158Slide Number 159Slide Number 160Slide Number 161Slide Number 162Slide Number 163Slide Number 164Slide Number 165Slide Number 166Slide Number 167Slide Number 168Slide Number 169Slide Number 170Slide Number 171Slide Number 172Slide Number 173Slide Number 174Slide Number 175Slide Number 176Slide Number 177Slide Number 178Slide Number 179Slide Number 180Slide Number 181Slide Number 182Slide Number 183Slide Number 184Slide Number 185Slide Number 186Slide Number 187Slide Number 188Slide Number 189Slide Number 190Slide Number 191Slide Number 192Slide Number 193Slide Number 194Slide Number 195Slide Number 196Slide Number 197Slide Number 198Slide Number 199Slide Number 200Slide Number 201Slide Number 202Slide Number 203Slide Number 204Slide Number 205Slide Number 206Slide Number 207Slide Number 208Slide Number 209Slide Number 210Slide Number 211Slide Number 212Slide Number 213Slide Number 214Slide Number 215Slide Number 216Slide Number 217Slide Number 218Slide Number 219Slide Number 220Slide Number 221Slide Number 222Slide Number 223Slide Number 224Slide Number 225Slide Number 226Slide Number 227Slide Number 228Slide Number 229Slide Number 230Slide Number 231Slide Number 232Slide Number 233Slide Number 234Slide Number 235References