Learn to love the argument - Dashboard - Wiki@UCSF
Transcript of Learn to love the argument - Dashboard - Wiki@UCSF
•••
Conceptualizing Validity & Validation
Simulation Educator Training - Principles of Assessment in Simulation Supplement
Special thanks to David Cook & Rose Hatala.
Expected Actions
Observer’s frame
Learners’ frames
Judgment Rating,
Feedback
Why should I care about this?
Objectives • Re-conceptualize how you define and think about validity and validation
• Define differences between two contemporary validity frameworks
• List types of evidence associated with the four inferences in Kane’s validity framework
• Define the interpretation-use argument
• Evaluate common mistakes to avoid in validation
What is validity?
“Validity refers to the evidence presented to support or refute the meaning or interpretation assigned to assessment results.”
S. Downing, 2003
Inference, not instrument
“We used a validated checklist”
“Training using our validated simulation curriculum”
Collecting evidence to evaluate appropriateness of inferences:
Assessment
ScoreInference/
Decision
Action/
Use
Activity: Small Group Discussion (7-10 mins)
What is tripping you up about this new conceptualization of validity?
What is hard to let go from your previous way of thinking about validity and validation?
Objectives
• Re-conceptualize how you define and think about validity and validation
• Define differences between two contemporary validity frameworks
Validity Frameworks
What if no gold standard?Risk of confirmation bias.
Too many types.Everything relates to the construct.Where to fit reliability?
How to prioritize evidence?
Validity Frameworks
Validity of Test/ Instrument Scores
Messick Model
(Standards of Psychological Testing)
Content, Response Process, Internal Structure, Relation other variables, Consequences
Kane Model
Scoring, Generalization, Extrapolation, Decision
Content
Response Process
Internal Structure
Relations to Other Variables
Consequences
Scoring
Generalization
Extrapolation
Decision
Benefits of the Argument-based Framework
1. Focuses attention on broad array of issues associated with interpreting and using assessment scores.
2. Emphasizes that we make assumptions when we interpret scores and the need to check our assumptions.
3. Allows for alternative interpretations and uses of assessment scores.
Adaptability of Framework
Benefit of Kane is that the framework is well suited to:
• Quantitative data
• Qualitative data
• Programmatic assessment
Objectives
• Re-conceptualize how you define and think about validity and validation
• Define differences between two contemporary validity frameworks
• List types of evidence associated with the four inferences in Kane’s validity framework
Scoring Generalization Extrapolation Implication
Scoring rubric/criteria(e.g. empiric comparisonof different procedures,think-aloud study)
Observation format (e.g.empiric comparison ofdifferent formats, suchas live vs.video-based)
QUAL: The richness, accuracy,authenticity and fairnessof qualitative data
Distinction: Workplace-based vs. Simulation-based Assessment
Sampling and ability to have multiple ‘stations’
Scoring Generalization Extrapolation Implication
Scoring rubric/criteria(e.g. empiric comparisonof different procedures,think-aloud study)
Observation format (e.g.empiric comparison ofdifferent formats, suchas live vs.video-based)
QUAL: The richness, accuracy,authenticity and fairnessof qualitative data
Reliability orgeneralisability (items,raters, tasks, occasions)
QUAL: Consistency andreflexivity ofinterpretations formedby different interpreters
Observation
Scoring Single Score
GeneralizationPerformance:
test setting
ExtrapolationPerformance:
real world
Distinction: Workplace-based vs. Simulation-based Assessment
Ensure simulation “matches”clinical context / construct of interest
Scoring Generalization Extrapolation Implication
Scoring rubric/criteria(e.g. empiric comparisonof different procedures,think-aloud study)
Observation format (e.g.empiric comparison ofdifferent formats, suchas live vs.video-based)
QUAL: The richness, accuracy,authenticity and fairnessof qualitative data
Reliability orgeneralisability (items,raters, tasks, occasions)
QUAL: Consistency andreflexivity ofinterpretations formedby different interpreters
Correlation with anothermeasure having anexpected relationship(convergent; concurrentor predictive)
Discrimination (knowngroups comparison)
QUAL: Agreement ofstakeholders thatinterpretations will applyto new contexts intraining or practice(transferability)
Observation
Scoring Single Score
GeneralizationPerformance:
test setting
ExtrapolationPerformance:
real world
Implication Decision
Scoring Generalization Extrapolation Implication
Scoring rubric/criteria(e.g. empiric comparisonof different procedures,think-aloud study)
Observation format (e.g.empiric comparison ofdifferent formats, suchas live vs.video-based)
QUAL: The richness, accuracy,authenticity and fairnessof qualitative data
Reliability orgeneralisability (items,raters, tasks, occasions)
QUAL: Consistency andreflexivity ofinterpretations formedby different interpreters
Correlation with anothermeasure having anexpected relationship(convergent; concurrentor predictive)
Discrimination (knowngroups comparison)
QUAL: Agreement ofstakeholders thatinterpretations will applyto new contexts intraining or practice(transferability)
Pass/fail standard (e.g.ROC curve)
QUAL/QUAN: Effectiveness of actionsbased on assessmentresults
Intended or unintendedconsequences of testing
Cook DA, Zendejas B, Hamstra SJ, Hatala R, Brydges R. What counts as validity evidence? Examples and prevalence in a systematic review of simulation-based assessment. Adv Health Sci Educ Theory Pract. 2014 May;19(2):233-50.
Large Group Discussion (10 mins + short break)
How did all that land for/on you?
Questions / Comments?
Objectives
• Re-conceptualize how you define and think about validity and validation
• Define differences between two contemporary validity frameworks
• List types of evidence associated with the four inferences in Kane’s validity framework
• Define the interpretation-use argument
Interpretation/Use Argument (IUA)
• Making an interpretation/use argument
• Specific purpose, meaning or interpretation
• Specific point in time
• Well-defined population and use
• Organizes your inferences and assumptions and allows you to test if the evidence supports or refutes them
Example: Interpretation/Use Argument
I plan to measure lumbar puncture skills using a simulator to assess residents’ performance during acquisition and retention; all for the purposes of discerning the effects of two learning interventions in a research program.
Example: Interpretation/Use Argument
I plan to measure lumbar puncture skills using a simulator to assess residents’ performance during acquisition and retention; all for the purposes of discerning the effects of two learning interventions in a research program.
Activity: Small Group Discussion (20 mins)
Consider an assessment tool you plan to use in your simulation program
Develop your own Interpretation-Use Argument that fits with how you are using that tool to achieve your program purpose
What evidence would you still need to collect for this IUA?
Objectives
• Re-conceptualize how you define and think about validity and validation
• Define differences between two contemporary validity frameworks
• List types of evidence associated with the four inferences in Kane’s validity framework
• Define the interpretation-use argument
• Evaluate common mistakes to avoid in validation
Common Mistakes – Cook & Hatala, 2016
• Don’t use a validity framework
• Reinvent the wheel by creating a new instrument each time a need arises
• Make expert-novice comparisons the crux of the validity argument
• Focus on the easily-accessible validity evidence rather than the most important
• Focus on the instrument rather than score interpretations and uses
• Don't synthesize or critique the validity evidence
• Ignore best practices for assessment development
• Omit details about the instrument
• Let the availability of the simulator/assessment instrument drive the assessment
Common Mistakes – Cook & Hatala, 2016
1. Don’t use a validity framework
2. Reinvent the wheel by creating a new instrument each time a need arises
3. Make expert-novice comparisons the crux of the validity argument
4. Focus on the easily-accessible validity evidence rather than the most important
5. Focus on the instrument rather than score interpretations and uses
6. Don't synthesize or critique the validity evidence
7. Ignore best practices for assessment development
8. Omit details about the instrument
9. Let the availability of the simulator/assessment instrument drive the assessment
Activity: Small Group Discussion
Each table assigned 2-3 of the common mistakes
Why are these mistakes? What is the issue?
How would you talk to / teach a colleague (or your boss) about the importance of this mistake?
Pearls
• Test scores can have multiple possible interpretations/uses & these are what validity evidence is collected for.
• Validity frameworks exist and we can use them to organize our work.
• Validity of proposed interpretation/use depends on how well evidence supports the IUA.
• More evidence required for more ambitious proposed interpretation/uses.
Learn to love the argument: Using Kane's Framework in Simulation-based Assessment
Email me for a template:
References
Clauser, B. E., Margolis, M. J., & Swanson, D. B. (2008). Issues of validity and reliability for assessments in medical education. Practical guide to the evaluation of clinical competence, 10-23.
Downing, S. M. (2003). Validity: on the meaningful interpretation of assessment data. Medical education, 37(9), 830-837.
Kane, M. T. (2013). Validating the interpretations and uses of test scores. Journal of Educational Measurement, 50(1), 1-73.
Cook, D. A., Zendejas, B., Hamstra, S. J., Hatala, R., & Brydges, R. (2014). What counts as validity evidence? Examples and prevalence in a systematic review of simulation-based assessment. Advances in Health Sciences Education, 19(2), 233-250.
Cook, D. A., Brydges, R., Ginsburg, S., & Hatala, R. (2015). A contemporary approach to validity arguments: a practical guide to Kane's framework.Medical education, 49(6), 560-575.
American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (1999).Standards for educational and psychological testing. American Educational Research Association.
Cook, D.A., Hatala R. (In Press). Validation of Educational Assessments: A Primer for Simulation and Beyond. Advances in Simulation.