Leadership of teacher learning - SSAT
Transcript of Leadership of teacher learning - SSAT
Dylan Wiliam (@dylanwiliam)
Leadership of teacher learning
www.dylanwiliam.net
INTERNAL
Outline: Five questions
• Where should our efforts be focused?
• Where does formative assessment fit in?
• What makes effective teacher learning?
• What doesn’t get done?
• How will we know it’s working?
Evaluating teaching
INTERNAL
Do we know a good teacher when we see one?
• Experiment 1
– Seven teachers (3 high-performing, 4 not)• Group 1: at least 0.5 sd above mean value-added for 3 years
• Group 2: never 0.5 sd above average value-added in 3 years
– 7 video clips shown to 100 raters
– Average number of correct ratings: 2.8
Distribution of total correct ratings
0 1 2 3 4 5 6 7
1% 11% 29% 36% 13% 9% 1% 0%
Strong, Gargani, and Hacifazlioğlu (2011)
INTERNAL
Ratings by rater type
Rater Number Accuracy (%)
Teachers 10 37
Parents 7 37
Mentors 10 47
University professors 9 41
Administrators 10 31
Teacher educators 10 31
College students 11 36
Math educators 10 34
Other adults 11 43
Primary school students 12 50
Rater Number Accuracy (%)
Teachers 10 37
Parents 7 37
Mentors 10 47
University professors 9 41
School leaders/deputes 10 31
Teacher educators 10 31
College students 11 36
Math educators 10 34
Other adults 11 43
INTERNAL
What if the difference is larger?
• Experiment 2
– Two groups of teachers (4 teachers in each group)• Group 1: at least 0.5 sd above average value-added
• Group 2: at least 0.5 sd below average value-added
– 8 video clips shown to 165 experienced school leaders
– Average number of correct ratings: 3.85
Distribution of total correct ratings
0 1 2 3 4 5 6 7 8
1% 3% 11% 25% 25% 24% 9% 1% 0%
Can we identify good teachers after training?
INTERNAL
Framework for teaching (Danielson 1996)
• Four domains of professional practice
– Planning and preparation
– Classroom environment
– Instruction
– Professional responsibilities
• Links with student achievement (Sartain, et al. 2011)
– Domains 1 and 4: no impact on student achievement
– Domains 2 and 3: some impact on student achievement
INTERNAL
Observations and teacher quality
-15
-10
-5
0
5
10
15
20
Unsatisfactory Basic Proficient DistinguishedPe
rce
nta
ge c
han
ge in
rat
e
of
lear
nin
g
Reading Mathematics
Sartain, Stoelinga, Brown, Luppescu, Matsko, Miller, Durwood, Jiang, and Glazer (2011)
So, the highest rated teachers are 30% more productive than the lowest rated
But the best teachers are 400% more productive than the least effective
INTERNAL
Imprecision in lesson observations
Hill, Charalambous and Kraft (2012)
Achieving a reliability of 0.9 in judging teacher quality through lesson observation is likely to require observing a teacher teaching 6 different classes, and for each lesson to be judged by 5 independent observers.
INTERNAL
Bias in lesson observations
• A study of 834 teachers from six large US school districts found that teachers were more likely to be given a higher observation rating if they were teaching students with higher achievement.
INTERNAL
Bias in lesson observations
Steinberg and Garrett (2016)
INTERNAL
Bias in lesson observations
• A study of 834 teachers from six large US school districts found that teachers were more likely to be given a higher observation rating if they were teaching students with higher achievement.
• Compared with teachers teaching the lowest achieving students (bottom 20%), those teaching the highest achieving students (top 20%) were:
– 2.5 times as likely to be top-rated in English
– 6 times as likely to be top-rated in mathematics
Can we identify good teachers from test scores?
14
INTERNAL
Short-term and long-term effects
• Data: 10,534 students attending USAFA (2000-2007)
• Students randomly allocated to calculus instructors
15
Carrell and West (2010)
Instructors
Less qualified, less experienced More qualified, more experienced
Higher end of course scores Lower end of course scores
Lower scores on follow-on courses Higher scores on follow-on courses
Higher end of course evaluations Lower end of course evaluations
Instructors
Less qualified, less experienced More qualified, more experienced
Higher end of course scores Lower end of course scores
Lower scores on follow-on courses Higher scores on follow-on courses
Instructors
Less qualified, less experienced More qualified, more experienced
Higher end of course scores Lower end of course scores
Instructors
Less qualified, less experienced More qualified, more experienced
Instructors
Can we identify good teachers by combining evidence from different sources?
INTERNAL
Measures of Effective Teaching project17
• Three sources of evidence on teacher effectiveness
– Value-added estimates
– Classroom observation
– Student perception surveys
• Prediction accuracy maximised with these weights
– Value-added estimates 81%
– Classroom observation 17%
– Student perception surveys 2%
Bill and Melinda Gates Foundation (2012)
INTERNAL
For secondary English teachers (S1 to S3)
Correlation with standardized test score gains 0.69
Correlation with higher-order assessments 0.29
Reliability 0.51
18
To get a 90% reliable prediction of a teacher’s quality, you would need to collect data for each teacher for 9 years
INTERNAL
This is what a correlation of 0.69 looks like…19
Actual
Pre
dic
ted
INTERNAL
…and this is a correlation of 0.29…20
Actual
Pre
dic
ted
What is the impact of removing less effective teachers?
21
INTERNAL
Effects of removing low-performing teachers
• Data: reading scores for 4th and 5th grade students in Florida’s public schools from 2004-05 to 2008-09
• A total of 227,014 students (96%) are matched to 15,152 teachers responsible for teaching reading
• A value-added score is estimated for each teacher each year
• Two policy options explored for teacher removal:
– Value-added score below threshold for two consecutive years
– Two-year average value-added score below threshold
Winters and Cowen (2013)
22
INTERNAL
System-wide impact
Policy Severity(percentile)
Increase in teacher valued-added
Extra weeks of learning per student
per year
Consecutive
5th .003 0.0
10th .006 0.1
25th .020 0.3
Two-year average
5th .020 0.3
10th .031 0.4
25th .050 0.7
23
INTERNAL
What does this all mean?
• The only way to improve student achievement at scale is to invest in the teachers we already have
• The “love the one you’re with” strategy
24
INTERNAL
Evaluation vs. improvement
• Evaluation frameworks:– of necessity, have to be comprehensive
– include all aspects of teachers work
– at best, incentivize improvement on all aspects of practice
– at worst, incentivize improvement on aspects of practice that are easy to improve
• Improvement frameworks:– are selective
– focus on those aspects of practice with the biggest payoff for students
• To maximize improvement, evaluation frameworks have to be used selectively
The ‘next big thing’
INTERNAL
Things that don’t work
• Getting smarter people into teaching
• Paying good teachers more
• Brain Gym®
• Learning styles
• Copying other countries
INTERNAL
Things that might work
• Differentiation
• Lesson study/Learning study
• Social and emotional aspects of learning
• Educational neuroscience
• Grit
INTERNAL
Things that do work—a bit
• Firing bad teachers
• Class size reduction
• Growth mindset
There is no ‘next big thing’
Just lots of small, mostly old, things
INTERNAL
Understanding meta-analysis
• A technique for aggregating results from different studies by expressing results with a common measure
• Problems with meta-analysis
– Inappropriate comparisons
– Aptitude x treatment interaction
– The “file drawer” problem
– Variations in intervention quality
– Selection of studies
• Problems with effect sizes
– Variation in population variability
– Sensitivity of outcome measures
INTERNAL
Meta-analysis in education
• Some problems are unavoidable:– Aptitude x treatment interactions
– Sensitivity to instruction
– Selection of studies
• Some problems are avoidable:– Inappropriate comparisons
– File-drawer problems
– Intervention quality
– Variation in variability
• Unfortunately, many of the people doing meta-analysis in education:– don’t discuss the unavoidable problems, and
– don’t avoid the avoidable ones
INTERNAL
So what does this mean?
• Meta-analysis is hard to do well anywhere
• In education
– Meta-analysis is really hard to do well
– Meta-meta-analysis is impossible to do well
• Rejoinders
– The effects average out
– The rank order of effects is still OK
– There is no reason to suppose that these are the case
• Conclusion
– Meta-meta analysis is an unsound basis for determining the impact of any educational intervention on student achievement
INTERNAL
Learning from research
• Four questions we should ask of research
1. Does it solve a problem we have?
2. How much extra achievement will it yield?
3. How much will it cost?
4. Can we implement it here?
• Right now there are 2 “best bets”
– A knowledge-rich curriculum
– Greater use of classroom formative assessment
INTERNAL
35
Formative assessment
Span
Length
Impact
Long-cycle Medium-cycle Short-cycle
Across terms, teaching units
Four weeks toone year
Monitoring, curriculum alignment
Within and between lessons
Minute-by-minute and day-by-day
Engagement, responsiveness
Within and between
teaching units
One to four weeks
Student-involved
assessment
35
INTERNAL
36
Where the learner is going
Where the learneris now
How to get the learner there
Teacher
Peer
Student
Unpacking Formative Assessment
Clarifying, sharing, and
understanding learning
intentions
Eliciting evidence of learning
Providing feedback that
moves learners forward
Activating students as learningresources for one another
Activating students asowners of their own learning
36
INTERNAL
37
Where the learner is going
Where the learneris now
How to get the learner there
Teacher
Peer
Student
Unpacking Formative Assessment
Clarifying, sharing, and
understanding learning
intentions
Eliciting evidence of learning
Providing feedback that
moves learners forward
Activating students as learningresources for one another
Activating students asowners of their own learning
37
Responsive teaching
The learner’s role
Before you can begin
INTERNAL
Reasons not to do formative assessment
• Higher achievement isn’t needed
• These students lack the aptitude
• I don’t need to improve; I get great results
• It’s not relevant to my subject
• I don’t have time
• We have a syllabus to cover
• I’m doing it already
• Parents won’t like it
38
INTERNAL
Designing for scale
• “In-principle” scalability
– A single model for a whole school
– Formative assessment as both generic and domain-specific
• Understanding what it means to scale (Coburn, 2003)
– Depth
– Sustainability
– Spread
– Shift in reform ownership
• Consideration of the diversity of contexts of application
• Clarity about components, and the theory of action
INTERNAL
Using logic models to evaluate progress
What makes effective teacher learning?
INTERNAL
42
Collaboration and teacher quality
Kraft and Papay (2014)
INTERNAL
43
The knowing-doing gap (Pfeffer 2000)
StatementWe know we
should do thisWe are
doing this
Getting ideas from other units in the chain 4.9 4.0
Instituting an active suggestions program 4.8 3.9
Detailed assessment processes for new hires 5.0 4.2
Posting all jobs internally 4.2 3.5
Talking openly about learning from mistakes 4.9 4.3
Providing employees with frequent feedback 5.7 5.2
Sharing information on financial performance 4.3 3.8
INTERNAL
Knowing more than we can say
• Six video extracts of a person delivering cardiopulmonary resuscitation (CPR):
– Five of the video extracts feature students
– One of the video extracts feature an expert
• Videos shown to three groups:
– students, experts, instructors
• Success rate in identifying the expert:
– Experts 90%
– Students 50%
– Instructors 30%
Klein & Klein (1981)
INTERNAL
Why research hasn’t changed teaching
• The nature of expertise in teaching
• Aristotle’s main intellectual virtues
– Episteme: knowledge of universal truths
– Techne: ability to make things
– Phronesis: practical wisdom
• What works is not the right question
– Everything works somewhere
– Nothing works everywhere
– What’s interesting is “under what conditions” does this work?
• Teaching is mainly a matter of phronesis, not episteme
INTERNAL
Knowledge creation and conversion
Dialogue
Learning by doing
Socializationsympathised knowledge
Externalizationconceptual knowledge
Internalizationoperational knowledge
Combinationsystemic knowledge
Tacit knowledge Explicit knowledge
to
from
Tacit knowledge
Explicit knowledge
Sharing experience Networking
Nonaka and Takeuchi (1995)
So much for the easy bit
INTERNAL
48
Where the learner is going
Where the learneris now
How to get the learner there
Teacher
Peer
Student
Formative assessment for teaching
Clarifying, sharing, and
understanding learning
intentions and criteria for
success
Eliciting evidence of development
Providing feedback that
moves teachers forward
Activating teachers as learningresources for one another
Activating teachers asowners of their own learning
48
INTERNAL
Excuses
• Lack of time
• Lack of trust
• Unsexy
• Big problems require big solutions
• Lack of leadership
• Administrivia
• The knowing-doing gap
• The research evidence is unclear
• Culture of individual accountability
• Requires sophisticated systems of support
• Focus on knowing that, rather than knowing how
INTERNAL
A case study in one district
• Cannington– Urban school district serving ~20,000 students
– Approximately 20% of the population non-white
– No schools under threat of re-constitution, but all under pressure to improve test scores
• Funding for a project on “better learning through smarter teaching”– Focus on mathematics, science and modern foreign
languages (MFL)
– Commitment from Principals in November 2007
– Initial workshops in July 2008
INTERNAL
Progress of TLCs in Cannington
Math Science MFL
Ash 1 — 1 — 0 —
Cedar 5 ▮ 1 ▮ 3 ▮ ▮
Hawthorne 4 ▮ ▮ 10 ▮ ▮ 5 ▮ ▮ ▮ ▮
Hazel 7 — 12 — 2 —
Larch 1 ▮ ▮ ▮ ▮ 0 ▮ 0 ▮
Mallow 6 ▮ ▮ ▮ 7 ▮ 3 ▮ ▮
Poplar 11 ▮ 3 ▮ ▮ ▮ 1 ▮ ▮ ▮
Spruce 7 ▮ ▮ ▮ ▮ 8 ▮ ▮ ▮ 5 ▮ ▮ ▮
Willow 2 ▮ 5 ▮ 2 ▮ ▮ ▮ ▮
Totals 44 47 21
Black nos. show teachers attending launch event; blue bars show progress of TLC
INTERNAL
Progress of TLCs in Cannington
0
1
2
3
4
0 2 4 6 8 10 12
Pro
gre
ss o
f TL
C
Number of teachers attending training event
Correlation: 0.01
INTERNAL
Why every school should do pareto analysis
• Vilfredo Pareto (1848-1923)
– Economist, philosopher, and sociologist,associated with the 80:20 rule
• Pareto improvement
– A change that can make at least one person(e.g., a student) better off without makinganyone else (e.g., a teacher) worse off.
• Pareto efficiency/Pareto optimality
– An allocation (e.g., of resources) is Pareto efficient or Pareto optimal when there are no more Pareto improvements
INTERNAL
54
To find out more…
www.dylanwiliamcenter.comwww.dylanwiliam.net
INTERNAL
https://www.ssatuk.co.uk/cpd/teaching-and-learning/embedding-formative-assessment/
INTERNAL
Embedding Formative Assessment
Months additional progress on Attainment 8
Equivalent to a 25% increase in learning (D. Wiliam 2018)
INTERNAL
Introducing the Embedding
Formative Assessment Report
What was the impact?