Leadership of teacher learning - SSAT

Dylan Wiliam (@dylanwiliam)

Leadership of teacher learning

www.dylanwiliam.net

INTERNAL

Outline: Five questions

• Where should our efforts be focused?

• Where does formative assessment fit in?

• What makes effective teacher learning?

• What doesn’t get done?

• How will we know it’s working?

Evaluating teaching

INTERNAL

Do we know a good teacher when we see one?

• Experiment 1

– Seven teachers (3 high-performing, 4 not)• Group 1: at least 0.5 sd above mean value-added for 3 years

• Group 2: never 0.5 sd above average value-added in 3 years

– 7 video clips shown to 100 raters

– Average number of correct ratings: 2.8

Distribution of total correct ratings

0 1 2 3 4 5 6 7

1% 11% 29% 36% 13% 9% 1% 0%

Strong, Gargani, and Hacifazlioğlu (2011)

INTERNAL

Ratings by rater type

Rater Number Accuracy (%)

Teachers 10 37

Parents 7 37

Mentors 10 47

University professors 9 41

Administrators 10 31

Teacher educators 10 31

College students 11 36

Math educators 10 34

Other adults 11 43

Primary school students 12 50

Rater Number Accuracy (%)

Teachers 10 37

Parents 7 37

Mentors 10 47

University professors 9 41

School leaders/deputes 10 31

Teacher educators 10 31

College students 11 36

Math educators 10 34

Other adults 11 43

INTERNAL

What if the difference is larger?

• Experiment 2

– Two groups of teachers (4 teachers in each group)• Group 1: at least 0.5 sd above average value-added

• Group 2: at least 0.5 sd below average value-added

– 8 video clips shown to 165 experienced school leaders

– Average number of correct ratings: 3.85

Distribution of total correct ratings

0 1 2 3 4 5 6 7 8

1% 3% 11% 25% 25% 24% 9% 1% 0%

Can we identify good teachers after training?

INTERNAL

Framework for teaching (Danielson 1996)

• Four domains of professional practice

– Planning and preparation

– Classroom environment

– Instruction

– Professional responsibilities

• Links with student achievement (Sartain, et al. 2011)

– Domains 1 and 4: no impact on student achievement

– Domains 2 and 3: some impact on student achievement

INTERNAL

Observations and teacher quality

-15

-10

-5

0

5

10

15

20

Unsatisfactory Basic Proficient DistinguishedPe

rce

nta

ge c

han

ge in

rat

e

of

lear

nin

g

Reading Mathematics

Sartain, Stoelinga, Brown, Luppescu, Matsko, Miller, Durwood, Jiang, and Glazer (2011)

So, the highest rated teachers are 30% more productive than the lowest rated

But the best teachers are 400% more productive than the least effective

INTERNAL

Imprecision in lesson observations

Hill, Charalambous and Kraft (2012)

Achieving a reliability of 0.9 in judging teacher quality through lesson observation is likely to require observing a teacher teaching 6 different classes, and for each lesson to be judged by 5 independent observers.

INTERNAL

Bias in lesson observations

• A study of 834 teachers from six large US school districts found that teachers were more likely to be given a higher observation rating if they were teaching students with higher achievement.

INTERNAL


Steinberg and Garrett (2016)

INTERNAL


• A study of 834 teachers from six large US school districts found that teachers were more likely to be given a higher observation rating if they were teaching students with higher achievement.

• Compared with teachers teaching the lowest achieving students (bottom 20%), those teaching the highest achieving students (top 20%) were:

– 2.5 times as likely to be top-rated in English

– 6 times as likely to be top-rated in mathematics

Can we identify good teachers from test scores?

14

INTERNAL

Short-term and long-term effects

• Data: 10,534 students attending USAFA (2000-2007)

• Students randomly allocated to calculus instructors

15

Carrell and West (2010)

Instructors

Less qualified, less experienced More qualified, more experienced

Higher end of course scores Lower end of course scores

Lower scores on follow-on courses Higher scores on follow-on courses

Higher end of course evaluations Lower end of course evaluations

Instructors



Lower scores on follow-on courses Higher scores on follow-on courses

Instructors



Instructors


Instructors

Can we identify good teachers by combining evidence from different sources?

INTERNAL

Measures of Effective Teaching project17

• Three sources of evidence on teacher effectiveness

– Value-added estimates

– Classroom observation

– Student perception surveys

• Prediction accuracy maximised with these weights

– Value-added estimates 81%

– Classroom observation 17%

– Student perception surveys 2%

Bill and Melinda Gates Foundation (2012)

INTERNAL

For secondary English teachers (S1 to S3)

Correlation with standardized test score gains 0.69

Correlation with higher-order assessments 0.29

Reliability 0.51

18

To get a 90% reliable prediction of a teacher’s quality, you would need to collect data for each teacher for 9 years

INTERNAL

This is what a correlation of 0.69 looks like…19

Actual

Pre

dic

ted

INTERNAL

…and this is a correlation of 0.29…20

Actual

Pre

dic

ted

What is the impact of removing less effective teachers?

21

INTERNAL

Effects of removing low-performing teachers

• Data: reading scores for 4th and 5th grade students in Florida’s public schools from 2004-05 to 2008-09

• A total of 227,014 students (96%) are matched to 15,152 teachers responsible for teaching reading

• A value-added score is estimated for each teacher each year

• Two policy options explored for teacher removal:

– Value-added score below threshold for two consecutive years

– Two-year average value-added score below threshold

Winters and Cowen (2013)

22

INTERNAL

System-wide impact

Policy Severity(percentile)

Increase in teacher valued-added

Extra weeks of learning per student

per year

Consecutive

5th .003 0.0

10th .006 0.1

25th .020 0.3

Two-year average

5th .020 0.3

10th .031 0.4

25th .050 0.7

23

INTERNAL

What does this all mean?

• The only way to improve student achievement at scale is to invest in the teachers we already have

• The “love the one you’re with” strategy

24

INTERNAL

Evaluation vs. improvement

• Evaluation frameworks:– of necessity, have to be comprehensive

– include all aspects of teachers work

– at best, incentivize improvement on all aspects of practice

– at worst, incentivize improvement on aspects of practice that are easy to improve

• Improvement frameworks:– are selective

– focus on those aspects of practice with the biggest payoff for students

• To maximize improvement, evaluation frameworks have to be used selectively

The ‘next big thing’

INTERNAL

Things that don’t work

• Getting smarter people into teaching

• Paying good teachers more

• Brain Gym®

• Learning styles

• Copying other countries

INTERNAL

Things that might work

• Differentiation

• Lesson study/Learning study

• Social and emotional aspects of learning

• Educational neuroscience

• Grit

INTERNAL

Things that do work—a bit

• Firing bad teachers

• Class size reduction

• Growth mindset

There is no ‘next big thing’

Just lots of small, mostly old, things

INTERNAL

Understanding meta-analysis

• A technique for aggregating results from different studies by expressing results with a common measure

• Problems with meta-analysis

– Inappropriate comparisons

– Aptitude x treatment interaction

– The “file drawer” problem

– Variations in intervention quality

– Selection of studies

• Problems with effect sizes

– Variation in population variability

– Sensitivity of outcome measures

INTERNAL

Meta-analysis in education

• Some problems are unavoidable:– Aptitude x treatment interactions

– Sensitivity to instruction

– Selection of studies

• Some problems are avoidable:– Inappropriate comparisons

– File-drawer problems

– Intervention quality

– Variation in variability

• Unfortunately, many of the people doing meta-analysis in education:– don’t discuss the unavoidable problems, and

– don’t avoid the avoidable ones

INTERNAL

So what does this mean?

• Meta-analysis is hard to do well anywhere

• In education

– Meta-analysis is really hard to do well

– Meta-meta-analysis is impossible to do well

• Rejoinders

– The effects average out

– The rank order of effects is still OK

– There is no reason to suppose that these are the case

• Conclusion

– Meta-meta analysis is an unsound basis for determining the impact of any educational intervention on student achievement

INTERNAL

Learning from research

• Four questions we should ask of research

1. Does it solve a problem we have?

2. How much extra achievement will it yield?

3. How much will it cost?

4. Can we implement it here?

• Right now there are 2 “best bets”

– A knowledge-rich curriculum

– Greater use of classroom formative assessment

INTERNAL

35

Formative assessment

Span

Length

Impact

Long-cycle Medium-cycle Short-cycle

Across terms, teaching units

Four weeks toone year

Monitoring, curriculum alignment

Within and between lessons

Minute-by-minute and day-by-day

Engagement, responsiveness

Within and between

teaching units

One to four weeks

Student-involved

assessment

35

INTERNAL

36

Where the learner is going

Where the learneris now

How to get the learner there

Teacher

Peer

Student

Unpacking Formative Assessment

Clarifying, sharing, and

understanding learning

intentions

Eliciting evidence of learning

Providing feedback that

moves learners forward

Activating students as learningresources for one another

Activating students asowners of their own learning

36

INTERNAL

37




Teacher

Peer

Student

Unpacking Formative Assessment



intentions

Eliciting evidence of learning


moves learners forward

Activating students as learningresources for one another

Activating students asowners of their own learning

37

Responsive teaching

The learner’s role

Before you can begin

INTERNAL

Reasons not to do formative assessment

• Higher achievement isn’t needed

• These students lack the aptitude

• I don’t need to improve; I get great results

• It’s not relevant to my subject

• I don’t have time

• We have a syllabus to cover

• I’m doing it already

• Parents won’t like it

38

INTERNAL

Designing for scale

• “In-principle” scalability

– A single model for a whole school

– Formative assessment as both generic and domain-specific

• Understanding what it means to scale (Coburn, 2003)

– Depth

– Sustainability

– Spread

– Shift in reform ownership

• Consideration of the diversity of contexts of application

• Clarity about components, and the theory of action

INTERNAL

Using logic models to evaluate progress

What makes effective teacher learning?

INTERNAL

42

Collaboration and teacher quality

Kraft and Papay (2014)

INTERNAL

43

The knowing-doing gap (Pfeffer 2000)

StatementWe know we

should do thisWe are

doing this

Getting ideas from other units in the chain 4.9 4.0

Instituting an active suggestions program 4.8 3.9

Detailed assessment processes for new hires 5.0 4.2

Posting all jobs internally 4.2 3.5

Talking openly about learning from mistakes 4.9 4.3

Providing employees with frequent feedback 5.7 5.2

Sharing information on financial performance 4.3 3.8

INTERNAL

Knowing more than we can say

• Six video extracts of a person delivering cardiopulmonary resuscitation (CPR):

– Five of the video extracts feature students

– One of the video extracts feature an expert

• Videos shown to three groups:

– students, experts, instructors

• Success rate in identifying the expert:

– Experts 90%

– Students 50%

– Instructors 30%

Klein & Klein (1981)

INTERNAL

Why research hasn’t changed teaching

• The nature of expertise in teaching

• Aristotle’s main intellectual virtues

– Episteme: knowledge of universal truths

– Techne: ability to make things

– Phronesis: practical wisdom

• What works is not the right question

– Everything works somewhere

– Nothing works everywhere

– What’s interesting is “under what conditions” does this work?

• Teaching is mainly a matter of phronesis, not episteme

INTERNAL

Knowledge creation and conversion

Dialogue

Learning by doing

Socializationsympathised knowledge

Externalizationconceptual knowledge

Internalizationoperational knowledge

Combinationsystemic knowledge

Tacit knowledge Explicit knowledge

to

from

Tacit knowledge

Explicit knowledge

Sharing experience Networking

Nonaka and Takeuchi (1995)

So much for the easy bit

INTERNAL

48




Teacher

Peer

Student

Formative assessment for teaching



intentions and criteria for

success

Eliciting evidence of development


moves teachers forward

Activating teachers as learningresources for one another

Activating teachers asowners of their own learning

48

INTERNAL

Excuses

• Lack of time

• Lack of trust

• Unsexy

• Big problems require big solutions

• Lack of leadership

• Administrivia

• The knowing-doing gap

• The research evidence is unclear

• Culture of individual accountability

• Requires sophisticated systems of support

• Focus on knowing that, rather than knowing how

INTERNAL

A case study in one district

• Cannington– Urban school district serving ~20,000 students

– Approximately 20% of the population non-white

– No schools under threat of re-constitution, but all under pressure to improve test scores

• Funding for a project on “better learning through smarter teaching”– Focus on mathematics, science and modern foreign

languages (MFL)

– Commitment from Principals in November 2007

– Initial workshops in July 2008

INTERNAL

Progress of TLCs in Cannington

Math Science MFL

Ash 1 — 1 — 0 —

Cedar 5 ▮ 1 ▮ 3 ▮ ▮

Hawthorne 4 ▮ ▮ 10 ▮ ▮ 5 ▮ ▮ ▮ ▮

Hazel 7 — 12 — 2 —

Larch 1 ▮ ▮ ▮ ▮ 0 ▮ 0 ▮

Mallow 6 ▮ ▮ ▮ 7 ▮ 3 ▮ ▮

Poplar 11 ▮ 3 ▮ ▮ ▮ 1 ▮ ▮ ▮

Spruce 7 ▮ ▮ ▮ ▮ 8 ▮ ▮ ▮ 5 ▮ ▮ ▮

Willow 2 ▮ 5 ▮ 2 ▮ ▮ ▮ ▮

Totals 44 47 21

Black nos. show teachers attending launch event; blue bars show progress of TLC

INTERNAL

Progress of TLCs in Cannington

0

1

2

3

4

0 2 4 6 8 10 12

Pro

gre

ss o

f TL

C

Number of teachers attending training event

Correlation: 0.01

INTERNAL

Why every school should do pareto analysis

• Vilfredo Pareto (1848-1923)

– Economist, philosopher, and sociologist,associated with the 80:20 rule

• Pareto improvement

– A change that can make at least one person(e.g., a student) better off without makinganyone else (e.g., a teacher) worse off.

• Pareto efficiency/Pareto optimality

– An allocation (e.g., of resources) is Pareto efficient or Pareto optimal when there are no more Pareto improvements

INTERNAL

54

To find out more…

www.dylanwiliamcenter.comwww.dylanwiliam.net

http://www.dylanwiliamcenter.com/

http://www.dylanwiliam.net/

INTERNAL

https://www.ssatuk.co.uk/cpd/teaching-and-learning/embedding-formative-assessment/

https://www.ssatuk.co.uk/cpd/teaching-and-learning/embedding-formative-assessment/

INTERNAL

Embedding Formative Assessment

Months additional progress on Attainment 8

Equivalent to a 25% increase in learning (D. Wiliam 2018)

INTERNAL

Introducing the Embedding

Formative Assessment Report

What was the impact?

https://webcontent.ssatuk.co.uk/wp-content/uploads/2017/09/15085809/SSAT-Embedding-Formative-Assessment-Report.pdf

Leadership of teacher learning - SSAT

Documents

Transcript of Leadership of teacher learning - SSAT