Using Student Test Scores to Distinguish Good Teachers from Bad Edward Haertel School of Education...
-
Upload
isabella-jarratt -
Category
Documents
-
view
217 -
download
2
Transcript of Using Student Test Scores to Distinguish Good Teachers from Bad Edward Haertel School of Education...
![Page 1: Using Student Test Scores to Distinguish Good Teachers from Bad Edward Haertel School of Education Stanford University AERA Presidential Session Measuring.](https://reader033.fdocuments.net/reader033/viewer/2022051819/551a944855034643688b5ea8/html5/thumbnails/1.jpg)
Using Student Test Scores to Distinguish Good Teachers from Bad
Edward HaertelSchool of EducationStanford University
AERA Presidential SessionMeasuring and Developing Teacher Effectiveness: An Assessment of Research, Policy, and Practice
New Orleans, Louisiana
April 10, 2011
1
![Page 2: Using Student Test Scores to Distinguish Good Teachers from Bad Edward Haertel School of Education Stanford University AERA Presidential Session Measuring.](https://reader033.fdocuments.net/reader033/viewer/2022051819/551a944855034643688b5ea8/html5/thumbnails/2.jpg)
FRAMING THE PROBLEMPolicy makers wish to measure
teachers' effectivenessThis goal is pursued using
complex statistics to model student test scores
2
![Page 3: Using Student Test Scores to Distinguish Good Teachers from Bad Edward Haertel School of Education Stanford University AERA Presidential Session Measuring.](https://reader033.fdocuments.net/reader033/viewer/2022051819/551a944855034643688b5ea8/html5/thumbnails/3.jpg)
Complicating FactorsStart-of-year student achievement varies due to◦Home background and community context
◦Individual interests and aptitudes
◦Peer culture◦Prior schooling
3
![Page 4: Using Student Test Scores to Distinguish Good Teachers from Bad Edward Haertel School of Education Stanford University AERA Presidential Session Measuring.](https://reader033.fdocuments.net/reader033/viewer/2022051819/551a944855034643688b5ea8/html5/thumbnails/4.jpg)
Complicating FactorsEnd-of-year student achievement varies due to◦Start-of-year differences◦Continuing effects of out-of-school factors, peers, and individual aptitudes and interests
◦Instructional effectiveness
4
![Page 5: Using Student Test Scores to Distinguish Good Teachers from Bad Edward Haertel School of Education Stanford University AERA Presidential Session Measuring.](https://reader033.fdocuments.net/reader033/viewer/2022051819/551a944855034643688b5ea8/html5/thumbnails/5.jpg)
Complicating FactorsInstructional effectiveness reflects◦District and state policies◦School policies and climate◦Available instructional materials and resources
◦Student attendance◦The teacher
5
![Page 6: Using Student Test Scores to Distinguish Good Teachers from Bad Edward Haertel School of Education Stanford University AERA Presidential Session Measuring.](https://reader033.fdocuments.net/reader033/viewer/2022051819/551a944855034643688b5ea8/html5/thumbnails/6.jpg)
Two Simplified AssumptionsTeaching matters, and some teachers teach better than others
There is a stable construct we may refer to as a teacher’s “effectiveness”
6
Simplified to
![Page 7: Using Student Test Scores to Distinguish Good Teachers from Bad Edward Haertel School of Education Stanford University AERA Presidential Session Measuring.](https://reader033.fdocuments.net/reader033/viewer/2022051819/551a944855034643688b5ea8/html5/thumbnails/7.jpg)
Two Simplified AssumptionsStudent achievement is a central goal of schooling
Valid tests can measure achievement
Achievement is a one-dimensional continuum
Brief, inexpensive achievement tests locate students on that continuum
7
Simplified to
![Page 8: Using Student Test Scores to Distinguish Good Teachers from Bad Edward Haertel School of Education Stanford University AERA Presidential Session Measuring.](https://reader033.fdocuments.net/reader033/viewer/2022051819/551a944855034643688b5ea8/html5/thumbnails/8.jpg)
8
More later on these simplified assumptions.
Let us press ahead.
![Page 9: Using Student Test Scores to Distinguish Good Teachers from Bad Edward Haertel School of Education Stanford University AERA Presidential Session Measuring.](https://reader033.fdocuments.net/reader033/viewer/2022051819/551a944855034643688b5ea8/html5/thumbnails/9.jpg)
Logic of the Statistical ModelWhat is a “Teacher Effect”?
◦Student growth (change in test score) attributable to the teacher
◦I.e., caused by the teacher
9
![Page 10: Using Student Test Scores to Distinguish Good Teachers from Bad Edward Haertel School of Education Stanford University AERA Presidential Session Measuring.](https://reader033.fdocuments.net/reader033/viewer/2022051819/551a944855034643688b5ea8/html5/thumbnails/10.jpg)
Logic of the Statistical Model
10
Teacher Effect
on One Student
Student’sObser
ved Score
Student’s
Predicted Score
= —
“Predicted Score” is Counterfactual – an estimate of what would have been observed with a hypothetical average teacher, all else being equal
These (student-level) “Teacher Effects” are averaged up to the classroom level to obtain an overall score for the teacher.
![Page 11: Using Student Test Scores to Distinguish Good Teachers from Bad Edward Haertel School of Education Stanford University AERA Presidential Session Measuring.](https://reader033.fdocuments.net/reader033/viewer/2022051819/551a944855034643688b5ea8/html5/thumbnails/11.jpg)
In or out?District leadershipSchool norms, academic pressQuality of school instructional staffEarly childhood history; medical
historyQuality of schooling in prior yearsParent involvementAssignment of pupils (to schools, to
classes)Peer cultureStudents’ school attendance histories…
11
![Page 12: Using Student Test Scores to Distinguish Good Teachers from Bad Edward Haertel School of Education Stanford University AERA Presidential Session Measuring.](https://reader033.fdocuments.net/reader033/viewer/2022051819/551a944855034643688b5ea8/html5/thumbnails/12.jpg)
Controlling for prior-year score is not sufficient
First problem—Measurement Error:prior-year achievement is imperfectly measured
Second problem—Omitted variables:models with additional variables predict different prior-year true scores as a function of◦additional test scores◦demographic / out-of-school factors
12
![Page 13: Using Student Test Scores to Distinguish Good Teachers from Bad Edward Haertel School of Education Stanford University AERA Presidential Session Measuring.](https://reader033.fdocuments.net/reader033/viewer/2022051819/551a944855034643688b5ea8/html5/thumbnails/13.jpg)
Controlling for prior-year score is not sufficient
Third problem—Different trajectories:students with identical prior-year true scores have different expected growth depending on◦individual aptitudes◦out-of-school supports for learning◦prior instructional histories
13
![Page 14: Using Student Test Scores to Distinguish Good Teachers from Bad Edward Haertel School of Education Stanford University AERA Presidential Session Measuring.](https://reader033.fdocuments.net/reader033/viewer/2022051819/551a944855034643688b5ea8/html5/thumbnails/14.jpg)
A small digression:Student Growth PercentilesConstruction
◦Each student’s SGP score is the percentile rank of that student’s current-year score within the distribution for students with the same prior-year score
14
Prior year score
Curr
ent
year
score
![Page 15: Using Student Test Scores to Distinguish Good Teachers from Bad Edward Haertel School of Education Stanford University AERA Presidential Session Measuring.](https://reader033.fdocuments.net/reader033/viewer/2022051819/551a944855034643688b5ea8/html5/thumbnails/15.jpg)
Student Growth PercentilesInterpretation
◦How much this student has grown relative to others who began at the “same” (prior-year) starting point
Advantages◦Invariant under monotone
transformations of score scale◦Directs attention to distribution of
outcomes, versus point estimate
15
![Page 16: Using Student Test Scores to Distinguish Good Teachers from Bad Edward Haertel School of Education Stanford University AERA Presidential Session Measuring.](https://reader033.fdocuments.net/reader033/viewer/2022051819/551a944855034643688b5ea8/html5/thumbnails/16.jpg)
Is anything really new here?
16
Thanks to Andrew Ho and Katherine Furgol for this graphic
![Page 17: Using Student Test Scores to Distinguish Good Teachers from Bad Edward Haertel School of Education Stanford University AERA Presidential Session Measuring.](https://reader033.fdocuments.net/reader033/viewer/2022051819/551a944855034643688b5ea8/html5/thumbnails/17.jpg)
EXAMINING THE EVIDENCE
Stability of “effectiveness” estimates◦ That first “simplified assumption”
Problems with the tests◦ That second “simplified assumption”
Random assignment?Professional consensus
17
![Page 18: Using Student Test Scores to Distinguish Good Teachers from Bad Edward Haertel School of Education Stanford University AERA Presidential Session Measuring.](https://reader033.fdocuments.net/reader033/viewer/2022051819/551a944855034643688b5ea8/html5/thumbnails/18.jpg)
EXAMINING THE EVIDENCE
Stability of “effectiveness” estimates◦ That first “simplified assumption”
Problems with the tests◦ That second “simplified assumption”
Random assignment?Professional consensus
18
![Page 19: Using Student Test Scores to Distinguish Good Teachers from Bad Edward Haertel School of Education Stanford University AERA Presidential Session Measuring.](https://reader033.fdocuments.net/reader033/viewer/2022051819/551a944855034643688b5ea8/html5/thumbnails/19.jpg)
Stability of “Effectiveness” Estimates
Newton, Darling-Hammond, Haertel, & Thomas (2010) compared high school math and ELA teachers’ VAM scores across◦Statistical models◦Courses taught◦Years
19
Full report at http://epaa.asu.edu/ojs/article/view/810
![Page 20: Using Student Test Scores to Distinguish Good Teachers from Bad Edward Haertel School of Education Stanford University AERA Presidential Session Measuring.](https://reader033.fdocuments.net/reader033/viewer/2022051819/551a944855034643688b5ea8/html5/thumbnails/20.jpg)
Sample* for Math and ELA VAM AnalysesAcademic Year 2005-06 2006-07
Math teachers 57 46
ELA teachers 51 63
Students Grade 9Grade 10Grade 11
646714511
881693789
*Sample included all teachers who taught multiple courses. Ns in table are for teachers x courses. There were 13 math teachers for 2005-06 and 10 for 2006-07. There were 16 ELA teachers for 2005-06 and 15 for 2006-07.
Findings from Newton, et al.
![Page 21: Using Student Test Scores to Distinguish Good Teachers from Bad Edward Haertel School of Education Stanford University AERA Presidential Session Measuring.](https://reader033.fdocuments.net/reader033/viewer/2022051819/551a944855034643688b5ea8/html5/thumbnails/21.jpg)
21
% of Teachers Whose Effectiveness Ratings Change …
By at least 1 decile
By at least 2 deciles
By at least 3 deciles
Across models* 56-80% 12-33% 0-14%
Across courses* 85-100% 54-92% 39-54%
Across years* 74-93% 45-63% 19-41%
*Depending on the model
![Page 22: Using Student Test Scores to Distinguish Good Teachers from Bad Edward Haertel School of Education Stanford University AERA Presidential Session Measuring.](https://reader033.fdocuments.net/reader033/viewer/2022051819/551a944855034643688b5ea8/html5/thumbnails/22.jpg)
22
One Extreme Case: An English language arts teacher
02468
10
Decile Rank Y1 Decile Rank Y2
0
20
40
60
80
% ELL % Low-income
%Hispanic
Y1
Y2
Comprehensive high school
Not a beginning teacher
WhiteTeaching English IEstimates control
for:
◦ Prior achievement
◦ Demographics
◦ School fixed effect
![Page 23: Using Student Test Scores to Distinguish Good Teachers from Bad Edward Haertel School of Education Stanford University AERA Presidential Session Measuring.](https://reader033.fdocuments.net/reader033/viewer/2022051819/551a944855034643688b5ea8/html5/thumbnails/23.jpg)
EXAMINING THE EVIDENCE
Stability of “effectiveness” estimates◦ That first “simplified assumption”
Problems with the tests◦ That second “simplified assumption”
Random assignment?Professional consensus
23
![Page 24: Using Student Test Scores to Distinguish Good Teachers from Bad Edward Haertel School of Education Stanford University AERA Presidential Session Measuring.](https://reader033.fdocuments.net/reader033/viewer/2022051819/551a944855034643688b5ea8/html5/thumbnails/24.jpg)
11th Grade History/Social Studies
US11.11.2.Discuss the significant domestic policy speeches of Truman, Eisenhower, Kennedy, Johnson, Nixon, Carter, Reagan, Bush, and Clinton (e.g., education, civil rights, economic policy, environmental policy).
![Page 25: Using Student Test Scores to Distinguish Good Teachers from Bad Edward Haertel School of Education Stanford University AERA Presidential Session Measuring.](https://reader033.fdocuments.net/reader033/viewer/2022051819/551a944855034643688b5ea8/html5/thumbnails/25.jpg)
Item Testing US11.11.2
![Page 26: Using Student Test Scores to Distinguish Good Teachers from Bad Edward Haertel School of Education Stanford University AERA Presidential Session Measuring.](https://reader033.fdocuments.net/reader033/viewer/2022051819/551a944855034643688b5ea8/html5/thumbnails/26.jpg)
9th Grade English-Language Arts
9RC2.8Expository Critique: Evaluate the credibility of an author’s argument or defense of a claim by critiquing the relationship between generalizations and evidence, the comprehensiveness of evidence, and the way in which the author’s intent affects the structure and tone of the text (e.g., in professional journals, editorials, political speeches, primary source material).
![Page 27: Using Student Test Scores to Distinguish Good Teachers from Bad Edward Haertel School of Education Stanford University AERA Presidential Session Measuring.](https://reader033.fdocuments.net/reader033/viewer/2022051819/551a944855034643688b5ea8/html5/thumbnails/27.jpg)
Item Testing 9RC2.8
![Page 28: Using Student Test Scores to Distinguish Good Teachers from Bad Edward Haertel School of Education Stanford University AERA Presidential Session Measuring.](https://reader033.fdocuments.net/reader033/viewer/2022051819/551a944855034643688b5ea8/html5/thumbnails/28.jpg)
Algebra I
25.1Students use properties of numbers to construct simple, valid arguments (direct and indirect) for, or formulate counterexamples to, claimed assertions.
![Page 29: Using Student Test Scores to Distinguish Good Teachers from Bad Edward Haertel School of Education Stanford University AERA Presidential Session Measuring.](https://reader033.fdocuments.net/reader033/viewer/2022051819/551a944855034643688b5ea8/html5/thumbnails/29.jpg)
Item Testing 25.1
![Page 30: Using Student Test Scores to Distinguish Good Teachers from Bad Edward Haertel School of Education Stanford University AERA Presidential Session Measuring.](https://reader033.fdocuments.net/reader033/viewer/2022051819/551a944855034643688b5ea8/html5/thumbnails/30.jpg)
High School Biology
BI6.fStudents know at each link in a food web some energy is stored in newly made structures but much energy is dissipated into the environment as heat. This dissipation may be represented in an energy pyramid.
![Page 31: Using Student Test Scores to Distinguish Good Teachers from Bad Edward Haertel School of Education Stanford University AERA Presidential Session Measuring.](https://reader033.fdocuments.net/reader033/viewer/2022051819/551a944855034643688b5ea8/html5/thumbnails/31.jpg)
Item Testing BI6.f
![Page 32: Using Student Test Scores to Distinguish Good Teachers from Bad Edward Haertel School of Education Stanford University AERA Presidential Session Measuring.](https://reader033.fdocuments.net/reader033/viewer/2022051819/551a944855034643688b5ea8/html5/thumbnails/32.jpg)
EXAMINING THE EVIDENCE
Stability of “effectiveness” estimates◦ That first “simplified assumption”
Problems with the tests◦ That second “simplified assumption”
Random assignment?Professional consensus
32
![Page 33: Using Student Test Scores to Distinguish Good Teachers from Bad Edward Haertel School of Education Stanford University AERA Presidential Session Measuring.](https://reader033.fdocuments.net/reader033/viewer/2022051819/551a944855034643688b5ea8/html5/thumbnails/33.jpg)
Student Assignments Affected ByTeachers’ particular specialtiesChildren’s particular
requirementsParents’ requestsPrincipals' judgmentsNeed to separate children who do
not get along
33
![Page 34: Using Student Test Scores to Distinguish Good Teachers from Bad Edward Haertel School of Education Stanford University AERA Presidential Session Measuring.](https://reader033.fdocuments.net/reader033/viewer/2022051819/551a944855034643688b5ea8/html5/thumbnails/34.jpg)
Teacher Assignments Affected ByDifferential salaries / working
conditionsSeniority / experienceMatch to school’s culture and
practicesResidential preferencesTeachers’ particular specialtiesChildren’s particular
requirements34
![Page 35: Using Student Test Scores to Distinguish Good Teachers from Bad Edward Haertel School of Education Stanford University AERA Presidential Session Measuring.](https://reader033.fdocuments.net/reader033/viewer/2022051819/551a944855034643688b5ea8/html5/thumbnails/35.jpg)
Does Non-Random Assignment Matter? A falsification test
Logically, future teachers cannot influence past achievement
Thus, if a model predicts significant effects of current-year teachers on prior-year test scores, then it is flawed or based on flawed assumptions
35
![Page 36: Using Student Test Scores to Distinguish Good Teachers from Bad Edward Haertel School of Education Stanford University AERA Presidential Session Measuring.](https://reader033.fdocuments.net/reader033/viewer/2022051819/551a944855034643688b5ea8/html5/thumbnails/36.jpg)
Falsification Test FindingsRothstein (2010) examined three
VAM specifications using a large data set and found “large ‘effects’ of fifth grade teachers on fourth grade test score gains.”
36
![Page 37: Using Student Test Scores to Distinguish Good Teachers from Bad Edward Haertel School of Education Stanford University AERA Presidential Session Measuring.](https://reader033.fdocuments.net/reader033/viewer/2022051819/551a944855034643688b5ea8/html5/thumbnails/37.jpg)
Falsification Test FindingsBriggs & Domingue (2011)
applied Rothstein’s test to LAUSD teacher data analyzed by Richard Buddin for the LA Times◦For Reading, ‘effects’ from next
year’s teachers were about the same as from this year’s teachers
◦For Math, ‘effects’ from next year’s teachers were about 2/3 to 3/4 as large as from this year’s teachers
37
![Page 38: Using Student Test Scores to Distinguish Good Teachers from Bad Edward Haertel School of Education Stanford University AERA Presidential Session Measuring.](https://reader033.fdocuments.net/reader033/viewer/2022051819/551a944855034643688b5ea8/html5/thumbnails/38.jpg)
EXAMINING THE EVIDENCE
Stability of “effectiveness” estimates◦ That first “simplified assumption”
Problems with the tests◦ That second “simplified assumption”
Random assignment?Professional consensus
38
![Page 39: Using Student Test Scores to Distinguish Good Teachers from Bad Edward Haertel School of Education Stanford University AERA Presidential Session Measuring.](https://reader033.fdocuments.net/reader033/viewer/2022051819/551a944855034643688b5ea8/html5/thumbnails/39.jpg)
Professional Consensus
We do not think that their analyses are estimating causal quantities, except under extreme and unrealistic assumptions. – Donald Rubin
39
![Page 40: Using Student Test Scores to Distinguish Good Teachers from Bad Edward Haertel School of Education Stanford University AERA Presidential Session Measuring.](https://reader033.fdocuments.net/reader033/viewer/2022051819/551a944855034643688b5ea8/html5/thumbnails/40.jpg)
Professional Consensus
The research base is currently insufficient to support the use of VAM for high-stakes decisions about individual teachers or schools. – Researchers from RAND Corp.
40
![Page 41: Using Student Test Scores to Distinguish Good Teachers from Bad Edward Haertel School of Education Stanford University AERA Presidential Session Measuring.](https://reader033.fdocuments.net/reader033/viewer/2022051819/551a944855034643688b5ea8/html5/thumbnails/41.jpg)
Professional Consensus
VAM estimates of teacher effectiveness that are based on data for a single class of students should not used to make operational decisions because such estimates are far too unstable to be considered fair or reliable.– 2009 Letter Report from the
Board on Testing and Assessment,
National Research Council 41
![Page 42: Using Student Test Scores to Distinguish Good Teachers from Bad Edward Haertel School of Education Stanford University AERA Presidential Session Measuring.](https://reader033.fdocuments.net/reader033/viewer/2022051819/551a944855034643688b5ea8/html5/thumbnails/42.jpg)
UNINTENDED EFFECTSNarrowing of curriculum and
instruction◦What doesn’t get tested doesn’t get
taughtInstructional focus on students
expected to make the largest or most rapid gains◦Student winners and losers will
depend on details of the model usedErosion of teacher collegial
support and cooperation 42
![Page 43: Using Student Test Scores to Distinguish Good Teachers from Bad Edward Haertel School of Education Stanford University AERA Presidential Session Measuring.](https://reader033.fdocuments.net/reader033/viewer/2022051819/551a944855034643688b5ea8/html5/thumbnails/43.jpg)
VALID AND INVALID USESVALID
◦Low-stakes◦Aggregate-level interpretations◦Background factors as similar as
possible across groups comparedINVALID
◦High-stakes, individual-level decisions, comparisons across highly dissimilar schools or student populations
43
![Page 44: Using Student Test Scores to Distinguish Good Teachers from Bad Edward Haertel School of Education Stanford University AERA Presidential Session Measuring.](https://reader033.fdocuments.net/reader033/viewer/2022051819/551a944855034643688b5ea8/html5/thumbnails/44.jpg)
UNINTENDED EFFECTS
44
“The most pernicious effect of these [test-based accountability] systems is to cause teachers to resent the children who don’t score well.”
—Anonymous teacher, in a workshop many years ago
![Page 45: Using Student Test Scores to Distinguish Good Teachers from Bad Edward Haertel School of Education Stanford University AERA Presidential Session Measuring.](https://reader033.fdocuments.net/reader033/viewer/2022051819/551a944855034643688b5ea8/html5/thumbnails/45.jpg)
45
Thank you