The Effect of Teaching Method on Academic Achievement and ...
The Effect of Testing on Student Achievement: 1910-2010
-
Upload
richardpphelps -
Category
Documents
-
view
44 -
download
0
description
Transcript of The Effect of Testing on Student Achievement: 1910-2010
© 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012
1
The effect of testing on student achievement: 1910-2010
Richard P. PHELPS
© 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012
2
Meta-analysis
• A method for summarizing a large research literature, with a single, comparable measure.
© 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012
3
The effect of testing on student achievement
• 12-year long study
• analyzed close to 700 separate studies, and more than 1,600 separate effects
• 2,000 other studies were reviewed and found incomplete or inappropriate
• lacking sufficient time and money, hundreds of other studies will not be reviewed
© 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012
4
Looking for studies to include in the meta-analyses
1. Included only those studies that found an effect from testing on student achievement or on teacher instruction…
© 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012
5
Studies included in the meta-analyses
2. …when:
• a test is newly introduced, or newly removed• quantity of testing is increased or reduced• test stakes are introduced or increased, or removed or
reduced
© 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012
6
Studies included in the meta-analyses
3. …plus previous research summaries (e.g.)
• Kulik, Kulik, Bangert-Drowns, & Schwalb (1983-1991) on:
– mastery testing, – frequency of testing, and – programs for high-risk university
students• Basol & Johanson (2009) on testing
frequency • Jaekyung Lee (2007) on cross-state
studies • W.J. Haynie (2007) in career-tech ed
© 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012
7
Number of studies of effects, by methodology type
Methodology typeNumber of
studiesNumber of
effects
Quantitative 177 640
Surveys and public opinion polls (US & Canada)
247 813
Qualitative 245 245
TOTAL 669 1698
© 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012
8
Effect size: Cohen’s d
d = (YE - YC) / Spool
YE = mean, experimental group
YC = mean, control group
Spooled = standard deviation
© 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012
9
Effect size: Other formulae
d = t*((n1+n2/n1*n2)^0.5
d = 2r/(1-r²)^0.5
d = (YE pre-YE post-YC pre+YC post)/Spooled post
© 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012
10
Effect size: Interpretation
• d between 0.25 & 0.50 weak effect
• d between 0.50 et 0.75 medium effect
• d more than 0.75 strong effect
© 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012
11
Quantitative studies
(population coverage ≈ 7 million persons)
© 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012
12
Quantitative studies: Effect size
• “Bare bones” calculation:
d ≈ +0.55 …a medium effect
• Bare bones effect size adjusted for measurement error
d ≈ +0.71 …a stronger effect
• Using same-study-author aggregation
d ≈ +0.88 …a strong effect
© 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012
13
Which predictors matter?
Treatment Group…Mean Effect
Size
…is made aware of performance, and control group is not +0.98
…receives targeted instruction (e.g., remediation) +0.96
…is tested with higher stakes than control group +0.87
…is tested more frequently than control group +0.85
© 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012
14
More Moderators – Source of Test
Number of Studies
Mean Effect Size
Researcher or Teacher 87 0.93National 24 0.87Commercial 38 0.82State or District 11 0.72Total 160
© 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012
15
More Moderators – Sponsor of Test
Number of Studies
Mean Effect Size
International 5 1.02Local 99 0.93National 45 0.81State 11 0.64Total 160
© 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012
16
More Moderators - Study Design
Number of Studies
Mean Effect Size
Pre-post 12 0.97Experiment, Quasi-experiment 107 0.94Multivariate 26 0.80Experiment, posttest only 7 0.60Pre-post (with shadow test) 8 0.58
Total 160
© 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012
17
More Moderators – Scale of Analysis
Number of Studies
Mean Effect Size
Aggregated 9 1.60Small-scale 118 0.91Large-scale 33 0.57
Total 160
© 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012
18
More Moderators – Scale of Administration
Number of Studies
Mean Effect Size
Classroom 115 0.95Mid-scale 6 0.72Large-scale 39 0.71
Total 160
© 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012
19
Surveys and opinion polls
© 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012
20
Percentage of survey items, by respondent group and type of survey
05
10
1520253035
404550
Public opinion polls Program evaluationsurveys*
Per
cen
t EducationProviders
EducationConsumers
© 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012
21
Number and percent of survey items,by test stakes and target group
Test stakes Number % Target group Number %
High 507 62 Students 393 46
Medium 184 23 Schools 281 33
Low 33 4 Teachers 116 14
Unknown 89 11 No stakes 64 7
TOTAL 813 TOTAL 854
© 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012
22
Opinion polls, by year
• 244 between 1958--2008, in the U.S. & Canada
• 813 unique question-response combinations
• close to 700,000 individual respondents
0
20
40
60
80
100
120
1960 1965 1970 1975 1980 1985 1990 1995 2000 2005
Year
© 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012
23
Surveys and opinion polls: Regular standardized tests, performance tests
Regular tests(N ≈125)
Performance tests (N ≈ 50)
Respondent opinion d d
Achievement is increased 1.2 1.0
…weighted by size of study population 1.9 0.5
Instruction is improved 1.0 1.4
…weighted by size of study population 0.9 0.9
Tests help align instruction 1.0 1.0
…weighted by size of study population 0.5 0.9
© 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012
24
Qualitative studies: Summary
(One cannot calculate an effect size.)
© 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012
25
Qualitative studies, by methodology type
MethodologyNumber of
studies %
Case study 120 43
Experiment or pre-post study 21 7
Interviews (individual or group) 75 27
Journal 2 1
Review of official records, documents, reports 33 12
Research review 8 3
Survey 22 8
TOTAL 281 100
© 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012
26
Qualitative studies: Effect on student achievement
Direction of effectNumber of
studies Percent of studiesPercent without
the inferred
Positive 204 84 93
Positive inferred 24 10
Mixed 5 2 2
No change 8 3 4
Negative 3 1 1
TOTAL 244 100 100
244 studies conducted in the past century in over 30 countries
© 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012
27
Qualitative studies: Testing improves student achievement and teacher instruction
Achievement is improvedNumber of
studies %
Yes 200 95
Mixed results 1 <1
No 10 5
TOTAL 211 100
Instruction is improvedNumber of
studies %
Yes 158 96
No 7 4
TOTAL 165 100
© 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012
28
Qualitative studies: Variation by rigor and test stakes
Direction of effect
Level of rigor
Totalhigh medium low
Positive 95 67 42 204
Positive inferred 10 8 6 24
Mixed 3 1 1 5
No change 4 3 1 8
Negative 1 1 1 3
TOTAL 113 80 51 244
Direction of effect
Stakes
Totalhigh medium low unknown
Positive 133 27 38 6 204
Positive inferred 12 5 7 24
Mixed 4 1 5
No change 2 1 5 8
Negative 3 3
TOTAL 154 33 51 6 244
© 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012
29
Qualitative studies: Regular standardized tests and performance tests
Regular tests (N =176)
Performance tests (N = 69)
Study results % %
Generally positive 93 95
High-stakes tests 71 42
High level of study rigor 46 48
Student attitudes toward test positive 60 71
Teacher attitudes toward test positive 55 80
Student achievement improved 95 95
Instruction improved 92 100
Large-scale testing 86 68
© 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012
30
An enormous research literature
• But, assertions that it does not exist at all are common
– Some claims are made by those who oppose standardized testing, and may be wishful thinking
– Others are “firstness” claims
© 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012
31
Dismissive research reviews
• With a dismissive research literature review, a researcher assures all that no other researcher has studied the same topic
© 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012
32
Firstness claims
• With a firstness claim, a researcher insists that he or she is the first to ever study a topic
© 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012
33
Social costs are enormous
• Research conducted by those without power or celebrity is dismissed -- ignored and lost
• Public policies are skewed, based exclusively on the research results of those with power or celebrity
• Society pays again and again for research that has already been done
© 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012
34
The effect of testing on student achievement:
1910-2010
Richard P. PHELPS