Determining Evidence-Based Practices in Special · PDF fileVol. 75. No. 3, />/>. 365-383....

Vol. 75. No. 3, />/>. 365-383.

©2009 Council for Exceptional Children.

Exceptional Childre

Determining Evidence-BasedPractices in Special Education

BRYAN G. COOKUniversity of Hawaii at Manoa

MELODY TANKERSLEYKent State University

TIMOTHY J. LANDRUMUniversity of Virginia

ABSTRACT: Determining evidence-based practices is a complicated enterprise that requires analyz-

ing the methodological quality and magnitude of the available research supporting specific prac-

tices. This article reviews criteria and procedures for identifying what works in the fields of clinical

psychology, school psychology, and general education; and it compares these systems with proposed

guidelines for determining evidence-based practices in special education. The authors then summa-

rize and analyze the approaches and findings of the 5 reviews presented in this issue. In these re-

views, prominent special education scholars applied the proposed quality indicators for

high-quality research and standards for evidence-based practice to bodies of empirical literature.

The article concludes by synthesizing these scholars' preliminary recommendations for refining the

proposed quality indicators and standards for evidence-based practices in special education, as well

as the process for applying them.

uch forces as the standards-based education movement, themandated participation of stu-dents with disabilities in stateproficiency testing, inclusion,

and the recognition that many students with dis-abilities are capable of higher levels of academicand social attainment than previously expectedhave driven an intensified focus on improvingoutcomes for students with disabilities. Perhapsbecause many factors that may inhibit the out-comes of students with disabilities are beyond thedirect control of educators (e.g., poverty, limitedresources, attitudes), special educators have

tended to focus their attention on one determi-nant of students' outcomes over which they havealways exercised primary control—teaching prac-tices. Unfortunately, many teachers of studentswith disabilities have implemented teaching prac-tices shown to have little effect on student out-comes while eschewing many research-basedpractices (e.g., B. G. Cook & Schirmer, 2003;Kauffman, 1996). In an effort to bridge this re-search-to-practice gap, lawmakers have empha-sized practices that research has shown to beeffective in such legislation as the No Child LeftBehind Act of 2001 and the Individuals WithDisabilities Education Act of 2004. Researchers

Exceptional Children 3 6 5

in special education have also conducted initialresearch on how to effectively support teachers inadopting and maintaining the use of research-based or evidence-based practices (see Wanzek &Vaughn, 2006, for a review of this research). De-spite the considerable interest in basing instruc-tional practices on research evidence, specialeducators have not yet established definitivelywhich practices are or are not evidence-based orsettled on a systematic process for determiningevidence-based practices (EBPs).

E V I D E N C E - B A S E D P R A C T I C E S

All interventions are not equal; some are muchmore likely than others to positively affect studentoutcomes (Forness, Kavale, Blum, &C Lloyd,1997). Simple logic appears to suggest that, ingeneral, teachers should prioritize the use of in-structional practices that are most likely to bringabout desired student outcomes. Although somecontend that research cannot reliably determinewhich educational practices produce desired gainsin student outcomes (e.g., Gallagher, 1998), weproceed under the positivist assumption that itcan (Lloyd, PuUen, Tankersley, & Lloyd, 2006).The use of EBPs, or those practices shown by re-search to work, seems particularly imperative inspecial education. As Dammann and Vaughn(2001) suggested, whereas many nondisabled stu-dents make adequate progress under a variety ofinstructional conditions, students with disabilitiesrequire the most effective teaching techniques tosucceed. However, advocating for implementingEBPs in special education begs two critical ques-tions: what are EBPs, and how can researchersidentify them?

All interventions are not equal; someare much more likely than others topositively affect student outcomes.

Determining whether a practice is evidence-based involves a number of issues: What types ofresearch designs should researchers consider? Howmany studies with converging findings are neces-sary to instill confidence that a practice is effec-tive? How methodologically rigorous must a

study be for the results to be meaningful? To whatextent must an intervention affect student out-comes for researchers to consider it effective? Al-though other issues certainly affect the difficultbusiness of determining EBPs, we limit our dis-cussion to these four issues—research design,quantity of research, methodological quality, andmagnitude of effect.

RESEARCH DESIGN

A generally accepted tenet of educational researchholds that research designs exhibiting experimen-tal control most appropriately address the ques-tion of whether a practice works (B. G. Gook,Tankersley, Gook, & Landrum, 2008). We recog-nize that no research design can completely ruleout all alternative explanations for findings whenconducted in the real-world settings of schoolsand classrooms; however, some designs do somore meaningfully than others. By using a con-trol group, randomly assigning participants togroups, and actively introducing the interventionto the experimental group, group experimentaldesigns can produce reliable knowledge claims re-garding whether an intervention affects studentoutcomes (L. H. Gook, Gook, Landrum, &Tankersley, 2008). We are not implying that ex-perimental research is better than other researchdesigns; rather, different types of research addressdifferent questions, and researchers should usethem accordingly. Should true experiments be theonly research design considered in determiningEBPs? Gan quasi-experiments, single-subject re-search (SSR), correlational research, and qualita-tive research also meaningfully determine whethera practice works?

QUANTITY OE RESEARCH

The process of conducting educational researchand accumulating knowledge from research istentative and cumulative (Rumrill & Gook,2001). Because of the recognized vagaries in con-ducting field-based educational research (Berliner,2002), it seems unwise to place too much faith inthe results of a single study regardless of its de-sign, effect size, or methodological rigor. Ger-tainly, as more studies with converging evidenceaccrue, research consumers can have greater confi-dence in those findings. But how many studies

3 6 6Spring 2009

supporting a practice are sufficient to reasonablyconclude that it works?

METHODOLOGICAL QUALITY

The methodological rigor with which a study isconducted affects the confidence that one canhave in its findings. For example, evidence of ac-ceptable implementation fidelity seems to be anecessary feature of a trustworthy study. If the re-searchers did not implement the intervention asdesigned, they can draw no meaningful conclu-sion about the effectiveness of the practice. In-deed, Simmerman and Swanson (2001) reportedthat the presence of desirable methodological fea-tures in a study (e.g., controlling for teachereffects, iising appropriate units of analysis in ana-lyzing data, reporting psychometric properties ofmeasurement tools) significantly corresponds withlower efFect sizes. Examining and accounting forthe methodological quality of studies in deter-mining EBPs therefore appears important. Shouldresearchers determine EBPs by using only studiesof high methodological quality? What method-ological features are critically important for ahigh-quality study?

MAGNITUDE OE EEEECT

EBPs should have a considerable and meaning-ful—as opposed to trivial—positive effect on stu-dent outcomes. Researchers have traditionallygauged the impact of an intervention in groupstudies by using tests of statistical significance,which estimate the likelihood that differences be-tween the grotips occurred by chance. However,in part because of concerns that studies involvinga large number of participants can yield statisti-cally significant findings even when outcomesmay not be educationally meaningful, researchershave begun to report effect sizes (e.g., Cohen's d),which sample size does not affect, to help inter-pret an intervention's effect (American Psycholog-ical Association, 2001). Although Cohen (1988)suggested values for interpreting effect sizes assmall, medium, and large, he was careful to pointout that researchers should not consider thesesubjective guidelines to be absolute standards.How is the effect of an intervention best evalu-ated? If researchers use effect sizes to assess theimpact of a practice, how large an effect is neces-

sary to indicate a meaningful change? If re-searchers use SSR studies to determine EBPs inspecial education, how should they evaluate theeffect of the intervention?

The importance of using practices shown byresearch to be the most effective is by no meansunique to special education (Odom et al., 2005).The medical field generally receives credit for pio-neering efforts in this area, with evidence-basedmedicine becoming prominent in the 1990s (seeSackett, Rosenberg, Cray, Haynes, & Richardson,1996). Such other professions as clinical psychol-ogy (see Chambless et al., 1996, 1998); schoolpsychology (see Kratochwill & Stoiber, 2002;Task Force on Evidence-Based Interventions inSchool Psychology, 2003); and general education(see What Works Clearinghouse, WWC, n.d.a)have followed suit, developing criteria and proce-dures for identifying EBPs in their fields. To con-textualize efforts to determine EBPs in specialeducation, we briefly review the criteria and stan-dards for determining EBPs in these three fields.

D E T E R M I N I N G E V I D E N C E - B A S E D

P R A C T I C E S I N R E L A T E D F I E L D S

The Division 12 (Division of Clinical Psychol-ogy) Task Force on Promotion and Disseminationof Psychological Procedures (1995) delineated cri-teria, which Chambless et al. updated in 1996and 1998, for well-established treatments andprobably efficacious treatments in clinical psy-chology. Subsequently, in the field of school psy-chology. Division 16 and the Society for theStudy of School Psychology Task Force developeda detailed system for coding and describing multi-ple aspects of research studies (Task Force on Evi-dence-Based Interventions in School Psychology,2003). Instead of categorizing the degree to whichiriterventions are evidence-based, the coding sys-tem generated by the school psychology teamprovides a detailed description of a research base,from which consumers "draw their own conclu-sions based on the evidence provided" regardingthe sufficiency of research supporting an interven-tion (Kratochwill Sc Stoiber, 2002, p. 360).

In general education, the WWC, establishedin 2002 by the U.S. Department of Education'sInstitute of Education Sciences, rates reviewed

Exceptional Children

practices as having positive, potentially positive,mixed, no discernible, potentially negative, ornegative effects (WWC, n.d.b). This section ex-amines how these three diverse approaches foridentifying what works in fields closely related tospecial education treat the issues of research de-sign, quantity of research, methodological quality,and niagnitude of effect.

RESEARCH DESIGN

Clinical Psycholog. Chambless et al. (1998)considered only studies employing between-groupexperimental and SSR designs in determiningboth well-established treatments and probably ef-ficacious tteatments.

School Psychology. The Task Force on Evi-dence-Based Interventions in School Psychology(2003) aims to provide descriptions of group re-search, SSR, confirmatory program evaluation,and qualitative research (Kratochwill & Stoiber,2002). Coding manuals are currently available forgroup research and SSR, but are being expandedto include criteria for qualitative research andconfirmatory program evaluation (T. Kratochwill,personal communication, September 26, 2008).Kratochwill and Stoiber suggested that codingndnexperimental research studies (i.e., qualitativeand confirmatory program evaluation) providesinformation on a broad range of research relevantto consumers but do not indicate that these dif-fetent research designs contribute equally to de-termining whether a practice works.

General Education. The WWC (2008) con-siders only randomized controlled trials andquasi-experimental studies (i.e., quasi-expeti-ments with equating, regression discontinuitydesigns, and SSR) when determining the effec-tiveness of an intervention. The WWC classifiesstudies as meeting evidence standards, meetingevidence standards with reservations, or not meet-ing evidence standards. Only randomized con-trolled studies can meet evidence standardswithout reservation. Quasi-experimental studiesthat satisfy the WWC's methodological criteria, aswell as randomized controlled studies withmethodological limitations, can meet evidencestandards with teservations. Methodological crite-ria for SSR and regression discontinuity designs

have been under development since September2006 but are not yet available (WWC).

QUANTITY OF RESEARCH

Clinical Psychology. The Division 12 TaskForce considers a psychological treatment well-es-tablished when at least two good between-groupdesign experiments or nine SSR studies support it(Chambless et al., 1998). The clinical psychologytask force considers a treatment to be possibly ef-ficacious when supported by at least (a) onegroup experiment that meets all methodologicalcriteria for group experiments except the require-ment for multiple investigators, (b) two group ex-periments that produce superior outcomes incomparison with a wait-list control group, or (c)three SSR studies that meet all SSR criteria exceptthe requirement fot multiple investigators.

School Psychology. Because the school psy-chology task force did not seek to categorize prac-tices regarding its effectiveness, it did not establishcriteria related to the number of required studiesfor evidence-based classifications.

General Education. The WWC (n.d.b) re-quires at least one or two studies for a practice orcurriculum to be considered as having positive,potentially positive, mixed, potentially negative,or negative effects. The specific number and typeof studies required varies within and betweenthese categories of effectiveness. For example, apositive effect requires two or more studies show-ing statistically significant positive effects, at leastone of which meets WWC evidence standardswithout reservations, and no studies showing sta-tistically significant or substantively importantnegative effects. A potentially positive efFect, how-ever, requires at least one study showing a statisti-cally significant or substantively importantpositive effect, no studies showing statistically sig-nificant or substantively important negative ef-fects, and no more studies showing indeterminateeffects than studies showing statistically signifi-cant or substantively important positive effects.


Clinical Psychology. In addition to stipulatingthat tesearchers must compare interventions witha placebo or other tteatment, the Division 12 cri-teria for well-established treatments require that

3 6 8 Spring 2009

researchers (a) conduct experiments with treat-ment manuals, (b) clearly describe participantcharacteristics, and (c) have two separate investi-gators or investigatory teams conduct supportingstudies (Chambless et al., 1998). These standardsare relaxed for possibly efficacious treatments.Group experiments that compare the treatmentgroup with a wait-list control group and that thesame investigators conduct may be considered forpossibly efficacious practices, as can SSR studiesthat the same investigators conduct.

When evidence regarding the effects of an in-tervention is mixed, reviewers further assess themethodological quality of studies to determinewhich studies to weigh more heavily (Chamblesset al., 1998). Chambless and HoUon (1998) rec-ommend assessing such methodological featuresas the following:

• The descriptions of samples use standard di-agnostic labels assigned from a structured di-agnostic interview.

• Outcome measures demonstrate acceptablereliability and validity in previous research.

• With the exception of simple procedures, theresearchers follow a written treatment man-ual when delivering the intervention.

• Researchers avoid Type I error (e.g., adjustalpha level when conducting multiple statis-tical tests), control for pretest scores whencomparing groups' posttest measures, and ad-just analysis and interpretation if differentialattrition or participation rates exist betweengroups.

• A stable baseline, typically with at least threedata points, is established in SSR.

School Psychology. Although the Division 16procedures do not classify studies according totheir methodological quality, reviewers do rateand describe a number of methodological fea-tures—which consumers use to make informeddecisions about an intervention's evidence baseand effectiveness (Kratochwill & Stoiber, 2002).Reviewers evaluate studies, regardless of design,by using multiple criteria along three dimensions:general characteristics, key evidence components,and other descriptive or supplemental features.For example, researchers rate the strength of eightkey components for group research on a 4-point

scale. These key components are measurement,comparison group, statistical significance of out-comes, educational and clinical significance, im-plementation fidelity, replication, site ofimplementation, and follow-up assessment. In ad-dition to providing an overall rating for eachcomponent, reviewers record additional informa-tion for most components. Regarding the com-parison group, for example, reviewers select thetype of comparison group from a list of options;rate their confidence in determining the type ofcomparison group (from very low to very high);indicate how the researchers counterbalancedchange agents (by change agent, statistical, other);check how the researchers established groupequivalence (e.g., random assignment, post hocmatched set, statistical matching, post hoc test forgroup equivalence); and check whether and howmortality was equivalent between groups.

General Education. The WWC (2008) speci-fies that for randomized controlled trials to meetevidence standards without reservations, (a) re-searchers must randomly assign participants toconditions; (b) overall and differential attritionmust not be high; (c) no evidence of interventioncontamination (e.g., changed expectancy, novelty,disruption, local history event) exists; and (d) re-searchers avoid a teacher-intervention confoundby either assigning more than one teacher to eachcondition or by presenting evidence that teachereffects are negligible. The WWC uses similar, butless stringent, criteria for randomized controlledtrials and quasi-experimental studies to meet evi-dence standards with reservations.

MAGNITUDE OF EFFECT

Glinical Psychology. For a group design studyto support a well-established or possibly effica-cious treatment, Chambless et al. (1998) requirethat treatment groups achieve outcomes that arestatistically significantly superior to a controlgroup or equivalent to a comparison group thatreceived a treatment that researchers had previ-ously determined to be well-established. With re-gard to SSR, Chambless and Hollon (1998)suggest that "evaluators . . . carefully examine datagraphs and draw their own conclusions about theefficacy of the intervention" (p. 13).


School Psychology. Because the Division 16Task Force (Task Force on Evidence-Based Inter-ventions in School Psychology, 2003) coding pro-cédures do not classify interventions in terms oftheir effectiveness, no criteria are specified regard-ing magnitude of effect. However, reviewers codestudy characteristics related to significance of out-comes: statistical significance, educational andclinical significance, and effect size for groupstudies; and visual analysis, effect size, and educa-tional and clinical significance for SSR.

General Education. The WWC (n.d.b) usesfive categories to describe the magnitude of effectfor reviewed studies: statistically significant posi-tiye effects, substantively important positiveeffects, indeterminate effects, substantively im-portant negative effects, and statistically signifi-cant negative effects. Substantively importanteffects are educationally meaningful although notstatistically significant; the WWC suggests usingan effect size of greater than ±0.25 as a cutoff forsubstantively important effects. Indeterminate ef-fects are neither statistically significant nor haveeffect sizes greater than ±0.25.

CRITIQUES OF PROCESSES FOR

DETERMINING WHAT WORKS IN

OTHER E I ELDS

Although it is difficiilt to disagree with the gen-eral notion that "evidence should play a role ineducational practice" (Slavin, 2008, p. 47), con-troversy seems to follow closely on the heels ofproposals for establishing EBPs. Indeed, Kendall(1998) likened EBPs to religion and politics aslightning rods for conflict. Elliott (1998) notedthat criticisms of EBPs tend to fall into one oftwo categories: concerns about the general en-deavor of designating EBPs and disagreementswith the particular standards and criteria used. Al-though the first category includes many impor-tant issues (e.g.. Can research conclusivelyidentify any practice as truly effective? Will ap-proaches not labeled as evidence-based be disre-garded?), this article focuses here on critiques ofspecific features of the three processes reviewed.

Waehler, Kalodner, Wampold, and Lichten-berg (2000) noted that some have criticized theDivision 12 criteria for determining empiricallyvalidated treatments in clinical psychology for re-

lying too heavily on randomized clinical trials,psychological diagnoses, and adherence to treat-ment manuals, as well as for being too lenient.Scholars in school psychology also took issue withthe Division 16 coding procedures as overwhelm-ing and overly complex (Durlak, 2002; Levin,2002; Nelson Si Epstein, 2002; Stoiber, 2002); asseeming to endorse research designs that do notpermit making causal inferences (Nelson & Ep-stein); and for producing ambiguous, descriptivereports rather than designating EBPs (Wampold,2002). Finally, some researchers have criticizedthe WWC's (2008) standards as relying too heav-ily on randomized controlled trials, which are ex-tremely difficult to conduct in school settings(Kingsbury, 2006); as overly rigorous, resulting infew practices with positive effects identified (caus-ing some to refer to the WWC as the "'nothingworks' clearinghouse," Viadero & Huff, 2006, p.8); and as politically influenced (Schoenfeld,2006).

Criticism regarding criteria and standards fordetermining EBPs may be unavoidable. Establish-ing EBPs involves addressing a number of ques-tions that lack any unequivocally correct answersand about which different stakeholders are boundto disagree. For example, requiring a large num-ber of randomized controlled trials that meetstringent methodological criteria and report largeeffect sizes will produce a high degree of confi-dence in practices shown to be evidence-based.However, this approach may be unnecessarilystringent, potentially excluding meaningful stud-ies. Yet designating practices as evidence-based be-cause of one study or a few research studies of anydesign without stringent methodological stan-dards invites false positives.

The categorization of practices represents an-other contentious issue for which multiple validapproaches may exist. Using a dichotomous sys-tem for labeling practices (e.g., evidence-based ornot evidence-based) provides straightforwardinput for prioritizing instructional practices.However, a binary categorization scheme mayoverlook the complexities involved in interpretingbodies of research literature as well as promote theunfounded view that practices are either com-pletely effective or completely ineffective. In con-trast, whereas in-depth descriptions of a researchbase might facilitate nuanced and comprehensive

3 7 O Spring 2009

understanding, they may be of limited practicaluse for practitioners seeking guidance on how toteach in their classrooms the following day.

Any approach to determining what works inspecial education will inevitably have limitations.This recognition does not suggest that endeavorsto establish EBPs are destined to fail. Rather, thestrength of a system for determining EBPs lies inmatching criteria and standards with the collec-tive traditions, values, and goals of the field thatwill use it. Therefore, special educators should de-sign a system for determining what works in spe-cial education based on the unique characteristicsand needs of their field. Odom et al. (2005) en-deavored to delineate the "devilish details" (p.138) of guidelines for determining EBPs rootedin the history and research traditions of specialeducation.

P R O P O S E D G U I D E L I N E S F O R

E V I D E N C E - B A S E D P R A C T I C E S

I N S P E C I A L E D U C A T I O N

As an initial step for basing practice on research,the Division for Research of the Gouncil for Ex-ceptional Ghildren, under the leadership of SamOdom, commissioned a series of papers that pro-posed quality indicators (QIs; i.e., features presentin high-quality research studies) for four differentresearch designs: group experimental studies(Gersten et al., 2005); SSR (Horner et al., 2005);correlational research (Thompson, Diamond,McWilliam, Snyder, & Snyder, 2005); and quali-tative research (Brantlinger, Jimenez, Klingner,Pugach, & Richardson, 2005). Gersten et al. alsoproposed standards for determining EBPs on thebasis of group experimental/quasi-experimentalresearch, and Horner et al. proposed standards fordetermining EBPs on the basis of SSR. Gonsid-ered together, the proposed QIs and standardsconstitute initial guidelines for establishing EBPsin special education. The number of prominentspecial education researchers who developed theproposed criteria and standards and the incorpo-ration of feedback from special education re-searchers who discussed the proposed criteria andstandards at a Research Project Director's Meeting(hosted by the Office of Special Education Pro-

grams; Odom et al., 2004) enhances their credi-bihty.

The following sections examine the proposedguidelines for determining EBPs in special educa-tion and compare the proposed guidelines in spe-cial education with the systems for determiningwhat works in clinical psychology, school psychol-ogy, and general education in relation to researchdesign, quantity of research, methodological qual-ity, and magnitude of effect.

RESEARCH DESIGN

We assume that because standards for EBPs wereproposed only for group experimental and quasi-experimental research (Gersten et al., 2005) andSSR (Horner et al., 2005), these research designsare the only ones to consider in determiningwhether a practice in special education is evi-dence-based. The Division for Research TaskForce probably based this decision on the uniqueability of these designs to exhibit experimentalcontrol (Gook, Tankersley, Gook, & Landrum,2008). Special education, clinical psychology, andgeneral education share many similarities in theirtreatment of research design in determiningEBPs. For example, all three fields consider groupexperimental studies in determining EBPs. Re-searchers can also consider practices as evidence-based in special education, as well-established inclinical psychology, and as having potentially pos-itive effects (but not as having positive effects) ingeneral education on the basis of SSR. However,whereas Gersten et al. allowed for quasi-experi-mental studies to constitute the sole research sup-port for EBPs in special education, Ghambless etal. (1998) did not consider quasi-experimental re-search in establishing empirically validated thera-pies in clinical psychology, and the WWG (n.d.b)requires at least one true experiment to supportpractices with positive effects in general educa-tion.

QUANTITY OE RESEARCH

Gersten et al. (2005) required a minimum of twohigh-quality group studies or four acceptable-quality group studies to consider a practice evi-dence-based or promising in special education.These numbers are similar to the quantity ofgroup-design studies required for determining


EBPs in clinical psychology and general educa-tion. For example, Chambless et al. (1998)required two or more group studies for a well-established treatment, and the WWC (n.d.b) callsfor two or more group design studies, at least oneof which must be a randomized controlled trial,to support practices with positive effects.

To consider a practice to be evidence-basedin special education, Horner et al. (2005) speci-fied a minimum of five SSR studies that involve atotal of at least 20 total participants and that atleast three different researchers conduct across atleast three different geographical locations. Thisnumber is somewhat less than the number of SSRstudies (« = 9) that Chambless et al. (1998) re-quired to deem a treatment in clinical psychologywell established. By contrast, the WWC (2008)considers SSR studies as quasi-experimental de-signs, which cannot alone constitute sufficient ev-idence to deem a practice as having positiveeffects.


Cersten et al. (2005) proposed four essential QIsfor group experimental research in the areas of de-scribing participants, implementing interventionsand describing comparison conditions, measuringoutcomes, and analyzing data. Each QI subsumesa number of specific criteria that a study mustmeet for it to address the QI. For example, tomeet the QI oí describing participants, a studymust address these three criteria:

1. Was sufficient information provided todetermine/confirm whether the partici-pants demonstrated the disability(ies) ordifficulties presented?

2. Were appropriate procedures used to in-crease the likelihood that relevant charac-teristics of participants in the sample werecomparable across conditions?

3. Was sufFicient information characterizingthe interventionists or teachers provided?Did it indicate whether they were compa-rable across conditions? (Gersten et al., p.152)

Cersten et al. (2005) also proposed eight de-sirable QIs related to attrition, reliability and datacollectors, outcome measures beyond posttest, va-

lidity, detailed assessment of implementation fi-delity, nature of instruction in comparison condi-tion, audiotape or videotape excerpts regardingthe intervention, and presentation of results. Inaddition to meeting all the essential QIs, high-quality group studies must address at least four ofthe desirable QIs. Acceptable studies must meetonly one of the desirable QIs in addition to ad-dressing all but one of the essential QIs.

The QIs for group studies that Cersten et al.(2005) proposed are somewhat distinct from thecriteria for high-quality group research used inother fields. For example, among the study fea-tures required for a high-quality group study inspecial education that the WWC (2008) does notrequire for a group study that meets evidencestandards without reservations in general educa-tion are

• Detailed descriptions of participants, setting,and independent variable, and services pro-vided in the comparison group.

• The use of multiple outcome measures col-lected at appropriate times.

• Documentation of implementation fidelity

• Appropriate units of analysis (althoughWWC reviews must note misalignment be-tween units of assignment and units of analy-sis).

Among the features that the WWC requires for astudy that meets evidence standards withoutreservations but that Gersten et al. does not re-quire for high-quality group studies are overalland differential attrition not severe or accountedfor (although Cersten et al. included attrition as adesirable QI), and no intervention contamina-tion. Both sets of criteria for high-quality groupstudies require researchers to demonstrate thecomparability of interventionists across condi-tions.

Horner et al. (2005) proposed QIs for SSRin special education in seven areas: describing par-ticipants and settings, dependent variable, inde-pendent variable, baseline, experimental controland internal validity, external validity, and socialvalidity. Horner et al. proposed 21 criteria to as-sess the presence of these QIs. For example, tomeet the dependent variable QI, a study mustmeet the following criteria:

3 7 2 Spring 2009

1. Dependent variahles are described withoperational precision.

2. Each dependent variable is measured witha ptocedure that generates a quantifiableindex.

3. Measurement of the dependent variable isvalid and described with replicable preci-sion.

4. Dependent variables are measured repeat-edly over time.

5. Data are collected on the reliability or in-terobserver agreement (IOA) associatedwith each dependent variable, and IOAlevels must meet minimal standards (e.g.,IOA = 80%, Kappa = 60%). (Horner etal., p. 174)

Horner et al. (2005) indicated that reviewersuse the QIs, "for detetmining if a study meets the'acceptable' methodological rigor needed to be acredible example of SSR" (p. 173). Horner et al.do not explicitly state whether studies must meetall the QIs to be considered of acceptablemethodological quality, although we infet thatthey must. The QIs for high-quality SSR studiesin special education overlap somewhat with thecriteria for studies that suppott empitically vali-dated treatments in clinical psychology. BothChambless et al. (1998) and Horner et al. requirethat researchers clearly describe participant chat-actetistics and use an apptopriate SSR design.Chambless et al. tequite that researchers comparethe intetvention with a placebo ot anothet treat-ment and conduct the intetvention by usingtteatment manuals, whereas Horner et al. do not(although Horner et al. do require that tesearchersovertly measure fidelity of implementation of theindependent variable). Horner et al. require anumber of criteria that Chambless et al. do notcall fot, such as desctiption of physical location,desctiption of the dependent variable with tepli-cable precision, acceptable levels of interobserveragreement regarding the dependent vatiable, anddocumentation of the external and social validityof the dependent vatiable.

MAGNITUDE OF EFFECT

For a practice to be considered evidence-based inspecial education, Cetsten et al. (2005) ptoposed

that the weighted effect size of group experimen-tal studies should be significantly greater thanzero. We presume that this effect size derives fromonly those studies found to be acceptable or ofhigh quality vis-a-vis the QIs. Fot promising prac-tices, Getsten et al. required that a 20% confi-dence intetval for the weighted effect size acrossstudies be gteatet than zeto. In contrast, bothclinical psychology (Chambless et al., 1998) andgeneral education (WWC, n.d.b) use statisticalsignificance as the standatd to judge whethergroup studies support well-established tteatmentsand ptactices with positive effects, tespectively.The WWC does consider effect sizes (e.g., d S0.25) in the absence of statistically significantfindings fot determining that a practice has po-tentially positive effects.

Horner et al. (2005) did not prescribe a par-ticular effect size needed fot SSR studies to sup-pott a practice. However, for the authors toconsider a practice as evidence-based on the basisof SSR in special education, they required a docu-mented causal ot functional telationship betweenuse of the ptactice and change in a socially impor-tant dependent vatiable. Horner et al. suggestedthat visual analysis "of the level, ttend, and vati-ability of petfonnance occurring during baselineand intervention conditions" (p. 171) establishesa functional relationship. Visual inspection ofgtaphic displays of student behaviot involves thefollowing:

1. Immediacy of effects following theonset and withdrawal of the practice.

2. Overlap of data points in adjacentphases.

3. Magnitude of change in the depen-dent variable.

4. Consistency of data patterns acrossconditions (Horner et al.).

Chambless and Hollon (1998) similarly sug-gested using visual inspection criteria to deter-mine the effect for SSR studies in clinicalpsychology. The WWC (2008) is developingguidelines, which are not yet available, for assess-ing the magnitude of effect in SSR studies.


A P P L I C A T I O N S O F P R O P O S E D

G U I D E L I N E S F O R D E T E R M I N I N G

E B P S I N S P E C I A L E D U C A T I O N

Gersten et al. (2005) suggested that their pro-posed criteria and standards for determining EBPsin special education were "merely a first step,"which researchers should refine, "based on field-testing" (p. 163). In response, special educationresearchers have begun to use the proposed QIsand standards in reviews and analyses of researchliterature. For example, Browder, Wakeman,Spooner, Ahlgrim-Delzell, and Algozzine (2006)applied the QIs and standards for EBPs proposedby Cersten et al. (2005) and Horner et al. (2005)to 128 intervention studies (88 SSR studies and40 group quasi-experimental studies) that investi-gated reading outcomes for individuals with sig-nificant cognitive disabilities. Browder et al.condensed the seven QIs and 21 criteria thatHorner et al. proposed for SSR into four cate-gories:

• Dependent variable operationally definedand included data on reliability.

• Methods adequately described.

• Data collected on procedural fidelity.

• Baseline and experimental control (with par-ticular focus on, between, and within partici-pant replications).

Two coders independently coded the pres-ence of these four categories for all 88 SSR stud-ies. Interrater agreement was 100% in eachcategory except procedural fidelity, for which in-terrater agreement was 93%. Fifty-six of the SSRstudies met all four of Browder et al.'s (2006) cat-egories of QIs for SSR. From these studies,massed trial as well as systematic prompting metHorner et al.'s (2005) standards for an EBP (i.e.,at least five supporting studies involving a mini-mum of 20 total participants, conducted by atleast three different researchers in at least threedifferent locations) for the outcomes oí sight-word vocabulary, picture vocabulary, and compre-hension. The researchers determined that timedelay was also an EBP for sight-word vocabularyand fiuency and that pictures were an EBP forcomprehension for the target population.

Browder et al. (2006) also clustered the fouressential and eight desirable QIs that Gersten et

al. (2005) proposed for group research into fourcategories:

• Outcome measures—operationally definedand evidence of reliability and validity.

• Intervention clearly defined.

• Measure of procedural fidelity.

• Use of comparison group and interventiondefined.

Because of the perceived level of judgmentrequired to code these methodological categories,Browder et al. (2006) used a consensus model toestablish reliability. In this consensus model, twocoders discussed coding decisions until theyreached agreement. Therefore, Browder et al. didnot report interrater reliability (IRR). Only 2 ofthe 40 group studies met all four of Browder etal.'s categories for group studies, with no particu-lar practice having sufficient empirical support tobe considered evidence-based.

In reviewing the empirical literature on inter-ventions aimed at improving self-advocacy forstudents with disabilities. Test, Fowler, Brewer,and Wood (2005) assessed the presence of the QIsthat Gersten et al. (2005) proposed in 11 groupexperimental studies and that Horner et al.(2005) proposed in 11 SSR studies. Test et al.found high levels of IRR for coding the QIs in asubset of studies: means of 98.5% agreement forSSR and 98.7% agreement for group experimen-tal research. Test et al. reported that only one ofthe 11 SSR studies that they reviewed met all theQIs. Although most of the SSR studies met mostQIs, only six sufficiently described how partici-pants were selected and only two described andmeasured procedural fidelity. Test et al. assessed23 criteria for group studies, examining essentialand desirable QIs together and including criteriaregarding the conceptualization of a study (thatGersten et al. included in their QIs for researchproposals). None of the group studies met all orall but one of Test et al.'s criteria. Among the cri-teria that few studies met: data collectors unfamil-iar with study conditions {n = 4), data collectorsunfamiliar with participants (« = 4), documenta-tion of attrition {n = 3), clear descriptions of thedifference between intervention and control (« =3), and measures of procedural fidelity (n = 1).Test et al. did not apply Gersten et al.'s proposed

3 7 4 Spring 2009

Standards to determine whether any practicesevaluated in the reviewed studies were evidence-based.

On a smaller scale, we applied the proposedQls, as literally as possible, to two group experi-mental studies (B. C. Cook & Tankersley, 2007)and two SSR studies (Tankersley, Cook, & Cook,2008). Although the small scope of these pilotprojects limits their generalizability, our applica-tion of Cersten et al.'s (2005) criteria indicatedthat the group experimental studies that we re-viewed met 40% of the QI components; whereasthe SSR studies that we reviewed met 48% of theQI components that Horner et al. (2005) pro-posed. We reported a moderately low IRR of .69for SSR QI components (Tankersley et al.; weused a consensus model and did not assess IRRfor the group Qls). We found that reliably deter-mining whether studies addressed many of theproposed Qls was difficult because of incompleteand ambiguous reporting in the articles reviewedand because of the lack of specificity and clarity(e.g., operationalized definitions) in the proposedQls (B. C. Cook & Tankersley; Tankersley et al.).

It is encouraging that both Browder et al.(2006) and Test et al. (2005) applied the pro-posed Qls and reported high IRR in their coding.However, it is important to note that Browder etal. did not apply all the Qls. Furthermore, Test etal. made no distinction berween essential and de-sirable Qls for group studies and did not applystandards for determining EBPs. Thus, to ourknowledge, no published studies have applied thespecific Qls and standards for EBPs as Cersten etal. (2005) and Horner et al. (2005) proposedacross an entire body of research literature.Clearly, to meaningfully determine the feasibilityof applying the proposed Qls and standards andto identify aspects of the Qls and standards thatresearchers might fruitfully reflne, researchersshould conduct additional field tests.

S U M M A R Y A N D A N A L Y S I S

O F F I V E F I E L D T E S T S O F

D E T E R M I N I N G E B P S I N

S P E C I A L E D U C A T I O N

We asked five teams of expert reviewers to faith-fully apply the Qls and standards for EBPs, pro-

posed by Cersten et al. (2005) and Horner et al.(2005), to bodies of research literature on inter-ventions relevant to their fields of expertise. Re-view teams evaluated the intervention literatureon five interventions frequently used with stu-dents with disabilities: cognitive strategy instruc-tion (Montague &C Dietz, 2009); repeated reading(Chard, Ketterlin-Celler, Baker, Doabler, &Apichatabutra, 2009); self-regulated strategy de-velopment (Baker, Chard, Ketterlin-Celler,Apichatabutra, & Doabler, 2009); time delay(Browder, Ahlgrim-Delzell, Spooner, Mims, &Baker, 2009); and function-based interventions(Lane, Kalberg, & Shepcaro, 2009). This sectionsummarizes and analyzes the approaches andfindings of these five reviews with the goal ofmaking preliminary recommendations for refin-ing the proposed Qls and standards for EBPs, aswell as the process for applying them.

SCOPE OF REVIEW

Reviewers initially had to determine whether andhow to delimit the scope of their review. As Brow-der et al. (2009) suggests, reviewers might delimit"the specific population of focus, the scope of thedependent variable to be considered . . ., andother aspects of the studies" (p. 360). Four of thefive review teams identified a target populationmore specific than students with disabilities—Baker et al. (2009) and Chard et al. (2009) re-viewed studies involving students with and at riskfor learning disabilities, Browder et al. focused onstudents with significant cognitive disabilities,and Lane et al. (2009) targeted students with orat risk for emotional and behavioral disorders.Only Lane et al. included an age parameter, re-viewing outcomes for secondary students only.The review teams varied in the degree to whichthey set parameters for dependent variables.Browder et al. reviewed only studies that specifi-cally assessed picture or word recognition. Mon-tague and Dietz (2009) and Baker et al. statedmore general outcome parameters for their re-views—mathematical problem solving and writ-ing performance, respectively. Although Chard etal. and Lane et al. did not specify such outcomevariables as inclusion criteria for their reviews,their interventions are associated with particular


outcome areas (i.e., reading for Ghard et al. andbehavioral outcomes for Lane et al.).

The specific parameters that reviewers applyrepresent an important concern. Using overlybroad parameters (e.g., students with or at risk fordisabilities) may not address such critical questionsas for whom the practice works with sufficientspecificity for research consumers. Gonversely,overly narrow parameters may reduce the numberof studies available and limit the implications ofthe review. Although we realize that a variety ofsensible rationales exists, for focusing reviews onspecific groups, in the absence of a compelling ra-tionale, we recommend that reviews focus on asbroad a population as seems reasonable andmeaningful and that authors carefully describeparticipants across studies reviewed to informconsumers about the population for whom theintervention has been shown to be efFective.

DETERMINING THE PRESENCE

OE QUALITY INDICATORS

A particular element of high-quality research isoften neither completely present nor completelyabsent in a research report but instead is partiallypresent. Recognizing this issue. Baker et al.(2009) and Ghard et al. (2009) collaborativelyconstructed 4-point rubrics for rating the pres-ence of Ql components for group experimentsand SSR. The other review teams rated each com-ponent dichotomously, as met or not met. Be-cause relatively low IRR was associated with usingthe 4-point rubric, we recommend that future re-views use a dichotomous approach for classifyingthe presence of QIs, at least until reviewers refinea more detailed rubric that they can use withgreater reliability.

Ultimately, the method of choice for identi-fying the presence of methodological QIs may bea philosophical issue. If the purpose of the reviewsis to provide in-depth descriptions of a researchbase, the use of a rubric—perhaps supplementedwith descriptions of the strengths and weaknessesof the literature base for each Ql—may be desir-able. Alternatively, if the main intent of the re-views is to yield a straightforward decision aboutwhether a practice is evidence-based, the benefitof additional information gained by using a moredetailed rating system may not be worth the cost

of extra time involved in assessing and reportingthe information or the possibility of decreasedIRR. Of course, the goals of providing in-depthinformation on a research base and categorizingpractices as evidence-based are not mutually ex-clusive. Future reviewers in special education maywant to provide descriptions, use a multiple-pointrating system, and employ a "yes/no checklist" ap-proach, thereby generating reviews to serve differ-ent purposes for different audiences.

The Division for Research of the Gouncil forExceptional Ghildren asked Gersten et al. (2005)and Horner et al. (2005) to identify and briefiydescribe, not operationally define, sets of QIs (S.L. Odom, personal communication, April 7,2006) in their development of the QIs. Accord-ingly, Gersten et al. and Horner et al. stated someof the QIs somewhat subjectively. For example,Horner et al. required that the dependent variablebe practical and cost-effective but did not provideconcrete guidelines for determining practicality orcost-effectiveness. Accordingly, many of the re-view teams interpreted, and in some cases modi-fied, the QIs for their reviews. For example. Laneet al. (2009) required that researchers explicitlydescribe the cost-effectiveness of their interven-tion. At times, review teams also expanded on theQIs. For instance, Browder et al. (2009) specifiedthat not only must researchers overtly measureimplementation fidelity but that they must alsodocument a minimum level of 80%. Lane et al.also required that all components for the internalvalidity Ql for SSR studies be met as a precon-dition for the external validity Ql. In other situa-tions, review teams for this issue reduced thecriteria for certain QIs (e.g.. Lane et al., 2009,and Montague & Dietz, 2009, set their criteria at3 data points for baseline, as opposed to the 5points that Horner et al. suggested). Browder etal. also adapted some of the SSR QIs for the spe-cific outcomes of their review (e.g., they definedthe socially important change component as learn-ing at least five new words or pictures).

Browder et al. (2009) suggested that adapt-ing the QIs to optimize their applicability for theintervention being reviewed should be a criticalcomponent of each review. Determining whetherthe QIs can be sufficiently specific and opera-tionálized to yield reliable ratings yet flexibleenough to apply meaningfully to a wide variety of

3 7 6 Spring 2009

TABLE 1

Summary of Single-Subject Research Quality Indicators Rated as Present

Quality Indicator

Participants/setting

Dependent variahle

Independent variahle

Baseline

Internal validity

External validity

Social validity

Total

CbardKetterlin-Geller,

Baker,Doabler, &

Apichatabutra(2009)

1/6

3/6

2/6

3/6

4/6

0/64/6

17/42, 40%

Lane,Kalberg, &Shepcaro(2009)

1/12

5/12

6/12

7/12

2/12

1/12

1/12

23/84, 27%

Browder,Ahlgrim-Dekell,

Spooner,Mims, &

Baker(2009)

28/30

30/30

26/30

29/30

30/30

29/30

28/30

200/210,95%

Montague& Dietz(2009)

5/5

1/5

0/5

5/5

5/5

5/5

5/5

26/35,1^%

Baker,Chard,

Ketterlin- Geller,Apichatabutra,

& Doabler(2009)

8/9

9/9

7/9

8/9

9/9

9/9

9/9

59/63,94%

Total

43/62, 69%

48/62, 77%

41/62, 66%

52/62, 84%

50/62,81%

44/62, 71%

47/62, 76%

325/434, 75%

studies will be a considerable challenge. Indeed,perhaps some freedom to adapt QIs may be ap-propriate for certain reviews. We are concerned,however, that giving review teams too much lati-tude to interpret and adapt QIs may, in some sit-uations, result in reviews that vary considerably intheir rigor and findings.

EiNDINGS

Quality Indicators. Table 1 summarizes thenumber of studies meeting Horner et al.'s (2005)QIs for SSR, and Table 2 reports the same infor-mation for Cersten et al.'s (2005) QIs for groupexperimental research (Browder et al., 2009, andLane et al., 2009, reviewed only SSR studies).The review teams reported widely discrepantfindings as to how frequently the studies reviewedmet the QIs. The proportion of QIs met in spe-cific reviews ranged from 27% to 95% for SSRstudies and from 12.5% to 95% for group experi-mental studies. It is noteworthy that these consid-erable disparities were not associated withdifferences in the rating procedure used. That is,although they all used a dichotomous approachfor identifying the presence of SSR QIs, Lane etal. found almost three fourths of QIs absent inthe studies that they reviewed, whereas Browderet al. and Montague and Dietz (2009) indicatedthat almost all QIs were present in the studiesthat they reviewed. Moreover, both Baker et al.

(2009) and Chard et al. (2009) used a 4-pointrubric to identify the presence of QIs. However,Chard et al. found that only 25% of the QIs forgroup experiments were present in the five studiesthat they reviewed, whereas Baker et al. reportedthat 95% were present in the five group studiesthat they reviewed. The disparities in identifiedQIs may simply reflect significant variation in themethodological quality of the bodies of literaturereviewed. Since many of the QIs are not opera-tionally defined, another possibility is that reviewteams systematically varied in their interpretationof the QIs.

In comparison with the wide discrepancies ofQIs met between reviews, the variance in specificQIs present across the studies reviewed was mini-mal. Of the SSR QIs, the most frequently metwas baseline (achieved in 52 of a total of 62 SSRstudies reviewed), whereas the least frequentlymet was independent variable (41 of the 62 SSRstudies met this QI). For group experimentalstudies, the number of total studies that met a QIranged from 5 (of 12 total studies reviewed) forindependent variable/comparison condition to 7 forparticipants and outcome variable. Across the stud-ies reviewed, the SSR studies met a much higherproportion of QIs than group experiments did.This outcome may have occurred because of dif-ferences in the quality of the studies reviewed, dif-ferences in the rigor required by the two sets of

Exceptional Children

TABLE 2

Summary of Group Experimental Research Quality Indicators Rated as Present

Quality Indicator

Participants

Independent variable/comparison condition

Outcome measure

Data analysis

Total

Chard,

Ketterlin-Geller,

Baker,

Doabler, &

Apichatabutra

(2009)

1/51/52/51/5

5/20, 25%

Montague& Dietz(2009)

1/2

0/2

0/2

0/2

1/8, 12.5%

Baker,

Chard,Ketterlin-Geller,

Apichatabutra,

& Doabler

(2009)

5/54/55/55/5

19/20,95%

Total

im, 58%5/12, 42%7/12, 58%6/12, 50%25/48, 52%

QIs, or both. The disproportionately high num-ber of SSR studies teviewed, in comparison withgroup experiments, may indicate a telative dearthof group experiments in the special education lit-erature (Seethaler & Fuchs, 2005). The smallnumber of group experiments conducted in spe-cial education appears to pose a particular con-cern for those wishing to establish EBPs in thefield, given that the tesults of group experimentalresearch figure prominently in this process.

The identification of components that re-searchers addressed least often can suggest areasof focus for futute teseatchets to imptove themethodological rigor of intetvention tesearch inthe field of special education. The least fte-quently addressed component of the dependentvariahle QI fot SSR studies appears to be appto-priate documentation of IRR. Issues related toimplementation fidelity were clearly the primaryreason that studies did not meet the independentvariahle QI, with each team of reviewers report-ing that multiple SSR studies reviewed did notmeet this component. For group studies, the leastfrequently addtessed component for the interven-tion/comparison condition QI was also implemen-tation fidelity. The primary shortcoming of groupexperiments for the outcome variahle QI appearsto be not using multiple dependent measutes, atleast one of which does not tightly align with theindependent vatiable. And the sole reason thatgroup experimental studies reviewed did not meetthe data analysis QI was failure to teport effectsizes. It is impottant to note that this special issuereviewed a relatively small number of studies.

especially group experimental studies, and thatthe studies may not tepresent the latger pool ofintervention research in special education, sug-gesting that these methodological concerns maynot be generalizable.

Interrater Reliability. Unlike the proportionof QIs met, IRR did appear to vary according tothe method used to täte the presence of the QIs.Cenetally, the three teviews that categotized QIsdichotomously (i.e., present or absent) tepottedrelatively high levels of IRR. For example. Lane etal. (2009) reported 100% IRR for 15 of the 21SSR components, with only one componentfalling below 83% (IRR for the componentchange in dependent variahle is socially valid was75%). Browder et al. (2009) reported a mean IRRacross SSR QI components of 97%, with a tangeftom 83% to 100%. And Montague arid Dietz(2009) tepotted a mean IRR of 93% across QIsfor SSR studies and 77% for group studies. Incontrast. Baker et al. (2009) and Chard et al.(2009)—both of whom used a 4-point tubric torate the presence of QIs—reported IRR of .36and 62% for SSR studies and .53 and 77% forgroup studies. Although IRRs for these two re-views were much higher when allowing for 1-point disctepancies, the teliability for determiningthe presence of QIs appears to be meaningfullylower when using a 4-point tubtic, which is notunexpected, given that Chard et al. tepotted somedifficulties in discriminating between the multipletating levels. No systematic differences appear toexist fot IRR between SSR and group studies. Inthe thtee reviews that consideted both types of

3 7 8 Spring 2009

research. Baker et al. and Ghard et al. reportedhigher IRR for the group studies, whereas Mon-tague and Dietz indicated higher IRR for SSRstudies.

RECOMMENDATIONS FOR REFINING

THE PROCESS

On the basis of their experiences applying Ger-sten et al.'s (2005) and Horner et al.'s (2005) QIsand standards for determining EBPs in special ed-ucation, the review teams for this issue made anumber of recommendations for refining the pro-cess. The reviewers suggested adding some newQIs or making some of the existing QIs and theircomponents more rigorous. For example, Ghardet al. (2009) proposed requiring researchers to de-scribe the theoretical or conceptual framework forthe intervention reviewed (see also Browder et al.,2009). And Montague and Dietz (2009) advo-cated that researchers specify inclusion and exclu-sion criteria for selecting participants, assesstreatment fidelity with at least two impartial ob-servers with interrater agreement of at least 80%,and report effect sizes for SSR. Moreover, bothGhard et al. and Montague and Dietz suggestedmaking some of the desirable QIs for group ex-periments, such as documenting validity of mea-surement instruments and minimal attrition,essential QIs.

In contrast to these calls for additional ormore rigorous QIs, Lane et al. (2009) suggestedthat some of the SSR QIs might be overly rigor-ous. They recommended, for example, that re-searchers reconsider the requirements fordocumenting the instruments and process used todetermine the disability of participants and de-scribing the cost-effectiveness of the interventionin SSR studies. Lane et al. also advocated that thefield consider requiring less than 100% of com-ponents for meeting a QI, perhaps using an 80%criterion.

Review teams also noted the need for greateroperationalization of the QIs and their compo-nents. In particular, Montague and Dietz (2009)called for greater clarity with regard to what con-stitutes a typical intervention agent in SSR studies.Baker et al. (2009) provided another suggestionfor improving the ability of reviewers to determinethe presence of QIs in reports of research—furnish

opportunities for researchers, perhaps on Web siteslinked to the Journal, to give additional, detailedinformation that might otherwise go unreportedbecause of space limitations.

In regard to standards for EBPs, Lane et al.(2009) raised the issue of whether all QIs wereequally important, and if not, whether they mightbe weighted differentially in determining EBPs(see also Montague & Dietz, 2009). Montagueand Dietz also suggested that researchers mightdevelop standards for determining when to con-sider an EBP evidence-based for subpopulations(e.g., how many studies involving students with aparticular disability are necessary to demonstratethat the intervention is evidence-based for thatpopulation?).

These recommendations all appear to havemerit and warrant further consideration whilespecial educators work toward refining the processfor determining EBPs in special education. How-ever, we also advise caution in revising the QIsand the process for establishing EBPs too readilyor repeatedly. Special educators can and shouldrefine the QIs and standards, perhaps periodicallyover time, to optimize their efficiency, reliability,and validity. For example, we endorse the ideathat the QIs should be further operationalized—aprocess that the Gouncil for Exceptional Ghildrenhas undertaken (Bruno, 2007). However, no sin-gle set of QIs or standards will meet every pur-pose; and for the most part, the review teamsfound the application of the proposed QIs andstandards feasible and meaningful. When the QIsand standards have been refined and vettedthrough what we envision as an iterative but lim-ited sequence of field trials, stability and consis-tency in the QIs and standards for EBPs in thefield will be of significant importance.

C O N C L U S I O N

The authors of the five reviews in this topicalissue took on a task that posed multiple chal-lenges. The review teams not only had to system-atically review a large number of studies, but theydid so by using criteria that often required inter-pretation while they devised their own processesfor field-testing the proposed QIs and standardsfor EBPs in special education. Not surprisingly.


this process was time-consuming—Browder et al.(2009) estimated that their review team devotedmore than 400 hours to their review. The review-ers also no doubt found the review process diffi-cult because we asked them to apply the Qlsliterally. At times, literally applying the Qls mayhave seemed to highlight limitations in the re-search of respected colleagties. It is important tonote that authors of previous research wrotethe body of extant research without foreknowl-edge of the future standards of methodologicalrigor to which it might be held and that they con-formed to external requirements of the day (e.g.,little emphasis on reporting effect sizes; the per-petual space limitations in journals). Nonetheless,the results of these reviews have provided the firstlarge-scale application of the Qls and the stan-dards for EBPs in special education. We appreci-ate and applaud the work of the reviewers and thescholars who conducted the original research re-viewed, as well as the pioneering work of Cerstenet al. (2005) and Horner et al. (2005).

Collectively, the application of Qls to deter-mine high-quality group research and SSR acrossfive bodies of special education intervention re-search indicate the following:

• Approximately three quarters of the SSR Qlswere present across studies reviewed, whereasapproximately one half of group experimen-tal Qls were present.

• Considerable variability existed between re-views in the proportion of Qls met. The rat-ing procedure used did not appear to explainthis variability.

• The IRR for rating Qls varied markedly be-tween reviews, although reviews using a di-chotomous yes/no scheme for identifyingQls tended to yield adequate IRR.

Reviewers also made a number of suggestionsfor refining the Qls, such as operationalizingthem, adding and deleting particular componentsof some Qls, and weighting the Qls according totheir importance. In addition to considering theseand other technical matters (e.g.. Should reviewsbe restricted to articles published in peer-reviewedjournals?), special education leaders will need toaddress some foundational issues regarding theneed for and merits of determining EBPs in spe-

cial education so that they can garner the broadsupport of the special education community forthis process.

The philosophical objections to EBPs that wehave heard from special educators often parallelcriticisms raised regarding the advent of evidence-based medicine. As described by Sackett et al.(1996), "criticism has ranged from evidence basedmedicine being old hat to it being a dangerous in-novation, perpetrated by the arrogant to . . . sup-press clinical freedom" (p. 71). Civen thedocumented research-to-practice gap (e.g., B. C.Cook & Schirmer, 2003), the claim that EBPs areold hat seems unwarranted in special education.As for concerns that EBPs in special educationwill force instruction to conform to an approvedmenu of interventions, we believe that EBPs willnot and should not ever take the place of profes-sional judgment but can be used to inform andenhance the decision making of special educationteachers. As Sackett et al. suggested for evidence-based medicine.

Good doctors use both individual clinical ex-pertise and the best available external evi-dence, and neither alone is enough. Withoutclinical expertise, practice risks becomingtyrannised by evidence, for even excellentexternal evidence may be inapplicable to orinappropriate for an individual patient.Without current best evidence, practice risksbecoming rapidly out of date, to the detri-ment of patients, (p. 71)

Likewise, we in no way imagine evidence-based special educators being directed as to whenand in what situations they can or cannot use par-ticular teaching practices. Instead, EBPs shouldinterface with the professional wisdom of teachersto maximize the outcomes of students with dis-abilities (Cook, Tankersley, & Harjusola-Webb,2008).

EBPs should interface with the

professional wisdom of teachers to maximizethe outcomes of students with disabilities.

We concur, then, with Sackett et al.'s (1996)declaration that, "clinicians who fear top downcookbooks will find the advocates of evidence

3 8 O Spring 2009

based medicine [or special education] joiningthem at the barricades" (p. 72). However, al-though we recognize the dangers of overempha-sizing EBPs in a field premised on individualizedinstruction, we believe that special educatorswould be remiss if they did not make every effortto prioritize practices shown by our best researchto result in meaningful improvements in studentoutcomes. Identifying practices that are evidence-based for students with disabilities is a necessarybut insufficient step in a process that we hope willculminate in the consistent implementation ofthe most effective practices with fidelity, ulti-mately resulting in improved outcomes for stu-dents with disabilities.

R E F E R E N C E S

American Psychological Association. (2001). Publica-tion manual of the American Psychological Association(5th ed.). Washington, DC: Author.

Baker, S. K., Chard, D. J., Kerterlin-Geller, L. R.,Apichatabutra, C , & Doabler, C. (2009). Teachingwriting to at-risk students: The qualiry of evidence forself-regulated strategy development. Exceptional Chil-dren, 75,

Berliner, D. C. (2002). Educational research: The hard-est science of all. Educational Research, 31{8), 18-20.

Brandinger, E., Jimenez, R., Klingner, J., Pugach, M.,ßc Richardson, V. (2005). Qualitative studies in specialeducation. Exceptional Children, 71, 195-207.

Browder, D., Ahlgrim-Delzell, L., Spooner, E, Mims, P.J., & Baker, J. N. (2009). Using time delay to teach lit-eracy to students with severe developmental disabilities.Exceptional Children, 75, 343-364.

Browder, D. M., Wakeman, S. Y., Spooner, E.,Ahlgrim-Delzell, L., & Algozzine, B. (2006). Researchon reading instruction for individuals with significantcognitive disabilities. Exceptional Children, 72, 392-408.

Bruno, R. (2007). CEC's evidence based practice effortRetrieved September 29, 2008. from htrp://education.uoregon.edu/grantmatters/pdf/DR/Showcase/Bruno.ppr.

Chambless, D. L., Baker, M. J., Baucom, D. H., Beut-ler, L. E., Calhoun, K. S., Crirs-Christoph, P., et al.(1998). Update on empirically validated therapies, II.The Clinical Psychologist, 51, 3—16.

Chambless, D. L, & Hollon, S. D. (1998). Defmingempirically supported therapies. Journal of Consultingand Clinical Psychology, 66, 7-18.

Chambless, D. L., Sanderson, W C, Shoham, V., Ben-nett Johnson, S., Pope, K. S., Crits-Christoph, P., et al.(1996). An update on empirically validated therapies.The Clinical Psychologist, 49, 5-18.

Chard, D. J., Ketterlin-Celler, L. R., Baker, S. K.,Doabler, C , & Apichatabutra, C. (2009). Repeatedreading interventions for students with learning disabil-ities: Status of the evidence. Exceptional Children, 75,263-281.

Cohen, J. (1988). Statistical power analysis for the be-havioral sciences (2nd ed.). Hillsdale, NJ: Lawrence Ed-baum.

Cook, B. C , &c Schirmer, B. R. (Eds.). (2003). What isspecial about special education [Special issue]. TheJournal of Special Education, 37(3).

Cook, B. G., & Tankersley, M. (2007). A preliminaryexamination to identify rhe presence of qualiry indica-tors in experimental research in special education. In J.Crockett, M. M. Cerber, & T. J. Landrum (Eds.),Achieving the radical reform of special education: Essays inhonor of James M. Kauffman (pp. 189-212). Mahwah,NJ: Lawrence Erlbaum.

Cook, B. G., Tankersley, M., Cook, L., & Landrum, T.J. (2008). Evidence-based practices in special educa-tion: Some practical considerations. Intervention inSchool and Clinic, 44{T), 69-75.

Cook, B. G., Tankersley, M., & Harjusola-Webb, S.(2008). Evidence-based practice and professional wis-dom: Putting it all together. Intervention in School andClinic, 44{2), 105-111.

Cook, L. H., Cook, B. G., Landrum, T J., «¿Tankers-ley, M. (2008). Examining the role of group experi-mental research in establishing evidenced-basedpractices. Intervention in School and Clinic, 44{2),76-82.

Dammann, J. E., & Vaughn, S. (2001). Science andsanity in special education. Behavioral Disorders, 27,21-29.

Dudak, J. A. (2002). Evaluating evidence-based inter-ventions in school psychology. School Psychology Quar-terly, 17, 475-482.

Elliott, R. (1998). Editor's introduction: A guide toempirically supported treatments controversy. Psy-chotherapy Research, 8, 115-125.

Forness, S. R., Kavale, K. A., Blum, I. M., & Lloyd, J.W. (1997). What works in special education and re-lated services: Using meta-analysis to guide practice.TEACHING Exceptional Children, 29, 4-9.


Gallagher, D. J. (1998). The scientific knowledge baseof special education: Do we know what we think weknow? Exceptional Children, 64, 493-502.

Gersten, R., Fuchs, L. S., Compton, D., Coyne, M.,Greenwood, C , & Innocenti, M. S. (2005). Qualityindicators for group experimental and quasi-experi-mental research in special education. Exceptional Chil-dren, 71, 149-164.

Horner, R. H., Carr, E. G., Halle, J., McGee, G.,Odom, S., & Wolery, M. (2005). The use of single-subject research to identify evidence-hased practice inspecial education. Exceptional Children, 71, 165—179.

Individuals With Disabilities Education Act, 20 U.S.C.§ 1400 et seq. (2004).

Kauffman, J. M. (1996). Research to practice issues.Behavioral Disorders, 22, 55—60.

Kendall, P. C. (1998). Empirically supported psycho-logical therapies. Journal of Consulting and Clinical Psy-chology, 66, 3-6.

Kingsbury, G. G. (2006). The medical research model:No magic formula. Educational Leadership, 63{6),79-82.

Kratochwill, T R., & Stoiber, K. C. (2002). Evidence-based interventions in school psychology: Conceptualfoundations of the Procedural and Coding Manual ofDivision 16 and the Society for the Study of SchoolPsychology Task Force. School Psychology Quarterly, 17,341-389.

Lane, K. L., Kalberg, J. R., & Shepcaro, J. C. (2009).An examination of the evidence base for function-basedinterventions for students with emotional or behavioraldisorders attending middle and high schools. Excep-tional Children, 75, 321-340.

Levin, J. R. (2002). How to evaluate the evidence ofevidence-based interventions. School Psychology Quar-terly, 17, 483-492.

Lloyd, J. W, Pullen, P C, Tankersley, M., & Lloyd, PA. (2006). Critical dimensions of experimental studiesand research syntheses that help defme effective prac-tices. In B. G. Gook & B. R. Schirmer (Eds.), What isspecial about special education: The role of evidence-basedpractices (pp. 136-153). Austin, TX: PRO-ED.

Montague, M., & Dietz, S. (2009). Evaluating the evi-dence base for cognitive strategy instruction and math-ematical problem solving. Exceptional Children, 75,285-302.

Nelson, J. R., & Epstein, M. H. (2002). Report on evi-dence-based interventions: Recommended next steps.School Psychology Quarterly, 17, 493-499.

No Child Left Behind, 20 U.S.C. § 16301 et seq.(2001).

Odom, S. L., Brantlinger, E., Gersten, R., Horner, R.H., Thompson, B., & Harris, K. (2004). Quality indi-cators for research in special education and guidelines forevidence-based practices: Executive summary. RetrievedSeptember 29, 2008, from education, uoregon.edu/grantmatters/pdf/DR/Exec_Summary.pdf

Odom, S. L., Brandinger, E., Gersten, R., Horner, R.H., Thompson, B., & Harris, K. R. (2005). Researchin special education: Scientific methods and evidence-based practices. Exceptional Children, 71, 137—148.

Rumrill, P D., & Cook, B. G. (Eds.). (2001). Researchin special education: Designs, methods and applications.Springfield, IL: Charles C Thomas.

Sackett, D. L., Rosenberg, W. M. C , Gray, J. A. M.,Haynes, R. B., & Richardson, W. S. (1996). Evidencebased medicine: What it is and what it isn't. BritishMedical Journal, 312, 71-72.

Schoenfeld, A. H. (2006). What doesn't work: Thechallenge and failure of the What Works Clearinghouseto conduct meaningful reviews of studies of mathemat-ics curricula. Educational Researcher, 35(2), 13-21.

Seethaler, P M., & Fuchs L. S. (2005). A drop in thebucket: Randomized controlled trials testing readingand math interventions. Learning Disabilities Researchand Practice, 20(2), 98-102.

Simmerman, S., & Swanson, H. L. (2001). Treatmentoutcomes for students with learning disabilities: Howimportant are internal and external validity? Journal ofLearning Disabilities, 34, 221-236.Slavin, R. E. (2008). Evidence-based reform in educa-tion: Which evidence counts? Educational Researcher,37, 47-50.

Stoiber, K. C. (2002). Revisiting efforts on construct-ing a knowledge base of evidence-based interventionwithin school psychology. School Psychology Quarterly,17, 533-546.

Tankersley, M., Cook, B. G., & Cook, L. (2008). Apreliminary examination to identify the presence ofquality indicators in single-subject research. Educationand Treatment of Children, 31(4), 523-548.

Task Force on Evidence-Based Interventions in SchoolPsychology. (2003). Procedural and coding manual forreview of evidence-based interventions. Division 16 of theAmerican Psychological Association. Retrieved fromwww.indiana.edu/'-ebi/documents/_workingfiles/EBImanuall.pdf

Task Force on Promotion and Dissemination of Psy-chological Procedures. (1995). Training in and dissemi-nation of empirically-validated psychologicaltreatments. The Clinical Psychologist, 48, 3—23.

Spring 2009

Test, D. W., Fowlet, C. H., Btewet, D. M., & Wood,W. M. (2005). A content and methodological review ofself-advocacy intervention studies. Exceptional Children,72, 101-125.

Thompson, B., Diamond, K. E., McWilliam, R., Sny-der, P., & Snyder, S. W. (2005). Evaluating the qualityof evidence from correlational research for evidence-based practice. Exceptional Children, 71, 181-194.Viadero, D., & Huff, D. J. (2006). "One stop" researchshop seen as slow to yield views that educators can use.Education Week, 26(5), 8-9.

Waehler, C. A., Kalodner, C. R., Wampold, B. E., &Lichtenberg, J. W. (2000). Empirically supported treat-ments (ESTs) in perspective: Implications for counsel-ing psychology training. Counseling Psychologist, 28,657-671.

Wampold, B. E. (2002). An examination of the basesof evidence-based interventions. School PsychologyQuarterly, 17, 500-507.

Wanzek, J., & Vaughn, S. (2006). Bridging the re-search-to-practice gap: Maintaining the consistent im-plementation of research-based practices. In B. G.Cook & B. R. Schirmer (Eds.), What is special aboutspecial education: The role of evidence-based practices (pp.165-174). Austin, TX: PRO-ED.

What Works Clearinghouse. (2008). What Works Clear-inghouse evidence standards for reviewing studies. Re-trieved September 23, 2008, from http://ies.ed.gov/ncee/wwc/pdf/study_standards_fmal.pdfWhat Works Clearinghouse, (n.d.a). Welcome to WWC.Retrieved September 23, 2008, from http://ies.ed.gov/ncee/wwc/

What Works Clearinghouse, (n.d.b). What Works Clear-inghouse intervention rating scheme. Retrieved Septem-ber 23, 2008, from http://ies.ed.gov/ncee/wwc/pdf/rating_scheme.pdf.

ABOUT THE A U T H O R S

BRYAN G. COOK (CEC HI Federation), Profes-sor, Department of Special Education, Universityof Hawaii, Honolulu, MELODY TANKERSLEY(CEC OH Federation), Professor, Department ofSpecial Education, Kent State University, Kent,Ohio. TIMOTHY J. LANDRUM (CEC VA Fed-eration), Senior Scientist, Department of Cur-riculum, Instruction, and Special Education,University of Virginia, Charlottesville.

Address correspondence to Bryan G. Cook, Uni-versity of Hawaii at Manoa, College of Educa-tion, Department of Special Education, 1776University Ave., Wist Hall 117, Honolulu, HI96822 (e-mail: [email protected]).

The authors thank the Division for Research ofthe Council for Exceptional Children for theirsupport of this work and for its leadership inidentifying and applying evidence-based practicesin special education.

Manuscript received June 2008; accepted Septem-ber 2008.


Determining Evidence-Based Practices in Special · PDF fileVol. 75. No. 3, />/>. 365-383....

Documents

Transcript of Determining Evidence-Based Practices in Special · PDF fileVol. 75. No. 3, />/>. 365-383....