Classworks Universal Screeners

30
Validity and Reliability of Classworks Universal Screeners Updated May 2018

Transcript of Classworks Universal Screeners

Page 1: Classworks Universal Screeners

Validity and Reliability of

Classworks Universal Screeners

Updated May 2018

Page 2: Classworks Universal Screeners

Research on Validity and Reliability of Classworks Universal Screeners • 2

Table of Contents

ExecutiveSummary..................................................................................................................................3–4

TestDesign...............................................................................................................................................5–8

VerticalScaleandItemBankCalibration

ScoreReporting

EstablishingCutScores

ItemDevelopment.................................................................................................................................9–10

GuidingPrinciplesofItemConstruction

TestValidation.....................................................................................................................................11–12

FieldTestingandAnalysis

NationalCenterforResponsetoInterventionReview........................................................................13–14

Reliability

Validity

ClassificationAnalyses

Addendum:ClassworksUniversalScreenersUpdate………………………………………………………………………….15

Page 3: Classworks Universal Screeners

Research on Validity and Reliability of Classworks Universal Screeners • 3

Executive Summary

Purpose ClassworksUniversalScreenersareformalassessmentsusedtomeasurereadinessforgradelevelinstruction,helpidentifybaselinelearninglevels,andmeasuregrowth.TheUniversalScreenerswerespecificallydesignedforthepurposeofscreeningstudentswhomayneedadditionalinterventionandcanbeusedaspartoftheResponsetoIntervention(RtI)process.

Inadditiontoreportinganoverallscaledscorebasedonthetotaltest,Classworksprovidesstudentstrengthsandweaknessforkeystrands.Keystrandsincludeaminimumoffourtestquestionstoprovideareasonableestimateofstudentstrengthsandweaknesses.Thisinformation,whenusedinconjunctionwithotherdatasuchasHighStakesTestresultsandclassroomperformance,canhelpprovideastartingpointfordeterminingnextsteps.

Overview ClassworksUniversalScreenersincludemultipleformsateachlevelforlanguageartsandmathematics,gradesK–10.TheUniversalScreenersaretypicallyadministeredthreetimesayear:atthebeginningoftheschoolyeartoassessreadinessforinstructionforallstudents,mid-yeartomeasureprogressforRtItiersIIandIII,andend-of-yeartomeasureoverallgrowthfortheyear.Giventhatthetestisprimarilydesignedtoidentifyreadiness,thetestincludesmultiplegradelevelsofcontenttoallowsufficientreachforstudentswhomaybestruggling.

TheUniversalScreenersarebetween20and35itemsinlengthdependingonthegradeleveltargeted,andmustbeadministeredinasinglesitting.TwoparallelformsofeachScreenerweredeveloped;theseformsmeasuresimilarcontent.Thekindergartenlevelassessmentsareanexceptiontothisapproach,withtwodifferentformsreflectingearlierandlaterkindergartencontentgiventherapiddevelopmentatthekindergartenlevel.

Overalltestresultsarereportedasascaledscore.Scoringonaverticalscaleprovidesasinglepointofreferencetocompareindividualstudentgainsfromonetestadministrationtothenext,withinandacrossschoolyears.Measuringgrowthverticallyservesadualpurpose:totracklearninggainsforindividualstudentsandtodeterminewhetherlearningmustbeaccelerated.

ClassworksUniversalScreenershavebeenevaluatedbytheNationalCenterforResponsetoIntervention(NCRTI),andtheyreceivedthehighestreliabilityranking.

Page 4: Classworks Universal Screeners

Research on Validity and Reliability of Classworks Universal Screeners • 4

Universal Screener Quick Guide

Item Description

Purpose Measure grade level readiness, help identify baseline, measure growth

Grades K–10 Math, K–10 Reading

Levels of coverage per test

Test includes multiple grade levels of content to allow sufficient reach to help identify strugglers (exception: Kindergarten)

Audio Audio support available for all grades

Length of test Must be taken in one sitting; 20–35 items depending on grade level/subject

Vertical scale? Yes. All scores are vertically scaled from K–10 for longitudinal tracking.

Output from test Average readiness scaled score of students by class, teacher, custom group, demographic, and/or grade level

Page 5: Classworks Universal Screeners

Research on Validity and Reliability of Classworks Universal Screeners • 5

Test Design SEGMeasurement(SEG)hasbeeninstrumentalinthedesign,development,testing,andanalysisofClassworksUniversalScreeners.SEGisanassessment,measurement,andresearchfirmthatprovidesassessmentdesign,development,andimplementationservicesforK–12,highereducation,andcredentialingprograms.Theyhavedeliveredover100millionassessmentstotensofthousandsofschoolsandcollegesinall50states.

ClassworksUniversalScreenersweredesignedandbuiltfortheparticularpurposetheyserve.Forthisreason,theymeetallofthecriteriathatdefinequalityscreeners:theassessmentsarebrief,reliable,valid,equated,andmeasuredonaverticalscale.

SEGinitiallycreatedtheassessmentsbyhand-selectingitemsforeachlevelandformofthetests.Formswerethenequatedthroughfieldtestingandcalibrationsothateachmeasuresthesamesetsofskillsatthesamelevelofdifficulty.Individualtestitemsandtheassessmentsthemselvesweredesignedwithdiversityinmind:includingpopulationsofculturalandlinguisticallydiversestudents,andspecialneedsstudents.Guidingprinciplesforassessmentdesignwereintegratedintotheprocess,includingensuringallitemsarewritteninaclear,concisemannerandfreeofage,gender,ethnic,religious,ordisabilitybias.

TherearetwoparallelformsforeachtestingradesK–10.For2ndgradeandabove,thetestquestionsincludecontentfromthetargetgradelevelaswellasfromtwogradelevelsbelowthetarget.Giventhatthetestisprimarilydesignedtoidentifyreadiness,thetestincludesmultiplegradelevelsofcontenttoallowsufficientreachandenoughcontentcoverageforstudentswhomaybestruggling.Thetestsincludeapproximately50%ofthecontentfromthetargetgrade,approximately25%ofthecontentfromthegradebelow,andapproximately25%ofthecontentfromtwogradesbelow.The1stgradeassessmentcontainscontentfromboth1stgradeandkindergarten.Thekindergartenassessmentcontainscontentdrawnonlyfromkindergartenwithtwodifferentformsreflectingearlierandlaterkindergartencontent,giventherapiddevelopmentatthekindergartenlevel.

Page 6: Classworks Universal Screeners

Research on Validity and Reliability of Classworks Universal Screeners • 6

Grade Level Number of Test Questions

Scored Number of Test Forms

Source of Test Questions

K Early 15 Reading/Language Arts; 15 Mathematics

1 100 % early K content

K Late 15 Reading/Language Arts; 15 Mathematics

1 50% later K content; 50% early K content

Grade 1 20 Reading/Language Arts; 20 Mathematics

2 50% grade 1 content; 50% grade K content;

Grade 2 25 Reading/Language Arts; 25 Mathematics

2 50% grade 2 content; 25% grade 1 content; 25% grade K content

Grade 3 25 Reading/Language Arts; 25 Mathematics

2 50% grade 3 content; 25% grade 2 content; 25% grade 1 content

Grade 4 25 Reading/Language Arts; 25 Mathematics

2 50% grade 4 content; 25% grade 3 content; 25% grade 2 content

Grade 5 30 Reading/Language Arts; 30 Mathematics

2 50% grade 5 content; 25% grade 4 content; 25% grade 3 content

Grade 6 30 Reading/Language Arts; 30 Mathematics

2 50% grade 6 content; 25% grade 5 content; 25% grade 4 content

Grade 7 30 Reading/Language Arts; 30 Mathematics

2 50% grade 7 content; 25% grade 6 content; 25% grade 5 content

Grade 8 30 Reading/Language Arts; 30 Mathematics

2 50% grade 8 content; 25% grade 7 content; 25% grade 6 content

Grade 9 30 Reading/Language Arts; 30 Mathematics

2 50% grade 9 content; 25% grade 8 content; 25% grade 7 content

Grade 10 30 Reading/Language Arts; 30 Mathematics

2 50% grade 10 content; 25% grade 9 content; 25% grade 8 content

Page 7: Classworks Universal Screeners

Research on Validity and Reliability of Classworks Universal Screeners • 7

Vertical Scale and Item Bank Calibration Theverticalscalewasdevelopedthroughalinkedtestingdesignsuchthatallitemscouldbecalibratedtogetherandplacedonthesamecontinuum.Thefieldtestdatawasusedtocalibratetheitemsandtests.Calibrationisaprocessthatplacesalltestsandalltestitemsonacommonscale.ThiswasusedtocreateasinglecommonscalefromgradeKtograde10.Inthisway,scoresfromthetestsarecomparableacrossformsofthetestandovertime.Agivenscorewillhavethesamemeaningregardlessofwhichformisadministeredandregardlessofwhenthestudenttakesthetest.

Theassessmentsdevelopedincludesetsofoverlappingitemsacrosstestformsatthesamelevelandacrossadjacentgradelevels.Thisfacilitatesthecalibrationoftheitembank.SEGcalibratedtheitemsusingIRT(oneparameterRaschmodel)tocreateacommonverticalscaleacrossgradelevels.

TherawnumberofcorrectanswersreflectsaparticularRaschscore(rangingfrom-4to+4),whichisthentranslatedtothefinalscaledscoreforreportingpurposes.Whenthestudentcompleteshis/herscreener,thescaledscoreandkeystrandlevelperformancefeedbackareimmediatelyavailableforreporting.TheapproachtakeninthecalibrationandscoringprocessprovidesRaschextrapolatednorms.

Asafurthermeasuretoensurethatthetestquestionsandassessmentsaretechnicallysoundandareperformingasexpected,SEGanalyzesthedatafromthefalltesttakerseachyear.

CurriculumAdvantagereviewstheresultsfromthefalltomakesurethetestsareperformingwell.SEGexaminesthestatisticsforthetestsasawhole(e.g.,averagescores,distributionofscores)andthestatisticsforindividualtestitems(e.g.,questiondifficultyandtheabilityofthequestiontodistinguishbetweendifferentlevelsofstudentperformance).Basedonthisanalysis,CurriculumAdvantagefurtherrefinesthetests,revisingandreplacingquestionsasnecessary.

Duringthe2014-2015itemanalysis,CurriculumAdvantagemadethedecisiontoupdatetheUniversalScreener.Newitemswerecreatedandfieldtestedduringthe2015-2016schoolyearandofficiallyaddedtotheassessmentforthe2016-2016schoolyear.TheUniversalScreenerupdatesoverview,goal,andconstraintscanbefoundonaddendumI.

Score Reporting ScoreReportingisdesignedtoprovidereliableinformationusefulforunderstandingoverallstudentreadinessandestimatedstudentstrengthsandweaknessesinspecificstrandsmeasuredbythetest.Scoresarebasedonscaledscoresthatallowallteststobeplacedonacommonscaleregardlessofwhichformisadministeredandatwhatgradelevel.Resultsarereportedatthetotaltestandkeystrandlevel.Strandsassessedvarybygradelevelandsubjectoftheassessment.Thisapproachprovidesareasonablebalancebetweentheneedforinformationonstudentstrengthsandweaknessestheneedforsufficientscorereliability.

Rawscoresarecalculatedasthetotalnumberofitemsansweredcorrectlyonthescreener.Performanceontheassessmentsisreportedasascaledscoreonaverticalscalerangingfrom200to800spanningacrossgradesK–10.Feedbackisalsoprovidedatthekeystrandlevel.(seeVerticalScaleandItemBankCalibrationabove).

Thesestrandsweredeterminedbasedonananalysisofover31statestandardsandthenre-examinedwiththeintroductionoftheCommonCoreStateStandards.CrosswalksareavailabletoshowtherelationshipbetweentheClassworksstrandsandthesestatestandards.

Reading:

Page 8: Classworks Universal Screeners

Research on Validity and Reliability of Classworks Universal Screeners • 8

• Grammar/Usage/Mechanics

• ReadingComprehension

• StudySkills

• WordAnalysis

• Writing

• WritingProcess

Math:

• Algebra

• Geometry

• MathematicalProcesses

• Measurement

• Numeration

• Operations

• Patterns

• StatisticsandProbability

Strandsthatarereportedarerequiredtoincludeaminimumoffourtestquestionstoprovideareliableestimateofstudentstrengthsandweaknesses.

CurriculumAdvantageestablishesscorerangesthatreflectlevelsofstudentreadinessontheassessments.Therearevariousapproachesthatcanbeusedtoidentifyappropriatecutpointsdefininglevelsofreadiness.BelowdetailsthemethodSEGrecommendedforcreatingappropriatecutpoints.

Establishing Cut Scores ThecutscoresforClassworksUniversalScreenersforgrades3–8wereestablishedusingatwo-stagestandardsettingprocess.Inthefirststage,aBookMarkingProcedure(CizekandBunch,2007)wasapplied.Thiswasfollowedbyasecondstage,inwhichthestageonepotentialcutscoreswerereviewedinlightofstudentperformancedataandexpectationsforstudentperformance.

TheBookMarkingProcedureisanitemmappingapproachtostandardsettingdevelopedinthe1990’s(CizekandBunch,2007).TheBookMarkingProcedureasemployedforClassworksinvolvesthereviewofanorderedtestbookletcontainingalltheitemsforagiventestarrangedinorderofdifficultyfromeasiesttohardest(Mitzel,H.C.,Lewis,D.M.,Patz,R.J.,andGreen,D.R.,2001).ThedifficultyvaluesforthisprocedurewereobtainedfromtheRaschitemcalibrationsobtainedfromtheoriginaldevelopmentofthescreeners.BasedontheproceduressuggestedbyMitzel,etal(2001),contentexpertsreviewedtheordereditembookletandwereaskedtoidentify(“bookmark”)theitemrepresentingthefirstitemforwhichtheminimallyproficientstudentwouldbeunlikelytoanswertheitemcorrectly(lessthan50%probability).Thedifficultyoftheitemidentifiedservedasthepotentialcutscoreemergingfromstageoneofthestandardsetting.

Inthesecondstage,thepotentialcutscoresproducedinstageoneoftheprocesswerereviewedagainstthedistributionofscoresfromoperationaltestingtoevaluatethenumberandpercentageof

Page 9: Classworks Universal Screeners

Research on Validity and Reliability of Classworks Universal Screeners • 9

studentsthatwould“pass”andthenumberandpercentageofstudentsthatwould“fail”theassessmentbasedonthestageonepotentialcutscores.Insomecases,thestageonepotentialcutscorewasraisedorloweredbasedontheimpactratesorexpectedperformanceforthestudents.

Item Development TheClassworksassessmentitembankwasdevelopedbyateamofcontentexpertsfromathird-partydeveloper,aleaderinthecreationofhigh-stakescontentforassessmentsproducedbystatesandtestingcompanies.Thetestitemshavebeenreviewedandrefinedthroughamulti-stepprocessinvolvingmembersofthistestdevelopmentteam.

TheUniversalScreenersarecomposedof100%four-response-optionmultiple-choicetypequestions.TheitemswerespecificallydevelopedfortheUniversalScreenerorwereselectedandmodifiedfromtheexistingCurriculumAdvantageitembank.

Guiding Principles of Item Construction Inordertoensureitemreliabilityandvalidity,guidingprincipleswereusedintheitemconstructionprocess.

ItemConstruction:

• Itemsarewritteninclear,conciselanguageattheappropriategradelevel

• Itemsarewrittenwithoutage,gender,ethnic,religious,ordisabilitybias

• Eachitemsetmeasuresbothbasicknowledgeandhigher-orderthinkingskills

• Itemsadheretotheobjectivesbeingassessed

• Itemsareconstructedinaconsistentmanner

• Itemcontentiscurrentandrelevanttoaudience

• Itemsarewrittenintheformofquestions,avoidingopenendedornegativestems

ItemResponseMeasurement:

• Itemsshowconsistencyofstudentresponse

• Resultscanbegeneralizedtothepopulation

• Itemsarecalibratedtoensurethatscoreshavesimilarmeaningovertime

• Aftercalibration,itemsareplacedonadevelopmental/verticalscaletoallowfortheaccuratecomparisonofstudentsovertimeandacrossuseoftheitems

• Studentperformancecanbepredictedfromitemresponse

• Targetgoalsandnormscanbedevelopedfromitemresponsemeasures

Questions/Stems:

• Stemsandreadingpassageswillbeatgrade-levelreadabilityandmustassesstheskillbeingtestedaccordingtothelevelofBloom’sindicated

• Stemsarefreeofage,gender,ethnic,religious,ordisabilitystereotypesorbias

Page 10: Classworks Universal Screeners

Research on Validity and Reliability of Classworks Universal Screeners • 10

• Stemsarewritteninquestionformatanddonotrequiresentencecompletion,true/false,andfill-in-the-blank

• Eachstemhasonlyonecorrectanswer

Answers/Distractors:

• Answersarepresentedinamultiple-choiceformatwithfouransweroptions

• Distractorsarewritteninalogicalorder(alphabetical,chronological)

• Distractorsareapproximatelythesamelengthandmustbegrammaticallyparallel

• Distractorsareplausibleandshouldnotcontaingrammaticalclues

• Distractorsaddressavarietyofcommonerrorsratherthanthesameerror

• Distractorrationaleisprovidedforeachanswerchoice

Thetestitemsaremultiple-choicequestions,offeringanefficientandreliablewaytoassessstudents’knowledgeandskills.Allitemshaveonesinglebestanswerandresponsesarescoredascorrectorincorrect.Multiplechoicemeasureshaveadvantagesoverothertypesofitemresponse,inthattheyarecapableofcoveringalargeamountofcontentinarelativelyshortperiodoftime.Moreover,theycanachievehighlevelsofreliability,providinguserswithaconsistentandstablemeasureofstudentknowledgeandskillsovertime.

Page 11: Classworks Universal Screeners

Research on Validity and Reliability of Classworks Universal Screeners • 11

Test Validation Followingthecreationofthetests,SEGconductedasecondverificationoftheassessmentitems.Theverificationprocessconsistedofacomprehensivealignmentreviewtoestablishthevalidityoftheassessmentitemsandtodetermineiftheywereaccuratelyalignedtotheobjectivestheypurporttomeasure.

CurriculumAdvantagecontinuestopartnerwithSEGtoensurethattheteststhemselves,aswellasassessment-relateddecisions,arepsychometricallysound.Thisongoingprocessincludesfurtherstatisticalanalysis,itemcalibration,adjustmentstothecutscoresontheverticalscale,andoverallevaluationofthequalityofClassworksUniversalScreeners.

Field Testing and Analysis Toensurethatthetestitemsandassessmentsarepsychometricallysound,SEGanalyzedtheitemandtestperformancedatabasedonthefieldtesttobeconductedbyCurriculumAdvantageinthefallof2009,thefallof2010andthefallof2011.CurriculumAdvantagecollectedinformationfromapproximately200–300studentspertestformthefirstyear,withexponentialincreasesineachofthefollowingyears.SEGanalyzestheresultseachyear,providingbothtestanditemlevelanalysesincluding:

• Overalltestandsubteststatistics

o Mean

o StandardDeviation

o Reliability

o SEM(StandardErrorofmeasure)

o OverallModelFit

o FrequencyDistribution

• Itemstatistics

o PValue(percentcorrect)

o Pointbiserialcorrelation(measureofitemdiscrimination)

o Logitvaluefrom-3to+3(personanditemindependentmeasureofitemdifficulty)

o ItemInfitstatistic

o ItemOutfitstatistic

SEGreviewstheitemstatistics,andanyitemthatdoesnotdemonstratesuitablepsychometriccharacteristicsarerecommendedforreplacement.Thesestatisticshelpensureon-goingrelevanceandvalidity.

Page 12: Classworks Universal Screeners

Research on Validity and Reliability of Classworks Universal Screeners • 12

HerearesomeofthestatisticsSEGcalculates:TotalTestStatistics

• AverageScoreontheAssessment–SEGcomputestheaverage(mean)scoreachievedbystudentstakingtheassessment.Thishelpsusdetermineiftheassessmentisproperlytargetedtothelevelofthestudentsassessed.

• VariationandDistributionofScoresontheAssessment–SEGcalculatestheamountofvariability(standarddeviation)inthetestscoresachievedbystudentstakingtheassessment.Thisisanotherindicatorofhowwellthetestistargetedtothelevelofstudentsassessed.

• Reliability–SEGcomputesthereliabilityofthetesttoensurethatthetestisconsistentlymeasuringtheknowledgeandskillsmeasuredbytheassessmentacrossformsofthetestandisstableovertime.

• ScoreAccuracy–Anyassessmentscoreissubjecttovariationwhenastudenttakesthetestmultipletimes.SEGestimatestheamountofvariationexpectedforastudentscore(StandardErrorofMeasure;SEM);thisisanindicatorofscoreaccuracy.

IndividualQuestionStatistics

• QuestionDifficulty–SEGcomputesthepercentageofstudentswhoanswerthequestionscorrectly;thisisanindicatorofthedifficultyofthequestion

• QuestionDifferentiation–SEGcomputestherelationshipbetweenstudentperformanceoneachindividualquestionandtheassessmentasawhole;thisisanindicatorofhowwellthequestiondifferentiatesbetweenthosestudentswhohavetheknowledgeandskillsmeasuredbytheassessmentandthosewhodonothavetheknowledgeandskills.

Page 13: Classworks Universal Screeners

Research on Validity and Reliability of Classworks Universal Screeners • 13

National Center for Response to Intervention Review TheNationalCenterforResponsetoInterventionuseddatacollectedduringthe2009–2010and2010–2011schoolyearstofurtherevaluatethequalityofClassworksUniversalScreeners.FollowingtheimplementationofthefinalUniversalScreenerforms,performanceonthescreenersandhigh-stakestestswereusedtoinvestigatethevalidityandclassificationaccuracyoftheUniversalScreeners.

Reliability Testreliabilityreferstothetestscoreconsistencyandaccuracy.Reliabilityvaluesrangefrom0to1.00,withhighervaluesindicatinghigherreliability.Usingthedatacollectedfromthemulti-statefieldtest,theaveragereliabilityforUniversalScreenersforreadingfromgradesK–10wasfoundtobe0.90.FormathematicsingradesK–10,theaveragereliabilitycoefficientis0.88.ThesehighinternalconsistencymeasuresindicatethattheUniversalScreenersareabletoprovideareliablemeasureofstudentperformanceinreadingandmathematics.

Validity Testvalidityreferstotheappropriatenessofthetestsforitsintendedpurpose.Evidenceforvalidityofthetestsisgatheredfromtheitemdevelopmentandtestdevelopmentprocessaswellasstatisticalanalyses.

ClassworksUniversalScreenerswerespecificallydesignedforthepurposeofscreeningstudentswhomayneedadditionalintervention.TheitemsandtestshavebeenfieldtestedandevaluatedusingItemResponseTheorytoensurethattheitemsandtestsareperformingasexpected.TherigorousprocessesfollowedforitemandtestdevelopmentprovidesupportforthecontentvalidityoftheUniversalScreeners.

PerformanceontheUniversalScreenershasbeencomparedtootherhigh-stakesteststoensurethatperformanceontheUniversalScreenersisconsistentwithperformanceonotherassessments.Duringthe2010–2011schoolyear,ClassworksUniversalScreenerdataandhigh-stakestestdatafromover11,300studentsinalargesouthernstatewerecollectedtoevaluatethecorrelationbetweentheUniversalScreenerscoresandthehigh-stakestestscores.

RulesofThumb–Armstrong(2006),reiteratingtherecommendationsofSmith(1984)suggeststhefollowingrulesofthumbforvaliditydataexaminingonemeasureofaconstructinrelationtoanothermeasureofthatconstruct:

• Over.50excellent

• .40to.49good

• .30to.39acceptable

• Lessthan.30poor

Onaverage,thecorrelationbetweentheClassworksUniversalScreenerscoresandthehigh-stakestestscoreswas0.46formathematicsand0.63forreading.Further,thescreenerswerefoundtoagreewithothermeasuresinclassifyingstudentsas“notat-risk”93%ofthetimeinmathematics,and97%ofthetimeinreading.ThesecorrelationsbetweentwotestsmeasuringsimilarconstructssupporttheconstructvalidityoftheinterpretationoftheUniversalScreenerscores.

Page 14: Classworks Universal Screeners

Research on Validity and Reliability of Classworks Universal Screeners • 14

Classification Analyses Inadditiontothereliabilityandvalidityofthemeasures,theUniversalScreenerswerealsoevaluatedwithregardtotheaccuracyofclassifyingstudentsasat-riskincomparisontoanindependentmeasure.Itisimportantthatthescreenersareabletoappropriatelyidentifystudentswhoareat-riskandthosewhoarenotat-risk.Inparticular,itiscriticalthatat-riskstudentsareproperlyidentifiedasbeingat-risktogettheinstructionalhelpthattheyneed.

Inordertoevaluatetheclassificationaccuracy,ClassworksUniversalScreenersclassificationswerecomparedtotheclassificationsdeterminedbyperformanceonhigh-stakesstateassessmentsinreadingandmath.Thecomparisonsprovidedaclassificationofstudentsintooneoffourcellsina“confusionmatrix.”Studentscouldbeclassifiedasat-riskornotat-riskbasedonthepassingstatusforeachofthetwoassessmentsasPass-Pass,Pass-Fail,Fail-Pass,orFail-Fail.Theclassificationanalyseswereperformedbyevaluatingsensitivityandspecificity.

Negativepredictivepowerisameasurethatestimatestheaccuracyofclassifyingstudentsas“notat-risk.”Ausefulscreeningtoolshouldhaveveryhighnegativepredictivepowersuchthatat-riskstudentsarenotmisidentifiedasnotbeingat-risk.Usingtestdataformorethan11,300students,theUniversalScreenerswerefoundtohave93%and97%negativepredictivepowerformathandreading,respectively.

Page 15: Classworks Universal Screeners

ClassworksUniversalScreeners Update TechnicalReport

ThisdocumentprovidesasummaryofthetaskscompletedinupdatingtheClassworksUniversalScreenersforreadinginmathingradesK–HighSchool.

August2016

Page 16: Classworks Universal Screeners

1©SEGMeasurement.

ContentsOverview......................................................................................................................................................2

GoalsandConstraints...............................................................................................................................2

Tasks.........................................................................................................................................................2

Investigatingstrandsandexpectationsofotherassessments.............................................................3

FinalizingplansfortheupdatestobemadetotheUniversalScreeners.............................................6

Selectingitemsforreplacement.........................................................................................................10

Developingnewitems........................................................................................................................10

Producingnewitems..........................................................................................................................11

Creatingfieldtestformsandadministeringthefieldtest..................................................................11

Analyzingfieldtestdata.....................................................................................................................14

Evaluatingfinaltestformsandscoring..............................................................................................14

Page 17: Classworks Universal Screeners

2©SEGMeasurement.

OverviewThisdocumentprovidessupportinginformationregardingtheupdatestotheClassworksUniversalScreenersinReadingandMathforgradesK–10thatwillbeinplaceofficiallyforthe2016-2017schoolyear.ThisdocumentprovidesthefinalplansthatwereexecutedbetweenMarch2015andJuly2016andprovidesstatisticalinformationregardingtheitemsandforms.

GoalsandConstraintsThegoalsforthisprojectweretomodifytheUniversalScreenerstobemorereflectiveofthelatestmultiplechoiceitemsandexpectationsofstudentsinK-12education,whileatthesametimekeepingtheClassworksScreenersconsistentwiththecurrentforms.Itwasagreedthatthisprojectwouldincludethedevelopmentof125newReadingand125newMathitemsforuseinthenewUniversalScreenerforms.Furthergoalsandguidelinesarenotedbelow.

• Allitemswillbefour-choicemultiplechoicewithasinglecorrectanswer.• Therewillbenoaudioorvideopassagesassociatedwiththeitems.

o GradesK-2formswillhavetext-to-speechsupport.Anynewitemswrittenwillgetthisappliedsotheentirefieldtestformhasthissupport.

o Theremaybeconsiderationforapassagetobeavideooraudioclip,ifitcanbefullyownedandhosted(soastoavoidlinksexpiring)andifthedeliverycansupportit.

• Thetestlengthswillremainconsistentwiththecurrenttestlengthsofscoreablecontent(fieldtestlengthswillbelonger).

• Themajorityoftheitemsonthecurrentscreenerswillremainonthenewscreeners.• Thenewcontentshouldbeasseamlessaspossiblewiththecurrentcontent.• Aswiththecurrentforms,anystrandwithatleast4itemswillbeconsideredakeystrandand

willbelinkedtoinstructionalcontent.• AlloftheitemsmustaligntothecurrentClassworkscontenthierarchy.

(subject/grade/strand/skill/objective–listedintheAppendix)o Therewillbenochangesto,combinationsof,oradditionstothestrands,skills,or

objectives.• Fieldtestformswillincludetheentirecurrentformplusnewitemsforfieldtesting.Scoring

duringthefieldtesttimeperiodwillcontinuetobebasedonthescoreditemsonthecurrentforms.

• ReportingonstudentperformanceonthefinalnewScreenerswillneedtobeabletobecomparabletohistoricalperformanceonpriorforms,whetherbyusingthesamescaleorprovidingatranslationofnewtooldscoringforcomparison.

• Theupdatestotheformsshouldbeasseamlessaspossible.TasksThefollowingkeytasksinvolvedinupdatingtheUniversalScreenersaresummarizedinthisreport.

1. Investigatingstrandsandexpectationsofotherassessments

Page 18: Classworks Universal Screeners

3©SEGMeasurement.

2. FinalizingplansfortheupdatestobemadetotheUniversalScreeners3. Selectingitemsforreplacement

4. Developingnewitems

5. Producingnewitems

6. Creatingfieldtestformsandadministeringthefieldtest

7. Analyzingfieldtestdata

8. Evaluatingfinaltestformsandscoring

InvestigatingstrandsandexpectationsofotherassessmentsAspartoftheinitialplanningstages,manyassessmentsandstandardswerereviewedtogatherinformationonthelatestexpectationsofstudentsinreadingandmath.ThiswastohelpmeetthegoalthatthechangestotheUniversalScreenerswouldhelptobringtheformsmoreinlinewithexpectationsofothercommonassessmentsandstandards.

TheCommonCoreReading/ELAstrandsaresummarizedinthefollowingtable.

Table1:CommonCoreReading/ELAStrands

Area

Strand

Grade

K 1 2 3 4 5 6 7 8 9 10

Reading

Literature-keyideasanddetails Y Y Y Y Y Y Y Y Y Y Y

Literature-craftandstructure Y Y Y Y Y Y Y Y Y Y YLiterature-integrationofknowledgeandideas

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Literature-rangeofreadingandleveloftextcomplexity

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Inf.Text-keyideasanddetails Y Y Y Y Y Y Y Y Y Y Y

Inf.Text-craftandstructure Y Y Y Y Y Y Y Y Y Y Y

Inf.Text-integrationofknowledgeandideas Y Y Y Y Y Y Y Y Y Y YInf.Text-rangeofreadingandleveloftextcomplexity

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

FoundationalSkills-Printconcepts Y Y Y Y Y Y Foundationalskills-phonologicalawareness Y Y Y Y Y Y

Foundationalskills-phonicsandwordrecognition

Y

Y

Y

Y

Y

Y

Foundationalskills-fluency Y Y Y Y Y Y

Language

ConventionsofStandardEnglish Y Y Y Y Y Y Y Y Y Y Y

KnowledgeofLanguage Y Y Y Y Y Y Y Y Y

VocabularyAcquisitionandUse Y Y Y Y Y Y Y Y Y Y Y

Page 19: Classworks Universal Screeners

4©SEGMeasurement.

Writing

Texttypesandpurposes Y Y Y Y Y Y Y Y Y Y Y

ProductionandDistributionofWriting Y Y Y Y Y Y Y Y Y Y Y

Researchtobuildandpresentknowledge Y Y Y Y Y Y Y Y Y Y Y

Rangeofwriting Y Y Y Y Y Y Y Y

SpeakingandListening

ComprehensionandCollaboration Y Y Y Y Y Y Y Y Y Y Y

PresentationofKnowledgeandIdeas Y Y Y Y Y Y Y Y Y Y Y

LiteracyinHistory/Social

Studies,Science,&TechnicalSubjects

Keyideasanddetails Y Y Y Y Y

Craftandstructure Y Y Y Y Y

Integrationofknowledgeandideas Y Y Y Y Y

Rangeofreadingandleveloftextcomplexity Y Y Y Y Y

TheGeorgiaMilestoneAssessmenttestsingrades3–8includethefollowinghighlevelskillsforELA:

• ReadingandVocabulary• WritingandLanguage

TheNationalAssessmentofEducationalProgress(NAEP)forReadingincludesthefollowingskills:

• LiteraryandInformationaltext

o Locateandrecallo Integrateandinterpreto Critiqueandevaluateo Vocabulary

TheCommonCoreMathstrandsaresummarizedinthefollowingtable.

Table2:CommonCoreMathematicsStrands

Grade/Course

Strand

Grade

K

1

2

3

4

5

6

7

8 HS

-Num

ber&

Qua

ntity

HS-Algebra

HS-Functio

ns

HS-Geo

metry

HS-Stats&

Probability

K-8

CountingandCardinality Y OperationsandAlgebraicThinking Y Y Y Y Y Y NumberandOperationsinBase10 Y Y Y Y Y Y NumberandOperations-Fractions Y Y Y MeasurementandData Y Y Y Y Y Y Geometry Y Y Y Y Y Y Y Y Y RatiosandProportions Y Y TheNumberSystem Y Y Y ExpressionsandEquations Y Y Y

Page 20: Classworks Universal Screeners

5©SEGMeasurement.

Functions Y StatisticsandProbability Y Y Y

HS

Numberand

Quantity

TheRealNumberSystem

Y

Quantities Y ComplexNumberSystem Y VectorandMatrixQuantities Y

HSAlgebra

SeeingStructureinExpressions Y ArithmeticwithPolynomialsandRationalExpressions

Y

CreatingEquations Y ReasoningwithEquationsandInequalities Y

HSFunctions

InterpretingFunctions Y BuildingFunctions Y Linear,Quadratic,andExponentialModels Y TrigonometricFunctions Y

HSGeometry

Congruence Y Similarity,RightTriangles,andTrig Y Circles Y ExpressingGeometricPropertieswithEquations

Y

GeometricMeasurementandDimension Y ModelingwithGeometry Y

HSStatistics

andProbability

InterpretingCategoricalandQuantitativeData

Y

MakingInferencesandJustifyingConclusions

Y

ConditionalProbabilityandRulesofProbability

Y

UsingProbabilitytoMakeDecisions Y

TheGeorgiaMilestoneAssessmenttestsingrades3–8includethefollowingstrandsforMath:

• OperationsandAlgebraicThinking:Grades3–5• NumberandOperations:Grade3• NumberandOperationsinBase10:Grades4-5• NumberandOperations:Fractions:Grades4-5• MeasurementandData:Grades3-5• Geometry:Grades3–8• TheNumberSystem:Grades6-7• RatiosandProportions:Grades6–7• StatisticsandProbability:Grades6–8• Numbers,Expressions,andEquations:Grade8• ExpressionsandEquations:Grades6–7• AlgebraandFunctions:Grade8

Page 21: Classworks Universal Screeners

6©SEGMeasurement.

TheNationalAssessmentofEducationalProgress(NAEP)mathematicsassessmentcoversthefollowingstrands:

• Algebra• Numberpropertiesandoperations• Measurement• Geometry• Dataanalysis,statisticsandprobability

FinalizingplansfortheupdatestobemadetotheUniversalScreenersWemaderecommendationsforchangestothestrands(particularsonconsolidating,renaming,adding,orexpanding)andafterinternalreviewoftheimpactonthesystemandbenefitsofmakingthechanges,CurriculumAdvantagedeterminedthatthestrandswillremainconsistentbetweenthecurrentUniversalScreenerformsandthenewUniversalScreenerforms.Ratherthanchangingthestrands,thefocusisonincreasingthequalityoftheitemsincludedwithinthestrands.

Table3:ClassworksReading/ELAStrandsandCurrentUniversalScreenerCoverage

Grad

e

Grammar/U

sage/M

echanics

Read

ing

Stud

ySkills

WordAn

alysis

Writing

WritingProcess

notcovered

-Listen

ing/Speaking/Viewing

Grand

Total

K 9 1 5 151 2 10 7 1 202 3 12 1 8 1 253 7 9 3 4 1 1 254 6 10 2 5 1 1 255 7 10 3 7 1 2 306 7 11 3 6 1 2 307 7 11 4 6 2 308 8 11 2 6 3 309 8 13 3 4 2 30

10 9 13 5 3 30

Page 22: Classworks Universal Screeners

7©SEGMeasurement.

Table4:ClassworksMathematicsStrandsandCurrentUniversalScreenerCoverage

Grade

Algebra

Concep

tsofC

alculus

Geo

metry

Mathe

maticalProcesses

Measuremen

t

Num

eration

Ope

ratio

ns

Patterns

Statisticsa

ndProbability

Trigon

ometry

Grand

Total

K 2 3 5 2 2 1 151 1 5 4 4 4 1 1 202 2 4 3 5 4 2 2 3 253 2 2 2 6 1 5 2 5 254 1 5 1 3 4 4 1 6 255 5 8 4 4 1 2 1 5 306 6 1 8 3 1 3 3 1 4 307 6 2 6 2 4 2 2 1 5 308 8 1 8 1 4 1 2 1 4 309 8 7 5 2 1 1 1 4 1 30

10 8 6 5 4 1 1 4 1 30

ThefollowingdecisionsweremadeinconjunctionwithCurriculumAdvantagewithregardstotheupdatestotheUniversalScreeners:

o Eachformwouldhaveapproximately20%oftheformreplacedwithnewitemsthataligntothe

currentClassworksobjectives.(TheobjectivesforeachgradeandsubjectweregatheredthroughtheClassworksitembankandincludedinAppendixA.)

o Itemswillbeconsideredforreplacementwithanewitembasedonthequalityofthecurrentitem,theimportanceoftheobjectivemeasured,andtheabilityoftheitemtomeasureon-gradereadiness.

o Itemsthatarereplacedmaybereplacedwithanewitemmeasuringthesameobjective,adifferentobjectivewithinthesamestrand,oranobjectiveinadifferentstrandthatisinneedofmorecoverage.

o Inbothreadingandmath,allnewitemswillbesinglebestanswermultiplechoiceitems.o Itemsmaybeassociatedwithoneormorepassagesorimages.Someitemsmayneedtobe

administeredtogetherinsequenceasaset(i.e.,agroupofitemsthatareallassociatedwiththesamepassage(s)).

o Itemswillallbeindependent(notrelatetoorbuildoneachother),eveniftheyrelatetothesamepassageorstimulus.

o Newitemsmaybeusedonmultipleformsacrossorwithingrades(tofollowsimilaroverlapofcurrentforms),butduplicateusagewillonlycountasoneitemoutofthe125thatwillbedeveloped.

Page 23: Classworks Universal Screeners

8©SEGMeasurement.

o Thefieldtestformswillcontaintheentirecurrentscoreableformsplusadditionalnon-scoreditemsforfieldtesting.Thiswillallowforthefieldtestformstocontinuetoserveasliveoperationalformsduringthe2015-2016schoolyear.

o Thenewformswillmaintainthecurrentgradelevelcoverageoftheforms.o Allofthenewitemswillbefieldtestedtogatherdata.o Alinkedformdesignwithshareditemswillbeusedsothattheentirepoolofnewitemswithina

subjectcanbecalibratedwiththecurrentpool.o Afterthefieldtest,theitemsandplannedfinalformswillbeevaluated.o Scoringandcomparabilitytothecurrentformswillbeevaluatedtodeterminewhetherchanges

arewarranted.

Table5showstheplannednumberofitemsdevelopedforeachgrade(roughly20%ofeachform).Theactualitemdevelopmentmatchedtheseplans.Tables6and7showthebreakdownofgradelevelcoverageforeachform,whichremainconsistentfromthecurrentscoreableitemstothenewscoreableitems(afterfieldtesting).

Table5:NumberofNewItemsPerForm

Grade Reading MathK 3 31 3 32 5 53 7 74 6 65 7 76 6 67 7 78 6ononeform,7ontheother 6ononeform,7ontheother9 6 610 6 6Total 125 125

Table6:ItemGradeLevelCoverageonReadingScreeners

READING ItemGradeLevelForm K 1 2 3 4 5 6 7 8 HS NGradeKReadingScreener A 15 15B 15 15Grade1ReadingScreener A 8 12 20B 8 12 20Grade2ReadingScreener A 5 7 13 25

Page 24: Classworks Universal Screeners

9©SEGMeasurement.

B 5 7 13 25Grade3ReadingScreener A 7 7 11 25B 7 7 11 25Grade4ReadingScreener A 6 7 12 25B 6 7 12 25Grade5ReadingScreener A 7 8 15 30B 7 8 15 30Grade6ReadingScreener A 7 8 15 30B 7 8 15 30Grade7ReadingScreener A 7 8 15 30B 7 8 15 30Grade8ReadingScreener A 7 8 15 30B 7 8 15 30Grade9ReadingScreener A 7 8 15 30B 7 10 13 30Grade10ReadingScreener A 2 8 20 30B 2 8 20 30

Table7:ItemGradeLevelCoverageonMathScreeners

MATH ItemGradelevelForm K 1 2 3 4 5 6 7 8 HS NGradeKMathScreener

A 15 15B 15 15

Grade1MathScreener A 8 12 20B 8 12 20

Grade2MathScreener A 5 7 13 25B 5 7 13 25

Grade3MathScreener A 6 7 12 25B 6 7 12 25

Page 25: Classworks Universal Screeners

10©SEGMeasurement.

Grade4MathScreener A 6 7 12 25B 6 7 12 25

Grade5MathScreener A 7 8 15 30B 7 8 15 30

Grade6MathScreener A 7 8 15 30B 7 8 15 30

Grade7MathScreener A 4 8 18 30B 4 8 18 30

Grade8MathScreener A 7 8 15 30B 7 8 15 30

Grade9MathScreener A 2 13 15 30B 2 13 15 30

Grade10MathScreener A 2 8 20 30B 2 8 20 30

SelectingitemsforreplacementEachcurrentformwasexportedfromtheClassworkssystemintoaseparateWorddocumentinpreparationforreviewandupdate.Foreachform,theplansfornumbersofitemstobereplacedandtheblueprintfortheformwerenotedinthedocument.Eachcurrentformwasreviewedbyexpertstoidentifythespecificitemsthatwouldprovidethemostvaluebybeingremovedfromtheformandreplacedwithanewitem.Theitemswerereviewedalongmultiplefacetsofqualityincludingthegeneralqualityoftheitem,reflectionofcurrentexpectationsoftheskill,importanceandrelevanceoftheitem,andhowwelltheitemmeasurestheobjectivewithintheskill/strand.

Foreachform,theitemstobereplacedwereidentifiedanditemwritingassignmentsweredeveloped.Inmanycases,thenewitemwoulddirectlyreplacethecurrentitemwithanotheritemthatbettermeasuredtheobjectivewithinthestrand.Insomecases,itwasdeterminedthatadifferentobjectiveshouldbecoveredwithinthestrandtobettercoverthefocusoftheparticularstrand.

DevelopingnewitemsAftertheitemstobereplacedwereidentifiedandtheitemneedswereidentified,itemdevelopmentbegan.Testdevelopmentexpertsinmathandreadingdevelopedthenewitemstomeettheitemspecifications.NewitemswerewrittentomaintainthecurrentstyleoftheUniversalScreenerswhilealsorepresentingnewerwaysofmeasuringtheobjectives.

Page 26: Classworks Universal Screeners

11©SEGMeasurement.

Thedraftitemswerereviewedandeditedforstyle,grammar,contentaccuracy,appropriateness,andperceivedpsychometricquality.Thefinal250newitemswerethenpreparedforonlineproductionintotheClassworkssystem.

AppendixBcontainsthealignmentinformationforeachofthenewitems.

ProducingnewitemsOnceapprovedinternally,weindividuallyenteredtheitemsintotheClassworksitembankdatabase.Eachitemwascodedwithitssubject,gradelevel,strand,andskillaspertherequirementsofthesystem.ThesystemgeneratedauniqueassessmentsystemIDnumberforeachitem.Thecorrectanswerwasidentifiedandartworkwasuploaded.Theitemswerereviewedforproperrenderingontheplatform.

Aftertheitemspassedthroughtheinternalproductionreview,itemswerereleasedtoCurriculumAdvantageforreviewbytheircontentexperts.Inadditiontotheitemsbeingavailableonline,detailsabouttheitemsweresentexternallytoassistinreviewandtracking.AfterreviewbyCurriculumAdvantage,theitemswerefinalizedbySEGandapprovedinthesystembyCurriculumAdvantagecontentexperts.

OncetheitemswereapprovedinourlocalitembankinClassworks,CurriculumAdvantageprogrammersworkedtoporttheitemsintotheofficialClassworksitembankandactivatetheitemsforuseonthefieldtestforms.Duringthisprocess,theitemIDswereslightlymodifiedtoensuretheitemIDswereuniquewithintheClassworksitembankwhilealsoallowingfortrackingwiththeoriginalIDswhentheitemswerecreated.Allofthenewitemsintheofficialbankareinthe15,000s.Forexample,itemID8inthelocalbankisnow15008anditemID168isnow15168.

CreatingfieldtestformsandadministeringthefieldtestInordertoallowforcontinuedproductionuseoftheUniversalScreenerswhilealsofieldtestingthenewitems,thefieldtestformsweredevelopedtoincludetheentiresetofscoreableitemsonthecurrentformaswellasadditionalitemsforfieldtestingthatdidnotcounttowardsthestudent’sscore.Thefieldtestitemsincludeditemsthatwouldendupbeingonthatofficialformaswellasotherlinkingitemsthatwouldbedroppedfromtheform.Itemswereplacedstrategicallyacrossformssothatalloftheformswouldbelinkedandthattestdatafromstudentswhowereongrade,abovegrade,andbelowgradewereexposedtotheitems.Thefollowingtwotablessummarizetheplansforthefieldtestforms.AppendixCcontainstheitemleveldetailsonthefieldtestforms.

Page 27: Classworks Universal Screeners

12©SEGMeasurement.

Table8:FieldTestPlansforReading/ELA

Form

Number of Scored Items on Current Form

Number of Non-Scored Items on Current Forms (these items will be dropped for new field test forms)

Current total number of items (scored and non-scored)

Number of Item Replacements (New Field Test Items that will eventually replace current scored items)

Number of linking items (additional non-scored linking items for field testing and calibration)

New Field Test Length (scored and non- scored items)

New Planned Screener Final Test Length (all scored items only, same number as current form scored item count)

Grade K Reading Screener A 15 5 20 3 3 21 15 B 15 5 20 3 3 21 15 Grade 1 Reading Screener A 20 5 25 3 3 26 20 B 20 5 25 4 2 26 20 Grade 2 Reading Screener A 25 5 30 5 3 33 25 B 25 5 30 5 3 33 25 Grade 3 Reading Screener A 25 5 30 7 3 35 25 B 25 5 30 7 3 35 25 Grade 4 Reading Screener A 25 5 30 6 4 35 25 B 25 5 30 6 4 35 25 Grade 5 Reading Screener A 30 5 35 7 2 39 30 B 30 5 35 7 2 39 30 Grade 6 Reading Screener A 30 5 35 6 3 39 30 B 30 5 35 7 2 39 30 Grade 7 Reading Screener A 30 5 35 7 2 39 30 B 30 5 35 7 2 39 30 Grade 8 Reading Screener A 30 5 35 6 3 39 30 B 30 5 35 7 2 39 30 Grade 9 Reading Screener A 30 5 35 7 2 39 30 B 30 5 35 7 2 39 30 Grade 10 Reading Screener A 30 5 35 7 2 39 30 B 30 5 35 7 2 39 30

Page 28: Classworks Universal Screeners

13©SEGMeasurement.

Table9:FieldTestPlansforMath

Form

Number of Scored Items on Current Form

Number of Non-Scored Items on Current Forms (these items will be dropped for new field test forms)

Current total number of items (scored and non-scored)

Number of Item Replacements (New Field Test Items that will eventually replace current scored items)

Number of linking items (additional non-scored linking items for field testing and calibration)

New Field Test Length (scored and non- scored items)

New Planned Screener Final Test Length (all scored items only, same number as current form scored item count)

Grade K Math Screener A 15 5 20 3 3 21 15

B 15 5 20 3 3 21 15

Grade 1 Math Screener A 20 5 25 3 3 26 20

B 20 5 25 3 3 26 20

Grade 2 Math Screener A 25 5 30 5 3 33 25

B 25 5 30 5 3 33 25

Grade 3 Math Screener A 25 5 30 7 3 35 25

B 25 5 30 7 3 35 25

Grade 4 Math Screener A 25 5 30 6 4 35 25

B 25 5 30 6 4 35 25

Grade 5 Math Screener A 30 5 35 7 2 39 30

B 30 5 35 7 2 39 30

Grade 6 Math Screener A 30 5 35 6 3 39 30

B 30 5 35 6 3 39 30

Grade 7 Math Screener A 30 5 35 7 2 39 30

B 30 5 35 7 2 39 30

Grade 8 Math Screener A 30 5 35 6 3 39 30

B 30 5 35 7 2 39 30

Grade 9 Math Screener A 30 5 35 6 3 39 30

B 30 5 35 7 2 39 30

Grade 10 Math Screener A 30 5 35 7 2 39 30

B 30 5 35 9 0* 39 30

*Grade 10 B form already has new field test items that are also on other forms/grades.

Page 29: Classworks Universal Screeners

14©SEGMeasurement.

Thefieldtestformswereadministeredduringthe2015-2016schoolyearaspartofoperationalClassworksusageuntilsufficientdatawascollectedforeachform.CurriculumAdvantageexportedthefieldtestdataforanalysisinJune2016.

AnalyzingfieldtestdataSEGpreparedthefieldtestdataforanalysesformultiplepurposes:evaluatingtheitemqualityofthenewitems,evaluatingtheitemqualityofthecurrentitemsthatwillremainontheforms,calibratingthenewitemsintothecurrentpoolsofactiveitems,evaluatingthedifficultyofthetestforms,andreviewingtheverticalscalingacrosstheforms.

Theitemswerereviewedfirstintermsofpercentageofstudentsansweringcorrectly.Anyitemsthatwereansweredbyfewerthan25percentcorrectwerereviewedforaccuracy.Itemsthathaveveryfewpeopleansweringcorrectlymaysimplybeharditems,ortheymaybeitemsthatweremiskeyed,didnotrenderproperlyforansweringcorrectly(particularlyinthecaseswhereimages/graphswererequired),orpossiblyhadmultiplecorrectanswers.Thepointbiserialswerealsoreviewedforeachitem.Thepointbiserialprovidesameasureoftherelationshipbetweenperformanceontheitemandperformanceontheform.Allofthenewitemsweredeterminedtobefunctioningacceptablyandnomodificationsorreplacementswerewarranted.AsmallnumberofcurrentitemswereflaggedforcontentreviewinternallyatCurriculumAdvantageforpotentialmodificationtoimprovetheperformanceoftheitems.Theitemsflaggedforfurtherreviewwereitems8942,13064,and14628.

ThedetaileditemstatisticsareprovidedinAppendixD.Theformswerereviewedtocomparetheoveralldifficultyoftheplannednewformswiththedifficultyofthecurrentforms.ThenewformswerefoundtobeveryconsistentwiththecurrentformsasshowninAppendixE.Thesesimilaritieswereexpectedbasedonthefinaldesignandscopeoftheupdatestotheitemsontheforms.

EvaluatingfinaltestformsandscoringAfterthefieldtestdatawasevaluatedandthedefinitions(itemcomposition)ofthenewformswereconfirmed,weevaluatedthenewformstodeterminewhetheranychangestothescoringoruseofthedatawouldbewarranted.

Usingthedatacollectedduringthefieldtesting,wecalculatedtheestimatedreliabilityofthenewforms(includingthoseitemsthatwillbescoreableonthefinalnewforms).Reliabilitycanbethoughtofasameasureoftheconsistency,stability,andaccuracyofthescoring.Testscoreswithhighreliabilitywillproducesimilarscoresforstudentsiftheyweretoretakethetestwithoutfurtherinstructionortimepassing.Overall,thereliabilitiesforthenewUniversalScreenersareverystrong.Atthetailswheretherearefewerstudentstakingtheforms(specifically10thgrademath),thereliabilitiesareabitlower.Thereliabilitiesareaffectedbythedistributionofthescoresandthestudentswhotookthetestforms.ItisexpectedthatwithadditionaltesttakersandamoreconsistentusageoftheScreenersforthose

Page 30: Classworks Universal Screeners

180©SEGMeasurement.

forms,thatwewouldseeimprovedreliabilityforthoseformswherethereliabilityiscurrentlyabitweakerthanotherforms.

Table10:FormReliability

MATH READINGKA 0.78 0.69KB 0.88 0.861A 0.96 0.961B 0.77 0.822A 0.98 0.972B 0.87 0.973A 0.96 0.983B 0.97 0.974A 0.95 0.964B 0.91 0.895A 0.95 0.975B 0.93 0.936A 0.91 0.976B 0.92 0.947A 0.80 0.927B 0.89 0.938A 0.85 0.948B 0.85 0.939A 0.77 0.89B 0.51 0.7510A 0.66 0.8410B 0.48 0.82

Theitemswerecalibratedwithinsubjectacrossallgradesandanchoredtothecurrentitempools.Thiswasconductedinordertoevaluatewhethertheitemsfitreasonablywithinthepoolandwhetherchangestotheverticalscalingwerewarranted.Giventheconsistencyofthenewformswiththecurrentforms,itisrecommendedthatthecurrentscalingandreportingbecontinued.Thiswillallowforlongitudinalreportinginthesystemwithoutchangestothesystemorincreasedcomplexityforteacherstointerprettheresultsandmakedecisions.TheitemlevellogitandfitdatafromtheverticalscalingisincludedwiththeitemlevelstatisticsinAppendixD.

ThenewupdatedUniversalScreenerformscanbeseamlesslyputintoproductionasplannedandcancontinuetobeusedasanintegralcomponentofthecompleteClassworkssystem.