Standardized Tests. Problems with Individually Administered Tests Time required to administer test...
-
Upload
kayli-virgin -
Category
Documents
-
view
226 -
download
0
Transcript of Standardized Tests. Problems with Individually Administered Tests Time required to administer test...
Standardized Standardized TestsTests
Problems with Individually Problems with Individually Administered TestsAdministered Tests
Time required to administer testTime required to administer test Expense Expense Need for trained examinersNeed for trained examiners Unsuited for administration to large Unsuited for administration to large
numbers of peoplenumbers of people
Group Intelligence TestsGroup Intelligence Tests Robert M. YerkesRobert M. Yerkes Army Alpha & Army Beta tests for Army Alpha & Army Beta tests for
WWI recruits (1917)WWI recruits (1917) These tests initiated mass testingThese tests initiated mass testing Within a few years of the war’s Within a few years of the war’s
end, mass testing moved to the end, mass testing moved to the schoolsschools
8,000 students took the SAT when 8,000 students took the SAT when it was first administered in 1926it was first administered in 1926
Nearly 3 million take it annually Nearly 3 million take it annually nownow
Items from Army Beta TestItems from Army Beta Test
Group Tests of Intelligence:Group Tests of Intelligence:The Cognitive Abilities Test The Cognitive Abilities Test
(COGAT)(COGAT) Latest revision is form 6 (2001)Latest revision is form 6 (2001) Includes a kindergarten level, 2 Includes a kindergarten level, 2
levels for grades 1 & 2, and 8 levels levels for grades 1 & 2, and 8 levels (A to H) for grades 3 to 12(A to H) for grades 3 to 12
Each level is printed in a separate Each level is printed in a separate bookletbooklet
Levels A to HLevels A to H Contain the same nine subtests, grouped Contain the same nine subtests, grouped
into three batteries:into three batteries: VerbalVerbal QuantitativeQuantitative NonverbalNonverbal
Each subtest preceded by practice Each subtest preceded by practice exercises with detailed explanationsexercises with detailed explanations
Provides three separate scores: a verbal, Provides three separate scores: a verbal, quantitative & nonverbal scorequantitative & nonverbal score
Scores have mean of 100, standard Scores have mean of 100, standard deviation of 16deviation of 16
Reliability & ValidityReliability & Validity
Reliabilities in the .90’s for each of Reliabilities in the .90’s for each of the scoresthe scores
Good validity: correlates with other Good validity: correlates with other tests & school gradestests & school grades
Correlates with scores in social Correlates with scores in social studies, math, first grade reading, studies, math, first grade reading, musical ability, even social statusmusical ability, even social status
Nonverbal Group Tests:Nonverbal Group Tests:Raven’s Standard Progressive Raven’s Standard Progressive
MatricesMatrices Developed in UK by J.C. Raven Developed in UK by J.C. Raven
(1938)(1938) Can be administered to individuals Can be administered to individuals
or groups aged 5 to elderly adultor groups aged 5 to elderly adult Consists of 60 matrices, each Consists of 60 matrices, each
containing a logical pattern or containing a logical pattern or design with a missing part, of design with a missing part, of increasing difficultyincreasing difficulty
Reliability & ValidityReliability & Validity Internal consistency studies using either the split-half Internal consistency studies using either the split-half
method corrected for length or KR20 estimates result method corrected for length or KR20 estimates result in values ranging from .60 to .98, with a median of .90in values ranging from .60 to .98, with a median of .90
Test-retest correlations range from a low of .46 for an Test-retest correlations range from a low of .46 for an eleven-year interval to a high of .97 for a two-day eleven-year interval to a high of .97 for a two-day interval. The median test-retest value is interval. The median test-retest value is approximately .82. approximately .82.
test-retest coefficients for several age groups: .88 (13 test-retest coefficients for several age groups: .88 (13 yrs. plus), .93 (under 30 yrs.), .88 (30-39 yrs.), .87 (40-yrs. plus), .93 (under 30 yrs.), .88 (30-39 yrs.), .87 (40-49 yrs.), .83 (50 yrs. and over). 49 yrs.), .83 (50 yrs. and over).
Concurrent validity coefficients between the SPM and Concurrent validity coefficients between the SPM and the Stanford-Binet and Weschler scales range between the Stanford-Binet and Weschler scales range between .54 and .88, with the majority in the .70s and .80s. .54 and .88, with the majority in the .70s and .80s.
Benefits of Using SPMBenefits of Using SPM
Can be used without any verbal Can be used without any verbal instructions with young children, instructions with young children, culturally deprived, language-culturally deprived, language-handicapped, brain-injured individualshandicapped, brain-injured individuals
Minimizes the effects of language & Minimizes the effects of language & cultureculture
Differences between African Americans Differences between African Americans & Caucasians are less (7 or 8 points) & Caucasians are less (7 or 8 points) with RPM than with SB or Wechsler with RPM than with SB or Wechsler scales scales
Goodenough-Harris Goodenough-Harris Drawing TestDrawing Test
Individual instructed to draw a picture of Individual instructed to draw a picture of a whole man & do the best job possiblea whole man & do the best job possible
Respondents given credit for each item Respondents given credit for each item included in drawingsincluded in drawings
Each detail given 1 point (to a total of Each detail given 1 point (to a total of 70)70)
Raw scores converted to standard scores Raw scores converted to standard scores with a mean of 100, s.d. of 15, using age with a mean of 100, s.d. of 15, using age normsnorms
Reliability & ValidityReliability & Validity
Reliabilities (split-half, test-retest, Reliabilities (split-half, test-retest, inter-scorer) range from high .60’s inter-scorer) range from high .60’s to low .90’sto low .90’s
Scores level off at ages 14 or 15, so Scores level off at ages 14 or 15, so can only be used with younger can only be used with younger childrenchildren
Reasonable validity; correlation with Reasonable validity; correlation with standard IQ tests in one study standard IQ tests in one study was .81was .81
Tests of Aptitude & Tests of Aptitude & AchievementAchievement
Used in making decisions about Used in making decisions about admission to universities at the admission to universities at the undergraduate level, graduate level, undergraduate level, graduate level, and to business & professional and to business & professional schoolsschools
Referred to as “high stakes” tests Referred to as “high stakes” tests because of the impact they have on because of the impact they have on people’s livespeople’s lives
The Scholastic The Scholastic Assessment TestAssessment Test
Until 1995, known as the Scholastic Aptitude Until 1995, known as the Scholastic Aptitude TestTest
Has been in use since 1926Has been in use since 1926 Most widely used of university entrance testsMost widely used of university entrance tests Given to nearly 3 million students each yearGiven to nearly 3 million students each year Newest form was introduced in March 2005, Newest form was introduced in March 2005,
for entry into university in fall of 2006for entry into university in fall of 2006 There is a Reasoning Tests (general aptitude There is a Reasoning Tests (general aptitude
test) and Subject Tests in various subjectstest) and Subject Tests in various subjects
Reasoning Test (formerly Reasoning Test (formerly SAT-I)SAT-I)
““The SAT Reasoning Test is a measure The SAT Reasoning Test is a measure of the critical thinking skills you'll need of the critical thinking skills you'll need for academic success in college. The for academic success in college. The SAT assesses how well you analyze and SAT assesses how well you analyze and solve problems—skills you learned in solve problems—skills you learned in school that you'll need in college.”school that you'll need in college.”
Three sections:Three sections: Critical readingCritical reading MathematicsMathematics Writing Writing
Each section of the SAT is scored on Each section of the SAT is scored on a scale of 200-800, and the writing a scale of 200-800, and the writing section generates two subscores. section generates two subscores.
administered seven times a year in administered seven times a year in the U.S., Puerto Rico, and U.S. the U.S., Puerto Rico, and U.S. Territories, and six times a year in Territories, and six times a year in other countries.other countries.
Critical Reading SectionCritical Reading Section Reading Reading
comprehension, comprehension, sentence completions, sentence completions, and paragraph-length and paragraph-length critical reading critical reading
Hoping to _______ the Hoping to _______ the dispute, negotiators dispute, negotiators proposed a compromise proposed a compromise that they felt would be that they felt would be _______ to both labor and _______ to both labor and management.management.
(A) enforce . . useful(A) enforce . . useful (B) end . . divisive(B) end . . divisive (C) overcome . . (C) overcome . .
unattractiveunattractive (D) extend . . satisfactory(D) extend . . satisfactory (E) resolve . . acceptable(E) resolve . . acceptable
Mathematics SectionMathematics Section Content: Number Content: Number
and operations; and operations; algebra and algebra and functions; geometry; functions; geometry; statistics, probability, statistics, probability, and data analysis and data analysis
Item-types: Five-Item-types: Five-choice multiple-choice multiple-choice questions and choice questions and student-produced student-produced responses responses
Writing SectionWriting Section Multiple choice questions (35 min.) Multiple choice questions (35 min.)
and student-written essay (25 min.) and student-written essay (25 min.)
E.g., The following sentences test your ability to recognize grammar and usage errors. Each sentence contains either a single error or no error at all. No sentence contains more than one error. The error, if there is one, is underlined and lettered. If the sentence contains an error, select the one underlined part that must be changed to make the sentence correct. If the sentence is correct, select choice E. In choosing answers, follow the requirements of standard written English.Example:
The other delegates (A) and him (B) immediately (C) accepted the resolution drafted (D) by the neutral states. No error (E)
Subject Tests (formerly Subject Tests (formerly SAT-II)SAT-II)
Subject Tests are designed to measure Subject Tests are designed to measure students' knowledge and skills in students' knowledge and skills in particular subject areas, as well as particular subject areas, as well as their ability to apply that knowledge.their ability to apply that knowledge.
Students take the Subject Tests to Students take the Subject Tests to demonstrate to universities their demonstrate to universities their mastery of specific subjects like mastery of specific subjects like English, history, mathematics, English, history, mathematics, science, and language. science, and language.
Reliability & ValidityReliability & Validity
Studies of old SAT show high internal Studies of old SAT show high internal consistency (>.90), test-retest reliability consistency (>.90), test-retest reliability (>.85 over 10 months)(>.85 over 10 months)
Predictive validity of test, using university Predictive validity of test, using university grades as the criterion, is quite highgrades as the criterion, is quite high
May 4, 2005May 4, 2005
ON EDUCATIONON EDUCATION
SAT Essay Test Rewards Length and SAT Essay Test Rewards Length and Ignores ErrorsIgnores Errors
By MICHAEL WINERIPBy MICHAEL WINERIP
http://www.nytimes.com/2005/05/04/educhttp://www.nytimes.com/2005/05/04/education/04education.html?ei=5090&en=948ation/04education.html?ei=5090&en=94808505ef7bed5a&ex=1272859200&partne08505ef7bed5a&ex=1272859200&partner=rssuserland&emc=rss&pagewanted=prr=rssuserland&emc=rss&pagewanted=print&positionint&position= =
Graduate Record Exam Graduate Record Exam (GRE)(GRE)
One of the most commonly used tests for One of the most commonly used tests for graduate-school entrancegraduate-school entrance
Used in combination with undergraduate Used in combination with undergraduate grades, letters of recommendation in selecting grades, letters of recommendation in selecting students for graduate schoolstudents for graduate school
General Test produces three scores:General Test produces three scores: Verbal (GRE-V)Verbal (GRE-V) Quantitative (GRE-Q)Quantitative (GRE-Q) Analytic (GRE-A) Analytic (GRE-A)
Subject Tests in biology, chemistry, literature, Subject Tests in biology, chemistry, literature, psychology, etc.psychology, etc.
All scores have a mean of 500, standard All scores have a mean of 500, standard deviation of 100deviation of 100
GRE StructureGRE StructureGRE
(General)
GRE-V
Antonyms
Analogies
Sentence Completions
Reading Comprehension
GRE-Q
Arithmetic
Algebra
Geometry
Data analysis
GRE-A
Present your perspective
Analyze an argument
Sample QuestionsSample Questions
See See http://www.gre.org/http://www.gre.org/
Reliability & ValidityReliability & Validity Stability (test-retest) & split-half reliability is Stability (test-retest) & split-half reliability is
goodgood Predictive validity “far from convincing” Predictive validity “far from convincing”
(Kaplan & Saccuzzo, 2005, p. 330)(Kaplan & Saccuzzo, 2005, p. 330) Correlations between GRE and grade point Correlations between GRE and grade point
average are low (.22 to .33 in one study, average are low (.22 to .33 in one study, accounting for 5 to 10% of variance)accounting for 5 to 10% of variance)
High false negative rates High false negative rates When combined with undergraduate grades, When combined with undergraduate grades,
correlated .63 with graduate grade point correlated .63 with graduate grade point averageaverage
See See http://www.fairtest.org/facts/gre.htmhttp://www.fairtest.org/facts/gre.htm
High Stakes Tests in the High Stakes Tests in the SchoolsSchools
Several states in the US, Great Britain, New Several states in the US, Great Britain, New Zealand have implemented national testing Zealand have implemented national testing programsprograms
Bill Clinton’s proposal in 1997 to implement Bill Clinton’s proposal in 1997 to implement nation-wide testing aroused considerable debatenation-wide testing aroused considerable debate
In 1999 National Academy of Sciences published In 1999 National Academy of Sciences published report entitled “High Stakes: Testing for report entitled “High Stakes: Testing for Tracking, Promotion & Graduation”Tracking, Promotion & Graduation”
Generally supported testing, but expressed Generally supported testing, but expressed concern that test results are commonly concern that test results are commonly misinterpreted & misunderstanding of test misinterpreted & misunderstanding of test results can damage individuals results can damage individuals
Testing in CandaTesting in Canda
A number of provinces, including A number of provinces, including Alberta & Ontario, administer Alberta & Ontario, administer standardized ability tests to all standardized ability tests to all students in their jurisdictionsstudents in their jurisdictions
In Ontario, these tests are In Ontario, these tests are coordinated by the Education Quality coordinated by the Education Quality & Accountability Office (EQAO)& Accountability Office (EQAO)
Budget for EQAO: approximately $50 Budget for EQAO: approximately $50 million annuallymillion annually
The Ontario Secondary The Ontario Secondary School Literacy Test School Literacy Test
(OSSLT)(OSSLT) given every fall to assess the reading and given every fall to assess the reading and
writing abilities of Grade 10 studentswriting abilities of Grade 10 students Students must pass the OSSLT in order Students must pass the OSSLT in order
to obtain an Ontario Secondary School to obtain an Ontario Secondary School diplomadiploma
Students who don’t pass can retake the Students who don’t pass can retake the test an unlimited number of timestest an unlimited number of times
Their school transcript will only list Their school transcript will only list whether or not they passed the OSSLT, whether or not they passed the OSSLT, not how many times they attempted the not how many times they attempted the test.test.
OSSLT (continued)OSSLT (continued)
Reading: Students are given examples of Reading: Students are given examples of different types of reading selections. They different types of reading selections. They are then tested on their comprehension of are then tested on their comprehension of what they have read.what they have read.
Writing: Students are required to write Writing: Students are required to write four different types of workfour different types of work A summary A summary An opinion pieceAn opinion piece An information paragraph An information paragraph A news report A news report
EQAO changes to standardized testing make them less disruptive but do not address the fundamental validity of the testsSeptember 23, 2004(Toronto) - “The changes to standardized testing in Ontario’s schools announced by the Education Quality and Accountability Office (EQAO) today do not address the fundamental question posed by educators and parents as to whether the testing is in fact valid,” said Rhonda Kimberley-Young, president of the Ontario Secondary School Teachers’ Federation.”“These changes will mean that these intrusive tests will not disrupt the learning of students to the same degree as they have until now, but simply making the tests shorter and changing how the results are reported does not mean that the testing is any way a valid measure of student achievement.“Teachers and educational workers believe the Ontario government should now take the next logical step and immediately conduct a validity study of the standardized testing taking place in Ontario schools. “The EQAO and the testing it is conducting is a multi million dollar expense. At a time when financial resources for schools and students are stretched, OSSTF believes these education dollars would be far better spent on meeting the educational needs of students,” concluded Kimberley-Young.
OSSTF Position on Grade OSSTF Position on Grade 1010
The EQAO Grade 10 literacy test isThe EQAO Grade 10 literacy test is not not a a fair fair measure. measure. The test is not administered consistently across the The test is not administered consistently across the
province. It is impossible to standardize preparation province. It is impossible to standardize preparation and administration conditions in a standardized test.and administration conditions in a standardized test.
According to Alfie Kohn, who crusades against According to Alfie Kohn, who crusades against standardized tests in the United States, standardized tests in the United States, socioeconomic status accounts for "an socioeconomic status accounts for "an overwhelming proportion of the variance in test overwhelming proportion of the variance in test scores". scores".
Time is taken away from the regular curriculum in Time is taken away from the regular curriculum in preparing for the test. Student anxiety affects preparing for the test. Student anxiety affects learning in other areas.learning in other areas.
OSSTF Criticism (cont’d)OSSTF Criticism (cont’d) The EQAO Grade 10 literacy test is The EQAO Grade 10 literacy test is not not a a validvalid
measure of student reading and writing. measure of student reading and writing. The test is very heavily weighted to writing. The test is very heavily weighted to writing. Students need over Students need over 60%60% in in BOTHBOTH reading and writing reading and writing
to pass. to pass. No marked tests will be returned. Students who fail No marked tests will be returned. Students who fail
receive limited, vague feedback. receive limited, vague feedback. There are very few funds or opportunities to provide There are very few funds or opportunities to provide
help to students who perform poorly or fail. help to students who perform poorly or fail. Instructions for questions are unclear. On a question Instructions for questions are unclear. On a question
which asked for one paragraph, students who wrote which asked for one paragraph, students who wrote more than one paragraph failed the question because more than one paragraph failed the question because they did not follow the instructions exactly. they did not follow the instructions exactly.
EQAO is secretive and will reveal neither the marking EQAO is secretive and will reveal neither the marking criteria nor what constitutes a pass.criteria nor what constitutes a pass.
OSSTF Criticisms OSSTF Criticisms (cont’d)(cont’d)
Cost of administering the testsCost of administering the tests The cost of last year’s literacy test The cost of last year’s literacy test
was $15 million at the same time as was $15 million at the same time as there were textbook shortages, and there were textbook shortages, and cuts to library, music, guidance, cuts to library, music, guidance, educational assistants and support educational assistants and support staff.staff.
Canadian Teachers Canadian Teachers FederationFederation
High stakes testingHigh stakes testing Encourages “teaching to the test”Encourages “teaching to the test” Creates a situation in which students struggling Creates a situation in which students struggling
with the material or who have special needs are with the material or who have special needs are seen as a liability because their low score influences seen as a liability because their low score influences averagesaverages
Squeezes “non-tested” subjects out of the Squeezes “non-tested” subjects out of the curriculumcurriculum
Are frequently biased against certain groups of Are frequently biased against certain groups of studentsstudents
Perpetuates the idea that a good education equals Perpetuates the idea that a good education equals high test scoreshigh test scores
Transfers control over curriculum to the body that Transfers control over curriculum to the body that controls the examcontrols the exam
Not long ago, a widely respected middle-school Not long ago, a widely respected middle-school teacher in Wisconsin, famous for helping students teacher in Wisconsin, famous for helping students design their own innovative learning projects, stood design their own innovative learning projects, stood up at a community meeting and announced that he up at a community meeting and announced that he "used to be" a good teacher. The auditorium fell silent "used to be" a good teacher. The auditorium fell silent at his use of the past tense. These days, he explained, at his use of the past tense. These days, he explained, he just handed out textbooks and quizzed his he just handed out textbooks and quizzed his students on what they had memorized. The reason students on what they had memorized. The reason was very simple. He and his colleagues were was very simple. He and his colleagues were increasingly being held accountable for raising test increasingly being held accountable for raising test scores. The kind of wide-ranging and enthusiastic scores. The kind of wide-ranging and enthusiastic exploration of ideas that once characterized his exploration of ideas that once characterized his classroom could no longer survive when the emphasis classroom could no longer survive when the emphasis was on preparing students to take a standardized was on preparing students to take a standardized examination. examination.
Benefits of Standardized Benefits of Standardized TestsTests
Allow for identification of children Allow for identification of children with problems, so that remediation with problems, so that remediation can take placecan take place
Allow for identification of schools Allow for identification of schools that may need extra resourcesthat may need extra resources
Increases accountability of school to Increases accountability of school to parents, Boards of Education, parents, Boards of Education, governmentgovernment
What do you think What do you think about standardized about standardized
tests?tests?