DOCUMENT RESUME ED 128 366 95

44
ED 128 366 DOCUMENT RESUME 95 TM 005 488 AUTHOR Silver, Evelyn Stern, Ed. TITLE Declining Test Scores: A Conference Report. INSTITUTION Johnson (Lawrence) and Associates, Inc., Washington, D.C. SPONS AGENCY National Inst. of Education (DHEW), Washington, D.C. PUB DATE Feb 76 CONTRACT NIE-C-00-3-0060 NOTE 44p.; Proceedings from Conference on Declining Test Scores (Washington, D.C., June 19-21, 1975) EDRS PRICE MF-$0.83 HC-$2.06 Plus Postage. DESCRIPTORS Academic Achievement; *College Entrance Examinations; *Conference Reports; Curriculum Development; *Low Achievement Factors; Predictive Validity; Research Needs; *Scores; Standardized Tests; Student Ability; Student Attitudes; Test Interpretation; Test Validity; *Trend Analysis IDENTIFIERS American College Test; Scholastic Aptitude Test; *Test Score Decline ABSTRACT In response to increasing evidence of score declines with no apparent agreement as to meaning or causes, the National Institute of Education (NIE) sponsored a Conference on Declining Test Scores in June of 1975. The objectives of the conference were to (1) clarify the evidence and estimate the extent of test score declines; (2) review evidence for the seriousness and meaningfulness of the problem and assess the value of research in this area; (3) explore areas of agreement and disagreement among experts as to possible causes of the declines; (4) formulate research guidelines for efficient and effective investigation into score trends and possible remedies; and (5) identify NIEls concern for and responsiveness to recent reports of score changes which could have important social implications. There did not appear to be consensus on the reasons for the decline even after the evidence for the various viewpoints had been presented and discussed. But there did appear to be consensus that further research could, at the very least, narrow the options and begin to assess the importance of the reported score changes. (Author/BW) *********************************************************************** Documents acquired by ERIC include many informal unpublished * materials not available from other sources. ERIC makes every effort * * to obtain the best copy available. Nevertheless, items of marginal * 'producibility are often encountered and this affects the quality * the microfiche and hardcopy reproductions ERIC makes available * via the ERIC Document Reproduction Service (EDRS). EDRS is not * responsible for the quality of the original document. Reproductions * * supplied by EDRS are the best that can be made from the original. ***********************************************************************

Transcript of DOCUMENT RESUME ED 128 366 95

ED 128 366

DOCUMENT RESUME

95 TM 005 488

AUTHOR Silver, Evelyn Stern, Ed.TITLE Declining Test Scores: A Conference Report.INSTITUTION Johnson (Lawrence) and Associates, Inc., Washington,

D.C.SPONS AGENCY National Inst. of Education (DHEW), Washington, D.C.PUB DATE Feb 76CONTRACT NIE-C-00-3-0060NOTE 44p.; Proceedings from Conference on Declining Test

Scores (Washington, D.C., June 19-21, 1975)

EDRS PRICE MF-$0.83 HC-$2.06 Plus Postage.DESCRIPTORS Academic Achievement; *College Entrance Examinations;

*Conference Reports; Curriculum Development; *LowAchievement Factors; Predictive Validity; ResearchNeeds; *Scores; Standardized Tests; Student Ability;Student Attitudes; Test Interpretation; TestValidity; *Trend Analysis

IDENTIFIERS American College Test; Scholastic Aptitude Test;*Test Score Decline

ABSTRACTIn response to increasing evidence of score declines

with no apparent agreement as to meaning or causes, the NationalInstitute of Education (NIE) sponsored a Conference on Declining TestScores in June of 1975. The objectives of the conference were to (1)

clarify the evidence and estimate the extent of test score declines;(2) review evidence for the seriousness and meaningfulness of theproblem and assess the value of research in this area; (3) exploreareas of agreement and disagreement among experts as to possiblecauses of the declines; (4) formulate research guidelines forefficient and effective investigation into score trends and possibleremedies; and (5) identify NIEls concern for and responsiveness torecent reports of score changes which could have important socialimplications. There did not appear to be consensus on the reasons forthe decline even after the evidence for the various viewpoints hadbeen presented and discussed. But there did appear to be consensusthat further research could, at the very least, narrow the optionsand begin to assess the importance of the reported score changes.(Author/BW)

***********************************************************************Documents acquired by ERIC include many informal unpublished

* materials not available from other sources. ERIC makes every effort ** to obtain the best copy available. Nevertheless, items of marginal *

'producibility are often encountered and this affects the quality *the microfiche and hardcopy reproductions ERIC makes available *

via the ERIC Document Reproduction Service (EDRS). EDRS is not* responsible for the quality of the original document. Reproductions ** supplied by EDRS are the best that can be made from the original.***********************************************************************

Declining Test ScoresA Conference Report

001,

lIi...1111111.1°^1111:117S

111111.m.-_

Z s".=.'ger

't, '/ -?!.. ' (

44tiFika;*-111111""1"; lig : eVIIIIIIIM"."4

^AratsvAiir

41111----""illi

... ,t,AA24.1-1;:........4..., "...t.i4.(......."--4.'14=-

-4.100.'"1.41..-,....""...1.1117".'"."':..r..

U S. DE PARTMEN1 OF HEALTH.EDUCATION & WELFARENATIONAL INSTITUTE OF

EDUCATION

THIS DOCUMENT HAS BEEN REPRO.OUCED EXACTLY AS RECEIVED FROMTHE PERSON OR ORGANIZATION CRIGIN-ATING IT POINTS OF VIEW OR OPINIONSSTATEO DO NOT NECESSARILY REPRESENT OFFICIAL NATIONAL INSTITUTE OFEDUCATION POSITION OR POLICY

41,

2

The NationalInstitute ofEducationI S. 1)111,11truml of

Etinuatim.dnd\ (:. 20200

I s

41111

DECLINING TEST SCORES

A Conference Report

Edited byEvelyn Stern Silver

Lawrence Johnson & Associates, Inc.

February 1976

44 ft 4

14,:

U. S. DEPARTMENT OF HEALTH, EDUCATION, AND WELFAREDavid Mathews, SecretaryVirginia Trotter, Assistant Secretary for Education

NATIONAL INSTITUTE OF EDUCATIONHarold L. Hodgkinson, DirectorAndrew Porter, Acting Msociate Director for Basic Skills

4:1

CONTENTS

page

EXECUTV/E SUM14ARy

Chapter5

1 The Need for a Conference:. ,'11

2 C0tIference Objectives and Overview

3

15Reearch on Score Declines

4 C°11cerns and Recommendations of Participants

5

27

59POlicy and Planning Implications

Referencs -

Appendig

TABLES

41

43

Table a, AC1. and SAT Score Averages for College-Bou44 7

Setliors 1966-67 - 1974-75 ,

2, Patticipants at the Conference on Declining 12

Tet Scores ,-28

3, Tet Trends Over the Last Ten Years

a,

-3 5041 3 isb,t ; *4

FIGURES

8Figure l AC1, comp° site Score Trend

97- 5 A1. Verbal and Math Score Trends

4

EXECUTIVE SUMMARY

In response to increased attention and concern on thepart of educators, legislators and the American public (siChapter One), the National Institute of Education (NIE) spon-sored a Conference on Declining Test Scores. Held on June19-21, 1975 in Washington, D.C., the Conference was calledto:

clarify the problem of test score declines;

review evidence for the seriousness of theproblem and the value of research in this area;

explore areas of agreement and disagreementamong experts as to possible causes and treat-ments of the declines;

formulate research guidelines for efficientand effective investigation into test scoretrends; and

identify NIE's concern for, and responsivenessto, this important national problem.

Both the American College Testing Program (ACT) and theCollege Entrance Examination Board (CEEB) have reported con-sistent declines in average (mean) college entrance examina-tion scores over the past decade. ACT scores have declinedabout one standard score over the last ten years. Recentdata from the CEEB illustrate a dramatic drop in ScholEsticAptitude Test Scores--comparison of scores earned by 1975high school graduates with 1974 scores shows a decline of tenpoints on the verbal part and eight points on the mathemati-cal section.

The forty participants at the Conference on DecliningTest Scores included representatives of test developers, testpublishers, universities, private research centers and Fed-eral agencies such as the NIE (see Chapter Two). Vigorousdebate and dialogue gave participants the opportunity to re-view existing data and research (see Chapter Three), and toexplore mutual areas of interest for fut, -e investigative ef-

forts.

1

The three conference days included discussion groups on(1) Policy Issues and Impact, (2) Analysis of Causes, (3)

Psychometric Scales and Data Requirements, and (4) Guidelinesfor Research and Remedies. Each group generated specificrecommendations for research, ranging from analysis of tcitems and response patterns to surveys of test-taker atttudes and motivation (see Chapter Four). Several themesstressed by participants:

The score declines on college entrance exami-nations are important and worthy of furtherinvestigation.

There are other trends in test scores that aresimilar to those reported for the college ad-missions tests but which involve other seg-ments of the student population--these testtrends should also be studied.

There is no single cause of test scorechanges--research should focus on patterns ofcausal variables.

Existing data, which are plentiful, should befully exploited before expensive new datacollection is undertaKen.

There are important policy ana planning implications forNIE from the Conference on Declining Test Scores (see ChapterFive). Existing data must be gathered together and siftedfor descriptive information about trends in test scores.This review of research and related literature should be usedto develop a comprehensive research framework within whichsecondary analyses of existing data and new research effortscan be planned. The coordination of investigations in thisarea will also insure comparability of data sources. The re-search framework that is developed should focus on the broaderissue of test score variability, so that researchers are nottied into narrow explanations of score declines.

Such a comprehensive research framework would certainlyinclude a longitudinal study of student attitudes abouttaking tests and testing. Score declines on college entranceexaminations and on science achievement tests have recentlybeen attributed to changes in student motivation. The lackof existing research on attitudinal factors makes it impera-tive that planning begin immediately so that data will beavailable for future analyses.

2

Several areas of research recommended by conference par-ticipants included a breakdown of college entrance test scoretrends by skill area tested. Such analysis would serve topinpoint the locus of score declines and, perhaps, help tofocus on the causes of declines. The Admissions Testing Pro-gram of the CEEB has analyzed scores for reading comprehen-sion, vocab,dary and "written English" for the 1974-75 popu-lation of test-takers. Follow-ups on this breakdown shouldyield the type of specific trend data that are needed.

Similarly, further analysis of score trends by sex seemswarranted. The American College Testing Program has notedthat the ACT Composite has dropped one standard score for menand 1.6 standard scores for w men. The nature and extent ofsex differences deserve further study so that causal factorsmay be isolated.

Finally, participants urged that NIE serve as a clear-inghouse for research on test score trends. This functionwould insure coordination of diverse research efforts and ac-cessibility to the data necessary for continued research andstatus reports.

The Conference on Declining Test Scores initiated anoverview and investigation of this important topic. Specula-tion about the causes of score declines without empir-ical orresearch basis can lead to drastic and incorrect decisions--decisions that may affect educational personnel, programs,and priorities. Before remedial or preventive action is ad-vocated, we must thoroughly investigate the evidence forscore declines, analyze the possible meanings associated withsuch score changes, and then decide from educational researchwhat factors might ne responsible. Further research appearsnecessary in view of the diversity of opinion in these mat-ters.

3

CHAPTER 1

THE NEED FOR A CONFERENCE

On September 7, 1975, the College Entrance ExaminationBoard (CEEB) issued a press release highlighting the majorfindings of their report on college-bound seniors in 1974-75.The story appeared on the front pages of many Sunday news-papers (including The New York Times and The Washington Post)and revealed that average scores on the Scholastic AptitudeTest (SAT) had declined ten points on the verbal section andeight points on the mathematical section over the precedingyear. The 1975 data followed a general pattern set by thepast ten years of a downward trend of SAT scores.

A similar decline was reported by the American CollegeTesting (ACT) Program on three out of four test batteries(English, mathematics, and social studies). Scores on theACT natural sciences battery remained essentially constantover the same time period. However, average ACT compositescores for college-bound students declined by approximatelyone standard score, or &bout one-fifth of a standard devia-tion, over the past decade. On a scale of one to 36, themean ACT composite score has dropped from 19.9 in 1964-65 to18.6 in 1974-75 (on the first four testing dates in thatyear). English scores have declined by about one standardscore, mathematics scores have declined by about 1.5 standardscores, and social studies scores have declined by about 2.5standard scores. The Amerioan College Testing Program hasreported that, "because of the large number of studentstested each year, [850,000 high school seniors in 1974-75],the observed trends are clearly not due to random fluctua-tions in test scores but rather reflect actual changes"(Ferguson and Maxey, 1975, p. 5).

The CEEB is equally concerned about the mez,Jiing of thechanges in SAT scores. The Board has examined score averagesover the past twenty years and notes that scores were str'lefrom 1956 through 1963. In 1964, averge scores began to de-cline, particularly on the verbal section of the SAT. Afterleveling off in 1968, declines became more acute until 1974.In 1975, scores declined dramatically in the largest one-yeardrop yet recorded.

On a scale of 200-800, the average verbal SAT scoreshave dropped 39 points (from a mean score of 473 in 1956-57

5

to 434 in 1974-75). Average math scores declined 24 points(496 in 1956-57 to 472 in 1974-75). Over this time period,the deviations from the yearly mean of verbal scores have re-mained fairly stable, but the deviations around math scoremeans have increased (McCandless, 1975).

ACT and SAT score averages for college-bound seniorsfrom 1966-67 to 1974-75 are listed in Table 1. The data areillustrated in Figures 1 and 2.

Media coverage of tHe decline in test scores has inten-sified over the past twelve months. Editorials and news ar-ticles on this subject have appeared across the nation inpapers such as The Arizona Republic (Phoenix, Arizona), theNews-Star (Monroe, Louisiana) and the Mail (Charleston, WestVirginia). Media coverage reflects the growing public con-cern over poor student performance, a concern matched by in-creasing professional attention. Education U.S.A., EducationDaily, the Bulletin of the Education Commission of theStates, The Independent School Bulletin, and the Chronicle ofHigher Education are just a few of the professional publica-tions that have pondered the test score decline. Entries inthe Congressional Record in 1974 and 1975 demonstrate thatconcern among political leaders is also escalating.

As scores have declined, speculation about the causesand meaning of the reported changes has increased. There aredata available to investigate some of these changes, but be-cause these data are scattered and not generally acces-sible to the public, there has been little empirical researchto support the explanations receiving the most publicity.Some of these explanations are: the failure of the publicschools to teach basic computational and verbal skills; thedecreasing benefits generated by a costly public school sys-tem; and the change in the composition of the student bodythat is seeking college admission. These are not the onlyspeculations--a bewildering variety of specific causes forscore declines among college-bound students has been con-sidered. Hypotheses, which may often reflect the interestsof their supporters, include changes in college admissionsrequirements and decreasing reliance on test scores, changesin the attitudes of students about tests in general, socio-logical and cultural trends that involve competitive and se-lective pressures, increases in curricular diversity, andchanges in the content of test items.

Under its legislative mandate to improve education inthe United States through research, the NIE has followed such

6

9

Table 1.--ACT and SAT SugLe Avutageis 6o/L

Catege-Bound Seniou 1966-67 - 1974-75

ACT

Composite

SAT Verbal

SAT Mathematical

Total

Male

Female

Total

Male

Female

Total

1966-67

19.4

463

468

466

514

467

492

1967-68

19.0

464

466

466

512

470

492

1968-69

19.4

459

466

463

513

470

493

II"

4C

D1969-70

19.5

459

461

460

509

465

488

1970-71

18.9

454

457

455

507

466

488

1971-72

18.8

454

452

453

505

461

484

1972-73

18.9

446

443

445

502

460

481

1973-74

18.7

447

442

444

501

459

480

1974-75

18.3

437

431

434

495

449

472

21 .

UI

20.0

t19

.0

18.0

17.0

DATE OF TEST ADINIZTRATION

Figure 1.--ACT Compozite Sea/Le aend

460.0

1 > 440.0

420.

0

DATE OF rrsT AomINISTRATIDN

Figu

re 2

. --

SAT

V e

ithat

. and

Mat

h Sc

ote

Pten

dz

speculation with interest. Because of the diversity of thespeculation, it was thought appropriate to bring togethermany viewpoints to see whether a consensus would emerge orwhether further investigation was necessary. The Conferenceon Declining Test Scores convened in JUne, 1975 was the be-ginning of this investigation. This report summarizes theviewpoini:s expressed and v!search proposed.

10

13

CHAPTER 2

CONFERENCE OBJECTIVES AND OVERVIEW

In response to increasing evidence of score declineswith no apparent agreement as to meaning or causes, NIE spon-sored a Conference on Declining Test Scores in June of 1975.The objectives of the conference were to:

clarify the evidence and estimate the extentof test score declines;

review evidence for the seriousness and mean-ingfulness of the problem and assess thevalue of research in this area;

explore areas of agreement and disagreementamong experts as to possible causes of tnedeclines;

formulate research guidelines for efficientand effective investigation into score trendsand possible remedies; and

identify NIE's concern for and responsivenessto recent reports of score changes which couldhave important social implications.

The Conference on Declining Test Scores was held fromJune 19 through June 21, 1976 in Washington, D.C. Forty pa -ticipants included testing experts from universities, testdesigning and publishing organizations, research institutes,NIE and other Federal agencies. A list of attendees appearsin Table 2.

The three-day schedule began with an opening address bythe Director of NIE, Harold Hodgkinson, that set forth someof the evidence for the meaningfulness of score changes andthe need for thoughtful investigation. Presentations fol-lowed concerning the nature of testing and test score inter-pretations as well as the public policy implications of al-ternative explanations for score declines.

Friday morning was devoted to short presentations fol-lowed by discussion on four problem areas related to scoredeclines:

11

Table 2.--Patticipant6 at the Con6eAence onDecaning Tut Sconeis

William AngoffAlbert BeatonElias BlakeHarold BlighJohn CarrollJames CassT. Anne ClearyJohn DraperRoger FarrRichar6 FergusonJohn FlalLaganWalter GillespieRobert GuthrieHarold HardingLyle JonesHugh LaneSam McCandlessJason MillmanAmado PadillaPhilip ReverDon SearlesMarion ShaycoftRobert ThorndikeRalph Tyler

David WileyDean WhitlaBelvin Williams

Educational Testing ServiceEducational Testing ServicInstitute for Services to EducationHarcourt Brace Jovanovich, Inc.University of North CarolinaSaturday Review/WorldCollege Entrance Examinatio!1 LoardMcGraw-Hill/CTBIndiana UniversityAmerican College Testing ProgramAmerican Institutes for ResearchNational Science FoundationOffice of Naval ResearchYardstick ProjectUniversity of North CarolinaInstitute for Services to EducationCollege Entrance Examination BoardCornell UniversityUniversity of California at Los AngelesAmerican College Testing ProgramNational AssoGsment of Educational ProgressAmc!rican Ins-Atutes for ResearchTeachers College, ColuMbia UniversityCenter for Advanced Study in the

Behavioral SciencesUniversity of ChicagoHarvard UniversityEducational Testing Service

From the National Institute of Education

Daniel AntonoplosMichael CohenJane DavidLinda GlendeningHarold HodgkinsonJackie JenkinsCarlyle Maw

12

Garry McDanielsAndrew PorterJack SchwilleMarshall SmithTrevor WilliamsArthur Wise

Summary of evidence for score declines--areasof declines, extent of declines, comparisonsamong tests and test-taking populations.

Impact of declines--public opinion and re-sponse to news of declines; implications forpublic policy and conduct of research; ques-tions of special concern to subject matterareas or to sub-populations of students.

Causes of declines--historical and sociolog-ical correlates of declines; changes in tests,curricula, student populations and home en-vironments.

Remedies and Research--summary of possibleinterventions, research guidelines, furtherdata requirements, and aid to schools andcolleges for administration and planning; op-tions for immediate action, long-term plan-ning, and future preparedness.

In the afternoon, participants formed discussion groupsto analyze causes of score declines, impact of declines onpublic policy, remedies and research guidelines, and psycho-metric explanations and data requirements for further re-search. Each group produced a summary of its discussion forreview by all participants on Saturday morning. The confer-ence concluded with a discussion of problem definitior ;

recommendations for future research.

The vigorous discussions at the Conference on DecliningTest Scores raised a number of important concerns and gener-ated valuable ideas for research. There did not appear tobe consensus on the reasons for the declines even after theevidence for the various viewpoints had been presented anddiscussed. But there did appear to be consensus that furtherresearch could, at the very least, narrow the options and be-gin to assess the importance of the reported score changes.Before summarizing the content of these sessions, Chapter 3will review existing research to provide a background for thediscussions and recommendations.

13

,16

CHAPTER 3

RESEARCH ON SCORE DECLINES

An important outcome of the Conierence on Declining TestScores was the exchange of information on previous and cur-rent investigations into test score trends. A number of suchinvestigations have been reported.by the American CollegeTesting Program (ACT), the College Entrance Examination Board(CEEB), and the Educational Testing Service (ETS). Other in-vestigations of time trends, which had not been so well pub-licized or which had not been seen as relevant to the inter-.pretation of score changes, were also discussed at theconference. In this chapter, four broad areas of existingresearch are reviewed:

predictive validity of college entrance exams,

stability of score scales,

changes in test-taking populations, and

changes in overall student ability.

The specific issues and research recommendations raised byconference participants (Chapter 4) are based upon the stud-ies cited below and upon updated information about studentpopulations and test scores.

Predictive Validity of College Entrance Examinations

The steady decline in test scores has raised questionsabout the validity of college entrance examinations. Sincethese tests have been designed to help forecast academic per-formance in college, test publishers have focused on consid-erations of predictive validity. It should be noted thatmore than test scores are required for adequate prediction;the producers of tests by and large agree that test scoresshould be used as supplemental information about college po-tential. This caveat is important to remember in understand-ing what the tests are designed to measure and what meaningcan be attached to score changes.1/

1/ "The essential supplementary nature of the SAT is furtherattested to by the manner in whidh its effectiveness is

1715

The lorediotive validity of the SAT is routinelyeô

by the 1id1t.1, Study Service, through which CEEB proviazcolleges Iwith dnalyses of how SAT scores are related to 011bsequent aQadeMc performance in college (McCandless, 1910.Usually, the fk-eshman -Year grade average is the criteria fcacademic eucce s used in these validity studies.

Coaverable studies for 1971 or 1972 freshmen and tothree frehmah classes since 1963 are available for 21 ofcolleges that Aarticipate in the Validity Study Service, Acomparison of there studies showed no evidence of a syst0/41-atic inorase Or aec:.7ase in the predictive validity ot ,SAT (mccandles, 197L). Seventeen of the colleges had c0/411)rable datA for 1967 and 1972 freshmen. For these collee,the validl-tY c4efficient (a correlation coefficient) fo thecomposite SAT scores (math and verbal) ranged from a low 444'f.149 to a hlgh of .602 in the 1967 studies, and from a lo4v.226 to a high of .632 in the 1972 studies. Since most o ethe SAT sao re qecline is post-1967, it is interesting to .1°1'5that the Mediarl validity coefficient for the 17 colleqe.421 for poth -ime periods.

CEzi3 is cdutious, however, about interpretation of 410validity data:

"Thiz evidence that the [predictive] va-of e SAT has not been somehow=Z th

ed by the score decline is drawn,Of c°hrse from records that happen to beavail able for analysis.after the fact andnot trom sets of data collected with

alking the SAT score decline in mind.F.t

Lut tsnis is characteristic of the datathat seem to bear on the score decline.All sbch data are fortuitously availableand 4ot exactly what we would like."(McC4hdless, 1975, p. 3)

ordinarW eval -bated. Its simple (zero-order) correlattollwith collOge Petformance is by itself not a sufficient tnAi'cator ot its usefulness in selection. Far more interest At.,taches to its i hcremental effectiveness, that is, to th% me'5gree to yil-ich it can improve the prediction of college vAdewhen combi4ed with high school records and Achievement /%e0tscores. Ole SiVr, therefore, is valued to the extent thktcan ac oethg unique to the other measures." (Angotft1971,

16

18

ACT has also found that the ability to predict collegegrades based on the ACT Assessment has remained stable. Na-tional norrs from several hundred colleges that participatein ACT's Standard Research Service show that the multiplecorrelation of the ACT Assessment and first-year grade pointaverages has been very stable (about .5) over the 1966-1973period (Ferguson and Maxey, 1975).

These ACT and CEEB findings are especially interestingin light of the steady increase in grade point averages atboth the high school and college levels. ACT data, based onhigh school and college grades of students who attended ap-proximately 400 colleges and universities which used the ACTpredictive services from 1966-67 through 1972-73, provideevidence of this increase. Students at these colleges re-ported their latest high school grades prior to their senioryear of high school for courses in each of four curriculumareas: Lnglish, mathematics, social studies and natural sci-ences. The mean of these four grades increased by .2 on a 4point scale, over the six-year period. First semester col-lege grade point averages of approximately the same group ofstudents increased, on the average, by .27 units over thesame time span (Ferguson and Maxey, 1975).

The increase in college and high school grades, known as"grade inflation," has received increasing public attention(Etzioni, 1975). The interaction of these two factors--in-creasing grade point averages and declining test scores--willprobably result in closer scrutiny of the predictive validityrrlasures of college entrance exa-,' axions.

Given the stated purposes of collegP entrance examina-tions, validity studies have focused almost exclusively onpredictive validity. Yet several of the causal hypothesesfor test score declines (e.g. changes in school curricula andin students' educational experiences) depend upon the spe-cific content of the tests and the specific nature of skillsassessed. The current focus on scores as indicators as wellas predictors of student ability necessitates further re-search into other aspects of test validity. Work on contentand construct validity is known to be difficult and has notbeen well researched in the past for a variety of reasons(Nunnally, 1975); nevertheless, further work in this area isindicated.

1917

Stability of Score Scales

Several researchers at ETS have investigated the possi-bility that declining test scores result from the increasingdifficulty of the tests. Every new form of the test isequated 2/ with earlier forms and scores are reported on acontinuing score scale (Angoff, 1975) . Over a number ofyears, however, small incremantal changes in difficulty mightresult in a substantial "drift" in the score scales.

CEEB has long been concerned with the methods of equa-ting test forms. The control and stabilization of test dif-ficulty is achieved by retaining a group of calibrating oranchor items in several different test forms to allow compar-ison across forms used in different years. Of course thistechnique assumes that the group of unchanging items remainsat a constant difficulty level in spite of changes in schoolcurricula and course requirements.

Stewart (1966) investigated equality of SAT forms from1944 to 1963 and Modu and Stern (1975) updated this investi-gation for the period of 1963 to 1973. In the latter st1;.dy.

Mao verbal and mathematical sections from 1963 and 1966 2ATforms were administered to two random samples of 197'3 Sh!test-takers. Modu and Stern concluded that the sAf may havebeen less difficult in 1973 than in 1963, and certainly notmore difficult. Such a finding implies that the actual dropin ability levels may be greater than reported.

Studies of score stability, like the investigations oftest validity, must be viewed cautiously for they attach spe-cial meanines to the words "stability" and "validity" thatare different from common usage. It is difficult to avoidconfounding score stability research with other variablessuch as test reliability and student ability. It is espe-cially difficult to avoid such confounding in studies whichrange over a decade of history and which use non-experimentaldata. Again, it seems fair to conclude that basic researchin this area has not received sufficient attention in thepast and that such research should now be actively encour-aged.

2/ Equating is a technical term for a number of techniquesused to compare different forms of a test given to differentgroups of students at different times. For a summary, seeAngoff, 1971.

18

2 0

Changes in the Population of Test-Takers

Some experts have accounted for the decline in testscores by showing that there have been changes in the charac-teristics of the population taking college admissions tests.These population changes may be caused either by (1) the in-clusion of student groups who, in previous years, were notcollege-bound, or (2) changes in the characteristics or test-taking patterns of traditional college-bound students.

The first hypothesis assumes that the population oftest-takers has expanded or become more diverse. Accordingto this hypothesis, college-bound students are the highestability students. As the proportion of high school studentswho go to college increases, the average ability level mustdrop. If this hypothesis is correct, test score declines in-dicate the success of egalitarian educe,tional policies (e.g.Federal student loan programs, open enrollment and affirma-tive action policies, and special recruitment and counselingservices).

Evidence of increasing test-taking rates is inconclu-sive, however. CEEB has examined the number of high schoolgraduates, SAT-takers and college entrants as proportions ofthe total population of 18-year-olds, over time (McCandless,1975). The rate of high school graduation for both sexes in-creased from 60-65% in 1959 to approximately 75% in 1965.Since then, the percentage of all youth graduating has sta-bilized at or near 75%. The proportion of high school stu-dents who go straight on to college has also increased since1959. After a brief interruption around 1963, the growthtrend continued until 1968-69, when it peaked at approxi-mately 40% of 18-year-olds. Since then, the rate of immedi-ate college entry after high school has declined five per-centage points for females and ten pc 'entage points formales.

Actual college admissions test-taking rates are diffi-cult to obtain. Some indirect evidence indicates substantialincreases, but only before the period of the observed scoredeclines. In 1959, the number of students taking the SAT wasabout 50% of the number of immediate college entrants. By1964, the number of SAT-takers was greater than 65% of theimmediate college entrants; this percentage appears to haveremained constant through 1973 (McCandless, 1975). Whetheror not this constant percentage also implies an unchangingcomposition of student types and abilities deserves furtherinvestigation.

2.119

re.

.s also of interest to note that the proportion of:ing the ACT and the SAT has increased during thetde. In 1967, approximately 46% of SAT and ACT test-re women; by 1974, about 50% of SAT-takers and 52%tkers were women. Table 1 of this report revealsLle scores on the verbal section of the SAT haveLn average of 37 points from 1966-67 to 1974-75.Lis same time period, male scores on this sectiononly a 26 point decline. Similarly, since 1965-66!omposite has dropped one standard score for men andLard scores for women.

be proportion of women taking college admissions; increased, their scores have decreased more mark-; have the scores of men. Supporting evidence thatality and achievement trends deserve more researchL comes from the National Assessment of Educational!MAEP). Results from NAEP Assessments show that thewhere females consistently outperform males is inMales generally do better than females in mathe-cience, social studies and citizenship (NAEP News-Ictober 1975). The NAEP results further show that inx subject areas, "males and females at age 9 show

20

2

. Recommendations:

scholastic understandings that are fairly equal. By age 13,however, females have begun a decline in achievement, whichcontinues downward thrcugh age 17 and into adulthood."

Subgroup breakdowns by sex are only one way to analyzechanges in the characteristics and test-taking patterns ofthe population taking the ACT and SAT. Investigators haveexamined other test-taking subgroups, including:

college persisters: those students who remainin college after their freshman year;

achievement test-takers: those students whonot only take the SAT, but who also take oneor more of the CEEB subject-matter achieve-ment tests; and

SAT repeaters: those students who take theSAT twice, in the junior and senior years ofhigh school.

Rever and Kojaku (1975) analyzed ACT score trends amongcollege persisters, those students who took the test in highschool, enrolled in college and completed their freshmanyear. A sample of institutions that participate in annualpredictive validity studies was taken from ACT's ResearchService file. A comparison of the test scores of just thosefreshmen who subsequently completed their first year in col-lege in 1966 and in 1973 yielded an important finding: noscore decline. There were no uniform significant differ-ences in test scores of students who completed their freshmanyear of college. Rever and Kojaku conclude that, "the dataclearly suggest le tendency for lower-scoring students notto complete the first year of college" (p. 10).

The CEEB Achievement Tests are one-hour objective testsin fourteen ,,,cademic subjects. About one-fourth of the stu-dents who take the SAT also take one or more CEEB Achieve-ment Tests. Students take these tests to demonstrate subjectmatter preparation for college, to obtain placement in col-lege level courses, and to fulfill admissions requirements.

Investigators have averaged the SAT scores over time ofthose students who take both the SAT and at least one CEEBAchievement Test. This subgroup of SAT-takers shows a mark-edly smaller decline in SAT scores than does the entire SAT-taking population. For some special groups of students whotake specific CEEB Achievement Tssts (such as physical

21

sciences or European history) the results show an increase inSAT scores from 1966 to 1974. The conclusion from this find-ing depends on whether this subgroup is a stable and defin-able part of the college-bound population. These results dooffer some further indirect evidence that population changescan make a difference in score trends. This reinforces theneed for detailed breakdowns of overall score trends in anythorough investigation of the reasons for score declines.

It should be noted that trends in CEEB Achievement Testscores have also been investigated, but they have not re-ceived as much public attention as the SAT scores. The CEEBAdmissions Testing Program has examined score distributionsfor the seven most frequently chosen achievement tests aswell as a score distribution for averages across achievementtests taken. The overall average score increased from 526 in1971-72 to 531 in 1974-75. Several individual achievementtests showed slight declines over the same time period. Ingeneral, however, these data do not support the hypothesizeddecline in achievement for this group of test-takers. (Col-lege-Bound Seniors, 1974-75, 1975.)

Another area of research related to the hypothesis of achanging population of test-takers has been the study of thedecreasing number of SAT repeaters (students who take the SATexam as high school juniors and repeat it as seniors, eitherfor practice or to better previous scores). The decreasingnumber of SAT repeaters may be due either to early admissionin college for those who do especially well on the junioryear administration of the test, or to more negative atti-tudes (decreasing motivation) for taking tests. Either rea-son could contribute to the score decline for the larger setof SAT test-takers. For example, if the most able studentsand highest junior year scorers were selectively less likelyto take the senior year SAT because of early admission, thenthe decline in scores could reflect the fact that this groupof able students was absent from the SAT-taking population.

McCandless (1975) found that although the nuMber of re-peaters has decreased recently, the number of such studentsis too small a proportion of the total SAT-taking populationto have much of an effect (he estimates that the decrease innuMber of repeaters accounts for less than one-third of apoint in the national mean in any single year).

Using a different methodology, Bullock and Stern ana-lyzed SAT scores for 1966-67 and 1973-74 by removing all non-senior scores and by selecting high school students who took

22

2 4

the SAT on the same date. As a result of this grouping andthe elimination of junior year scores, the mean decline ofscores between these years was trimmed from 27 to 22 points.

Changes in Overall Student Ability

A hypothesis on score trends that has received nationalmedia attention relates score declines on college entranceexams to a more general decline in abilities of all students(not just the college-bound). This explanation has sometimesbeen accompanied by a critic3sm of educational programs andpriorities. At the present time, however, research cannotlink the score changes on tests for the college-bound to theprobable score changes for the entire high school populationbecause of sampling errors and non-random selection of stu-dents.

Other test data exist, however, which can be used to es-timate trends in student performance over time. These testsare not college admissions tests and therefore are not sub-ject to the criticism of a restricted population. There areother restrictions, however, on many of these scores (e.g.limitation to a particular state). Although these data areinconsistent and inconclusive, they are of interest both forwhat they indicate and for what they suggest for further re-search.

Flanagan and Jung (1970) administered a reading compre-hension test in 1960 to a random sample of high school stu-dents (not just the college applicant population previouslydiscussed) participating in Project TALENT. Ten years laterthe same test was given to a subsample of these same schools.Over this ten year period, the mean score in the test in-creased from 30.81 to 31.25.

Using data from the Preliminary Scholastic Aptitude Test(PSAT), administered in 1960, 1966, and 1970 in a random se-lection of schools, Schrader and Jackson (1975) found thatmean verbal and mathematics scores had remained virtuallyconstant. Although there may be sampling problems that couldlimit the generalizability of this finding, it is reasonableto conclude that the evidence does not support the hypothesisof a general decline in student abilities. More importantly,these data also show that the score decline does not existfor certain student populations even when SAT test items areused (the PSAT is composed of SAT test items).

2 5

23

NAEP data, referred to above, show an overall decline instudent achievement. It is regrettable that the NAEP datacollection effort 1442 not begnn sooner so that the timetrends could be more easily compared, but there is across-time data available for three subject matter areas:

science, 1969-1973;

writing mechanics, 1969-1974; and

basic reading/functional literacy, 1971-1974.

In science, there was a decrease in numbers of studentswho could answer most questions correctly. This finding ap-plies to both 9 and 13 year-olds; the largest decrease incorrect responses was for the older group. In writing me-chanics an overall decline was noted for 13 and 17 year-olds,with a possible increase in correct responses for 9 year-olds. As before, it was the oldest group that performed mostpoorly over time.

In basic reading, there were some gains for the 17 year-olds. This finding was cited as a major inconsistency andused to challenge the evidence for a decline in other stud-ies. Although it must be admitted that NAEP data (unlikeother test data) are sampled specifically for nation-wide rep-resentation, this NAEP test battery measures very elementaryskills, such as alphabetic ordering, reading road signs, andusing telephone books. Accordingly, it may not reflect avery high level of intellectual functioning.

Some participants at the Conference on Declining TestScores noted that it was in this area of basic literacy thatnational attention and financial support had been focusedover the time period in which other score declines had oc-curred. For example, the Right to Read effort of the Officeof Education and other compensatory programs have been con-cerned with functional literacy at the level tested by theNAEP instruments.

Another test that yields score data over time is theMinnesota Scholastic Aptitude Test, which has been adminis-tered to over 90% of all Minnesota high school juniors since1959. The test yields a single score and is a measure ofverbal aptitude. The scores on this test show an increasefrom 29.39 in 1967-68 to 34.71 in 1966-67. From that pointon, a decrease in average scores is reported 31.05 in 1972-73). This is roughly the pattern of the ACT und SAT scoretrends for college-bound students.

24

2 6

The Iowa Tests of Educational Development (ITED) havebeen given to high school students in Iowa; results have beenreported since 1962. The test includes seven skill areas,and is also reported in a single composite score. This com-posite score incieased from 14.0 in 1962 to 14.5 in 1965, butdeclined to 13.5 in 1974.

The Iowa Tests of Basic Skills (ITBS) have also beenpart of the Iowa Testing Program; state-wide data have beenreported for elementary schools since 1965. Average scoreson most of the test batteries increased slightly or remainedconstant through 1966. Scores for the upper grades then de-clined until 1975. The lower primary grades, however, do notshow a decline. Sor-e conference participants noted thatearly elementary grades had been the major recipients of Fed-eral and state support (e.g. Headstart, Follow-Through)throughout the period of other reported score declines andthat such support may be related to the lack of a decline inthe primary grades.

The Armed Forces Qualification Test (AFQT) contains fourcontent areas: word knowledge, arithmetic reasoning, spatialperception, and knowledge of tool functions. AFQT scoresfrom 1958 through 1972 are reported in Karpinos (1975). The

results show that there has been continuous improvement intest scores throughout this period. The most significantchange was the drop in the number of pre-inductees or drafteesscoring below the tenth percentile score (a disqualifyingscore). Although the other percentile score groups thereforecontain greater numbers of scores over the time period, thetop ability group (the ninety-third percentile) actually showsa small decrease. This could be explained by changing defer-ment policies for college students during the 1960s.

CEEB has supported research on the hypothesis thatchanging student abilities account for the declines inscores. Using data from College.Board "norm" studies in 1960,1966, and 1974, the Board estimated SAT score averages forall eleventh graders (not just college-bound). Tile estimates

showed increases in verbal ability between 1960 an0 1966which were then reversed between 1966 and 1974. Math Abilityreflected "trends" in the opposite direction; ,decrease be-tween 1960 and 1966 and an increase between 1966 and 1974(The College Board News, 1975).

These estimates are projections for eleventh graderstaking the SAT in October. The CEEB, based on other re-search, has hypothesized that SAT score decline may be partly

2 725

caused by a decrease in the growth of student abilitiesduring the junior year of high school. Many students takethe PSAT early in their junior year and then take the SAT intheir senior year. A comparative study of seniors in 1967-68and 1973-74 indicated that the amount of increase in testscores between the PSAT and the SAT testing was smaller forthe 1974 class: the average gain in scores decreased 18points (The College Board News, 1975). Once again, thereader should be cautioned concerning the acknowledged sam-pling restrictions.

This review of selected research on declining testscores illustrates the complexity of the problem which facedparticipants at the NIE Conference. In light of these factsand assumptions, participants raised the issles and recommen-dations included in the following section.

Readers interested in graphic and tabular displays ofdata discussed in this section are referred to the TechnicalAppendix of this report.

26

2 8

CHAPTER 4

CONCERNS AND RECOMMENDATIONS OF PARTICIPANTS

A summary of plenary and small group discussions at theConference on Declining Test Scores is presented in thisChapter. The concerns raised at each conference session arefollowed by participants' recommendations for further inves-tigation. Some of this material, and information in earlierchapters, was provided by participants during their.review ofthe draft of this report.

Plenary Session

After an initial review of the major hypotheses of testscore declines (see Chapter 3), participants focused on theinconsistent trends in ability tests. Table 3 lists examplesof tests which have shown score declines and those which havenot. In general, the tests for early primary grades do notshow declines, while tests for upper primary grades and sec-ondary school levels do.

Some participants suggested that the concentration ofFederal funds which have created educational programs in thelower grades may account for the Absence of test score de-clines in primary age groups. But it was also suggested thattests of this age group were concerned with "rote" learningand that the declines in later scores might be restricted tomore abstract Abilities.

Societal changes that could have contributed to downwardscore changes were discussed:

categorical Federal aid for specific popula-tions, rather than aid to the general schoolpopulation;

reactions to the Vietnam War and their effecton student moods, attitudes, and general dis-affiliation;

increased television viewing and other out-of-school activities which do not involvereading;

2 9

27

Table 3.--Te4t TnendS Ove4 the Last Ten Veaius

Declines

Increases or No Changes

American College Test (Composite)

Composite Test of Basic Skills

Iowa Tests of Basic Skills (later gradeS)

Iowa Tests of Educational Development

Minnesota Scholastic Aptitude Test

National Assessment of Educational

Progress:

Science & Functional Literacy

Scholastic Aptitude Test

Air Force Qualifications Test

Amc ican College Test (Science)

Iowa Test of Basic Skills (early grades)

National Assessment of Educational

Progress:

Reading Achievement

Preliminary Scholastic Aptitude Test

Project TALENT

grades, retention in school, and graduation. On tnend, both scores and college grades themselves havew correlation with success in subsequent careers andaningfulness was challenged.

day morning's plenary session set the stage for moree small group discussions in the afternoon. Fouronducted simultaneous discussions on (1) policy is-impact, (2) analysis of causes, (3) psychometric

nd data requirements, and (4) guidelines for researchdies.

oup Discussion: Policy Issues and Impact

ardless of the direction or magnitude of achievementends, participants agreed that public policy shouldat providing students with a higher than currentperformance with respect to verbal and mathematicalPublic education offers substantial and continuingities for improvement and these opportunities shouldely supported.

areas of public policy were focused upon in this(1) attrition of students during their first year

ge; and (2) cultu.:al or social bias in testing. ACT

res have been declining for students applying to col-llege-bound), yet the ACT scores of just those stu-mpleting their freshman year have remained stable.earch finding suggests that the score declines may be

31

29

associated with increased attrition in the first year of col-lege, rather than with the quality of persisting college stu-dents. In other words, increased opportunity to enter highereducation may not have resulted in increased opportunity toobtain a degree because of increased dropout rates in thefirst year of college.

The discussion of whether college entrance examinationsfairly represent the ability to benefit from a colleje educa-tion included comments on the differential effect of timedtests on subgroups of test-takers and on alternatives topaper and pencil assessment. There was general agreementthat tests were an improvement over the subjective and arbi-trary selection procedures they had historically replaced.The history of the college admissions tests was cited to il-lustrate that these tests effectively opened the doors ofelite colleges and universities to the middle class. Butthere remained questions of fairness to minority and lowersocioeconomic groups.

Recommendations:

Utilize existing data, e.g. National Assess-ment of Educational Progress, to reanalyzetest data of comparable groups to corroboratecollege-bound score decline with differentdata bases.

Survey teachers, school administrators andstudents to study their perspectives on themeaning and causes of score declines and ofinconsistent score trends (non-declines).

Evaluate relevance of specific test itemsused in equating protocols to give contentvalidity and interpretation to score declineand to identify the "meaning" of the scoredecline with respect to item type.

Small Group Discussion: Analysis of Causes

This discussion group explored five hypotheses relatedto test score declines: changes over time in (1) the test-taking population, (2) school curricula, (3) attitudes towardtaking tests, (4) test content, and (5) test-taking patterns.Each area of discussion is summarized below, with accompanying

30

3 2

research recommendations. General recommendations growingout of the afternoon discussion are also included.

In considering changes in the test-taking population,participants cited a need for data on the stability of se-lected high school populations and on subgroup analyses toshow test score changes within these subgroups. Additionaldemograpnic data, by geographical area, were seen as useful.The participants also urged quantitative and qualitative as-sessment of sociological trends: attitudinal surveys, druguse patterns, number of single parent homes, number of work-ing mothers, etc. Sociological trends should be studied inrelation to specific test patterns. For example, patterns oftelevision viewing may affect out-of-school reading patterns(a shift from reading to auditory comprehension). The samesociological changes may have different effects on the abili-ties of different age groups.

Recommendations:

Analyze geographic and demographic breakdownsof available data on test scores over time.

Analyze test score trends by type of institu-tion (research universities, four year privateschools, state colleges and universities, com-munity colleges).

Reanalyze ACT and SAT data by student sub-groups to test hypotheses about changing popu-lations as the cause of score declines on col-lege admissions tests.

Conduct correlational studies, i.e. televisionviewing and reading patterns over time in spe-cific subgroups of students.

Participants recognized the increasing diversity of pub-lic school curricula. It has not been possible for test pub-lishers to incorporate all of these changes in their tests togive full representation. Many publishers have instead triedto diminish the effect cf specific curricula by avoiding all"method-specific" questions and by staying on ground commonto all curricula. Questions remained, however, about whethertest score changes could be related to the type of schoolcurriculum offered and skills emphasized. The content and skillstaught by curricula developed in the 1960s may not be fully

31

3 3

covered in tests which are equated to test forms developed inthe 1940s and 1950s.1/

Recommendations:

Identify aspects of school structure, schoolpractice, and teaching activities that showchange over time and relate such changes totest score trends.

Analyze the possible effects of changes madein tests and item selection procedures whichwere designed to avoid dependence on particu-lar curricula (tests were made less method-specific and designed to eliminate the advan-tage of having been taught a specific curric-ulum; also tests were "balanced" to includerepresentative sections from different cur-ricula).

Student attitudes toward test-taking may have shiftedover the past decade. Participants believed that anxiety oftest-takers may be an important variable to study. Similarly,the relationship between student motivation and the specifictype of college to which the student aspires may be a fruit-ful area of research. Schools that select national studentpopulations may show more or less decline in scores than dothose institutions enrolling students from a restricted geo-graphic area, and perhaps attaching different importance totest scores.

Recommendations:

Conduct attitudinal surveys over time, sec-ondary analyses of existing data, and retro-spective surveys to isolate motivationalchanges with regard to test-taking.

Collect data concerning test-taker anxiety(frequency of no-shows, noncompletions, at-tendance at review courses, and theft at-tempts, as possible proxy variables).

1/ The stimulus for criterion-referenced testing is in part aresponse to such issues. To the extent that nationallystandardized tests focus on "general" educational objectivesand exclude curricular changes at local levels or changes inlocal educational objectives, this concern with content va-lidity seems justified.

3 432

Iu the process of test construction, especially withconcern for the continuing relevance of entrance examinationsin a rapidly changing society, the content of the tests mayhave changed significantly over time. In addition, the ef-fort to broadeu test relevance for new groups of test-takersmay be adversely affecting the group for which the tests wereoriginally developed (traditional college-bound students).

Recommendatinns:

Analyze response patterns to test items whichhave reappeared on tests over a ten year pe-riod; teanalyze patterns for different popu-lation subgroups, if possible using informa-tion about curricular patterns over time.

Conduct detailed analysis of specific skillstested by college entrance examinations and,where pertinent, of curriculum changes overthe past decade. Examine score trends byskill (conceptualization, computation). Theseanalyses should be set in the context of cur-riculum changes over time.

Aspects of shifting test-taking patterns that may impacton score changes include the time of year that tests havebeen administered and the proportions of students taking thetests on different testing dates. A gradual change in testdates over several years may account for some of the scorechanges, especially when coupled with admissions patterns forcolleges (the best students can get early admission and neednot retake thz test).

Recommendation:

Analyze score trends by test date and includeseparate analyses for students who re-takethe tests, with information on college choice,college attended, and admissions policies (whensuch information is available for the timeperiod of interest).

During the discussion, other ideas were introduced whichdid not fit easily under the above headings. These arelisted below as general recommendations.

3 5

33

General Recommendations:

Examine international data to help isolateand test cultural or societal factors (e.g.television viewing habits, curriculum changes,admission policy patterns, etc.).

Study differences, and impact of differences,between ability and achievement tests in itemdesign and item selection procedures.

Compare different tests (content profiles,subject matter differences, etc.) to probecauses of inconsistent trends in test scores.

Small Group Discussion: Psychometric ScaJls .Lnd Data Require-ments

Participants in this group agreed that test scorechanges are large enough in size and important enough in in-terpretation to warrant further investigation. Such investi-gation will be most cost effective if begun soon, since mean-ingful results requre longitudinal studies. Also, unin-formed speculation about score declines, which unhappily re-ceives widespread media attention, may lead to incorrect de-cisions that could adversely affect educational programming.

Although it was generally doubted that the drop in testscores could be attributed to the mathematical techniquesused to equate scores of different tests taken at differenttimes by different groups of students, it was neverthelessfelt by some that there was a need for detailed study ofthese procedures and better dissemination of the techniquesso tnat they could receive discussion. Also identified wasthe need for more information on test-taking subgroups, de-tails of score changes over time, and specific tests andmeasures showing changes.

The group cautioned planners that most of the researchthat can be done in this area will not be rigorously experi-mental and therefore will lead to correlational, not causal,information. Participants agreed that the general focus offuture research should be on source of score variabilityover time, rather than on just the reported declines in col-lege admission test scores.

34

3 6

Recommendations:

Gather and systematically analyze existingdata; focus on new variables for further hy-pothesis testing in existing data sets; in-vestigate effect of grade inflation on valid-ity coefficients for college admissions tests;in7cstigate and compare procedures for equat-ing test scores over time.

Conduct comparative analysis of ProjectTALENT (1960), Coleman (mid 60s) and NationalLongitudinal Study (1972) data to corroboratecollege test score trends.

Analyze PSAT, norms studies, state assessmenttests, and follow-up studies which containdemographic and attitude information andwhich do not show score declines.

Survey school systems to determine availabil-ity of test data and willingness to cooperatein data collection; data should includescores on tests, indicators of curriculumcontent, and relation of test scores to ad-mission practices for college-bound students.

Continue support of National Assessment ofEducational Progress (NAEP); NAEP should (1)develop indicators of educational progressand (2) collect data on more variables thatcould lead to experimental studies.

Examine tests at item and response levels;look at alternative scoring procedures toisolate content of decline scores.

Review literature (including studies in press)to develop a matrix of existing data; agreeon set of descriptors to insure comparabilityof planned research and establish a clearing-house to coordinate and disseminate informa-tion.

Conduct descriptive and comparative studiesof ACT, PSAT, and SAT test-takers, and ofcollege-bound and non college-bound students.

35

3 7

Small Group Discussion: Guidelines for Research and Remedies

Participants debated the meaning of test scores, ques-tioning whether the scores were directly interpretable orwere relevant only as proxies for specific educationalachievements. The group agreed that a first step in furtherresearch on decline in scores was to define the importantquestions in terns of test validity (predictive, content andconstruct validity). Researchers should recognize that dif-ferent constituent groups (parents, students, college admis-sions offices) will be interested in different questions andfindings, and will have their own interpretations and empha-ses for test validity. Research should be general enough tobe useful to this broad constituent group.

Once again, participants called for the assessment ofstudent attitudes about tests. Within this area, researchersmay want to focus on the effects of grade inflation or otheradmission requirements on motivation for test performance andon changes in the population of students taking the test(e.g. non-repeaters resulting from early admissions or openenrollment policies).

Recommendations:

Conduct studies to define the areas of inves-tigation that are important to different in-terest groups (parents, administrators, etc.).

Study public policy questions related to skilllevels, e.g. what level of reading is criticalfor performance of jobs essential to society?What level is required for college completion?

Concluding Session

Following a presentation by a representative of eachsmall group session, the conference concluded with a generaldiscussion of the topic and of future research directions.Three themes that had been mentioned throughout the meetingswere reiterated: the decline in college entrance examinationscores is large enough and consistent enough to be worthy offurther investigation; we should not assume that there is asingle cause or correlate of test score changes (thus theemphasis should be on patters of important variables); and

3 El

36

existing data, which are plentiful, should be fully exploitedbefore expensive new data are collected.

Participants aired both agreements and disagreements asthe conference drew to a close. They differed on whethershifts in the population of test-takers could be a signifi-cant contributor to score decline, and on whether the purposeof the ACT and SAT was to select (screen) college applicantsor to assess student ability. Participants agreed, however,that the phenomenon of declining scores was both verifiableand statistically significant, and that the Federal govern-ment should guarantee support of a continuous monitoring sys-tem to avoid "panic research" and speculation.

Recommendations:

Guarantee a systematic Federal research effortto assess the broad impact of score trends.

Examine the possibility of different explana-tions for score trends in primary age groupsand in college-bound groups.

Systematically track national changes in pri-mary and secondary school curricula; formulateeducational indicators which can be reliedupon for this purpose.

.1,% :70 C,

3 9

37

CHAPTER 5

POLICY AND PLANNING IMPLICATIONS

The persuasive evidence for a declining trend in scoreson college entrance examinations coincides with increasingconcern about test bias and educational accountability. Theoverlap in these three areas of attention has generated anunusual amount of public, professional and news media specu-lation. The Conference on Declining Test Scores concludedwith a recommendation that NIE should assume a major respon-sibility for emphasizing objective and careful investigationof these problems and for coordinating the multi-faceted re-searcl-, into a comprehensive and useful framework.

Most of the diverse explanations offered for test scoredecline relate to the fundamental testing concerns of relia-bility and validity. Researchers in this area should beidentifying and estimating the sources of variation (overtime) both in individual test scores and in classroom orschool district averages of scores. A possible researchframework might include examination of four broad sources ofvariance in individual scores: family background character-istics, educational variables, individual psychological vari-ables, and societal/cultural factors. These broad areas in-clude the major variables discussed at the Conference on De-clining Test Scores.

These four factors jointly influence high school per-formance and test scores, which in turn influence college se-lection, grades, and graduation. Finally, all of these fac-tors may influence occupational satisfaction and success.The role of tests in this process and the interpretation ofscore changes with respect to this process need clarificationand debate.

With regard to test validity, the conference discussionshighlighted a lack of clarity in what the college entranceexaminations measure and in whether such measures can be usedas predictors equally well with different groups of students.Such tests are sensitive to individual differences in studentability, but there is some question as to whether they arealso sensitive to other individual, educational, or societalinfluences. Although the predictive validity of these testscores for college grades has been demonstrated, the gradesthemselves do not appear to be reliable predictors of

39

40

post-schooling variables. The evidence of grade inflationseems to open suspicion about the stability of grades as apredicted or a predictor variable. Concern for validity isheightened by the lack of evidence that these test scorescorrelate well with factors other than college grades andpersistence in college. Future research in the area of testvalidity should not focus exclusively on predictive validityroefficients and should use definitions of reliability andvalidity that reflect diverse social concerns.

The interest in score decline on college entrance examshas generated a great deal of activity in the research com-munity. David Wiley, of the University of Chicago, is pre-paring an internal policy repor for the Ford Foundation.The CEEB has announced the formation of a "blue ribbon" studypanel. The Educational Research Service is studying declinesin secondary level achievement test scores. Several confer-ences on this general topic are planned, including a seminarsponsored by the Edison Foundation and the I/D/E/A/ and asession on "Score Trends in National Exams and Their Implica-Llions," to be sponsored by the American Educational ResearchAssociation in April, 1976.

There is a significant role for NIE to play in the in-vestigation of the declining test score phenomenon. As sug-gested by the conference participants, a first step is a com-prehensive review of existing research as an aid in thedevelopment of a comprehensive research framework. NIE couldserve as a clearinghouse and reduce duplication of effortwhile making data available for secondary analysis. Simulta-neously, however, NIE should attempt to place the concernabout test score trends in its proper perspective. To do sorequires supporting research which will lead to a soand theo-retical framework dealing with factors that affect testscores of both individual students and population subgroups.Only when such a framework is available can we relate testscore chanyes to other educationally and socially importantfactors.

4 1

40

References

"A Million College-Bound Seniors Described in New ATP Report."The College Board News, September 1975.

Angoff, William H. "Scales, Norms and Equivalent Scores." In

Educational Measurement, (2nd Edition). Edited by R. L.Thorndike. Washington, D. C.: American Council on Educa-tion, 1971.

Angoff, William H., ed. The College Board Admissions TestingProgram: A Technical Report on Research and DevelopmentActivities Relating to the Scholastic Aptitude Test andAchievement Tests. New York: College Entrance Examina-tion Board, 1971.

Breland, Hunter M. A Summary of Research Relating to the SATScore Decline. Princeton: Educational Testing Service,[1975].

Bullock, R., and Stern, J. Estimated SAT Summary Statisticsfor High School Cohcics 1967-1974. Princeton: Educa-tional Testing ServicP. L1974].

College-Bound Seniors, 1974-75. Iowa City: Admissions Test-ing Program of the College Entrance Examination Board,[1975].

Etzioni, Amitai. "Grade Inflation." Science 190 (October 10,1975).

Ferguson, Richard L., and Maxey, E. James. Trends in theAcademic Performance of High School and College Students.Iowa City: The American College Testing Program, 1975.

Karpinos, Bernard. AFQT: Historical Data (1958-1972).Alexandria, Virginia: Human Resources Research Organiza-tion, [1975].

"Males Dominate in Educational Success." NAEP Newsletter,October 1975.

McCandless, Sam A. "The SAT Score Decline and Its Implica-tions for College Admissions." Paper presented at the1975 Western Regional meeting of the College EntranceExamination Board, January 1975.

4 241

Modu, Christopher C., and Stern, June. The Stability of theSAT Score Scale. Princeton: Educational Testing Serv-ice, [1975].

Nunnally, Jam C. "Psychometric Theory--25 Years Ago and Now."Educational Researcher 4, (November 1975): 7-21.

Rever, Phillip R., and Kojaku, Lawrence K. Access, Attrition,Test Scores and Grades of College Entrants and Persisters:1965-1975. Iowa City: The American College Testing Pro-gram, [1975].

Stewart, Elizabeth E. The Stability of the SAT-Verbal ScoreScale. Princeton: Educational Testing Service, [1966].

42

4 3

APPENDIX

The Technical Appendix summarizes data on the various testsdiscussed in this report, including date of testing, populationtested, and other background information. Because these detailsmay not be of general interest, this Appendix (and additionalcopies of this report) is available only upon request from:

National Institute of EducationMeasurement and Methodology Division1200 19th Street, N. W.Washington, D. C. 20208

(202) 254-6512

4 443