Last updated on November 16, 2009
Oregon Department of Education
2007–2008 ELPA Validity and Reliability Oregon’s Statewide Assessment System
Annual Report Volume 10
Oregon’s Statewide Assessment System ELPA Validity and Reliability: Volume 10, Annual Report
Last updated on November 16, 2009
It is the policy of the State Board of Education and a priority of the Oregon Department of Education that there will be no discrimination or harassment on the grounds of race, color, sex, marital status, religion, national origin, age, or handicap in any educational programs, activities, or employment. Persons having questions about equal opportunity and nondiscrimination should contact the state superintendent of public instruction at the Oregon Department of Education.
Oregon Department of Education Office of Assessment and Information Services 255 Capitol Street NE Salem, OR 97310 503-947-5600 http://www.ode.state.or.us/ Susan Castillo State Superintendent of Public Instruction Doug Kosty Assistant Superintendent Tony Alpert Director, Assessment and Evaluation Kathleen Vanderwall Manager, Test Development Stephen Slater Manager, Psychometrics and Validity Melinda Bessner Manager, Analysis and Reporting
Ken Hermens Language Arts Assessment Specialist Leslie Phillips Science and Social Sciences Assessment Specialist Jim Leigh Mathematics Assessment Specialist Guillaume Gendre ELPA Assessment Specialist Cindy Barrick Research Analyst Tom Tinkler Psychometrics Specialist Saleem Ahmad Research Analyst Sheila Somerville Electronic Publishing Specialist
This technical report is one of a series that describes the development of Oregon’s Statewide Assessment System. The complete set of volumes provides comprehensive documentation of the development, procedures, technical adequacy, and results of the system:
Volume 1: 2007–2008 Annual Technical Report Volume 2: Test Development Volume 3: Standard Setting Volume 4: Reliability and Validity Volume 5: Test Administration Volume 6: Score Interpretation Guide Volume 7: Alternate Assessment, Program Description Volume 8: Alternate Assessment, 2005-06 Statistical Summary Volume 9: ELPA Test Development Volume 10: ELPA Validity and Reliability All volumes can be found at http://www.ode.state.or.us/search/page/?id=787.
ELPA Validity and Reliability Vol.10 Page 1 of 96
Section 1.0 – Overview - English Language Proficiency Assessment (ELPA) ............ 3 1.1 – Purpose of ELPA ................................................................................................... 3 1.2 – Oregon Administrative Rule #581-023-0100 ....................................................... 4 1.3 - Oregon‟s English Language Proficiency (ELP) Standards .................................... 4
1.4 – Defining Academic English ................................................................................... 5 1.5 - Academic Contexts ................................................................................................. 5 1.6 – Assessment Features .............................................................................................. 6
Section 2.0 - Introduction to Technical Adequacy ......................................................... 6 2.1 - Overview ................................................................................................................ 6
2.2 Computer Adaptive Administration ........................................................................ 7 2.3 - Assessment Scaling ................................................................................................ 8 2.4 - Field Testing ........................................................................................................... 8 2.5 – Annual Embedded Field-Testing Method ............................................................ 10
Section 3.0 – Content Validity........................................................................................ 10 3.1 - Rigorous Content Standards ................................................................................. 10
Section 3.2 – Consensus Driven Test Development ...................................................... 11 3.2.1 – Key Decisions ................................................................................................... 11
3.2.2 – Effective Test Administration and Design ........................................................ 12 3.2.3 – Research Based Conceptual Framework - Forms and Functions ..................... 13 3.2.4 – Technology Matrix............................................................................................ 14
3.3 – Consensus Driven Item Development ................................................................. 14 3.3.1 - Life of an ELPA Item ................................................................................... 14
3.3.2 -Principle Item Types; Relation to Domains ................................................... 15 3.3.3 – Distribution Across Grade Bands ................................................................. 16 3.3.4 – Order of Delivery .......................................................................................... 16
3.3.5 – Item Type Explanations ................................................................................ 17
3.4 - Test Specifications ................................................................................................ 19 3.4.1 – Relation to Validity ....................................................................................... 19 3.4.2 – Alignment History ........................................................................................ 19
3.4.3 – Ensuring Item Alignment with the Construct and Standards........................ 20
Section 4.0 Concurrent Validity ................................................................................... 21 4.1 - Explanation .......................................................................................................... 21 4.2 – Description of Consistency .................................................................................. 21
Section 5.0 – Reliability ................................................................................................. 22 5.1 - Standard Error of Measure ................................................................................... 22 5.2 - Item Analysis Methods for the ELPA .................................................................. 23
5.2.1 – Purpose of Item Analysis .............................................................................. 23 5.2.2 – Summary of Item Analysis Results............................................................... 24
5.3 - Strand Reliability .................................................................................................. 24 5.3.1 – Reliability Thought Number of Items ........................................................... 24
5.3.2 – Reliability Through Standard Setting and Precision at the Cut Scores ........ 24
Section 6.0 - Fairness and Accessibility ........................................................................ 26 6.1 – Test Administration ............................................................................................. 26
6.1.1 - Testing Requirements to Produce Valid Test Results ................................... 26 6.1.2 - Security of the Test Environment .................................................................. 27 6.1.3 - Testing Improprieties ..................................................................................... 27
ELPA Validity and Reliability Vol.10 Page 2 of 96
6.1.4 - Responding to Student Questions .................................................................. 28
6.1.5 – Testing Irregularities ..................................................................................... 28 6.2 – Sensitivity Panel Review ..................................................................................... 28 6.3 – Differential Item Analysis.................................................................................... 29
ELPA Validity and Reliability Vol.10 Page 3 of 96
Volume 10: VALIDITY AND RELIABILITY
Section 1.0 – Overview - English Language Proficiency Assessment (ELPA) 1.1 – Purpose of ELPA The purpose of Oregon‟s English Language Proficiency Assessment (ELPA) is to assess
academic English ability in reading, writing, listening, speaking, and comprehension for
English Language Learners (ELLs) enrolled in Oregon public schools in grades K-12.
As part of the No Child Left Behind Act (NCLB) enacted in 2001, states must annually
measure and report progress toward and attainment of English language proficiency by
ELLs enrolled in public schools. Under NCLB, states must develop English Language
Proficiency (ELP) content standards linked to content standards including those for
English Language Arts. Oregon English Language Proficiency test is aligned to the
forms and functions of the Oregon ELP content standards and describes the English
proficiency of students based on 6 domains: Total Proficiency, listening, speaking,
reading, writing and comprehension. Comprehension is a combination of the reading and
listening measures. Total Proficiency is a combination of listening, speaking, reading
and writing.
Oregon‟s ELP assessment is designed to satisfy the provisions of Title III of NCLB.
Scores are to be used for:
Providing an annual English language proficiency score and level for each
student;
Reporting annual measures of speaking, reading, listening, writing and
comprehension for each student;
Reporting Annual Measurable Achievement Objectives (AMAOs)
biennially to the federal government. Because ELLs enter school systems at
different ages with different degrees of English proficiency, AMAOs can be
based on cohorts, groups of students entering at a common age and
proficiency level.
AMAO #1: The number and percent of students making progress toward
English proficiency
AMAO #2: The number and percent of students attaining English
proficiency at the end of each school year
ELPA scores will not be used as the sole criteria for exiting students from English
development programs. Each district will continue to construct its own criteria and
procedures for ending services to students as they become fully proficient. ELP
assessment results may inform exit decisions as part of a set of evidence including
teacher recommendation, grades and other information supporting exit decisions.
ELPA Validity and Reliability Vol.10 Page 4 of 96
1.2 – Oregon Administrative Rule #581-023-0100 Sections Relevant to ELPA (see Appendix A for entire rule)
(d) "Language Minority Student" means:
(A) Individuals whose native language is not English; or
(B) Individuals who come from environments where a language other than English is
dominant; or
(C) Individuals who are Native Americans or Native Alaskans and who come from
environments where a language other than English has had a significant impact on their
level of English proficiency.
(4) Pursuant to ORS 327.013(7)(a)(B), the resident school districts shall receive an
additional .5 times the ADM of all eligible students enrolled in an English as a Second
Language program. To be eligible, a student must be in the ADM of the school district in
grades K through 12 and be a language minority student attending English as a Second
Language (ESL) classes in a program which meets basic U.S. Department of Education,
Office of Civil Rights guidelines. These guidelines provide for:
(a) A systematic procedure for identifying students who may need ESL classes, and for
assessing their language acquisition and academic needs;
(b) A planned program for ESL and academic development, using instructional
methodologies recognized as effective with language minority students;
(c) Instruction by credentialed staff and trained in instructional strategies that are
effective with second language learners and language minority students, or by tutors
supervised by credentialed staff trained in instructional strategies that are effective with
second language learners and language minority students;
(d) Adequate equipment and instructional materials;
(e) Evaluation of program effectiveness in preparing ESL students for academic success
in the mainstream curriculum.
1.3 - Oregon’s English Language Proficiency (ELP) Standards The Oregon Department of Education, in partnership with educators throughout the state,
developed Oregon‟s English Language Proficiency Standards. These standards describe
progressive levels of competence in English acquisition for five proficiency levels:
beginning, early intermediate, intermediate, early advanced and advanced. English
language proficiency levels set clear benchmarks of progress that reflect differences for
students entering school at various grade levels.
As specified in Title III of NCLB, ELP content standards are designed to supplement the
existing ELA academic content standards to facilitate students‟ transitioning into regular
education content classes. ELP Standards were designed to guide language acquisition
to allow English Language Learners to successfully participate in regular education
classes. ELP assessments measure ELP standards, not English Language Arts (ELA)
standards. This is an important distinction, as ELP content validity is based on the degree
to which tests reflect ELP content standards, which, although designed to supplement the
ELA standards, is quite different in structure and meaning. ELLs are required to take
ELPA Validity and Reliability Vol.10 Page 5 of 96
ELP assessments in addition to ELA and other content assessments. Therefore, the
domain of ELP assessments differs from English Language Arts.
1.4 – Defining Academic English For the purpose of this test, Academic English is defined broadly as the English
necessary to function and communicate successfully in the United States‟ school system.
It includes the language of interaction between students and teachers (How are you?;
Would you help me please?), vocabulary related to the school and classroom objects
(blackboard, pencil, dictionary, library), direction of student behavior (line-up, go to the
cafeteria, recess ends at 12:30), explicit content language (osmosis, square root, quarter
note), and reading passages connected to content standards and responding to questions
based on the reading passage (The first flying craft constructed by the Wright brothers
was a glider, which they flew like a kite. In the story the word “constructed” means the
same as built, bought, crashed, found.)
Regardless of specific language types found throughout the test, an important
consideration in the creation of ELPA concerns the differences inherent in testing
academic language as opposed to prior knowledge of a content area (see Academic
Contexts below).
1.5 - Academic Contexts Because language use is always couched within a context, ELPA was designed to include
a number of different school-related situations and contexts, such as the following:
Math
Science
Social studies
Language arts
Supplementary (art, music, drama, sports, recess, library, cafeteria)
However, this test is constructed such that language skills are assessed independently of
any potential knowledge of subject matter, or lack thereof. The inclusion of context-
based items does not assume that the student possesses prior knowledge of explicit
content for these areas. Contexts differ from content, and should not be equated. Thus, a
dialogue between two students may take place in the science lab (context) and discuss the
class‟s assignment (content), but the language skill being tested might be verb
conjugation, not science content (e.g. Yesterday we learned how to use the microscope;
the remaining foils might be learn, learning, learns.) An ELPA item set within a science
context will not require students to have prior knowledge of, for example, the various
parts of a microscope, or the parts of a cell, in order to successfully complete the item.
That is, ELPA is not designed to assess content of specific subjects; rather, test items are
situated within, and draw upon the language of, familiar school-related contexts.
ELPA Validity and Reliability Vol.10 Page 6 of 96
1.6 – Assessment Features
The State of Oregon ELP Assessment has the following features:
Web-based adaptive
Research-based and documented
Aligned to the Oregon ELP (English Language Proficiency) standards
Aligned to the Oregon ELA (English Language Arts) content standards
Valid and reliable
Conducted in English
Tests the following grade bands: K-1, 2-3, 4-5, 6-8 and 9-12 and is required of all
ELLs enrolled in these grades
Delivered within an assessment window
Produces a score and level for overall academic English proficiency. Cut points
are established on the overall English proficiency scale.
Produces sub-scores in four sub-domains: listening, speaking, writing, and
reading
Reports a measure of comprehension as a combination of listening and reading
Demonstrates growth in English language acquisition skills over time.
Applicable to students of any language or cultural background
Supports Title I accountability and Title III program evaluation in local school
districts.
Section 2.0 - Introduction to Technical Adequacy
2.1 - Overview
The Oregon English Language Proficiency Examination (ELPA) is an across grade
(Kindergarten through 12th Grade), multi-domain assessment covering reading, listening,
writing, and speaking. Comprehension is derived from reading and listening scores; a
Total Proficiency score is derived from the first four domains). The assessment employs
multiple item types including multiple-choice (MC); picture-click (PC) items; cloze (CZ)
items; elicited information (EI) items; short-answer (S2), word-builder (WB) items; and
extended response (ER) items.
For purposes of scoring and item analysis, items can generally be classified into one
of two categories: selected-response (SR) or constructed-response (CR). SR items
typically provide multiple response options and require the examinee to select one of the
options1. CR items essentially allow free response and the response performance is
scored by some established rubric. Rubrics can be dichotomous (i.e., correct=1, incorrect)
or polytomous, with scores ranging from 0 to 3 points.
ELPA Validity and Reliability Vol.10 Page 7 of 96
2.2 Computer Adaptive Administration The ELPA is administered as a two-stage computer-adaptive multistage (ca-
MST) test (Luecht & Nungester, 1998, 2000; Luecht, 2004). This type of test presents a
fixed-length locator block. If an examinee scores poorly on the locator block, (s)he is
routed to an easier testlet of items. If an examinee performs extremely well on the locator
block, (s)he is routed to an a harder block of items; otherwise, the examinee is
administered a moderate-difficulty block. Figure 1 presents the generic ca-MST design.
Target
Item Locator Block A (Easier)
Block B
(Moderate)
Block C
(Difficult)
Difficulty L S R W L S R W L S R W L S R W
-2.0 1 0 2 1 3 5 3 2 0 0 0 0 0 0 0 0
-1.5 2 0 2 2 2 5 2 3 3 5 3 2 0 0 0 0
-1.0 2 0 1 2 3 5 3 2 2 5 2 3 3 5 3 2
+1.0 2 0 1 2 2 5 2 3 3 5 3 2 2 5 2 3
+1.5 2 0 2 2 0 0 0 0 2 5 2 3 3 5 3 2
+2.0 1 0 2 1 0 0 0 0 0 0 0 0 2 5 2 3
Column
Totals
1
0 0 10 10 10 20 10 10 10 20 10 10 10 20 10 10
Figure 1. ca-MST Layout for the Oregon ELPA
Multiple selections can be accommodated under a SR item format; however, no mSR
items are included on the ELPA.
Reading from the left, the “Target Item Difficulty” column describes the relative
difficulty of the items, where a minus sign indicates easier items and a plus sign
indicates harder items. The column headers, L, R, S, and W, denote the domain
(listening, reading, speaking, and writing). The locator blocks therefore contain 30 items,
followed by a block of up to 50 easy (Block A), moderate (Block B), or difficult (Block
C) items. This type of ca-MST design is statistically more efficient than a fixed test form
because it tailors the difficulty of the item block or “testlet” to the examinee‟s apparent
ability, resulting in more accurate scores (Luecht & Nungester, 1998). Item response
theory (IRT; Lord, 1980; Hambleton & Swaminathan, 1985) is used to calibrate all items
to a common scale, denoted θ. Despite being administered potentially items of differing
difficulty under the ca-MST design, IRT scoring can put all examinees on the same
measurement scale.
The ca-MST design described in Figure 1 reflects only the approximate item counts for
the operational (i.e., scored) items within domains and blocks. Individual cs4 MST
“panels” (i.e., the combination of a locator block and the three possible second stage
blocks for a particular grade level) may vary slightly in item composition, given the
availability of items in the ELPA item bank. The stage-two blocks also have “pretest”
slots to try-out new ELPA items. The new, grade-level-appropriate items are randomly
seeded into the pretest slots for purposes of gathering data solely to determine the
psychometric and statistical quality of the pretest items. Pretest items do not appear on
ELPA Validity and Reliability Vol.10 Page 8 of 96
the locator blocks and do enter into scoring for any students. The pretest items are
subsequently added to the ELPA item bank for possible inclusion on future test forms.
A total of 496 operational items and 218 pretest items were administered in
Spring 2008 across the five ELPA grade levels (K-1, 2-3, 4-5, 6-8, and 9-12). A cross
tabulation of operational item counts by grade-level block is presented in the CART
Report (see http://www.ode.state.or.us/search/page/?id=1561 – Cart Technical Report).
This listing provides exact item counts on the diagonal as well as shared-item counts
across blocks. Because the pretest items are randomly seeded onto the operational ca-
MST forms, it is not possible to specifically tie a pretest item to any ca-MST item block
(module). Therefore, the 218 pretest items are not reflected in those counts.
2.3 - Assessment Scaling Scaling decisions are based on the assumption that the four sub-domains (listening,
speaking, writing, and reading) work together to comprise a single English proficiency
scale. The scale is presumed to be unidimensional, although this assumption may be
revisited if data reveal pronounced dimensionality. The scale is a vertically linked
longitudinal scale, so that progress toward English proficiency can be measured as
required by Title III annual measurable achievement objectives. The comprehension
measure is derived from listening and reading scores. The method for this is a
mathematical formula, approved by a policy-making group.
There is an assumption that students at the same proficiency levels in adjacent grades
share substantial linguistic characteristics, differing primarily in developmental and social
factors. It is also assumed that transitional levels at upper grade bands will be higher than
those at lower bands. It is NOT assumed that students will grow one level each year.
Language acquisition experts have long agreed that younger students master linguistic
skill faster than older ones. They may disagree on the reasons for this, but everyone
agrees on the phenomenon. Proficiency levels represent stages of acquisition that
younger students, in general, work through faster than older ones. Consequently,
language proficiency levels across grades may look different than those for other content
areas where it is assumed that achievement levels across grades are progressively higher
on vertical scales. In ELP assessment, vertical linking blocks across grade bands may use
items of similar difficulty for all or most bands. Given these considerations, linking
blocks for the operational tests contain items from throughout the difficulty continuum.
2.4 - Field Testing The original field test was conducted with a minimum of 6000 students and provided
preliminary difficulty levels for items that fed the Spring 06 baseline test. Each student
took four blocks of 20 items each. The blocks represented only two sub-domains of
varying combinations (reading and speaking; speaking and writing; writing and listening;
etc.).
In addition to providing within-grade scaling and item calibration, fall field testing
allowed a dimensionality study to be conducted (see Appendix B). We wanted to know
ELPA Validity and Reliability Vol.10 Page 9 of 96
whether English Language Proficiency is a single skill, resting on acquisition of functions
and forms, or a combination of several skills with student responses more dependent on
sub-domain platforms than overall English proficiency.
The winter/early spring ‟06 field test was for linking item difficulty to form the vertical
scale. Fall ‟05 and Winter ‟06 field tests provided scaled items for the Spring 2006
Baseline ELP assessment. The design for this assessment appears below..
GENERIC LOCATOR TEST - Operational 2006
ELP Level L R S W L R S W L R S W L R S W
1 1 2 0 1 3 3 5 2
2 2 2 0 2 2 2 5 3 3 3 5 2
3 2 1 0 2 3 3 5 2 3 3 5 2 3 3 5 2 <= Core Block
4 2 1 0 2 2 2 5 3 2 2 5 3 2 2 5 3
5 2 2 0 2 2 2 5 3 2 2 5 3
6 1 2 0 1 3 3 5 2
Domain
total per
block 10 10 0 10 10 10 20 10 10 10 20 10 10 10 20 10
Block total 30 50 50 50
Numbers represent numbers of items, not points
Locator block contains all MC items, representing the full range of difficulty.
The SAME core set of 25 items repeats in blocks A, B, and C. These items are at intermediate difficulty.
Grade bands K-1 & 2-3
Block A contains NO SA2s or ERs (Speaking or Writing)
Blocks B & C contain 2 Speaking SA2s but NO Writing SA2s
All blocks contain NO ERs (Speaking or Writing)
Grade bands 4-12
Block A contains 2 Speaking SA2s and NO Writing SA2s
Block A contains no ERs (Speaking or Writing)
Blocks B and C contain 2 Writing SA2s and 2 Writing ERs (these are the exact same items in BOTH blocks)
Blocks B and C contain 2 Speaking SA2s and 2 Speaking ERs (these are the exact same items in BOTH blocks).
130 unique items
30 Locator
25 Core
25 Unique low
25 Unique mid
25 Unique high
130 TOTAL Unique Items
Block A Block B Block CLOCATOR Block
ELPA Validity and Reliability Vol.10 Page 10 of 96
2.5 – Annual Embedded Field-Testing Method Each year field test items are loaded into a pool and randomly selected from among the
items in this pool for each instance of test administration resulting in equal coverage with
better data for analysis.
This plan is annually realized in the 180 embedded items and the 15 field test forms
loaded into the test delivery system. Selection of these 180 items is a direct result of the
review and approval by the Content and Sensitivity Review.
Field-testing requires broad exposure of the embedded items across multiple districts and
schools. Minimally, each item receives 600 exposures prior to any analyses. Past
experience has shown that the procedure used for random exposure of items results in
approximately 3500 exposures for each item during the course of the testing window.
Section 3.0 – Content Validity
Content validity is the degree to which an assessment measures the knowledge and skills
it was designed to measure. It is a consensus driven process, typically determined by
expert judgment.
Evidence of content validity includes the following:
3.1 - Rigorous content standards identifying what students should know and be
able to do that were developed and revised with comprehensive review by Oregon
educators, parents, and other citizens.
3.2 - A consensus-driven test and 3.3 item development process, using panels
of educators from around the state to make judgments about the content relevance
and representativeness of potential items and tasks that ensure test item
faithfulness.
3.3 4 - Test specifications that provide a clear link between the test content and
the content standards and their corresponding performance levels; ongoing studies
to evaluate and increase the extent that instruction, assessments, and the ELPA
Standards are aligned.
3.1 - Rigorous Content Standards Content Standards describe what students in Oregon should know and be able to do.
The ELP Standards delineate the proficiency levels required to move through the levels
of English-language development (see Appendix C - Stages of Language Acquisition –
Social Dimension, and Appendix D – Acquisition of Language Functions and Forms).
They are designed to move all students, regardless of their instructional program, into the
mainstream English-language arts curriculum. The levels of developing proficiency in a
second language have been well documented through research. The ELP Standards were
designed around these levels to provide teachers in all types of programs clear
ELPA Validity and Reliability Vol.10 Page 11 of 96
benchmarks of progress. The standards provide different academic pathways that reflect
critical developmental differences for students who enter school at various grade levels.
The major benefit of adopting ELP Standards is to provide criteria that can be used to
document LEP students‟ progress or lack of progress in learning English.
A committee comprised of practitioners and experts in English language development
(ELD) and assessment developed the English Language Proficiency (ELP) Standards.
The standards were reviewed by teachers throughout Oregon, the draft standards were
posted on the ODE website for public comment. The standards were presented as an
informational item to State Board of Education (SBE) during their October 2003 meeting
with the understanding that the document would undergo some modifications and
additions to better align the ELP Standards with developmental proficiency levels and
with the Oregon English-Language Arts Content Standards that were adopted by the SBE
in January 2002 and June 2002 as well as the language used in the content standards of
mathematics, science and social studies.
Section 3.2 – Consensus Driven Test Development
A consensus driven test development process is important to ensure test validity.
Another important consideration, for both validity and reliability, is the development of a
test that is not too time consuming in the classroom or burdensome on the student.
3.2.1 – Key Decisions The following list summarizes the key decisions that were made by consensus with
regard to this assessment.
General 1. Testing will be conducted in English.
2. Assessment is not intended to be a placement or exit test. It is not intended to be
the only measure but rather one of many inputs to the overall plan for the student.
3. There are five grade groupings – K-1, 2-3, 4-5, 6-8, and 9-12. Tests cover grade
ranges; for example, there is a 4-5 test, not a fourth grade test, et cetera.
4. Tests are constructed to yield a single English language proficiency score which
maps directly to ELP levels (beginning, early intermediate, intermediate, early
advanced, advanced, transitional).
5. Proficiency level achievement standards were established for overall English
proficiency, not for sub-domains. Cut points will be established based on the
overall English proficiency scale.
6. Tests will report sub-scores in four sub-domains – reading, listening, writing and
speaking. A fifth sub-domain, comprehension, will be derived from sub-scores in
reading and listening.
7. Distribution of items among sub-domains is fixed so that each subdomain has an
equal number of items.
ELPA Validity and Reliability Vol.10 Page 12 of 96
Standards 8. Standards include descriptors for six proficiency levels: proficient, advanced,
early advanced, intermediate, early intermediate, and beginning.
9. The ELP standards are designed to supplement the ELA standards to ensure that
LEP students develop proficiency in both the English language and the concepts
and skills contained in the ELA standards. This connection is not a perfect match
or one-to-one correspondence.
10. The ELP assessment must be aligned with the ELP standards. Alignment requires
that each item address a specific ELP content standard.
11. ELP tests are based on the subset of ELP standards that are assessable.
Measurement
12. The object of this set of assessments is to monitor growth in English proficiency
across time. Because of this and because of the nature of language acquisition, it
is desirable to use a longitudinal, vertically articulated scale for the English
language proficiency construct. This requires blocks of vertically linking items
between adjacent grade groups.
13. Test scores will be used in reference to proficiency criteria rather than
expectations generated by norms.
14. The overall proficiency score and level will be based on an English proficiency
scale, not on separate scales for each sub-domain. Sub-domains (reading, writing,
speaking and listening) are goals or strands within the overall English proficiency
construct.
3.2.2 – Effective Test Administration and Design A key goal is to create an assessment that measures accurately, is valid, reliable and
useful to the field. One of the important efforts is to create an assessment that is not too
time-consuming in the classroom and does not place undue burden on the student,
teachers or proctors during the course of test administration. This requires careful
attention to the efficiency of items in providing maximum information for each student.
In order to report sub-scores for both sub-domains and language functions, test forms are
designed so that items form a matrix with functions cutting across sub-domains. In this
way, information in both dimensions can be provided with minimum testing time. The
current design calls for an 80-item test delivered in two sessions corresponding to a
locator block followed by a leveled tier. The basic design calls for a computer-delivered
test consisting of a locator block with machine scored reading, writing and listening
items, followed by a targeted block selected from one of three leveled tiers. The locator
block has items placed at wide intervals along the scale to tell whether the student is in
the lower, middle or upper end of the proficiency continuum. The student is then given a
targeted block of appropriate difficulty. The student experiences the test as a single
session and will not be aware of the transition from the locator block to the leveled tier.
The computer adaptive/multi-stage test begins with 30 items to determine initial
proficiency. After the locator block, students face about 50 at-level items. This type of
test results in improved reliability and therefore provides better support for content
ELPA Validity and Reliability Vol.10 Page 13 of 96
validity decisions. It is statistically more efficient than a fixed test form because it tailors
the difficulty of the item block or “testlet” to the examinee‟s apparent ability, resulting in
more accurate scores (Luecht & Nungester, 1998).
3.2.3 – Research Based Conceptual Framework - Forms and Functions The conceptual framework for the Oregon ELP Assessment is based on research in the
field of Education, Applied Linguistics and the English language Acquisition process.
After a great deal of research into current linguistic models, Oregon adopted a framework
which focuses on two major components of language competence: Grammatical
Competence and Illocutionary Competence. Each of these is further sub-divided,
resulting in a total of five assessable components of language competence (see Appendix
D for additional information on language functions and forms).
Grammatical Competence (Forms of Language)
1. Morphology (components of words)
2. Vocabulary (the words of the language or “parts of speech”)
3. Syntax (grammar)
Illocutionary Competence (Functions of Language)
4. Ideational (communication of ideas)
5. Manipulative (use of language to get something done)
The table below shows expected item distributions. Distribution of items among sub-
domains is fixed so that each has an equal number of items. This is because the design
must guarantee a usable sub-score for each sub-domain required by Title III of NCLB.
Subdomain
Forms - Grammatical Competence
Functions-Illocutionary Competence
TOTAL ITEMS
Morph- ology
Voca- bulary Syntax Ideational
Manipu- lative
Listening 20
Reading 20
Speaking 20
Writing 20
Total Items About 16 About 16 About 16 About 16 About 16 80
ELPA Test Specifications Page 14 of 96
3.2.4 – Technology Matrix For each grade band and sub-domain, an appropriate level of familiarity with computer
mousing and keyboarding is required. The chart below shows which skills are required
for each area. At early elementary grades (K-3), items are restricted to those that require
only speaking into a microphone and point-and-click mousing (but not drag-and-drop or
double clicking). Our pilot testing, interviews with Oregon teachers and research done in
other states reveal that students at this age master these skills easily.
Grade
Band
Domain and Computer Skill Required
Listening+ Speaking+ Reading Writing
K-1 Point & click mouse
skills
Speak into a microphone,
Point & click mouse skills
Point & click mouse skills Point & click mouse skills
2-3 Point & click mouse skills
Speak into a microphone, Point & click mouse skills
Point & click mouse skills Point & click mouse skills
4-5 Point & click mouse
skills
Speak into a microphone,
Point & click mouse skills
Point & click mouse skills Point & click mouse skills and
keyboard words, phrases, paragraphs, and sentences*
6-8 Point & click mouse
skills
Speak into a microphone,
Point & click mouse skills
Point & click mouse skills Point & click mouse skills and
keyboard words, phrases,
paragraphs, and sentences*
9-12 Point & click mouse
skills
Speak into a microphone,
Point & click mouse skills
Point & click mouse skills Point & click mouse skills and
keyboard words, phrases,
paragraphs, and sentences*
+ All students are provided with a combination headset/microphone unit for completion of listening and speaking items.
* Degree of keyboarding in Writing in grades 4 and above depends on proficiency level
3.3 – Consensus Driven Item Development
For evidence of content validity, the process by which items are written and reviewed is
critical. In the ELPA consensus-driven item development process, panels of educators
from around the state make judgments about content relevance, bias issues
representativeness of potential items, and tasks that ensure test item faithfulness.
3.3.1 - LIFE OF AN ELPA ITEM
After approximately 15 experienced Oregon teachers select text passages and determine
an appropriate proficiency and grade-level for each of them, a contractor employs
qualified teachers in Oregon to write items for ELPA. All items include audio (sound)
and visual (picture) components.
Oregon ELD teachers then review the items to verify that they are aligned with the forms
and functions of the ELP standards. Grade-level and proficiency levels are also verified.
In addition, a sensitivity panel reviews the items for bias. The assessment specialist
makes final recommendations for edits and revisions to the contractor.
Approved field test items are embedded in ELPA as part of the operational test. Data is
collected and analyzed to determine if the items “behave” as expected and staff calibrates
the items. Any item that is not “behaving” as expected is analyzed, revised and field-
tested again.
ELPA Test Specifications Page 15 of 96
3.3.2 Principle Item Types; Relation to Domains All ELPA items consist of a stimulus, a stem, and, in the case of selected response items,
four foils. A stimulus may consist of a picture plus an audio or written text, or simply a
picture (all items, regardless of type, contain a graphic/picture prompt). A stem consists
of an audio and/or written prompt or question. Foils, where present, always number four
and may be in the form of text or pictures (but not a combination of the two), or text and
audio.
A variety of item types are designed to contribute to different aspects of English language
development. ELPA consists of four principle item types, some of which are presented
through various item sub-types. Some of these item types are presented in multiple sub-
domains, while others are used exclusively in one sub-domain:
Item Type Domains Score Points/
Forms and Functions
1 Selected Response (Grammatical and
Illocutionary)
Multiple Choice Reading, Listening,
Writing
0 or 1
Picture Click* Reading, Listening 0 or 1
2 Short Answer
Cloze* or
Word Builder*
(SA1)
Writing 0 or 1 (Grammatical -
morphology and vocabulary)
Descriptive Short
Answer (SA2)
Writing, Speaking Short answer, four points,
scored on a scale of 0, 1, 2
with two criteria g/i
(Grammatical and
Illocutionary)
3 Extended Response (ER) Writing, Speaking Six points scored on a scale of
0, 1, 2, 3 with two criteria g/i
(Grammatical and
Illocutionary)
4 Elicited Imitation Speaking 0 or 1 (Grammatical – syntax)
Each form (A, B and C) contains a mixture of selected response, short answer, extended
response, and elicited imitation items (see 3.3.5 for a detailed description of each item
type). Open-ended item types such as short answer and extended response are kept to a
minimum to facilitate quick and inexpensive scoring. All reading and listening items are
selected response. Writing items are divided among multiple choice, short answer
*Definitions - Picture Click – click on matching picture; Word Builder – fill in missing letters;
Cloze – fill in the blank.
ELPA Test Specifications Page 16 of 96
and extended response item types. Speaking items are a mixture of elicited imitation,
short answer, and extended response. Extended response items are given only to students
in grade band 4-5 and above who receive the intermediate or advanced tier.
Each item is written to address the following information:
Grade level K-1, 2-3, 4-5, 6-8, 9-12
Sub-domain reading, writing, listening, speaking
Assessment Point grammatical (vocabulary, morphology, syntax),
illocutionary (ideational, manipulative)
Intended difficulty beginning, early intermediate, intermediate, early
advanced, advanced, proficient
Item Type selected response, short answer, extended response,
elicited imitation
3.3.3 – Distribution Across Grade Bands Item types are also sometimes grade band-specific; the following table shows the
distribution of items types across grade bands.
3.3.4 – Order of Delivery The test is administered such that the sub-domains and item types within the domains are
delivered in the following order:
(1) Reading (picture click [followed by] multiple choice)
(2) Writing (multiple choice short answer extended response)
(3) Listening (picture click multiple choice)
(4) Speaking (short answer extended response elicited imitation)
Item types within each sub-domain (picture click, multiple choice) are delivered such
that, in general, the least complex are presented first.
K-1 2-3 4-5 6-8 9-12
Reading Multiple Choice x x x x x
Picture Click x x x x x
Listening Multiple Choice x x x x x
Picture Click x x x x x
Writing Multiple Choice x x x x x
Word Builder x x - - -
Cloze - - x x x
SA2 - - x x x
Extended Response - - x x x
Speaking SA2 x x x x x
Extended Response - - x x x
Elicited Imitation x x x x x
ELPA Test Specifications Page 17 of 96
3.3.5 – Item Type Explanations Selected response is essentially multiple choice. In SA1 items, a student has to produce a
small unit of language, e.g., a word, to get credit. In SA2 items, a student has to produce
language at more or less the sentence level to get credit. Extended response items require
that the student produce language consisting of several sentences to convey a message. In
elicited imitation, a student has to repeat verbatim a sentence he or she has heard.
Selected response items have a predetermined correct answer and are scored right or
wrong.
Short Answer-1 (SA1) items may have several acceptable responses, which are listed in
a look-up table. The student gets credit for any suitable response.
Short Answer-2 (SA2) and extended response (ER) items are scored on item-specific
rubrics. Thus the criteria for full credit on one item may differ from the criteria on
another item according to the complexity of responses obtained or the unique language
features elicited by the item, which could not be foreseen when the item was written. The
actual psychometric value of responses to different items lies not in the assigned score
but according to the overall ELPA scores of respondents who obtained given item scores.
A given rubric score should not be presumed to correspond to a given level of proficiency
absent information about the respondent‟s overall score.
Unlike stand-alone performance assessment prompts, SA2 and ER prompts are short
tasks of variable difficulty. They will be scaled for difficulty so that the rated response
becomes part of the set of responses to all items that generates the student‟s overall test
score. Consequently each item has its own scoring guide describing the specific
performance needed to earn each rating. Scoring guides may follow a common template,
but they contain item-specific information needed to inform the rating process. Rubrics
generally address both functional and grammatical elements, but do not require specific
language unless the directions call for this. Thus, the general prompt, “Tell about what is
in the picture,” will not necessarily evoke a specific tense or word ending, but will be
judged on overall content and grammatical form. Rubrics may take into account
communicative effectiveness (illocutionary competency), correctness of syntax and
appropriateness of vocabulary. Thus three different elements of eligible content may
influence the rubric and the score the student receives.
Title III of NCLB requires that English proficiency tests assess in four domains, reading,
writing, speaking, and listening. The following table shows which item types are used to
assess each domain
Item Type
Domain
Reading Writing Speaking Listening
Selected Response X X X
SA1 X X
SA2 X X
Extended Response X X
Elicited Imitation X
ELPA Test Specifications Page 18 of 96
In most cases, there is not an exact match between item type and the eligible content
being assessed. However, the following table shows the kind of eligible content that an
item type may potentially assess.
Item Type
Eligible Content Syntax Morphology Vocabulary Ideational Manipulative
Selected Response X X X X X
SA1 X X
SA2 X X X X X
Extended Response X X X X X
Elicited Imitation X
For example:
Selected Response
A selected response item in listening or reading might require that a student distinguish
between what happened in the past v. the present using knowledge of verb tenses to get
an item right. Thus the assessment point would be tense, which is part of syntax.
A selected response item in reading might have a student see a picture of a desk and
choose which of four written words matches the picture, thus demonstrating the ability to
read the wordk “desk.” The assessment point would be vocabulary.
A selected response item in writing might require that a student recognize that “ate” v.
“eating,” “eat” or “eaten” describes what a student did the day before. The assessment
point would be the morphological inflection for the past tense of “eat.”
A selected response item in reading might require that a student use vocabulary and
syntax to understand that a conversation occurred yesterday in a library. The assessment
point would be the ideational competency. The response might also hinge upon a
student‟s understanding of certain words, thus the assessment point would be vocabulary.
A selected response listening item might require a student to understand the last thing that
needs to be done in a short series of steps in a science experiment. The assessment point
would be the manipulative competency, specifically, understanding of following
directions.
Short Answer
A short answer-1 item in writing or speaking might require a student to look at a picture
of a chicken and respond to the prompt, “What is this?” the student might write chicken,
rooster, hen, or even bird and receive credit.
A short answer-2 item in writing or speaking might require a student to see a picture of
students playing baseball and respond to the prompt, “What’s happening in the picture?”
Full credit might be given for such responses as, The students are playing a game, The
kids are playing baseball, They’re playing a sport, etc. Partial credit might be given for
They’re playing, Playing a game, Play a game, Baseball, etc. Thus full credit might be
ELPA Test Specifications Page 19 of 96
given for clearly communicating (ideational function), correct grammar and appropriate
vocabulary and partial credit for appropriate syntax and vocabulary but failure to
communicate clearly with the ideational function.
Extended response items are designed to elicit more writing or speaking than short
answer-2 items. For example, a student might be asked to speak or write in response to a
prompt such as, “What are your hardest and your easiest classes? Describe what makes
one hard and one easy.” As in the SA2 items, full credit might depend on communicative
effectiveness, correctness and complexity of grammar, and clear use of vocabulary to
convey ideas, and partial credit might be assigned where syntax is flawed or a student
does not convey the complete ideas sought by the prompt.
Elicited imitation tasks are part of speaking. The student hears a sentence and is asked to
repeat it exactly as he or she heard it. For example, the student might hear: Mrs. Jones
teaches biology and chemistry but not physics. The student might get credit for saying,
“Mrs. Jones teaches chemistry and biology but not physics.” The order of the two
subjects was changed, but all the sentence elements were there, and the meaning did not
change. However, the student would not get credit for, “Mrs. Jones teaches biology and
chemistry. She doesn’t teach physics.” That response alters the syntax of the sentence and
converts one sentence into two somewhat simpler sentences. Elicited imitation response
items represent a range of syntactic complexity from simple sentences to complex
sentences with embedded clauses. The more syntactically complex sentences students can
repeat, the more proficiency they are in English.
3.4 - Test Specifications 3.4.1 – Relation to Validity Test specifications help ensure validity because they provide a clear link between the test
content and the content standards and their corresponding performance levels. One
particularly powerful source of support for intended interpretations of test scores is
documentation that each test item aligns to the knowledge or skill required to achieve the
content standards. Items are developed to measure these academic standards, per the
content specifications.. The Joint Standards, AERA, 1999, pages 11–12 in particular;
underscore the importance of this type of content evidence of validity.
Test specifications also define how the content standards are to be assessed (e.g., multiple
choice, state performance assessment, local work sample), provide further specificity to
the skills and knowledge expected of students, and convey to teachers what they can
expect on state assessments.
3.4.2 – Alignment History The earliest draft of the ELP standards was based on the state‟s English Language Arts
standards to comply with an NCLB requirement that the two be linked. In February 2004,
ELPA Test Specifications Page 20 of 96
the Content and Assessment Panel reviewed that draft to identify which standards were a)
relevant to English proficiency as opposed to language arts and b) assessable. The
resultant document was condensed into consolidated standards because a great deal of
redundancy occurred among standards and between standards for the grade levels
grouped for the ELPA grade bands. That document was used to guide the first ELPA item
writing session in July 2004, and that document has maintained the Halliday coding
system.
EII and the ELPA team agreed that English proficiency is a separate construct from
English language arts, and in fact, the above-described language competency framework
was included in the standards document approved by the State Board of Education in
June 2004 in order to draw attention to that fact.
Subsequently, EII adopted the Bachman framework, which consists of the same major
elements but uses somewhat different terms than the Halliday framework. In order to
ensure consistency and clarity of communication, the ELPA project adopted the terms of
the Bachman framework. Therefore, the Bachman construct of language competence is
considered to comprise the essence of the English proficiency standards. For purposes of
test construction, the ELPA team determined that the eligible content for assessment from
the English Proficiency Standards would consist of these five components of the
Bachman framework (See p. 13 under Forms and Functions). The alignment of these
elements of the English proficiency construct is documented in the list of consolidated
standards. Appendix E further describes the components of the eligible content.
3.4.3 – Ensuring Item Alignment with the Construct and Standards In May 2005, the ELPA team and EII agreed on an approach to coding items‟ alignment
to the ELP standards based on the competency framework and the above-listed eligible
content. See Appendix F for Content/Assessment Panel Review Sheets (with
competency code).
When the Content and Assessment Panel met that month for item review, they used the
new approach rather than the language arts CCG system. Under the new approach, items
were coded to indicate which competency (syntax, vocabulary, morphology,
manipulative, ideational) was demonstrated by a student's correct response to an item, the
assessment point. In other words, what aspect of language does a student have to
command to receive credit for the item? All items, whether grammatical or illocutionary,
were also coded for the “functional context” as further evidence of standards alignment.
The ELP standards document lists 23 specific functions, and items are coded according to
that list.
ELPA Test Specifications Page 21 of 96
Section 4.0 Concurrent Validity 4.1 Explanation A basic concept of validity is that persons who score high on a test should score high on
other measures of the same construct. To the extent that two measures address the same
latent construct, scores for the same individuals should agree. Conversely, a lack of
relationship with theoretically unrelated measures helps substantiate the meaning of the
test score. The extent to which related measures are correlated with the test scores and
support, or contradict, state assessment scores validate the measure of academic
achievement for the intended purposes.
4.2 – Description of Consistency The department provides a description of the consistency in English Language
Proficiency designations between the state‟s English Language Proficiency Assessment
(ELPA) and the Idea Proficiency Test (IPT), Language Assessment Scale (LAS), and
Woodcock-Muñoz Language Survey to help teachers understand and use the ELPA.
While this analysis should not be considered an equating or comparability study, it can
provide additional context for the ELPA by referring to tests about which teachers are
more familiar.
The ELPA data used in this analysis were collected in 2005-06. While the intent was to
collect the ELPA data via a random sample (i.e. student with even SSIDs), because of the
complex nature of the assessment and the circumstances, the sample is unlikely to be
completely random (i.e. some districts tested additional students and some students were
not assessed). In addition to the required ELPA testing, some districts chose to submit
commercial test data (e.g., IPT, LAS and Woodcock-Muñoz, Stanford Proficiency Test)
for some of students. The consistency analysis is based on the subset of 2005-06 Oregon
LEP students. This group included students obtaining a valid score on the ELPA and for
whom districts chose to submit an additional commercial test score.
In addition to the obvious consideration that the ELPA is a computer based test while the
commercial tests are based on paper and structured interviews, there are several caveats
that should be considered when examining these data. First, given the methodology
described above, this sample is unlikely to be random. Second, these commercial tests
may not have be have been administered at the same time as the ELPA. Finally and most
importantly, the commercial tests:
• Do not assess all of the required domains of reading, speaking, listening and writing.
• Are not based on Oregon eligible content.
• Use a different set of proficiency standards.
For these reasons, we would expect differences between the identification of proficient
students based on the ELPA versus the other commercial tests.
ELPA Test Specifications Page 22 of 96
Comparison of ELPA to Woodcock Munoz, IPT and LAS
Not Proficient on the Woodcock-Muñoz Proficient on the Woodcock-Muñoz Consistency
Not Proficient on ELPA Proficient on ELPA Not Proficient on ELPA Proficient on ELPA
N % N % N % N % %
K-1 2248 92.8 55 2.3 88 3.6 31 1.3 94.1
2-3 2229 92.7 132 5.5 22 0.9 21 0.9 93.6
4-5 1837 88.1 233 11.2 4 0.2 10 0.5 88.6
6-8 1925 81.1 425 17.9 7 0.3 18 0.8 81.9
9-12 1650 85.4 266 13.8 8 0.4 9 0.5 85.9
Not Proficient on the IPT Proficient on the IPT Consistency
Not Proficient on ELPA Proficient on ELPA Not Proficient on ELPA Proficient on ELPA
K-1 689 89.5 67 8.7 10 1.3 4 0.5 90.0
2-3 592 85.7 53 7.7 28 4.1 18 2.6 88.3
4-5 523 83.8 53 8.5 36 5.8 12 1.9 85.7
6-8 918 80.9 100 8.8 74 6.5 43 3.8 84.7
9-12 664 87.7 18 2.4 55 7.3 20 2.6 90.3
Not Proficient on the LAS Proficient on the LAS Consistency
Not Proficient on ELPA Proficient on ELPA Not Proficient on ELPA Proficient on ELPA
K-1 489 90.6 42 7.8 5 0.9 4 0.7 91.3
2-3 427 80.6 70 13.2 11 2.1 22 4.2 84.8
4-5 494 67.0 123 16.7 59 8.0 61 8.3 75.3
6-8 428 63.2 156 23.0 41 6.1 52 7.7 70.9
9-12 405 78.5 34 6.6 51 9.9 26 5.0 83.5
Section 5.0 – Reliability
5.1 - Standard Error of Measure
Reliability refers to the consistency, stability, and accuracy expected from test scores.
Reliability is best handled by showing the Standard Error of Measure-ment (SEM)
because the SEM is expressed on the same scale as the student scores. The Standard
Error of Measurement (SEM) curve evaluates the precision of the measure at various
points along the score distribution (see Appendix G). When Item Response theory is
employed, the standard error changes, conditioned on the relative position of the score in
the ability or person score distribution.
ELPA Test Specifications Page 23 of 96
When I interpreting the graphs, the x axis describes various levels of person
performance, while y axis provides the different levels of standard error give the ability
level. Typically, scores occurring in the middle of the distribution have smaller standard
errors since there is more item information targeting students around the middle ranges of
performance. For this reason, the graph is often observed as an inverted U-shape curve.
The SEM of the ELPA tests compares favorably with the SEM of the state test when it
ranges between 2 to 4 RITS in magnitude."
5.2 - Item Analysis Methods for the ELPA 5.2.1 – Purpose of Item Analysis A key factor in the life of an item (see p. 14) occurs when an item is proven reliable
enough to be promoted from a field test item to an operational item. What indicators of
item quality determine whether this occurs? The item analysis described below focuses
on total score correlation, distracters, level of difficulty, and item p-value (item mean
divided by the maximum number of points. It was performed for the ELPA in 2008
CART Technical Report http://www.ode.state.or.us/search/page/?id=1561
An essential purpose of an Item Analysis (IA) is to flag items with questionable statistics.
CART conducts all item analyses in two stages. Following data reconciliation, the first
stage involves a thorough IA and key validation of only the operational (scored) ELPA
items. The second stage adds in the pretest items. In 2008, 496 operational items and
218 pretest items were analyzed.
CITAN flags items for any of five reasons: A- item has a negative item-total score
correlation, indicating a possible miskey; B=SR items have one or more incorrect
distracters having a positive pt. biserial correlation; C= items are very difficult, where the
item p-value (item mean divided by the maximum number of points) is less than 0.3; and
D=items that are very easy; where the item p-value (item mean divided by the maximum
number of points) is greater than 0.95. The table below provides a cross-tabulation of the
item flags for the 496 operational items.
Item
Types
Flags
Total A,B,C B B,C C D None
CZ 0 0 0 2 0 39 41
EI 0 0 0 2 0 16 18
ER 0 0 0 2 0 22 24
MC 1 13 3 12 5 271 305
PC 0 2 1 0 0 56 59
S2 0 0 0 0 1 11 12
WB 0 0 0 0 0 0 0
ELPA Test Specifications Page 24 of 96
Total 1 15 4 23 6 447 496
5.2.2 – Summary of Item Analysis Results Despite some flagged items in Figure 2, nothing in the operational IA suggested serious
or wide-spread problems that might jeopardize the subsequent Item Response Theory
(IRT) calibrations. Furthermore, Avant Assessment staff re-verified all operational item
answer keys. As a result, the calibration and linking steps were completed.
The calibration worked out well (the magnitude of misfit was very minor). Based on the
success of the BVS-anchored calibration of the 496 operational items, it was decided to
use those parameter estimates in a final, anchored, joint calibration of the 496 items and
the 218 pretest items (see http://www.ode.state.or.us/search/page/?id=1561 – CART Technical
Report for details on the 2008 Item Analysis and Calibration). See Appendix H for the
Spring 2007 ELPA Operational Item Analysis Summary.
5.3 - Strand Reliability
5.3.1 – Reliability Though t Number of Items Strand reliability is ensured if there are enough items per strand. ELPA has adequate
items in each domain. There are at least 8 items for each domain. Domains in the locator
block have more items. (see Appendix I for item distribution by domain, form and
function and grade band).
5.3.2 – Reliability Through Standard Setting and Precision at the Cut Scores Precision at the cut scores is necessary so that students on the borderline can be correctly
classified The process for re-establishing the achievement standards on the statewide
assessments in reading, mathematics, science and for the English Language Proficiency
Assessments (ELPA) consists of three key phases:
Phase One - Establish a broadly representative panel for each grade and subject
area;
Phase Two - Determine "cut scores" through established process;
Phase Three – Conduct field review and public input.
On November 5-6, 2007, staff members from the Oregon Department of Education
(ODE) and CTB/McGraw-Hill worked in collaboration to perform standard setting on the
English Language Proficiency Assessments (see CTB Standard Setting Report -
AchievementScores at link from www.ode.state.or.us/go ELPA). Educators from across
the state of Oregon with specialization in English-language development convened to
study the ELPA, consider the English language skills required of students in each
proficiency level, and discuss these expectations with their colleagues.
ELPA Test Specifications Page 25 of 96
The purpose of the standard setting was to recommend cut scores on the ELPA to divide
students into five proficiency levels: Beginning, Early Intermediate, Intermediate, Early
Advanced and Advanced. The Bookmark Standard Setting Procedure (BSSP) was used to
set the proficiency standards for the ELPA. Participants recommended a well-articulated
set of proficiency standards at six grades: Kindergarten and Grades 1, 2, 5, 7, and 11.
Proficiency standards for the remaining grades were statistically interpolated based on
participants‟ recommendations. The ODE divided participants into five grade groups,
each with approximately 3 participants. Participants were divided into assigned grade
groups that were balanced in terms of relevant demographic characteristics (e.g., gender,
geographic location). The standard setting consisted of training, orientation, three rounds
of judgments, an articulation discussion, and proficiency level description writing.
Following the standard setting, ODE made adjustments to the recommended cut
scores. These adjustments were made to accommodate the cut scores to their impact on
students, that is, so that a more appropriate distribution of students by proficiency level
could be achieved based on 2006-07 performance data.
On Thursday, March 13, 2008, the State Board of Education voted to adopt changes to
the Performance Standards for the English Language Proficiency Assessment.
Achievement Standards (Cut Scores) for the
English Language Proficiency Standards Adopted March 13, 2008
Grade
Level
Early
Intermediate
Intermediate Early
Advanced
Advanced
(Proficient)
K 482 492 498 507
1 492 507 514 523
2 495 508 514 523
3 501 514 521 529
4 497 508 514 521
5 497 508 516 523
6 497 506 515 522
7 497 507 517 524
8 499 508 518 526
9 491 501 515 526
10 493 501 516 527
11 494 501 515 528
12 498 504 516 530
ELPA Test Specifications Page 26 of 96
Section 6.0 - Fairness and Accessibility
Fairness concerns occur throughout testing. Standardization itself is intended to ensure
that no examinees are given advantages or impediments through administration practices.
Nevertheless fairness issues arise simply because uniform conditions trigger different
levels of comfort in examinees. Although absolute fairness cannot be guaranteed, sources
of bias should be investigated and controlled to the extent practicable. There are several
components of fairness and accessibility.
6.1 – Test Administration All test items, test materials, and student-level testing information, are secure documents
and must be appropriately handled. Secure handling must protect the integrity, validity,
and confidentiality of assessment questions, prompts, and student results. Any deviation
in test administration must be reported to ensure the validity of the assessment results.
Mishandling of test administration puts student information at risk and disadvantages the
student as tests that are improperly administered may be invalidated. Failure to honor
security severely jeopardizes district and state accountability requirements and the
accuracy of student data.
6.1.1 - Testing Requirements to Produce Valid Test Results Requirements for ethical testing that results in valid test results are mandated to ensure
that each Oregon student has a fair opportunity to demonstrate his/her abilities and that
school districts are fairly rated for state accountability. Requirements include but are
not limited to:
All Oregon Statewide Assessments must be administered by a trained Test
Administrator (TA).
TAs must receive annual training from the District Test Coordinator DTC) or
School Test Coordinator (STC) on the test administration policies and procedures
included in this Test Administration Manual. Specifically, TAs must receive
training on the components of the Oregon assessment system, requirements for
valid test administration, testing options, and requirements for both standard
administration and modified administration.
All TAs must read and understand Parts I – VIII and Appendices A, D, E, Q, R,
and T of the Test Administration Manual, as well as all appendices pertaining to
those specific assessments which the TA will be administering.
Each TA must receive security training and have a signed Test Administrator
Assurance of Test Security form valid for the current school year, prior to
administering any assessments. TAs must renew this form annually upon
completion of the security training.
STCs and DTCs must receive security training and have a signed School Test
Coordinator or District Test Coordinator Assurance of Test Security form on
file at the District Office, valid for the current school year. STCs and DTCs must
renew this form annually upon completion of the security training.
Any person (office staff, volunteers, computer lab support staff, substitutes, etc.)
who has access to or participates in the handling of test materials but who does
ELPA Test Specifications Page 27 of 96
NOT administer the test must sign a Non-Administrator Assurance of Test
Security form. This signed form must be kept on file at the District Office, valid
for the current school year.
All test administrators are trained in how to administer the ELPA, this includes
paraprofessionals. In addition to properly configuring computer systems to run the ELPA
application, school staff ensures that students have the skills necessary to interact with the
application (Table 1, p. 14 describes the skills students will need in different grade bands
to receive a valid score on the ELPA) . Websites and computer programs offering
opportunities for students to practice or to demonstrate these skills are included among
the training links described below.
Training materials are available from the ELPA home page (www.oregonelp.net). These
training materials include a document illustrating the different types of items used
throughout the ELPA (Item Guide), training regarding technologies and content of the
ELPA (Training Guide), and several videos describing the technology of ELPA and
information around access to ELPA (Training Videos).
6.1.2 - Security of the Test Environment The test environment refers to all aspects of the testing situation while students are
testing. The test environment includes what a student can see, hear, or access. During
Online testing, the test environment also includes the electronic resources to which the
student has access.
Requirements of a secure test environment include but are not limited to:
A quiet environment, void of talking or other distractions that might interfere with
a student‟s ability to concentrate or compromise the testing situation. Read aloud
accommodations for one student must not interfere with other students‟ test-
taking environment.
Visual barriers or adequate spacing between students‟ seating.
Student access to and use of only allowable resources.
Observation of any assessment items by only the student taking an assessment
and, to a limited extent, the trained TA.
No electronic devices that allow communication among students or the
photographing of test content.
Administration of online testing only through the Secure Browser. Test
administrators double check the student name and school identification carefully
to avoid errors.
Students are instructed to log in and work independently, not offering help to
other students.
Directions are the only portion of ELPA that may be translated.
.
6.1.3 - Testing Improprieties Adult and student-initiated test improprieties are behaviors prohibited during test
administration because they can give students an unfair advantage or otherwise
compromise the State‟s standard test administration. Adults (TAs) may not assist or
ELPA Test Specifications Page 28 of 96
interfere with student testing. Adults must carefully adhere to all test administration
procedures to avoid test improprieties (see p. 12, Test Administration Manual for list) A
list of student-initiated test improprieties that have been reported to ODE in previous
school years is provided in a table on p. 13 of the Test Administration Manual. It is not
intended to be inclusive.
6.1.4 - Responding to Student Questions Helping students violates the integrity and validity of the test. If a student asks for help
remind the student to “do your best,” but do not initiate assistance or give any indication
that you can help. Use caution: check your verbal and nonverbal cues to ensure that the
student does not receive any inappropriate coaching that may impact a student‟s response
to a test item.
6.1.5 – Testing Irregularities Testing irregularities are unusual circumstances that impact a group of students who are
testing and may potentially affect student performance on the test or interpretation of
those scores. Examples of testing irregularities include major disruptions to a test, such as
a fire drill, a school-wide power outage, or a force majeure (e.g. a natural disaster).
During an event such as a fire drill or other evacuation, safety is the top priority. If the
TA can safely access the TA workstation before evacuating the testing environment, then
the TA should pause all tests before evacuating. If the TA cannot safely access the TA
workstation, then the TA should evacuate and secure the testing environment consistent
with the school‟s evacuation policy. Upon returning to the testing environment, the TA
should pause all tests while students return to their stations. Testing irregularities also
include the administration of Test Accommodations to a group of students or to an entire
class without an investigation of individual student need. As with testing improprieties,
all testing irregularities should be reported immediately to your DTC. The DTC will then
immediately report the irregularity to ODE within one business day.
6.2 – Sensitivity Panel Review Fairness and accessibility is also addressed by the sensitivity panel. They ensure that
items
present racial, ethnic, and cultural groups in a positive light.
do not contain controversial, offensive, or potentially upsetting content.
avoid content familiar only to specific groups of students because of race or
ethnicity, class, or geographic location.
aid in the elimination of stereotypes.
avoid words or phrases that have multiple meanings.
ELPA Test Specifications Page 29 of 96
6.3 – Differential Item Analysis Differential Item Analyses were conducted using the WINSTEPS IRT software. Two
analyses were completed. The first involved a simple standardized difference in Rasch
model difficulty parameters calculated using the Reference Group (Males) and Focal
Group (females).
Several problem areas that require further substantive review are the First Grade Reading
PC items in which nine of the 22 items were found to be significantly different, seven
favoring females, two favoring males. Eight of the 25 Fourth Grade Listening MC items
and nine of the 31 Ninth Grade Listening items were found to be statistically significant,
all favoring males. Other notable problem areas were Fourth Grade Reading MC items
(seven out of 29 were significant) and Sixth Grade Reading MC items (ten out of 33 were
significant).
The second analysis that was conducted via WINSTEPS calibration was a statistic
equivalent of the Mantel Haenszel DIF statistic (Holland & Thayer, 1986) called
MH prox. Linacre and Wright (1989) converted the MHp into ETS‟DIF categories that
can be used to design and maintain tests equivalent for groups of subjects on which the
original test data are calibrated. The first category, the A-type items, displays negligible
DIF and can be used freely. The second category, the B-type items, display slight to
moderate DIF, and if possible should be replaced by equivalent items with smaller MHp
absolute values. The third category, the C-type items, display moderate to large amount
of DIF and should be selected only if it is essential to meet the test specifications. Go to
http://www.ode.state.or.us/search/page/?id=1561 – CART Technical Report (Appendix
1) to see the number of A, B, and C items for each item type by grade level. In all, there
were 451 A-type items, 34 B-type items and only 11 C-type items.
ELPA Test Specifications Page 30 of 96
APPENDIX
ELPA Test Specifications Page 31 of 96
APPENDIX A
ESL PROGRAM FUNDING AND EVALUATION – STATE LAW
Oregon Administrative Rule #581-023-0100
Eligibility Criteria for Student Weighting for Purposes of State School Fund Distribution
(1) The following definitions apply to this rule:
(a) "Average Daily Membership" or "ADM" means the membership defined in ORS
327.006(3) and OAR 581-023-0006;
(b) "Days in Session" means number of days of instruction during which students are
under the guidance and direction of teachers;
(c) "Department" means the Oregon Department of Education;
(d) "Language Minority Student" means:
(A) Individuals whose native language is not English; or
(B) Individuals who come from environments where a language other than English is
dominant; or
(C) Individuals who are Native Americans or Native Alaskans and who come from
environments where a language other than English has had a significant impact on their
level of English proficiency.
(e) "Superintendent" means the State Superintendent of Public Instruction;
(f) "Weighted Average Daily Membership" or "ADMw" means the ADM plus an
additional amount or weight as described in ORS 327.013, subject to the limitations
imposed by Section (4)(a), Chapter 780, Oregon Laws 1991.
(2) Pursuant to ORS 327.013(7)(a)(A) the resident school districts shall receive one
additional ADM or "weight" for children with disabilities who comprise up to 11 percent
of the district's ADM. The Department will calculate the percentage of children with
disabilities on the basis of resident counts of students eligible for weighting from the
Special Education Child Count and the resident ADM:
(a) To be eligible, a student must be in the ADM of the school district and meet the
following criteria:
(A) The student must be eligible for special education having been evaluated as having
one of the following conditions: Mental retardation, hearing impairment including
difficulty in hearing and deafness, speech or language impairment, visual impairment,
serious emotional disturbance, orthopedic or other health impairment, autism, traumatic
brain injury or specific learning disabilities; and
(B) The student must be between the ages 5 and 21 and generate federal funding for
purposes of special education.
(b) Districts may apply for an exception to the 11 percent ceiling. Applications are to be
made on forms provided by the Department. Upon receipt of the application the
Superintendent may conduct a complete review of a district's special education records.
ELPA Test Specifications Page 32 of 96
The Superintendent shall develop a process for conducting such reviews which will
include the following elements:
(A) Comparison of district claims with those submitted by other districts;
(B) Participation of school district and education service district staff in the review. No
district staff shall be asked to review claims submitted by the employing district.
(c) After considering the recommendations of the review committee the Superintendent
may allow all or a portion of the requested added weighted ADM over 11 percent;
(d) The Superintendent shall make the determination of approval for funding above the
11 percent limitation. Such determination may be appealed for review by the State Board
of Education according to a process established by the Superintendent;
(e) If the review indicates that a district has claimed ineligible special education students,
the Superintendent also shall withhold the related federal funds from the district, pursuant
to OAR 581-015-0049; OAR #581-023-0100
(f) A district must submit an application for an exception to the 11 percent ceiling no later
than six months after the close of the year for which payment is being sought. Payments
for allowable exceptions shall be made in the following school year as part of the May 15
payment.
(3) Pursuant to ORS 336.640(4), the resident school districts shall receive an additional
1.0 times the ADM of all eligible pregnant and parenting students:
(a) To be eligible, a student must be in the ADM of the resident school district and meet
the following criteria:
(A) The student must be identified through systematic procedures established by the
district;
(B) The student must be enrolled and receiving services described in ORS 336.640(1)(b)
and (d);
(C) The student must have an individualized written plan for such services which
identifies the specific services, their providers, and funding resources.
(b) Students counted in section (2) of this rule are not eligible under this section.
(4) Pursuant to ORS 327.013(7)(a)(B), the resident school districts shall receive an
additional .5 times the ADM of all eligible students enrolled in an English as a Second
Language program. To be eligible, a student must be in the ADM of the school district in
grades K through 12 and be a language minority student attending English as a Second
Language (ESL) classes in a program which meets basic U.S. Department of Education,
Office of Civil Rights guidelines. These guidelines provide for:
(a) A systematic procedure for identifying students who may need ESL classes, and for
assessing their language acquisition and academic needs;
(b) A planned program for ESL and academic development, using instructional
methodologies recognized as effective with language minority students;
(c) Instruction by credentialed staff and trained in instructional strategies that are
effective with second language learners and language minority students, or by tutors
supervised by credentialed staff trained in instructional strategies that are effective with
second language learners and language minority students;
(d) Adequate equipment and instructional materials;
(e) Evaluation of program effectiveness in preparing ESL students for academic success
in the mainstream curriculum.
(5) Students served in the following programs are not eligible for weighting:
ELPA Test Specifications Page 33 of 96
(a) Programs funded fully by state funds, programs funded fully by federal funds, and
programs funded fully by a combination of state and federal funds;
(b) Private and parochial schools unless placed by the resident district in a registered
private alternative program or state approved special education program;
(c) Instruction by a private tutor or parent under ORS 339.035.
(6) No later than January 15 of each year, the designated official for a school district shall
submit to the Department a report of students eligible under sections (3) and (4) of this
rule. The report shall include the following data for the period October 1 through
December 31:
(a) Total days in session for the quarter ending December 31 for the school or program
reporting;
(b) Total days membership for the quarter ending December 31 for all students served in
eligible programs. OAR #581-023-0100
(7) Not later than July 10 of each year, the designated official for a school district shall
submit to the
Department a final report of students eligible under sections (3) and (4) of this rule. The
report shall include the following:
(a) Total days in session during the regular school year for the school or program
reporting;
(b) Name of each student;
(c) Total days membership beginning with the first day of instruction for each student and
ending with the date of withdrawal from the eligible program or the end of the regular
school year, whichever comes first;
(d) Grade level of the student.
(8) School districts must retain supporting documentation for a minimum of two years.
(9) The Department shall perform periodic reviews of the eligibility of students reported
for additional weighting. Any funds provided for ineligible students shall be recovered by
the Department for redistribution to school districts.
(10) This rule is effective beginning with the 1993-94 school year.
Stat. Auth.: ORS 327.013 & ORS 327.125
Stats. Implemented: ORS 327.013 & ORS 327.125
Hist.: EB 31-1992, f. & cert. ef. 10-14-92; EB 6-1994, f. & cert. ef. 4-29-94
ELPA Test Specifications Page 34 of 96
APPENDIX B Executive Summary of Dimensionality Analysis LINK :
http://www.ode.state.or.us/teachlearn/testing/dev/techaspects/elpa/executive-summary-
cart-oregonelpa-operitemanalysis-aug07.pdf
ELPA Test Specifications Page 35 of 96
ELPA Test Specifications Page 36 of 96
ELPA Test Specifications Page 37 of 96
ELPA Test Specifications Page 38 of 96
ELPA Test Specifications Page 39 of 96
ELPA Test Specifications Page 40 of 96
ELPA Test Specifications Page 41 of 96
APPENDIX C
ELPA Test Specifications Page 42 of 96
APPENDIX D
LANGUAGE FUNCTIONS and FORMS
The English Language Proficiency Standards are written as pathways to the Oregon English Language Arts standards. The ELP
Standards are designed to supplement the ELA standards to ensure that LEP students develop proficiency in both the English language
and the concepts and skills contained in the ELA standards. They can be found on the web at
www.ode.state.or.us/teachlearn/standards/elp/files/all.doc.
This section contains language functions and forms that native English speakers acquire mostly before entering school or naturally at
home. These language functions and forms, however, need to be explicitly taught to English language learners (ELLs). They may be
taught to ELLs at all grade levels, and as the need and context arises.
Forms of a language deal with the internal grammatical structure of words. The relationship between boy and boys, for example, and
the relationship (irregular) between man and men would be forms of a language.
A language function refers to the purpose for which speech or writing is being used.
In speech these include:
giving instructions
introducing ourselves
making requests
In academic writing we use a range of specific functions in order to communicate ideas clearly.
These include:
describing processes
comparing or contrasting things or ideas, and
classifying objects or ideas
The contrast between form and function in language can be illustrated through a simple medical analogy. If doctors studied only a
limited portion of the human system, such as anatomical form, they would be unable to adequately address their patient‟s needs. To
fully treat their patients, physicians must understand the purposes of the human body and the relationships between organs, cells, and
genes (Pozzi, 2004). Similarly, ELLs need to understand both the form (structure) and the function (purpose) of the English language
in order to reach higher levels of proficiency.
Pozzi, D.C. (2004). Forms and functions in language: Morphology, syntax. Retrieved March 10, 2005, from University of Houston, College of Education Web site: http://www.viking.coe.uh.edu/grn11.intr/intr.0.1.2.htm
ELPA Test Specifications Page 43 of 96
Language Functions and Examples of Forms
Language Function Examples of Language Forms Expressing needs and likes
Indirect/ direct object, subject/ verb agreement, pronouns
Describing people, places, and things
Nouns, pronouns, adjectives
Describing spatial and temporal relations
Prepositional phrases
Describing actions
Present progressive, adverbs
Retelling/relating past events
Past tense verbs, perfect aspect (present and past)
Making predictions
Verbs: future tense, conditional mode
Asking Informational Questions
Verbs and verb phrases in questions
Asking Clarifying Questions
Questions with increasing specificity
Expressing and Supporting Opinions
Sentence structure, modals (will, can, may, shall)
Comparing
Adjectives and conjunctions, comparatives, superlatives, adverbs
Contrasting
Comparative adjectives
Summarizing
Increasingly complex sentences with increasingly specific
vocabulary
Persuading
Verb forms
Literary Analysis
Sentence structure, specific vocabulary
Cause and Effect
Verb forms
Drawing Conclusions
Comparative adjective
ELPA Test Specifications Page 44 of 96
Defining
Nouns, pronouns, and adjectives
Explaining
Verb forms, declarative sentences, complex sentences, adverbs of
manner
Generalizing
Abstract nouns, verb forms, nominalizations
Evaluating
Complex sentences; increasing specificity of nouns, verbs, and
adjectives
Interpreting
Language of propaganda, complex sentences, nominalizations
Sequencing
Adverbs of time, relative clauses, subordinate conjunctions
Hypothesizing and speculating
Modals (would, could, might), compound tenses (would have
been)
ELPA Test Specifications Page 45 of 96
ACQUISITION OF LANGUAGE FUNCTIONS AND GRAMMATICAL FORMS
1. Language Function: Expressing Needs and Likes
BEGINNING EARLY
INTERMEDIATE
INTERMEDIATE EARLY
ADVANCED
ADVANCED TARGET
FORMS:
Students demonstrate minimal
comprehension of general
meaning; gain familiarity with the
sounds, rhythms and patterns of
English. Early stages show no
verbal responses while in later
stages one or two word responses
are expected. Students respond in
single words and phrases, which
may include subject or a predicate.
Many speech errors are observed.
(bear, brown)
Students demonstrate
increased comprehension
of general meaning and
some specific meaning; use
routine expressions
independently and respond
using phrases and simple
sentences, which include a
subject and predicate.
Students show basic errors
in speech. (The bear is
brown. He is eating.)
Students demonstrate good
comprehension of general
meaning; increased
comprehension of specific
meaning; responds in more
complex sentences, with
more detail using newly
acquired vocabulary to
experiment and form
messages. (The brown
bear lived with his family
in the forest.)
Students demonstrate
consistent comprehension
of general meaning; good
understanding of implied
meaning; sustain
conversation, respond with
detail in compound and
complex sentences;
actively participate using
more extensive vocabulary,
use standard grammar with
few random errors. (Can
bears live in the forest if
they find food there?)
Students‟ comprehension
of general and implied
meaning, including
idiomatic and figurative
language. Students
initiate and negotiate
using appropriate
discourse, varied
grammatical structures
and vocabulary; use of
conventions for formal
and informal use.
(Would you like me to
bring pictures of the
bear that I saw last
summer?)
One or two-word answers (nouns or yes/no) to questions about preferences, (e.g., two, apples, or tree)
Simple sentences with subject/verb/object. “I like/don’t like—(object)—.” I need a /some — (object)—.”
Elaborated sentences with
subject/verb/object
Sentences with
subject/verb/object and
dependent clause
Complex sentences,
perhaps with tags or
embedded questions
Sentence Structure:
The basic sentence
structures that we use
to express needs and
likes are foundations
to the more complex
sentence structure we
use for academic
purposes.
ELPA Test Specifications Page 46 of 96
ALL GRADES
2. Language Function: Describing People, Places and Things
BEGINNING EARLY
INTERMEDIATE
INTERMEDIATE EARLY
ADVANCED
ADVANCED TARGET FORMS
Common nouns and adjectives Simple sentences with the
verb to be, using common
nouns and adjectives. The
(my, her) ______ is/are
_______. A (it) has/have
_________.
Elaborated sentences
has/have/had or
is/are/were with nouns
and adjectives
Compound sentences with
more specific vocabulary
(nouns, adjectives)
Complex sentences with
more specific vocabulary
(nouns, adjectives)
Nouns Pronouns and
Adjectives: Students
learn to understand and
generate oral and written
language with nouns,
pronouns and adjectives.
3. Language Function: Describing Location
BEGINNING EARLY
INTERMEDIATE
INTERMEDIATE EARLY
ADVANCED
ADVANCED TARGET FORMS
Demonstrated comprehension of
total physical response
commands, including prepositions
(e.g., on, off, in, out, inside,
outside)
Simple sentences with
prepositional phrases
(e.g., next to, beside,
between, in front of, in
back of, behind, on the
left/right, in the middle of,
above, below, under)
May include two
prepositional phrases with
more difficult
prepositions (e.g., in front
of, behind, next to)
Complex sentences with
phrases using prepositions
(e.g., beneath, within)
Complex sentences with
phrases using prepositions
(e.g., beneath, within)
Prepositional Phrases:
Students learn to
understand and generate
oral and written
language with
prepositional phrases.
4. Language Function: Describing Action
BEGINNING EARLY
INTERMEDIATE
INTERMEDIATE EARLY
ADVANCED
ADVANCED TARGET FORMS
Demonstrate comprehension
(perform or describe actions)
Present progressive Variety of verb tenses and
descriptive adverbs
Adverb clauses telling
how, where, or when
Adverb clauses telling
how, where, or when.
Present Progressive,
Adverbs: Students learn
to understand and
generate oral and written
language skills with
present progressive and
adverbs.
ELPA Test Specifications Page 47 of 96
5. Language Function: Retelling/Relating Past Events (Kinder – General Understanding
BEGINNING EARLY
INTERMEDIATE
INTERMEDIATE EARLY
ADVANCED
ADVANCED TARGET FORMS
Single words in response to past
tense question
Simple sentences with
past progressive __
(pronoun) ___ was/were
_____-ing.
Simple sentences with
regular and irregular past
tense verbs
“Yesterday/Last ____/On
___day (pronoun) ____ -
ed (prep. phrase or other
direct object).” First ___
and then __ . Finally
Compound sentences
using past tense and
adverb
Present progressive/past
perfect tense with
specialized prepositions
_____ have/has been
____-ing since/for ____.
Past Tense Verbs:
Students learn to
understand and generate
oral and written
language with past tense
verbs.
6. Language Function: Making Predictions
BEGINNING EARLY
INTERMEDIATE
INTERMEDIATE EARLY
ADVANCED
ADVANCED TARGET FORMS
In response to questions, may
respond by circling, pointing, and
so on, or answer with one or two
words
The _____ is/are going to
______.
The ________ will
________.
Conditional (could, might)
mood in complex
sentences
Conditional (could,
might) mood in complex
sentences
Verbs: Future Tense,
Conditional Mood:
Students learn to
understand and generate
oral and written
language with future
tense verbs and
conditional mood.
7. Language Function: Asking Informal Questions
BEGINNING EARLY
INTERMEDIATE
INTERMEDIATE EARLY
ADVANCED
ADVANCED TARGET FORMS
Simple questions about familiar
or concrete subjects
Present or present
progressive tense
questions with to be
Who, what, where, why
questions with do or did
Detailed questions with
who, what, when, where,
why and how
Detailed questions with
expanded verb phrase
Verbs and Verb Phrases
in Questions: Students
learn to understand and
generate oral and written
language with verbs and
verb phrases in
questions.
ELPA Test Specifications Page 48 of 96
9. Language Function: Expressing and Supporting Opinions
BEGINNING EARLY
INTERMEDIATE
INTERMEDIATE EARLY
ADVANCED
ADVANCED TARGET FORMS
I like/don‟t like ______
(concrete topics).
I think/agree with (don‟t)
______.
I think/agree with (don‟t)
____ because _____.
In my opinion ____ should
____ because/so ______.
Complex sentences using
modals and clauses Sentence Structure
10. Language Function: Compacting
BEGINNING EARLY
INTERMEDIATE
INTERMEDIATE EARLY
ADVANCED
ADVANCED TARGET FORMS
Single words or phrases in
response to concrete comparison
questions
Sentences with
subject/verb/adjective
showing similarities and
differences
Subject/verb/adjective,
but _____.
Adjective with –er or –est
Varied sentence structures
with specific comparative
adjectives and phrases
Complex sentence
structure with specific
comparative language
Adjectives and
Conjunctions
11. Language Function: Contrasting
BEGINNING EARLY
INTERMEDIATE
INTERMEDIATE EARLY
ADVANCED
ADVANCED TARGET FORMS
Sentences with
subject/verb/adjective
showing similarities and
differences
Subject/verb/adjective
like ____ but
subject/verb/adjective
Subject/verb/adjective,
both
subject/verb, but
Approximately used
idiomatic phrases and
contrasting words (e.g.,
whereas, and in contrast)
Comparative Adjectives
8. Language Function: Asking Clarifying Questions
BEGINNING EARLY
INTERMEDIATE
INTERMEDIATE EARLY
ADVANCED
ADVANCED TARGET FORMS
Not Applicable Formula questions
clarifying classroom
procedures, rules and
routines
Formula questions
clarifying classroom
procedures, rules and
routines
A variety of fairly specific
questions clarifying
procedures or content
Varied, specific
questions clarifying
procedures or content
Questions with Increasing
Specificity
ELPA Test Specifications Page 49 of 96
12. Language Function: Summarizing
BEGINNING EARLY
INTERMEDIATE
INTERMEDIATE EARLY
ADVANCED
ADVANCED TARGET FORMS
Simple sentences with key
nouns, adjectives, and
verbs
Compound sentences
with and/but
Conjunctions that
summarize (to conclude,
indeed, in summary, in
short)
Conjunctions that
summarize (indeed,
therefore, consequently)
Increasingly Complex
Sentences with
Increasingly Specific
Vocabulary
13. Language Function: Persuading
BEGINNING EARLY
INTERMEDIATE
INTERMEDIATE EARLY
ADVANCED
ADVANCED TARGET FORMS
Imperative verb forms Complex sentences with
future and conditional
Complex sentences with
varied verb forms and tag
questions, idiomatic
expressions or embedded
clauses
Verb Forms
14. Language Function: Literary Analysis
BEGINNING EARLY
INTERMEDIATE
INTERMEDIATE EARLY
ADVANCED
ADVANCED TARGET FORMS
Single words for character and
setting
Simple sentences
(subject/verb/adjective)
(subject/verb/object)
Compound sentences
with and, because,
before, after
Descriptive language in
more complex sentences
Specific descriptive
language in complex
sentences
Sentence Structure and
Specific Vocabulary
15. Language Function: Cause and Effect Relationship
BEGINNING EARLY
INTERMEDIATE
INTERMEDIATE EARLY
ADVANCED
ADVANCED TARGET FORMS
Answer cause and effect
question with a simple
response
Descriptive sentences
with past tense verbs
Complex sentences with
past tense verbs
Conditional: If ___
had/hadn‟t _____. _____
would/wouldn‟t have
_____.
Verb Forms
ELPA Test Specifications Page 50 of 96
16, Language Function: Draw Conclusions
BEGINNING EARLY
INTERMEDIATE
INTERMEDIATE EARLY
ADVANCED
ADVANCED TARGET FORMS
Comparative adjectives
with past tense verbs in
simple sentences
Comparative adjectives
with conjunctions such as
although, because, that
Comparative adjectives
with idiomatic phrases
and passive voice
Comparative Adjectives
17. Language Function: Defining
BEGINNING EARLY
INTERMEDIATE
INTERMEDIATE EARLY
ADVANCED
ADVANCED TARGET FORMS
Patterned responses: A table is
furniture/ A boy is a person.
Simple terms, aspects of
concrete and familiar
objects, regular nouns
singular and plural,
personal pronouns,
present tense, simple
sentences
Connected text including
irregular nouns, personal,
possessive pronouns and
adjectives with some
irregular past tense verbs
Concrete and abstract
topics using irregular
nouns, singular and plural,
personal and possessive
pronouns and adjectives
Clear, well-structured,
detailed language on
complex subjects,
showing controlled use of
nouns, pronouns,
adjectives
Nouns, Abstract Nouns,
Pronouns, Adjectives:
Students learn to define
concrete and abstract
objects/concepts with
correct nouns, pronouns,
and adjectives
18. Language Function: Explaining
BEGINNING EARLY
INTERMEDIATE
INTERMEDIATE EARLY
ADVANCED
ADVANCED TARGET FORMS
Main points in familiar
idea or problem with
some precision using
simple indicative verb
forms in simple
declarative sentences
(Large oaks grew in the
park/ The length of the
room is 40 feet.)
Explain simple,
straightforward
information of immediate
relevance, using regular
verbs and adverbs of
manner in declarative
sentences and compound
sentences (Maria planted
the petunia seeds
carefully.)
Get across important
points using declarative,
compound and complex
sentences, regular and
irregular verb forms
Complex: As I came home,
I stopped at the store.
Compound: The children
who came in early had
refreshments, but those
who came late had none.
Get across which point
he/she feels is most
important using regular
and irregular verb forms,
adverbs of manner and
compound-complex
sentences.
Adverbs of manner: The
children who sang loudly
got a cookie, but those
who didn’t sing had none.
Verb Forms- Indicative
verb (makes a statement
of fact), Declarative
Sentences, Complex
Sentences, Adverbs of
Manner:
Students learn to develop
and use explanations
using appropriate verb
forms, declarative and
complex sentences and
adverbs of manner.
19. Language Function: Generalizing
BEGINNING EARLY
INTERMEDIATE
INTERMEDIATE EARLY
ADVANCED
ADVANCED TARGET FORMS
Imperative mode:
expresses command
Indicative mode: makes a
statement of fact (The
Subjunctive mode:
expressing a condition
Nouns – Common,
Collective and Abstract
ELPA Test Specifications Page 51 of 96
(Take me home. Stay
there.)
Collective nouns name, as
a unit, the members of a
group (herd, class, jury,
congregation).
temperature is low.)
Abstract nouns: name
things or ideas that people
cannot touch or handle
(beauty, honesty, comfort,
love).
contrary to fact or
expressing a doubt (If
only he were here.)
Nouns; Verb Forms:
Students learn to develop
and use generalizations
using abstract nouns,
verb forms and
nominalizations.
20. Language Function: Evaluating
BEGINNING EARLY
INTERMEDIATE
INTERMEDIATE EARLY
ADVANCED
ADVANCED TARGET FORMS
Adjectives that point out
particular objects (that wagon,
those toys, each person, every
girl)
Number adjectives: (two men, ten
ships, the third time, the ninth
boy)
Adjectives used to limit:
(few horses, much snow,
little rain)
Evaluate simple direct
exchange of limited
information on familiar
and routine matters using
simple verbs and
adjectives.
Correlative conjunctions
are used in pairs: both –
and; not only – but also
(Neither the teacher nor
the students could solve
the problem.)
Qualify opinions and
statements precisely in
relation to degrees of
certainty/uncertainty,
belief/doubt, likelihood,
etc.
Convey finer, precise
shades of meaning by
using, with reasonable
accuracy, a wide range of
qualifying devices, such
as adverbs that express
degree (This class is too
hard.); clauses expressing
limitations (This is a
school van, but it is only
used for sports.); and
complex sentences
Complex Sentences;
Increasing Specificity of
Nouns, Verbs, and
Adjectives; Correlative
Conjunctions:
Students learn to
understand and use
complex sentences using
very specific nouns,
verbs and adjectives.
21. Language Function: Interpreting
BEGINNING EARLY
INTERMEDIATE
INTERMEDIATE EARLY
ADVANCED
ADVANCED TARGET FORMS
Interpret a single phrase at a time,
picking up familiar names, words,
and basic phrases (D’Onofrio
chocolates are the best.)
Interpret short, simple
texts containing the
highest frequency
vocabulary
Interpret short, simple
texts on familiar matters
of a concrete type, which
consist of high frequency
everyday or school-
related language
Interpret a wide range of
long and complex texts,
appreciating subtle
distinctions of style and
implicit as well as explicit
meaning
Interpret critically
virtually all forms of the
written language
including abstract,
structurally complex, or
highly colloquial non-
literary writings
Language of
Propaganda, Complex
Sentences:
Students learn to identify
and interpret the
language of propaganda
and use complex
sentences.
22. Language Function: Sequencing
BEGINNING EARLY
INTERMEDIATE
INTERMEDIATE EARLY
ADVANCED
ADVANCED TARGET FORMS
Subject
(The girl who was sick went
home.)
Direct object
(The story that I read was
long.)
Prepositional object
(I found the book that
John was talking about.)
Possessive
(I know the woman whose
father is visiting.)
Object of comparison
(The person whom Susan
is taller than is Mary.)
Adverbs of time,
Relative clauses,
Subordinate
ELPA Test Specifications Page 52 of 96
Natural sequencing
(I hit him and he fell over.)
Indirect object
(The man to who[m] I
gave the present was
absent.)
Subordinate conjunctions-
used to join two
grammatical parts of equal
rank (Although he worked
hard, he did not finish his
homework.)
conjunctions:
Students learn
sequencing using
adverbs of time, relative
clauses and subordinate
conjunctions.
23. Language Function: Hypothesizing and Speculating
BEGINNING EARLY
INTERMEDIATE
INTERMEDIATE EARLY
ADVANCED
ADVANCED TARGET FORMS
Auxiliary verbs that
indicate futurity: will and
shall
Auxiliary verb indicating
desire or intent: would
Auxiliary verbs include
modal verbs, which may
express possibility: may,
might, can, could.
Modals (would, could,
might), Compound
tenses (would have
been):
Students learn to
hypothesize and
speculate using modals
and compound tenses.
24. Language Function: Summarizing
BEGINNING EARLY
INTERMEDIATE
INTERMEDIATE EARLY
ADVANCED
ADVANCED TARGET FORMS
Copy out short texts;
can copy out single words and
short texts
Paraphrase short written
passages in a simple
fashion, using the original
text wording and
ordering; pick out and
reproduce key words and
phrases or short sentences
from a short text within
the learner‟s limited
competence and
experience
Summarize extracts from
news items, interviews or
documentaries containing
opinions, argument and
discussion; summarize
the plot and sequence of
events in a poem or play;
collate short pieces of
information from several
sources and summarize
them for someone else
Summarize a wide range
of factual and imaginative
texts, commenting on and
discussing contrasting
points of view and the
main themes
Summarize information
from different sources,
reconstructing arguments
and accounts in a
coherent presentation of
the overall result
Modals (would, could,
might), Compound
tenses (would have
been):
Students learn to
summarize and speculate
using modals and
compound tenses.
53
APPENDIX E
Explanation of Eligible Content
Each of the five components of the eligible content will be explained. However, the five
components interact: Morphology reflects syntax, words with similar meanings occur in different
syntactic structures, and illocutionary functions can only be expressed through forms. Forms
never exist without illocutionary meaning, and meaning cannot be conveyed without forms.
Syntax refers to what is traditionally called “grammar.” Syntax occurs at the sentence level. It is
often explained as “word order,” but in fact the order of words in a sentence are governed by
rules that convey the interrelated meanings of the words and phrases in a sentence. Examples of
syntax include:
Tenses and Aspects:
Simple present
Simple past
Simple future
Modals
Tenses with modals
Perfect tenses
Perfect tenses with modals
Tenses with progressive -ing
Examples of Tenses and Aspects
Simple Present: I ride the bus to school every day. Mario studies English.
Simple Past: I rode the bus to school this morning. Mario studied English last year.
Simple future: I will ride the bus to school tomorrow. Mario will study English next semester.
Tenses with Modals: I should (may, can, etc.) ride the bus to school tomorrow. Mario might
study English next semester.
Perfect Tenses: I have ridden the bus to school every day this year. Mario has studied
English for three years. I had always ridden the bus until I got a car. Mario had studied
English before he immigrated to the United States.
Perfect Tenses with Modals: I should have ridden the bus to school this morning. At the end
of this semester, Mario will have studied English for five years.
Tenses with Progressive –ing: I’m riding the bus to school tomorrow. (Present progressive
functioning as future) Mario has been studying English for five years.
54
Sentence Structure
Simple subject+verb(+NP)
Simple subject+verb with compound subject or verb phrase
Compound sentences: Two or more subject+verb(+NP)
Complex sentences with subordinate clauses.
Complex sentences with relative clauses
Examples of Sentence Structures
Simple subject+verb: Rebecca eats pizza.
Simple subject+verb with compound subject or verb: Rebecca and Jessica eat pizza. Rebecca
eats pizza and drinks soda.
Compound sentences: Rebecca eats pizza and she drinks soda. Rebecca eats pizza, but she
doesn’t drink soda. (Note the coordinate conjunctions, and a but, which signal a relationship
between the two independent clauses.)
Complex Sentences with Subordinate Clauses: Subordinate clauses are sentences within
sentences. They can be introduced with a subordinate conjunction that expresses the
relationship between the main clause and the subordinate clause. Rebecca eats pizza because
she likes it. Rebecca drinks soda after she eats the pizza. Rebecca drinks soda when she eats
pizza. Rebecca likes pizza better than Jessica does. (In this examples, note that “Jessica” is
the subject of the subordinate clause, and “does” takes the place of “likes pizza.”) Other
examples: Mary stayed home from school because she felt sick. After the students returned
from gym class, the alarm sounded for a fire drill. Katie held the door open while the
students filed out. (Note again that the subordinate conjunctions, when, better than, because,
after, while, indicate a relationship between the main and subordinate clauses.)
Complex sentences with relative clauses, including deleted relative pronouns, e.g., The man
driving the car ran the stop sign. The man [who was] driving the car ran the stop sign. Mario
read the instructions to Al, who carried out the experiment.
Negation
Negation can occur in independent and dependent clauses:
Rebecca doesn‟t like pizza, but she likes seafood.
Rebecca likes pizza, but she doesn‟t like seafood.
Rebecca doesn‟t like pizza, and she doesn‟t like seafood either.
Mary stayed home from school because she didn‟t feel well.
Mary didn‟t stay home from school even though she didn‟t feel well.
The placement of the negation indicates which part of a complex sentence is negated. Consider:
It‟s not important that you speak to the school board.
It‟s important that you not speak to the school board.
55
Indirect Speech
Indirect speech can be difficult for the English learner. Dependent clauses in indirect speech are
introduced with “for” or “to. John asked Sally to open the window. Robert asked for the waiter to
bring the check. (In the latter case, he didn‟t speak directly to the waiter.) John told us to go
ahead. John said for us to go ahead. Using the “for” or the “to” construction depends on the
main verb, tell or say, which are semantically similar but occur in different syntactic contexts.
Vocabulary, or “lexicon,” consists of the words of the language. Words fall into several common
so-called “parts of speech”:
Nouns
Verbs
Adjectives
Adverbs
Prepositions
Pronouns
Articles
Conjunctions
ELL students acquire a great deal of vocabulary without instruction, particularly vocabulary that
they frequently hear, words that represent tangible or concrete experiences, or words that related
to the students‟ immediate experiences.
ELL students often use relatively general words, and often, teachers use simplified vocabulary to
make meaning more comprehensible. However, ELL students need to learn the subtle
distinctions of vocabulary, e.g., look, stare, glare, gaze, peer, watch, see.
Two-word verbs may challenge ELL students because they can resemble verb + preposition but
mean different things: Look up a word v. Look up a chimney. Get on the bus v. Get on with your
business.
Language arts classes cover such prefixes as un-, mis- and re-. However, many words such
as prepositions can serve as prefixes to create new words: outshine, outrun, overeat, overdo,
overreact, underachieve, undercut.
Morphology refers to the components of words, such as their base forms, prefixes, suffixes, and
inflectional and derivational endings, and even changes in the base forms themselves to indicate
syntactic roles such as tense (am v. was, eat v. ate, etc.) Common morphemes include:
Third-person –s
Other inflections for person, e.g., am, is, are
Plural –s or –es
56
Other inflections for number, e.g., ox, oxen
Tense and aspect markers, e.g., -ed, -en, -ing
Derivational suffixes, e.g., -er, -ing, -able
Illocutionary competencies refer to the ability to use English, applying correct forms, to
communicate or understand communication. Illocutionary competencies that may appear on the
ELPA are ideational and manipulative functions.
Ideational functions communicate ideas from one person to another, e.g., describing actions,
expressing likes and dislikes, comparing and contrasting, explaining, defining, cause and effect,
and sequencing. Those are listed in the standards document. Ideational functions are prevalent in
instruction. Examples of anguage forms that can occur in ideational functions include big, bigger
than, less than, similar to, and different from, for comparing and contrasting; prefer and would
rather for expressing likes and dislikes; because, as a result, for cause and effect; before, after,
having completed, for sequencing or describing temporal relations.
Manipulative functions are the use of language to get something done or influence behavior,
such as requesting or giving instructions. Language forms that occur in manipulative functions
might include the imperative, e.g., Sit down. Other forms can also be used, such as Would you
please, I’d like for you to, Why don’t you, and many others.
57
APPENDIX F
58
59
60
61
62
63
APPENDIX G
STANDARD ERROR OF MEASUREMENT (SEM)
TOTAL PROFICIENCY
0
2
4
6
8
10
12
14
16
430 440 450 460 470 480 490 500 510 520 530 540 550 560
SE
M
Composite Score
SEM Conditioned on the Composite ScoreGrade Band K to 1
64
0
2
4
6
8
10
12
430 440 450 460 470 480 490 500 510 520 530 540 550 560
SE
M
Composite Score
Test SEM conditioned on Composite ScoreGrade Band 2 to 3
0
1
2
3
4
5
6
470 480 490 500 510 520 530 540 550
SE
M
Composite Score
SEM Conditioned on Composite ScoreGrade Band 4 to 5
65
0
1
2
3
4
5
6
460 470 480 490 500 510 520 530 540 550
SE
M
Composite Scores
SEM Conditioned on Composite ScoresGrade Band 6 to 8
0
1
2
3
4
5
6
7
460 470 480 490 500 510 520 530 540 550 560
SE
M
Composite Score
SEM Conditioned on Composite ScoreGrade Band 9 to 12
66
LISTENING
0
5
10
15
20
25
450 460 470 480 490 500 510 520 530 540 550
SE
M
Listening Score
SEM Conditioned on Listening ScoreGrade Band K to 1
0
2
4
6
8
10
12
14
16
18
20
450 460 470 480 490 500 510 520 530 540 550
SE
M
Listening Score
SEM Conditioned on Listening ScoreGrade Band 2 to 3
67
0
2
4
6
8
10
12
14
16
450 460 470 480 490 500 510 520 530 540 550
SE
M
Listening Score
SEM Condition on Listrening ScoreGrade Band 4 to 5
0
2
4
6
8
10
12
14
460 470 480 490 500 510 520 530 540 550
SE
M
Listening Score
SEM Conditioned on Listening ScoreGrade Band 6 to 8
68
0.00
2.00
4.00
6.00
8.00
10.00
12.00
14.00
440 460 480 500 520 540 560
SE
M
Listening Score
SEM Conditioned on Listening ScoreGrade Band 9 to 12
69
SPEAKING
0
2
4
6
8
10
12
14
460 470 480 490 500 510 520 530 540
SE
M
Speach Score
SEM Conditioned on Speaking ScoreGrade Band K to 1
70
0
2
4
6
8
10
12
14
16
460 470 480 490 500 510 520 530 540
SE
M
Speaking Score
SEM Conditioned on Speaking ScoreGrade Band 2 to 3
0
5
10
15
20
25
30
460 470 480 490 500 510 520 530 540 550 560
SE
M
Speach Score
SEM Conditioned on the Speaking ScoreGrade Band 4 to 5
71
0
5
10
15
20
25
30
450 460 470 480 490 500 510 520 530 540 550 560
SE
M
Speach Score
SEM Conditioned on Speaking ScoreGrade Band 6 to 8
0
5
10
15
20
25
30
0 100 200 300 400 500 600
SE
M
Speach Score
SEM Conditioned on Speaking ScoreGrade Band 9 to 12
72
READING
0.00
2.00
4.00
6.00
8.00
10.00
12.00
14.00
16.00
440 460 480 500 520 540 560
SE
M
Reading Score
SEM Conditioned on Reading ScoreBand K to 1
73
0
2
4
6
8
10
12
14
16
440 460 480 500 520 540 560
SE
M
Reading Score
SEM Conditioned on the Reading ScoreGrade Band 2 to 3
0
2
4
6
8
10
12
14
16
18
450 460 470 480 490 500 510 520 530 540 550
SE
M
Reading Score
SEM Conditioned on the Reading ScoreBand 4 to 5
74
0.00
2.00
4.00
6.00
8.00
10.00
12.00
14.00
16.00
460 480 500 520 540 560
SE
M
Reading Score
SEM Conditioned on Reading ScoreTest Grade Band 6 to 8
0.00
2.00
4.00
6.00
8.00
10.00
12.00
14.00
16.00
440 460 480 500 520 540 560
SE
M
Reading Score
SEM Conditioned on Reading ScoreGrade Bands 9 to 12
75
WRITING
0.00
2.00
4.00
6.00
8.00
10.00
12.00
14.00
16.00
440 460 480 500 520 540
SE
M
Writing Score
SEM Conditioned on Writing ScoreGrade Bands K to 1
76
0
2
4
6
8
10
12
14
16
18
450 460 470 480 490 500 510 520 530 540 550 560
SE
M
Writing Score
SEM Conditioned on Writing ScoreGrade Band 2 to 3
0.00
2.00
4.00
6.00
8.00
10.00
12.00
14.00
16.00
18.00
440 460 480 500 520 540 560
SE
M
Writing Score
SEM Conditioned on Writing ScoreGrade Band 4 to 5
77
0
2
4
6
8
10
12
14
450 460 470 480 490 500 510 520 530 540 550
SE
M
Writing Score
SEM Conditioned on Writing ScoreGrade Band 6 to 8
0
2
4
6
8
10
12
14
450 460 470 480 490 500 510 520 530 540 550 560
SE
M
Writing Score
SEM Conditioned on Writing ScoreGrade Band 9 to 12
78
COMPREHENSION
(Listening and Reading)
0
2
4
6
8
10
12
14
16
18
450 460 470 480 490 500 510 520 530 540 550
SE
M
Comprehension Score
SEM Conditioned on Comprehension ScoreGrade Band K to 1
79
0
2
4
6
8
10
12
14
440 460 480 500 520 540 560
SE
M
Comprehension Score
SEM Conditioned on Comprehension ScoreGrade Band 2 to 3
0
2
4
6
8
10
12
460 480 500 520 540 560
SE
M
Comprehension Score
SEM Conditioned on Comprehension ScoreGrade Band 4 to 5
80
0
2
4
6
8
10
12
460 470 480 490 500 510 520 530 540 550 560
SE
M
Comprehension Score
SEM Conditioned on Comprehension ScoreGrade Band 6 to 8
0
2
4
6
8
10
12
460 470 480 490 500 510 520 530 540 550 560
SE
M
Comprehension Score
SEM Conditioned on Comprehesion ScoreGrade Band 9 to 12
81
APPENDIX H
Executive Summary of Oregon English Language Proficiency
Examination Operational Item Analysis
Background on Oregon’s English Language Proficiency Assessment
Oregon’s English Language Proficiency Examination (ELPA) assess grades kindergarten through
12 measuring reading, listening, writing, and speaking with a calculated score for comprehension (combining reading and listening) and a composite score across the range of the domains. The
assessment employs two general item types, selected response (including multiple-choice,
picture-click, and cloze items) and constructed response (including, elicited-imitation, short-answer, word-builder, and extended-response items). Selected response items provide multiple
potential responses from which to choose a response. Constructed response items essentially allow free response and the response performance is scored by an established rubric. Rubrics can
be dichotomous (correct or incorrect) or polytomous (multiple score potentials).
ELPA is administered as a two-stage computer-adaptive multistage (ca-MST) test (Luecht &
Nungester, 1998, 2000; Luecht, 2004) structured for each of 5 grade bands (K-1, 2-3, 4-5, 6-8,
and 9-12). The test opens with a 30-item locator block presenting exactly the same questions to all students within a grade-band. Examinee responses to the locator block are scored so that the
examinee is routed immediately from the locator to one of three leveled follow-on tests of 50-items each for each grade band. Using this model, the test can achieve greater precision than
could be garnered from a single test administered to all students within a grade band with far
fewer items than would be necessary otherwise.
Using techniques collectively described as Item Response Theory, all of the tests can be placed
onto a single scale. Item Response Theory provides a means of placing each item on the test onto a scale of difficulty. Equating across tests is done by including common items across the
tests and fixing the scale at the difficulty point for each of these common items.
Item Analysis Methods Applied to EPLA
Two types of item analyses were performed for the 494 operational items used during the Spring
2007 ELPA administration: (1) a modified classical item analysis and (2) a concurrent IRT calibration using the Rasch model. The reason for these two analyses is explained below.
Modified classical item analysis served to evaluate patterns of distractors for selected response items and frequency of scoring patterns for the constructed response items. The principal
modification of this approach was the use of an external proficiency score as a grouping variable
for item-test correlations. This was essential because the number-correct total score is confounded with item difficulty under the ca-MST design. This item-difficulty confounding can
also carry over to the item statistics produced during a classical item analysis. Because difficult items appear easier because they are only administered to higher-proficiency examinees and
easier items may appear more difficult because they are only administered to lower-proficiency
examinees, typical analysis tends to deflate item means. Item standard deviations likewise are “range restricted” and the associated item-test correlations are similarly systematically reduced.
To avoid some of these variances and range-restrictions, IRT scores based on a concurrent calibration of all operational items were used in conjunction with the item analysis. A special,
modified version of the Classical ITem ANalysis (CITAN, Luecht, 2005) program was used for the
82
operational analyses that included conditioning on external scores—in this case, the estimated
IRT θ scores from the Spring 2006 examinee sample. A total of 494 items were analyzed for 62,296 examinees. The item analysis is comprised of two components: the classical item analysis
and the IRT-based WinSteps (Linacre, 2006) Rasch calibration analysis. This analysis provides an indication of item difficulty, independent of the ca-MST design pathways or routes and also
provides various fit analyses.
Modified Classical Item Analysis
A sparse 62,296 by 494 matrix (rows=items, columns=examinees) of raw responses was
analyzed using a modified version of CITAN (Luecht, 2005). CITAN provides a classical item and test analysis, including distractor analysis and high-low group statistics. For this analysis, the
program was modified to input an estimated proficiency score, θ, for each of the examinees. The proficiency scores were obtained from a concurrent, local Rasch calibration of all 62,296
examinees and all 494 items using WinSteps (Linacre, 2006). The estimated proficiency scores
were used in place of the number-correct total test scores for all score groupings and for computing all item-test correlations1. These proficiency scores are summarized in Table 1.
Table 1. Summary of Proficiency Scores (N=62,296)
Statistic Value
N (Examinees) 62296
Mean 0.86
Std. Deviation 1.26
Variance 1.58
Skewness -0.32
Kurtosis -0.45
Minimum -5.55
Maximum 5.48
These values match the results from the WinSteps calibration2, and as noted above, were used
by the modified version of the CITAN item analysis software to compute score groupings and for
computing all item-test correlations.
The classical item statistics are summarized in Table 2, reported by item type and then
aggregated for all 494 items. Item type codes are: CZ=cloze; EI=elicited information; ER=extended response; MC=multiple choice; PC=picture click; S2=short answer; and WB=word
builder items.
1 The point-biserial correlations produced by CITAN matched the WinSteps point-biserial
correlations exactly. 2 Examinees with extreme values are trimmed from the WinSteps summary report. Extreme
scores are assigned by the software for examinees with near-perfect or near-null total-test scores. All examinees are summarized in Table 1, including examinees with extreme scores.
83
Table 2. Summary of Item Statistics by Item Type and For All Items (n=494)
Item
Type Statistics
Item
Mean Item SD
Item Min.
Score
Item Max.
Score r(pbis) r(bis) Np
CZ Item Count 38 Minimum 0.177 0.262 0 1 0.163 1002
Maximum 0.926 0.500 0 1 0.565 23944
Mean 0.611 0.451 0 1 0.402 10178.658 Std. Dev. 0.180 0.056 0 0 0.109 5125.450
EI Item Count 18
Minimum 0.170 0.376 0 1 0.257 5365 Maximum 0.804 0.500 0 1 0.433 45495
Mean 0.512 0.463 0 1 0.334 24255.556 Std. Dev. 0.191 0.039 0 0 0.049 12388.371
ER Item Count 24
Minimum 1.293 0.528 0 3 0.326 1824 Maximum 2.403 0.976 0 3 0.535 17942
Mean 1.803 0.743 0 3 0.436 9229.917 Std. Dev. 0.261 0.124 0 0 0.053 3710.058
MC N 304
Minimum 0.161 0.178 0 1 0.015 0.021 900 Maximum 0.967 0.500 0 1 0.662 0.857 36470
Mean 0.622 0.448 0 1 0.358 0.480 9349.313 Std. Dev. 0.175 0.060 0 0 0.117 0.162 5972.248
PC N 66
Minimum 0.192 0.278 0 1 0.111 0.186 900 Maximum 0.916 0.500 0 1 0.642 0.805 16897
Mean 0.660 0.437 0 1 0.346 0.464 6577.682
Std. Dev. 0.173 0.061 0 0 0.128 0.156 5118.222
S2 Item Count 12
Minimum 1.501 0.334 0 2 0.305 9472 Maximum 1.903 0.727 0 2 0.404 21148
Mean 1.751 0.483 0 2 0.364 11849.667
Std. Dev. 0.132 0.126 0 0 0.034 4419.253
WB Item Count 32
Minimum 0.170 0.339 0 1 0.158 2176
Maximum 0.868 0.500 0 1 0.611 15743 Mean 0.494 0.471 0 1 0.373 7988.219
Std. Dev. 0.166 0.040 0 0 0.110 4789.984
Total Item Count 494 Minimum 0.161 0.178 0 1 0.015 0.021 900
Maximum 2.403 0.976 0 3 0.662 0.857 45495 Mean 0.699 0.464 0 1.121 0.364 0.477 9552.721
Std. Dev. 0.356 0.091 0 0.452 0.114 0.161 6677.238
As shown, sample sizes ranged from 900 to 45,495 valid responses per item; the average number of examinee responses per item was approximately 9553. The means and standard
deviations of the item scores shown in the “All Items” block should be interpreted cautiously
since both selected response items (scored 0 or 1) and constructed response items are included, with the latter having raw score points ranging from 0 to 1, 0 to 2, or 0 to 3 points. The
intersections of rows labeled “Minimum” and “Maximum” with columns labeled “Min. Score” and “Max. Score” specify the appropriate range of scores for each item type.
84
Two item-test correlations are reported. The biserial correlations (rbis) are only reported for the
MC and PC selected-response items. The point biserial correlations (rpbis) are Pearson product-moment correlations. The biserial correlations are only shown for the SR items (item type = MC
or PC). Those point biserial correlations are typically lower than the biserial correlations. On average, the point biserial correlations are fairly consistent across item types, with the cloze
items demonstrating the highest degree of discrimination by a nominal margin. Item-test score
correlations of less than 0.10 should be investigated on an individual basis.
Twenty-two items were flagged as having at least one distractor other than the correct-answer
key showing a positive correlation with the proficiency scores, with the majority being multiple-choice items. The data suggest that, for the majority of those items, the positive non-key
distractor correlation was only nominally above zero. Nonetheless, the data indicate that these items should be substantively reviewed.
In general, the classical item analysis results suggest that the 494 operational items are
performing reasonably well. Some of the specific distractors for items flagged might be reviewed to discover a possible item writing fix that could avoid the positive correlations with the total test
proficiency scores for distractors.
IRT Analysis
A concurrent (all grade bands, K-12) local calibration was conducted in WinSteps (Linacre, 2006)
of all 494 operational items. 62,296 examinees were included in the calibration. Raw score groupings and recoding of the ordered response categories was done within WinSteps for the
extended response (ER) item. ER items are normally scored on a 0 to 3 point scale. Some items are scored using two different scoring evaluators, grammatical aspects (g-scored) and
illocutionary aspects (i-scored): For the g-scored ER items, the recoding was Xi={0,1,2,3}{0,1,1,2}. For the i-scored ER items, the recoding was Xi={0,1,2,3}{0,0,1,2}.
This recoding was determined as part of calibration of the spring 2006 data.
Results from this analysis indicate that the ER items are the most difficult with positive mean b-
values. The short-answer (S2) items are easiest, and the remaining item types are moderately difficulty (mean b near zero). To interpret this mean difficulty, consider that the average
proficiency score for all 62,296 examinees is 0.86 of the θ metric. That translates to a probability of approximately 0.70 of correctly answering an average item on the ELPA.
MS(Infit), a statistic derived during IRT analysis, denotes the fit of the response data to the
Rasch model and is most sensitive to where the density of examinee scores is highest. A second statistic, MS(Outfit), indicates the fit of the Rasch model to the data for examinees who are
located further away for the item location (difficulty). Of the two measures, MS(Infit) is generally preferred because it tells us which items are potentially misfitting the calibration model for a
majority of the examinees. In general, values of MS() in the range 0.7 to 1.3 are considered to indicate a good-fitting item.
Review of these statistics reveals that, of the 458 items in score-group A, most fit quite well. This
is encouraging, given that the concurrent calibration puts all examinees, K-12, taking the reading, listening, writing, and speaking items, on a common scale. The extended response items exhibit
a small degree of misfit for several items, as do the short-answer items. A high degree of correlation was shown between the two fit indices. The most extreme values of the MS(Infit) and
MS(Outfit) include only six items that exhibited MS(Infit) values outside of the “good” range.
85
Discussion
In general, the operational items performed quite well. The modified classical item analysis suggested several items that were flagged as potentially too easy or too difficult, in addition to a
number of selected response items having slightly positive correlations between an incorrect distractor and the total test score. The Rasch IRT analysis suggested a reasonable range of item
difficulties, with the extended response items being the most difficult. The item misfit analysis
highlighted six items as having MS(Infit) values outside the “good” range (0.7 to 1.3). All were extended response items, but none of the misfit was overly extreme.
86
APPENDIX I
ITEM DISTRIBUTION
FORMS AND FUNCTIONS
The conceptual framework for the Oregon ELP Assessment is based on research in
the field of Education, Applied Linguistics and the English Language Acquisition
process. After a great deal of research into current linguistic models, Oregon has
adopted a framework which focuses on two major components of language
competence: Grammatical Competence and Illocutionary Competence. Each of
these is further sub-divided, resulting in a total of five assessable components of
language competence:
Grammatical Competence (Forms of Language)
1. Morphology
2. Vocabulary
3. Syntax
Illocutionary Competence (Functions of Language)
4. Ideational [replaces original‟s „Representative‟]
5. Manipulative
The tables below shows expected item distributions. Distribution of items among sub-
domains is fixed so that each has an equal number of items. This is because the design
must guarantee a usable sub-score for each sub-domain required by Title III of NCLB.
Grade Band K-1 (Form A—Beginning/Easy)
Subdomain
Forms - Grammatical Competence
Functions-Illocutionary Competence
TOTAL ITEMS
Morph- ology
Voca- bulary Syntax Ideational
Manipu- lative
Listening 8 10 2 20
Reading 19 2 21
Speaking 1 8 9
Writing 16 16
Total Items 1 43 8 12 2 66
87
Grade Band K-1 (Form B—Medium)
Subdomain
Forms - Grammatical Competence
Functions-Illocutionary Competence
TOTAL ITEMS
Morph- ology
Voca- bulary Syntax Ideational
Manipu- lative
Listening 8 10 2 20
Reading 22 3 25
Speaking 11 11
Writing 2 12 14
Total Items 2 42 11 13 2 70
Grade Band K-1 (Form C—Hard)
Subdomain
Forms -Grammatical Competence
Functions-Illocutionary Competence
TOTAL ITEMS
Morph- ology
Voca- bulary Syntax Ideational
Manipu- lative
Listening 9 9 2 20
Reading 3 21 2 26
Speaking 10 10
Writing 1 14 15
Total Items 4 44 10 11 2 71
88
Grade Band 2-3 (Form A—Beginning/Easy)
Subdomain
Forms -Grammatical Competence
Functions-Illocutionary Competence
TOTAL ITEMS
Morph- ology
Voca- bulary Syntax Ideational
Manipu- lative
Listening 7 14 21
Reading 3 13 5 21
Speaking 8 8
Writing 6 4 10 20
Total Items 9 24 8 29 70
Grade Band 2-3 (Form B—Medium)
Subdomain
Forms-Grammatical Competence
Functions-Illocutionary Competence
TOTAL ITEMS
Morph- ology
Voca- bulary Syntax Ideational
Manipu- lative
Listening 7 14 21
Reading 4 11 9 24
Speaking 12 1 13
Writing 6 3 9 18
Total Items 10 21 12 33 76
89
Grade Band 2-3 (Form C—Hard)
Subdomain
Forms - Grammatical Competence
Functions-Illocutionary Competence
TOTAL ITEMS
Morph- ology
Voca- bulary Syntax Ideational
Manipu- lative
Listening 9 13 22
Reading 3 12 6 21
Speaking 12 1 13
Writing 6 3 12 21
Total Items 9 24 12 32 77
Grade Band 4-5 (Form A—Beginning/Easy)
Subdomain
Forms - Grammatical Competence
Functions-Illocutionary Competence
TOTAL ITEMS
Morph- ology
Voca- bulary Syntax Ideational
Manipu- lative
Listening 5 18 23
Reading 4 17 21
Speaking 6 6
Writing 7 1 13 21
Total Items 7 10 6 48 71
90
Grade Band 4-5 (Form B—Medium)
Subdomain
Forms - Grammatical Competence
Functions-Illocutionary Competence
TOTAL ITEMS
Morph- ology
Voca- bulary Syntax Ideational
Manipu- lative
Listening 6 17 23
Reading 6 14 20
Speaking 8 2 10
Writing 9 13 22
Total Items 9 12 8 46 75
Grade Band 4-5 (Form C—Hard)
Subdomain
Forms - Grammatical Competence
Functions-Illocutionary Competence
TOTAL ITEMS
Morph- ology
Voca- bulary Syntax Ideational
Manipu- lative
Listening 3 17 20
Reading 6 18 24
Speaking 8 2 10
Writing 7 16 23
Total Items 7 9 8 53 77
91
Grade Band 6-8 (Form A—Beginning/Easy)
Subdomain
Forms - Grammatical Competence
Functions-Illocutionary Competence
TOTAL ITEMS
Morph- ology
Voca- bulary Syntax Ideational
Manipu- lative
Listening 10 12 22
Reading 6 13 19
Speaking 9 9
Writing 11 1 10 22
Total Items 11 17 9 35 72
Grade Band 6-8 (Form B—Medium)
Subdomain
Forms - Grammatical Competence
Functions-Illocutionary Competence
TOTAL ITEMS
Morph- ology
Voca- bulary Syntax Ideational
Manipu- lative
Listening 9 12 21
Reading 9 12 21
Speaking 10 2 12
Writing 9 14 23
Total Items 9 18 10 40 77
92
Grade Band 6-8 (Form C—Hard)
Subdomain
Forms - Grammatical Competence
Functions-Illocutionary Competence
TOTAL ITEMS
Morph- ology
Voca- bulary Syntax Ideational
Manipu- lative
Listening 8 12 20
Reading 9 15 24
Speaking 10 2 12
Writing 5 1 16 22
Total Items 5 18 10 45 78
Grade Band 9-12 (Form A—Beginning/Easy)
Subdomain
Forms - Grammatical Competence
Functions-Illocutionary Competence
TOTAL ITEMS
Morph- ology
Voca- bulary Syntax Ideational
Manipu- lative
Listening 6 15 1 22
Reading 4 18 22
Speaking 6 6
Writing 4 3 15 22
Total Items 4 13 6 48 1 72
93
Grade Band 9-12 (Form B—Medium)
Subdomain
Forms - Grammatical Competence
Functions-Illocutionary Competence
TOTAL ITEMS
Morph- ology
Voca- bulary Syntax Ideational
Manipu- lative
Listening 3 18 1 22
Reading 6 16 22
Speaking 8 2 10
Writing 3 2 17 22
Total Items 3 11 8 1 76
Grade Band 9-12 (Form C—Hard)
Subdomain
Forms - Grammatical Competence
Functions-Illocutionary Competence
TOTAL ITEMS
Morph- ology
Voca- bulary Syntax Ideational
Manipu- lative
Listening 3 17 1 21
Reading 4 17 21
Speaking 9 2 11
Writing 4 1 18 23
Total Items 4 8 9 54 1 76
Top Related