Download - Volume 10: Validity and Reliability - Oregon Department of ... · PDF file503-947-5600 ... Providing an annual English language proficiency score and level for ... ELPA Validity and

Last updated on November 16, 2009

Oregon Department of Education

2007–2008 ELPA Validity and Reliability Oregon’s Statewide Assessment System

Annual Report Volume 10

Oregon’s Statewide Assessment System ELPA Validity and Reliability: Volume 10, Annual Report

Last updated on November 16, 2009

It is the policy of the State Board of Education and a priority of the Oregon Department of Education that there will be no discrimination or harassment on the grounds of race, color, sex, marital status, religion, national origin, age, or handicap in any educational programs, activities, or employment. Persons having questions about equal opportunity and nondiscrimination should contact the state superintendent of public instruction at the Oregon Department of Education.

Oregon Department of Education Office of Assessment and Information Services 255 Capitol Street NE Salem, OR 97310 503-947-5600 http://www.ode.state.or.us/ Susan Castillo State Superintendent of Public Instruction Doug Kosty Assistant Superintendent Tony Alpert Director, Assessment and Evaluation Kathleen Vanderwall Manager, Test Development Stephen Slater Manager, Psychometrics and Validity Melinda Bessner Manager, Analysis and Reporting

Ken Hermens Language Arts Assessment Specialist Leslie Phillips Science and Social Sciences Assessment Specialist Jim Leigh Mathematics Assessment Specialist Guillaume Gendre ELPA Assessment Specialist Cindy Barrick Research Analyst Tom Tinkler Psychometrics Specialist Saleem Ahmad Research Analyst Sheila Somerville Electronic Publishing Specialist

http://www.ode.state.or.us/

This technical report is one of a series that describes the development of Oregon’s Statewide Assessment System. The complete set of volumes provides comprehensive documentation of the development, procedures, technical adequacy, and results of the system:

Volume 1: 2007–2008 Annual Technical Report Volume 2: Test Development Volume 3: Standard Setting Volume 4: Reliability and Validity Volume 5: Test Administration Volume 6: Score Interpretation Guide Volume 7: Alternate Assessment, Program Description Volume 8: Alternate Assessment, 2005-06 Statistical Summary Volume 9: ELPA Test Development Volume 10: ELPA Validity and Reliability All volumes can be found at http://www.ode.state.or.us/search/page/?id=787.

http://www.ode.state.or.us/search/page/?id=787

ELPA Validity and Reliability Vol.10 Page 1 of 96

Section 1.0 – Overview - English Language Proficiency Assessment (ELPA) ............ 3 1.1 – Purpose of ELPA ................................................................................................... 3 1.2 – Oregon Administrative Rule #581-023-0100 ....................................................... 4 1.3 - Oregon‟s English Language Proficiency (ELP) Standards .................................... 4

1.4 – Defining Academic English ................................................................................... 5 1.5 - Academic Contexts ................................................................................................. 5 1.6 – Assessment Features .............................................................................................. 6

Section 2.0 - Introduction to Technical Adequacy ......................................................... 6 2.1 - Overview ................................................................................................................ 6

2.2 Computer Adaptive Administration ........................................................................ 7 2.3 - Assessment Scaling ................................................................................................ 8 2.4 - Field Testing ........................................................................................................... 8 2.5 – Annual Embedded Field-Testing Method ............................................................ 10

Section 3.0 – Content Validity........................................................................................ 10 3.1 - Rigorous Content Standards ................................................................................. 10

Section 3.2 – Consensus Driven Test Development ...................................................... 11 3.2.1 – Key Decisions ................................................................................................... 11

3.2.2 – Effective Test Administration and Design ........................................................ 12 3.2.3 – Research Based Conceptual Framework - Forms and Functions ..................... 13 3.2.4 – Technology Matrix............................................................................................ 14

3.3 – Consensus Driven Item Development ................................................................. 14 3.3.1 - Life of an ELPA Item ................................................................................... 14

3.3.2 -Principle Item Types; Relation to Domains ................................................... 15 3.3.3 – Distribution Across Grade Bands ................................................................. 16 3.3.4 – Order of Delivery .......................................................................................... 16

3.3.5 – Item Type Explanations ................................................................................ 17

3.4 - Test Specifications ................................................................................................ 19 3.4.1 – Relation to Validity ....................................................................................... 19 3.4.2 – Alignment History ........................................................................................ 19

3.4.3 – Ensuring Item Alignment with the Construct and Standards........................ 20

Section 4.0 Concurrent Validity ................................................................................... 21 4.1 - Explanation .......................................................................................................... 21 4.2 – Description of Consistency .................................................................................. 21

Section 5.0 – Reliability ................................................................................................. 22 5.1 - Standard Error of Measure ................................................................................... 22 5.2 - Item Analysis Methods for the ELPA .................................................................. 23

5.2.1 – Purpose of Item Analysis .............................................................................. 23 5.2.2 – Summary of Item Analysis Results............................................................... 24

5.3 - Strand Reliability .................................................................................................. 24 5.3.1 – Reliability Thought Number of Items ........................................................... 24

5.3.2 – Reliability Through Standard Setting and Precision at the Cut Scores ........ 24

Section 6.0 - Fairness and Accessibility ........................................................................ 26 6.1 – Test Administration ............................................................................................. 26

6.1.1 - Testing Requirements to Produce Valid Test Results ................................... 26 6.1.2 - Security of the Test Environment .................................................................. 27 6.1.3 - Testing Improprieties ..................................................................................... 27


6.1.4 - Responding to Student Questions .................................................................. 28

6.1.5 – Testing Irregularities ..................................................................................... 28 6.2 – Sensitivity Panel Review ..................................................................................... 28 6.3 – Differential Item Analysis.................................................................................... 29


Volume 10: VALIDITY AND RELIABILITY

Section 1.0 – Overview - English Language Proficiency Assessment (ELPA) 1.1 – Purpose of ELPA The purpose of Oregon‟s English Language Proficiency Assessment (ELPA) is to assess

academic English ability in reading, writing, listening, speaking, and comprehension for

English Language Learners (ELLs) enrolled in Oregon public schools in grades K-12.

As part of the No Child Left Behind Act (NCLB) enacted in 2001, states must annually

measure and report progress toward and attainment of English language proficiency by

ELLs enrolled in public schools. Under NCLB, states must develop English Language

Proficiency (ELP) content standards linked to content standards including those for

English Language Arts. Oregon English Language Proficiency test is aligned to the

forms and functions of the Oregon ELP content standards and describes the English

proficiency of students based on 6 domains: Total Proficiency, listening, speaking,

reading, writing and comprehension. Comprehension is a combination of the reading and

listening measures. Total Proficiency is a combination of listening, speaking, reading

and writing.

Oregon‟s ELP assessment is designed to satisfy the provisions of Title III of NCLB.

Scores are to be used for:

Providing an annual English language proficiency score and level for each

student;

Reporting annual measures of speaking, reading, listening, writing and

comprehension for each student;

Reporting Annual Measurable Achievement Objectives (AMAOs)

biennially to the federal government. Because ELLs enter school systems at

different ages with different degrees of English proficiency, AMAOs can be

based on cohorts, groups of students entering at a common age and

proficiency level.

AMAO #1: The number and percent of students making progress toward

English proficiency

AMAO #2: The number and percent of students attaining English

proficiency at the end of each school year

ELPA scores will not be used as the sole criteria for exiting students from English

development programs. Each district will continue to construct its own criteria and

procedures for ending services to students as they become fully proficient. ELP

assessment results may inform exit decisions as part of a set of evidence including

teacher recommendation, grades and other information supporting exit decisions.


1.2 – Oregon Administrative Rule #581-023-0100 Sections Relevant to ELPA (see Appendix A for entire rule)

(d) "Language Minority Student" means:

(A) Individuals whose native language is not English; or

(B) Individuals who come from environments where a language other than English is

dominant; or

(C) Individuals who are Native Americans or Native Alaskans and who come from

environments where a language other than English has had a significant impact on their

level of English proficiency.

(4) Pursuant to ORS 327.013(7)(a)(B), the resident school districts shall receive an

additional .5 times the ADM of all eligible students enrolled in an English as a Second

Language program. To be eligible, a student must be in the ADM of the school district in

grades K through 12 and be a language minority student attending English as a Second

Language (ESL) classes in a program which meets basic U.S. Department of Education,

Office of Civil Rights guidelines. These guidelines provide for:

(a) A systematic procedure for identifying students who may need ESL classes, and for

assessing their language acquisition and academic needs;

(b) A planned program for ESL and academic development, using instructional

methodologies recognized as effective with language minority students;

(c) Instruction by credentialed staff and trained in instructional strategies that are

effective with second language learners and language minority students, or by tutors

supervised by credentialed staff trained in instructional strategies that are effective with

second language learners and language minority students;

(d) Adequate equipment and instructional materials;

(e) Evaluation of program effectiveness in preparing ESL students for academic success

in the mainstream curriculum.

1.3 - Oregon’s English Language Proficiency (ELP) Standards The Oregon Department of Education, in partnership with educators throughout the state,

developed Oregon‟s English Language Proficiency Standards. These standards describe

progressive levels of competence in English acquisition for five proficiency levels:

beginning, early intermediate, intermediate, early advanced and advanced. English

language proficiency levels set clear benchmarks of progress that reflect differences for

students entering school at various grade levels.

As specified in Title III of NCLB, ELP content standards are designed to supplement the

existing ELA academic content standards to facilitate students‟ transitioning into regular

education content classes. ELP Standards were designed to guide language acquisition

to allow English Language Learners to successfully participate in regular education

classes. ELP assessments measure ELP standards, not English Language Arts (ELA)

standards. This is an important distinction, as ELP content validity is based on the degree

to which tests reflect ELP content standards, which, although designed to supplement the

ELA standards, is quite different in structure and meaning. ELLs are required to take


ELP assessments in addition to ELA and other content assessments. Therefore, the

domain of ELP assessments differs from English Language Arts.

1.4 – Defining Academic English For the purpose of this test, Academic English is defined broadly as the English

necessary to function and communicate successfully in the United States‟ school system.

It includes the language of interaction between students and teachers (How are you?;

Would you help me please?), vocabulary related to the school and classroom objects

(blackboard, pencil, dictionary, library), direction of student behavior (line-up, go to the

cafeteria, recess ends at 12:30), explicit content language (osmosis, square root, quarter

note), and reading passages connected to content standards and responding to questions

based on the reading passage (The first flying craft constructed by the Wright brothers

was a glider, which they flew like a kite. In the story the word “constructed” means the

same as built, bought, crashed, found.)

Regardless of specific language types found throughout the test, an important

consideration in the creation of ELPA concerns the differences inherent in testing

academic language as opposed to prior knowledge of a content area (see Academic

Contexts below).

1.5 - Academic Contexts Because language use is always couched within a context, ELPA was designed to include

a number of different school-related situations and contexts, such as the following:

Math

Science

Social studies

Language arts

Supplementary (art, music, drama, sports, recess, library, cafeteria)

However, this test is constructed such that language skills are assessed independently of

any potential knowledge of subject matter, or lack thereof. The inclusion of context-

based items does not assume that the student possesses prior knowledge of explicit

content for these areas. Contexts differ from content, and should not be equated. Thus, a

dialogue between two students may take place in the science lab (context) and discuss the

class‟s assignment (content), but the language skill being tested might be verb

conjugation, not science content (e.g. Yesterday we learned how to use the microscope;

the remaining foils might be learn, learning, learns.) An ELPA item set within a science

context will not require students to have prior knowledge of, for example, the various

parts of a microscope, or the parts of a cell, in order to successfully complete the item.

That is, ELPA is not designed to assess content of specific subjects; rather, test items are

situated within, and draw upon the language of, familiar school-related contexts.


1.6 – Assessment Features

The State of Oregon ELP Assessment has the following features:

Web-based adaptive

Research-based and documented

Aligned to the Oregon ELP (English Language Proficiency) standards

Aligned to the Oregon ELA (English Language Arts) content standards

Valid and reliable

Conducted in English

Tests the following grade bands: K-1, 2-3, 4-5, 6-8 and 9-12 and is required of all

ELLs enrolled in these grades

Delivered within an assessment window

Produces a score and level for overall academic English proficiency. Cut points

are established on the overall English proficiency scale.

Produces sub-scores in four sub-domains: listening, speaking, writing, and

reading

Reports a measure of comprehension as a combination of listening and reading

Demonstrates growth in English language acquisition skills over time.

Applicable to students of any language or cultural background

Supports Title I accountability and Title III program evaluation in local school

districts.

Section 2.0 - Introduction to Technical Adequacy

2.1 - Overview

The Oregon English Language Proficiency Examination (ELPA) is an across grade

(Kindergarten through 12th Grade), multi-domain assessment covering reading, listening,

writing, and speaking. Comprehension is derived from reading and listening scores; a

Total Proficiency score is derived from the first four domains). The assessment employs

multiple item types including multiple-choice (MC); picture-click (PC) items; cloze (CZ)

items; elicited information (EI) items; short-answer (S2), word-builder (WB) items; and

extended response (ER) items.

For purposes of scoring and item analysis, items can generally be classified into one

of two categories: selected-response (SR) or constructed-response (CR). SR items

typically provide multiple response options and require the examinee to select one of the

options1. CR items essentially allow free response and the response performance is

scored by some established rubric. Rubrics can be dichotomous (i.e., correct=1, incorrect)

or polytomous, with scores ranging from 0 to 3 points.


2.2 Computer Adaptive Administration The ELPA is administered as a two-stage computer-adaptive multistage (ca-

MST) test (Luecht & Nungester, 1998, 2000; Luecht, 2004). This type of test presents a

fixed-length locator block. If an examinee scores poorly on the locator block, (s)he is

routed to an easier testlet of items. If an examinee performs extremely well on the locator

block, (s)he is routed to an a harder block of items; otherwise, the examinee is

administered a moderate-difficulty block. Figure 1 presents the generic ca-MST design.

Target

Item Locator Block A (Easier)

Block B

(Moderate)

Block C

(Difficult)

Difficulty L S R W L S R W L S R W L S R W

-2.0 1 0 2 1 3 5 3 2 0 0 0 0 0 0 0 0

-1.5 2 0 2 2 2 5 2 3 3 5 3 2 0 0 0 0

-1.0 2 0 1 2 3 5 3 2 2 5 2 3 3 5 3 2

+1.0 2 0 1 2 2 5 2 3 3 5 3 2 2 5 2 3

+1.5 2 0 2 2 0 0 0 0 2 5 2 3 3 5 3 2

+2.0 1 0 2 1 0 0 0 0 0 0 0 0 2 5 2 3

Column

Totals

1

0 0 10 10 10 20 10 10 10 20 10 10 10 20 10 10

Figure 1. ca-MST Layout for the Oregon ELPA

Multiple selections can be accommodated under a SR item format; however, no mSR

items are included on the ELPA.

Reading from the left, the “Target Item Difficulty” column describes the relative

difficulty of the items, where a minus sign indicates easier items and a plus sign

indicates harder items. The column headers, L, R, S, and W, denote the domain

(listening, reading, speaking, and writing). The locator blocks therefore contain 30 items,

followed by a block of up to 50 easy (Block A), moderate (Block B), or difficult (Block

C) items. This type of ca-MST design is statistically more efficient than a fixed test form

because it tailors the difficulty of the item block or “testlet” to the examinee‟s apparent

ability, resulting in more accurate scores (Luecht & Nungester, 1998). Item response

theory (IRT; Lord, 1980; Hambleton & Swaminathan, 1985) is used to calibrate all items

to a common scale, denoted θ. Despite being administered potentially items of differing

difficulty under the ca-MST design, IRT scoring can put all examinees on the same

measurement scale.

The ca-MST design described in Figure 1 reflects only the approximate item counts for

the operational (i.e., scored) items within domains and blocks. Individual cs4 MST

“panels” (i.e., the combination of a locator block and the three possible second stage

blocks for a particular grade level) may vary slightly in item composition, given the

availability of items in the ELPA item bank. The stage-two blocks also have “pretest”

slots to try-out new ELPA items. The new, grade-level-appropriate items are randomly

seeded into the pretest slots for purposes of gathering data solely to determine the

psychometric and statistical quality of the pretest items. Pretest items do not appear on


the locator blocks and do enter into scoring for any students. The pretest items are

subsequently added to the ELPA item bank for possible inclusion on future test forms.

A total of 496 operational items and 218 pretest items were administered in

Spring 2008 across the five ELPA grade levels (K-1, 2-3, 4-5, 6-8, and 9-12). A cross

tabulation of operational item counts by grade-level block is presented in the CART

Report (see http://www.ode.state.or.us/search/page/?id=1561 – Cart Technical Report).

This listing provides exact item counts on the diagonal as well as shared-item counts

across blocks. Because the pretest items are randomly seeded onto the operational ca-

MST forms, it is not possible to specifically tie a pretest item to any ca-MST item block

(module). Therefore, the 218 pretest items are not reflected in those counts.

2.3 - Assessment Scaling Scaling decisions are based on the assumption that the four sub-domains (listening,

speaking, writing, and reading) work together to comprise a single English proficiency

scale. The scale is presumed to be unidimensional, although this assumption may be

revisited if data reveal pronounced dimensionality. The scale is a vertically linked

longitudinal scale, so that progress toward English proficiency can be measured as

required by Title III annual measurable achievement objectives. The comprehension

measure is derived from listening and reading scores. The method for this is a

mathematical formula, approved by a policy-making group.

There is an assumption that students at the same proficiency levels in adjacent grades

share substantial linguistic characteristics, differing primarily in developmental and social

factors. It is also assumed that transitional levels at upper grade bands will be higher than

those at lower bands. It is NOT assumed that students will grow one level each year.

Language acquisition experts have long agreed that younger students master linguistic

skill faster than older ones. They may disagree on the reasons for this, but everyone

agrees on the phenomenon. Proficiency levels represent stages of acquisition that

younger students, in general, work through faster than older ones. Consequently,

language proficiency levels across grades may look different than those for other content

areas where it is assumed that achievement levels across grades are progressively higher

on vertical scales. In ELP assessment, vertical linking blocks across grade bands may use

items of similar difficulty for all or most bands. Given these considerations, linking

blocks for the operational tests contain items from throughout the difficulty continuum.

2.4 - Field Testing The original field test was conducted with a minimum of 6000 students and provided

preliminary difficulty levels for items that fed the Spring 06 baseline test. Each student

took four blocks of 20 items each. The blocks represented only two sub-domains of

varying combinations (reading and speaking; speaking and writing; writing and listening;

etc.).

In addition to providing within-grade scaling and item calibration, fall field testing

allowed a dimensionality study to be conducted (see Appendix B). We wanted to know



whether English Language Proficiency is a single skill, resting on acquisition of functions

and forms, or a combination of several skills with student responses more dependent on

sub-domain platforms than overall English proficiency.

The winter/early spring ‟06 field test was for linking item difficulty to form the vertical

scale. Fall ‟05 and Winter ‟06 field tests provided scaled items for the Spring 2006

Baseline ELP assessment. The design for this assessment appears below..

GENERIC LOCATOR TEST - Operational 2006

ELP Level L R S W L R S W L R S W L R S W

1 1 2 0 1 3 3 5 2

2 2 2 0 2 2 2 5 3 3 3 5 2

3 2 1 0 2 3 3 5 2 3 3 5 2 3 3 5 2 <= Core Block

4 2 1 0 2 2 2 5 3 2 2 5 3 2 2 5 3

5 2 2 0 2 2 2 5 3 2 2 5 3

6 1 2 0 1 3 3 5 2

Domain

total per

block 10 10 0 10 10 10 20 10 10 10 20 10 10 10 20 10

Block total 30 50 50 50

Numbers represent numbers of items, not points

Locator block contains all MC items, representing the full range of difficulty.

The SAME core set of 25 items repeats in blocks A, B, and C. These items are at intermediate difficulty.

Grade bands K-1 & 2-3

Block A contains NO SA2s or ERs (Speaking or Writing)

Blocks B & C contain 2 Speaking SA2s but NO Writing SA2s

All blocks contain NO ERs (Speaking or Writing)

Grade bands 4-12

Block A contains 2 Speaking SA2s and NO Writing SA2s

Block A contains no ERs (Speaking or Writing)

Blocks B and C contain 2 Writing SA2s and 2 Writing ERs (these are the exact same items in BOTH blocks)

Blocks B and C contain 2 Speaking SA2s and 2 Speaking ERs (these are the exact same items in BOTH blocks).

130 unique items

30 Locator

25 Core

25 Unique low

25 Unique mid

25 Unique high

130 TOTAL Unique Items

Block A Block B Block CLOCATOR Block


2.5 – Annual Embedded Field-Testing Method Each year field test items are loaded into a pool and randomly selected from among the

items in this pool for each instance of test administration resulting in equal coverage with

better data for analysis.

This plan is annually realized in the 180 embedded items and the 15 field test forms

loaded into the test delivery system. Selection of these 180 items is a direct result of the

review and approval by the Content and Sensitivity Review.

Field-testing requires broad exposure of the embedded items across multiple districts and

schools. Minimally, each item receives 600 exposures prior to any analyses. Past

experience has shown that the procedure used for random exposure of items results in

approximately 3500 exposures for each item during the course of the testing window.

Section 3.0 – Content Validity

Content validity is the degree to which an assessment measures the knowledge and skills

it was designed to measure. It is a consensus driven process, typically determined by

expert judgment.

Evidence of content validity includes the following:

3.1 - Rigorous content standards identifying what students should know and be

able to do that were developed and revised with comprehensive review by Oregon

educators, parents, and other citizens.

3.2 - A consensus-driven test and 3.3 item development process, using panels

of educators from around the state to make judgments about the content relevance

and representativeness of potential items and tasks that ensure test item

faithfulness.

3.3 4 - Test specifications that provide a clear link between the test content and

the content standards and their corresponding performance levels; ongoing studies

to evaluate and increase the extent that instruction, assessments, and the ELPA

Standards are aligned.

3.1 - Rigorous Content Standards Content Standards describe what students in Oregon should know and be able to do.

The ELP Standards delineate the proficiency levels required to move through the levels

of English-language development (see Appendix C - Stages of Language Acquisition –

Social Dimension, and Appendix D – Acquisition of Language Functions and Forms).

They are designed to move all students, regardless of their instructional program, into the

mainstream English-language arts curriculum. The levels of developing proficiency in a

second language have been well documented through research. The ELP Standards were

designed around these levels to provide teachers in all types of programs clear


benchmarks of progress. The standards provide different academic pathways that reflect

critical developmental differences for students who enter school at various grade levels.

The major benefit of adopting ELP Standards is to provide criteria that can be used to

document LEP students‟ progress or lack of progress in learning English.

A committee comprised of practitioners and experts in English language development

(ELD) and assessment developed the English Language Proficiency (ELP) Standards.

The standards were reviewed by teachers throughout Oregon, the draft standards were

posted on the ODE website for public comment. The standards were presented as an

informational item to State Board of Education (SBE) during their October 2003 meeting

with the understanding that the document would undergo some modifications and

additions to better align the ELP Standards with developmental proficiency levels and

with the Oregon English-Language Arts Content Standards that were adopted by the SBE

in January 2002 and June 2002 as well as the language used in the content standards of

mathematics, science and social studies.

Section 3.2 – Consensus Driven Test Development

A consensus driven test development process is important to ensure test validity.

Another important consideration, for both validity and reliability, is the development of a

test that is not too time consuming in the classroom or burdensome on the student.

3.2.1 – Key Decisions The following list summarizes the key decisions that were made by consensus with

regard to this assessment.

General 1. Testing will be conducted in English.

2. Assessment is not intended to be a placement or exit test. It is not intended to be

the only measure but rather one of many inputs to the overall plan for the student.

3. There are five grade groupings – K-1, 2-3, 4-5, 6-8, and 9-12. Tests cover grade

ranges; for example, there is a 4-5 test, not a fourth grade test, et cetera.

4. Tests are constructed to yield a single English language proficiency score which

maps directly to ELP levels (beginning, early intermediate, intermediate, early

advanced, advanced, transitional).

5. Proficiency level achievement standards were established for overall English

proficiency, not for sub-domains. Cut points will be established based on the

overall English proficiency scale.

6. Tests will report sub-scores in four sub-domains – reading, listening, writing and

speaking. A fifth sub-domain, comprehension, will be derived from sub-scores in

reading and listening.

7. Distribution of items among sub-domains is fixed so that each subdomain has an

equal number of items.


Standards 8. Standards include descriptors for six proficiency levels: proficient, advanced,

early advanced, intermediate, early intermediate, and beginning.

9. The ELP standards are designed to supplement the ELA standards to ensure that

LEP students develop proficiency in both the English language and the concepts

and skills contained in the ELA standards. This connection is not a perfect match

or one-to-one correspondence.

10. The ELP assessment must be aligned with the ELP standards. Alignment requires

that each item address a specific ELP content standard.

11. ELP tests are based on the subset of ELP standards that are assessable.

Measurement

12. The object of this set of assessments is to monitor growth in English proficiency

across time. Because of this and because of the nature of language acquisition, it

is desirable to use a longitudinal, vertically articulated scale for the English

language proficiency construct. This requires blocks of vertically linking items

between adjacent grade groups.

13. Test scores will be used in reference to proficiency criteria rather than

expectations generated by norms.

14. The overall proficiency score and level will be based on an English proficiency

scale, not on separate scales for each sub-domain. Sub-domains (reading, writing,

speaking and listening) are goals or strands within the overall English proficiency

construct.

3.2.2 – Effective Test Administration and Design A key goal is to create an assessment that measures accurately, is valid, reliable and

useful to the field. One of the important efforts is to create an assessment that is not too

time-consuming in the classroom and does not place undue burden on the student,

teachers or proctors during the course of test administration. This requires careful

attention to the efficiency of items in providing maximum information for each student.

In order to report sub-scores for both sub-domains and language functions, test forms are

designed so that items form a matrix with functions cutting across sub-domains. In this

way, information in both dimensions can be provided with minimum testing time. The

current design calls for an 80-item test delivered in two sessions corresponding to a

locator block followed by a leveled tier. The basic design calls for a computer-delivered

test consisting of a locator block with machine scored reading, writing and listening

items, followed by a targeted block selected from one of three leveled tiers. The locator

block has items placed at wide intervals along the scale to tell whether the student is in

the lower, middle or upper end of the proficiency continuum. The student is then given a

targeted block of appropriate difficulty. The student experiences the test as a single

session and will not be aware of the transition from the locator block to the leveled tier.

The computer adaptive/multi-stage test begins with 30 items to determine initial

proficiency. After the locator block, students face about 50 at-level items. This type of

test results in improved reliability and therefore provides better support for content


validity decisions. It is statistically more efficient than a fixed test form because it tailors

the difficulty of the item block or “testlet” to the examinee‟s apparent ability, resulting in

more accurate scores (Luecht & Nungester, 1998).

3.2.3 – Research Based Conceptual Framework - Forms and Functions The conceptual framework for the Oregon ELP Assessment is based on research in the

field of Education, Applied Linguistics and the English language Acquisition process.

After a great deal of research into current linguistic models, Oregon adopted a framework

which focuses on two major components of language competence: Grammatical

Competence and Illocutionary Competence. Each of these is further sub-divided,

resulting in a total of five assessable components of language competence (see Appendix

D for additional information on language functions and forms).

Grammatical Competence (Forms of Language)

1. Morphology (components of words)

2. Vocabulary (the words of the language or “parts of speech”)

3. Syntax (grammar)

Illocutionary Competence (Functions of Language)

4. Ideational (communication of ideas)

5. Manipulative (use of language to get something done)

The table below shows expected item distributions. Distribution of items among sub-

domains is fixed so that each has an equal number of items. This is because the design

must guarantee a usable sub-score for each sub-domain required by Title III of NCLB.

Subdomain

Forms - Grammatical Competence

Functions-Illocutionary Competence

TOTAL ITEMS

Morph- ology

Voca- bulary Syntax Ideational

Manipu- lative

Listening 20

Reading 20

Speaking 20

Writing 20

Total Items About 16 About 16 About 16 About 16 About 16 80

ELPA Test Specifications Page 14 of 96

3.2.4 – Technology Matrix For each grade band and sub-domain, an appropriate level of familiarity with computer

mousing and keyboarding is required. The chart below shows which skills are required

for each area. At early elementary grades (K-3), items are restricted to those that require

only speaking into a microphone and point-and-click mousing (but not drag-and-drop or

double clicking). Our pilot testing, interviews with Oregon teachers and research done in

other states reveal that students at this age master these skills easily.

Grade

Band

Domain and Computer Skill Required

Listening+ Speaking+ Reading Writing

K-1 Point & click mouse

skills

Speak into a microphone,

Point & click mouse skills

Point & click mouse skills Point & click mouse skills

2-3 Point & click mouse skills

Speak into a microphone, Point & click mouse skills

Point & click mouse skills Point & click mouse skills

4-5 Point & click mouse

skills



Point & click mouse skills Point & click mouse skills and

keyboard words, phrases, paragraphs, and sentences*


skills




keyboard words, phrases,

paragraphs, and sentences*


skills




keyboard words, phrases,

paragraphs, and sentences*

+ All students are provided with a combination headset/microphone unit for completion of listening and speaking items.

* Degree of keyboarding in Writing in grades 4 and above depends on proficiency level

3.3 – Consensus Driven Item Development

For evidence of content validity, the process by which items are written and reviewed is

critical. In the ELPA consensus-driven item development process, panels of educators

from around the state make judgments about content relevance, bias issues

representativeness of potential items, and tasks that ensure test item faithfulness.

3.3.1 - LIFE OF AN ELPA ITEM

After approximately 15 experienced Oregon teachers select text passages and determine

an appropriate proficiency and grade-level for each of them, a contractor employs

qualified teachers in Oregon to write items for ELPA. All items include audio (sound)

and visual (picture) components.

Oregon ELD teachers then review the items to verify that they are aligned with the forms

and functions of the ELP standards. Grade-level and proficiency levels are also verified.

In addition, a sensitivity panel reviews the items for bias. The assessment specialist

makes final recommendations for edits and revisions to the contractor.

Approved field test items are embedded in ELPA as part of the operational test. Data is

collected and analyzed to determine if the items “behave” as expected and staff calibrates

the items. Any item that is not “behaving” as expected is analyzed, revised and field-

tested again.


3.3.2 Principle Item Types; Relation to Domains All ELPA items consist of a stimulus, a stem, and, in the case of selected response items,

four foils. A stimulus may consist of a picture plus an audio or written text, or simply a

picture (all items, regardless of type, contain a graphic/picture prompt). A stem consists

of an audio and/or written prompt or question. Foils, where present, always number four

and may be in the form of text or pictures (but not a combination of the two), or text and

audio.

A variety of item types are designed to contribute to different aspects of English language

development. ELPA consists of four principle item types, some of which are presented

through various item sub-types. Some of these item types are presented in multiple sub-

domains, while others are used exclusively in one sub-domain:

Item Type Domains Score Points/

Forms and Functions

1 Selected Response (Grammatical and

Illocutionary)

Multiple Choice Reading, Listening,

Writing

0 or 1

Picture Click* Reading, Listening 0 or 1

2 Short Answer

Cloze* or

Word Builder*

(SA1)

Writing 0 or 1 (Grammatical -

morphology and vocabulary)

Descriptive Short

Answer (SA2)

Writing, Speaking Short answer, four points,

scored on a scale of 0, 1, 2

with two criteria g/i

(Grammatical and

Illocutionary)

3 Extended Response (ER) Writing, Speaking Six points scored on a scale of

0, 1, 2, 3 with two criteria g/i

(Grammatical and

Illocutionary)

4 Elicited Imitation Speaking 0 or 1 (Grammatical – syntax)

Each form (A, B and C) contains a mixture of selected response, short answer, extended

response, and elicited imitation items (see 3.3.5 for a detailed description of each item

type). Open-ended item types such as short answer and extended response are kept to a

minimum to facilitate quick and inexpensive scoring. All reading and listening items are

selected response. Writing items are divided among multiple choice, short answer

*Definitions - Picture Click – click on matching picture; Word Builder – fill in missing letters;

Cloze – fill in the blank.


and extended response item types. Speaking items are a mixture of elicited imitation,

short answer, and extended response. Extended response items are given only to students

in grade band 4-5 and above who receive the intermediate or advanced tier.

Each item is written to address the following information:

Grade level K-1, 2-3, 4-5, 6-8, 9-12

Sub-domain reading, writing, listening, speaking

Assessment Point grammatical (vocabulary, morphology, syntax),

illocutionary (ideational, manipulative)

Intended difficulty beginning, early intermediate, intermediate, early

advanced, advanced, proficient

Item Type selected response, short answer, extended response,

elicited imitation

3.3.3 – Distribution Across Grade Bands Item types are also sometimes grade band-specific; the following table shows the

distribution of items types across grade bands.

3.3.4 – Order of Delivery The test is administered such that the sub-domains and item types within the domains are

delivered in the following order:

(1) Reading (picture click [followed by] multiple choice)

(2) Writing (multiple choice short answer extended response)

(3) Listening (picture click multiple choice)

(4) Speaking (short answer extended response elicited imitation)

Item types within each sub-domain (picture click, multiple choice) are delivered such

that, in general, the least complex are presented first.

K-1 2-3 4-5 6-8 9-12

Reading Multiple Choice x x x x x

Picture Click x x x x x

Listening Multiple Choice x x x x x

Picture Click x x x x x

Writing Multiple Choice x x x x x

Word Builder x x - - -

Cloze - - x x x

SA2 - - x x x

Extended Response - - x x x

Speaking SA2 x x x x x

Extended Response - - x x x

Elicited Imitation x x x x x


3.3.5 – Item Type Explanations Selected response is essentially multiple choice. In SA1 items, a student has to produce a

small unit of language, e.g., a word, to get credit. In SA2 items, a student has to produce

language at more or less the sentence level to get credit. Extended response items require

that the student produce language consisting of several sentences to convey a message. In

elicited imitation, a student has to repeat verbatim a sentence he or she has heard.

Selected response items have a predetermined correct answer and are scored right or

wrong.

Short Answer-1 (SA1) items may have several acceptable responses, which are listed in

a look-up table. The student gets credit for any suitable response.

Short Answer-2 (SA2) and extended response (ER) items are scored on item-specific

rubrics. Thus the criteria for full credit on one item may differ from the criteria on

another item according to the complexity of responses obtained or the unique language

features elicited by the item, which could not be foreseen when the item was written. The

actual psychometric value of responses to different items lies not in the assigned score

but according to the overall ELPA scores of respondents who obtained given item scores.

A given rubric score should not be presumed to correspond to a given level of proficiency

absent information about the respondent‟s overall score.

Unlike stand-alone performance assessment prompts, SA2 and ER prompts are short

tasks of variable difficulty. They will be scaled for difficulty so that the rated response

becomes part of the set of responses to all items that generates the student‟s overall test

score. Consequently each item has its own scoring guide describing the specific

performance needed to earn each rating. Scoring guides may follow a common template,

but they contain item-specific information needed to inform the rating process. Rubrics

generally address both functional and grammatical elements, but do not require specific

language unless the directions call for this. Thus, the general prompt, “Tell about what is

in the picture,” will not necessarily evoke a specific tense or word ending, but will be

judged on overall content and grammatical form. Rubrics may take into account

communicative effectiveness (illocutionary competency), correctness of syntax and

appropriateness of vocabulary. Thus three different elements of eligible content may

influence the rubric and the score the student receives.

Title III of NCLB requires that English proficiency tests assess in four domains, reading,

writing, speaking, and listening. The following table shows which item types are used to

assess each domain

Item Type

Domain

Reading Writing Speaking Listening

Selected Response X X X

SA1 X X

SA2 X X

Extended Response X X

Elicited Imitation X


In most cases, there is not an exact match between item type and the eligible content

being assessed. However, the following table shows the kind of eligible content that an

item type may potentially assess.

Item Type

Eligible Content Syntax Morphology Vocabulary Ideational Manipulative

Selected Response X X X X X

SA1 X X

SA2 X X X X X

Extended Response X X X X X

Elicited Imitation X

For example:

Selected Response

A selected response item in listening or reading might require that a student distinguish

between what happened in the past v. the present using knowledge of verb tenses to get

an item right. Thus the assessment point would be tense, which is part of syntax.

A selected response item in reading might have a student see a picture of a desk and

choose which of four written words matches the picture, thus demonstrating the ability to

read the wordk “desk.” The assessment point would be vocabulary.

A selected response item in writing might require that a student recognize that “ate” v.

“eating,” “eat” or “eaten” describes what a student did the day before. The assessment

point would be the morphological inflection for the past tense of “eat.”

A selected response item in reading might require that a student use vocabulary and

syntax to understand that a conversation occurred yesterday in a library. The assessment

point would be the ideational competency. The response might also hinge upon a

student‟s understanding of certain words, thus the assessment point would be vocabulary.

A selected response listening item might require a student to understand the last thing that

needs to be done in a short series of steps in a science experiment. The assessment point

would be the manipulative competency, specifically, understanding of following

directions.

Short Answer

A short answer-1 item in writing or speaking might require a student to look at a picture

of a chicken and respond to the prompt, “What is this?” the student might write chicken,

rooster, hen, or even bird and receive credit.

A short answer-2 item in writing or speaking might require a student to see a picture of

students playing baseball and respond to the prompt, “What’s happening in the picture?”

Full credit might be given for such responses as, The students are playing a game, The

kids are playing baseball, They’re playing a sport, etc. Partial credit might be given for

They’re playing, Playing a game, Play a game, Baseball, etc. Thus full credit might be


given for clearly communicating (ideational function), correct grammar and appropriate

vocabulary and partial credit for appropriate syntax and vocabulary but failure to

communicate clearly with the ideational function.

Extended response items are designed to elicit more writing or speaking than short

answer-2 items. For example, a student might be asked to speak or write in response to a

prompt such as, “What are your hardest and your easiest classes? Describe what makes

one hard and one easy.” As in the SA2 items, full credit might depend on communicative

effectiveness, correctness and complexity of grammar, and clear use of vocabulary to

convey ideas, and partial credit might be assigned where syntax is flawed or a student

does not convey the complete ideas sought by the prompt.

Elicited imitation tasks are part of speaking. The student hears a sentence and is asked to

repeat it exactly as he or she heard it. For example, the student might hear: Mrs. Jones

teaches biology and chemistry but not physics. The student might get credit for saying,

“Mrs. Jones teaches chemistry and biology but not physics.” The order of the two

subjects was changed, but all the sentence elements were there, and the meaning did not

change. However, the student would not get credit for, “Mrs. Jones teaches biology and

chemistry. She doesn’t teach physics.” That response alters the syntax of the sentence and

converts one sentence into two somewhat simpler sentences. Elicited imitation response

items represent a range of syntactic complexity from simple sentences to complex

sentences with embedded clauses. The more syntactically complex sentences students can

repeat, the more proficiency they are in English.

3.4 - Test Specifications 3.4.1 – Relation to Validity Test specifications help ensure validity because they provide a clear link between the test

content and the content standards and their corresponding performance levels. One

particularly powerful source of support for intended interpretations of test scores is

documentation that each test item aligns to the knowledge or skill required to achieve the

content standards. Items are developed to measure these academic standards, per the

content specifications.. The Joint Standards, AERA, 1999, pages 11–12 in particular;

underscore the importance of this type of content evidence of validity.

Test specifications also define how the content standards are to be assessed (e.g., multiple

choice, state performance assessment, local work sample), provide further specificity to

the skills and knowledge expected of students, and convey to teachers what they can

expect on state assessments.

3.4.2 – Alignment History The earliest draft of the ELP standards was based on the state‟s English Language Arts

standards to comply with an NCLB requirement that the two be linked. In February 2004,


the Content and Assessment Panel reviewed that draft to identify which standards were a)

relevant to English proficiency as opposed to language arts and b) assessable. The

resultant document was condensed into consolidated standards because a great deal of

redundancy occurred among standards and between standards for the grade levels

grouped for the ELPA grade bands. That document was used to guide the first ELPA item

writing session in July 2004, and that document has maintained the Halliday coding

system.

EII and the ELPA team agreed that English proficiency is a separate construct from

English language arts, and in fact, the above-described language competency framework

was included in the standards document approved by the State Board of Education in

June 2004 in order to draw attention to that fact.

Subsequently, EII adopted the Bachman framework, which consists of the same major

elements but uses somewhat different terms than the Halliday framework. In order to

ensure consistency and clarity of communication, the ELPA project adopted the terms of

the Bachman framework. Therefore, the Bachman construct of language competence is

considered to comprise the essence of the English proficiency standards. For purposes of

test construction, the ELPA team determined that the eligible content for assessment from

the English Proficiency Standards would consist of these five components of the

Bachman framework (See p. 13 under Forms and Functions). The alignment of these

elements of the English proficiency construct is documented in the list of consolidated

standards. Appendix E further describes the components of the eligible content.

3.4.3 – Ensuring Item Alignment with the Construct and Standards In May 2005, the ELPA team and EII agreed on an approach to coding items‟ alignment

to the ELP standards based on the competency framework and the above-listed eligible

content. See Appendix F for Content/Assessment Panel Review Sheets (with

competency code).

When the Content and Assessment Panel met that month for item review, they used the

new approach rather than the language arts CCG system. Under the new approach, items

were coded to indicate which competency (syntax, vocabulary, morphology,

manipulative, ideational) was demonstrated by a student's correct response to an item, the

assessment point. In other words, what aspect of language does a student have to

command to receive credit for the item? All items, whether grammatical or illocutionary,

were also coded for the “functional context” as further evidence of standards alignment.

The ELP standards document lists 23 specific functions, and items are coded according to

that list.


Section 4.0 Concurrent Validity 4.1 Explanation A basic concept of validity is that persons who score high on a test should score high on

other measures of the same construct. To the extent that two measures address the same

latent construct, scores for the same individuals should agree. Conversely, a lack of

relationship with theoretically unrelated measures helps substantiate the meaning of the

test score. The extent to which related measures are correlated with the test scores and

support, or contradict, state assessment scores validate the measure of academic

achievement for the intended purposes.

4.2 – Description of Consistency The department provides a description of the consistency in English Language

Proficiency designations between the state‟s English Language Proficiency Assessment

(ELPA) and the Idea Proficiency Test (IPT), Language Assessment Scale (LAS), and

Woodcock-Muñoz Language Survey to help teachers understand and use the ELPA.

While this analysis should not be considered an equating or comparability study, it can

provide additional context for the ELPA by referring to tests about which teachers are

more familiar.

The ELPA data used in this analysis were collected in 2005-06. While the intent was to

collect the ELPA data via a random sample (i.e. student with even SSIDs), because of the

complex nature of the assessment and the circumstances, the sample is unlikely to be

completely random (i.e. some districts tested additional students and some students were

not assessed). In addition to the required ELPA testing, some districts chose to submit

commercial test data (e.g., IPT, LAS and Woodcock-Muñoz, Stanford Proficiency Test)

for some of students. The consistency analysis is based on the subset of 2005-06 Oregon

LEP students. This group included students obtaining a valid score on the ELPA and for

whom districts chose to submit an additional commercial test score.

In addition to the obvious consideration that the ELPA is a computer based test while the

commercial tests are based on paper and structured interviews, there are several caveats

that should be considered when examining these data. First, given the methodology

described above, this sample is unlikely to be random. Second, these commercial tests

may not have be have been administered at the same time as the ELPA. Finally and most

importantly, the commercial tests:

• Do not assess all of the required domains of reading, speaking, listening and writing.

• Are not based on Oregon eligible content.

• Use a different set of proficiency standards.

For these reasons, we would expect differences between the identification of proficient

students based on the ELPA versus the other commercial tests.


Comparison of ELPA to Woodcock Munoz, IPT and LAS

Not Proficient on the Woodcock-Muñoz Proficient on the Woodcock-Muñoz Consistency

Not Proficient on ELPA Proficient on ELPA Not Proficient on ELPA Proficient on ELPA

N % N % N % N % %

K-1 2248 92.8 55 2.3 88 3.6 31 1.3 94.1

2-3 2229 92.7 132 5.5 22 0.9 21 0.9 93.6

4-5 1837 88.1 233 11.2 4 0.2 10 0.5 88.6

6-8 1925 81.1 425 17.9 7 0.3 18 0.8 81.9

9-12 1650 85.4 266 13.8 8 0.4 9 0.5 85.9

Not Proficient on the IPT Proficient on the IPT Consistency


K-1 689 89.5 67 8.7 10 1.3 4 0.5 90.0

2-3 592 85.7 53 7.7 28 4.1 18 2.6 88.3

4-5 523 83.8 53 8.5 36 5.8 12 1.9 85.7

6-8 918 80.9 100 8.8 74 6.5 43 3.8 84.7

9-12 664 87.7 18 2.4 55 7.3 20 2.6 90.3

Not Proficient on the LAS Proficient on the LAS Consistency


K-1 489 90.6 42 7.8 5 0.9 4 0.7 91.3

2-3 427 80.6 70 13.2 11 2.1 22 4.2 84.8

4-5 494 67.0 123 16.7 59 8.0 61 8.3 75.3

6-8 428 63.2 156 23.0 41 6.1 52 7.7 70.9

9-12 405 78.5 34 6.6 51 9.9 26 5.0 83.5

Section 5.0 – Reliability

5.1 - Standard Error of Measure

Reliability refers to the consistency, stability, and accuracy expected from test scores.

Reliability is best handled by showing the Standard Error of Measure-ment (SEM)

because the SEM is expressed on the same scale as the student scores. The Standard

Error of Measurement (SEM) curve evaluates the precision of the measure at various

points along the score distribution (see Appendix G). When Item Response theory is

employed, the standard error changes, conditioned on the relative position of the score in

the ability or person score distribution.


When I interpreting the graphs, the x axis describes various levels of person

performance, while y axis provides the different levels of standard error give the ability

level. Typically, scores occurring in the middle of the distribution have smaller standard

errors since there is more item information targeting students around the middle ranges of

performance. For this reason, the graph is often observed as an inverted U-shape curve.

The SEM of the ELPA tests compares favorably with the SEM of the state test when it

ranges between 2 to 4 RITS in magnitude."

5.2 - Item Analysis Methods for the ELPA 5.2.1 – Purpose of Item Analysis A key factor in the life of an item (see p. 14) occurs when an item is proven reliable

enough to be promoted from a field test item to an operational item. What indicators of

item quality determine whether this occurs? The item analysis described below focuses

on total score correlation, distracters, level of difficulty, and item p-value (item mean

divided by the maximum number of points. It was performed for the ELPA in 2008

CART Technical Report http://www.ode.state.or.us/search/page/?id=1561

An essential purpose of an Item Analysis (IA) is to flag items with questionable statistics.

CART conducts all item analyses in two stages. Following data reconciliation, the first

stage involves a thorough IA and key validation of only the operational (scored) ELPA

items. The second stage adds in the pretest items. In 2008, 496 operational items and

218 pretest items were analyzed.

CITAN flags items for any of five reasons: A- item has a negative item-total score

correlation, indicating a possible miskey; B=SR items have one or more incorrect

distracters having a positive pt. biserial correlation; C= items are very difficult, where the

item p-value (item mean divided by the maximum number of points) is less than 0.3; and

D=items that are very easy; where the item p-value (item mean divided by the maximum

number of points) is greater than 0.95. The table below provides a cross-tabulation of the

item flags for the 496 operational items.

Item

Types

Flags

Total A,B,C B B,C C D None

CZ 0 0 0 2 0 39 41

EI 0 0 0 2 0 16 18

ER 0 0 0 2 0 22 24

MC 1 13 3 12 5 271 305

PC 0 2 1 0 0 56 59

S2 0 0 0 0 1 11 12

WB 0 0 0 0 0 0 0



Total 1 15 4 23 6 447 496

5.2.2 – Summary of Item Analysis Results Despite some flagged items in Figure 2, nothing in the operational IA suggested serious

or wide-spread problems that might jeopardize the subsequent Item Response Theory

(IRT) calibrations. Furthermore, Avant Assessment staff re-verified all operational item

answer keys. As a result, the calibration and linking steps were completed.

The calibration worked out well (the magnitude of misfit was very minor). Based on the

success of the BVS-anchored calibration of the 496 operational items, it was decided to

use those parameter estimates in a final, anchored, joint calibration of the 496 items and

the 218 pretest items (see http://www.ode.state.or.us/search/page/?id=1561 – CART Technical

Report for details on the 2008 Item Analysis and Calibration). See Appendix H for the

Spring 2007 ELPA Operational Item Analysis Summary.

5.3 - Strand Reliability

5.3.1 – Reliability Though t Number of Items Strand reliability is ensured if there are enough items per strand. ELPA has adequate

items in each domain. There are at least 8 items for each domain. Domains in the locator

block have more items. (see Appendix I for item distribution by domain, form and

function and grade band).

5.3.2 – Reliability Through Standard Setting and Precision at the Cut Scores Precision at the cut scores is necessary so that students on the borderline can be correctly

classified The process for re-establishing the achievement standards on the statewide

assessments in reading, mathematics, science and for the English Language Proficiency

Assessments (ELPA) consists of three key phases:

Phase One - Establish a broadly representative panel for each grade and subject

area;

Phase Two - Determine "cut scores" through established process;

Phase Three – Conduct field review and public input.

On November 5-6, 2007, staff members from the Oregon Department of Education

(ODE) and CTB/McGraw-Hill worked in collaboration to perform standard setting on the

English Language Proficiency Assessments (see CTB Standard Setting Report -

AchievementScores at link from www.ode.state.or.us/go ELPA). Educators from across

the state of Oregon with specialization in English-language development convened to

study the ELPA, consider the English language skills required of students in each

proficiency level, and discuss these expectations with their colleagues.



The purpose of the standard setting was to recommend cut scores on the ELPA to divide

students into five proficiency levels: Beginning, Early Intermediate, Intermediate, Early

Advanced and Advanced. The Bookmark Standard Setting Procedure (BSSP) was used to

set the proficiency standards for the ELPA. Participants recommended a well-articulated

set of proficiency standards at six grades: Kindergarten and Grades 1, 2, 5, 7, and 11.

Proficiency standards for the remaining grades were statistically interpolated based on

participants‟ recommendations. The ODE divided participants into five grade groups,

each with approximately 3 participants. Participants were divided into assigned grade

groups that were balanced in terms of relevant demographic characteristics (e.g., gender,

geographic location). The standard setting consisted of training, orientation, three rounds

of judgments, an articulation discussion, and proficiency level description writing.

Following the standard setting, ODE made adjustments to the recommended cut

scores. These adjustments were made to accommodate the cut scores to their impact on

students, that is, so that a more appropriate distribution of students by proficiency level

could be achieved based on 2006-07 performance data.

On Thursday, March 13, 2008, the State Board of Education voted to adopt changes to

the Performance Standards for the English Language Proficiency Assessment.

Achievement Standards (Cut Scores) for the

English Language Proficiency Standards Adopted March 13, 2008

Grade

Level

Early

Intermediate

Intermediate Early

Advanced

Advanced

(Proficient)

K 482 492 498 507

1 492 507 514 523

2 495 508 514 523

3 501 514 521 529

4 497 508 514 521

5 497 508 516 523

6 497 506 515 522

7 497 507 517 524

8 499 508 518 526

9 491 501 515 526

10 493 501 516 527

11 494 501 515 528

12 498 504 516 530


Section 6.0 - Fairness and Accessibility

Fairness concerns occur throughout testing. Standardization itself is intended to ensure

that no examinees are given advantages or impediments through administration practices.

Nevertheless fairness issues arise simply because uniform conditions trigger different

levels of comfort in examinees. Although absolute fairness cannot be guaranteed, sources

of bias should be investigated and controlled to the extent practicable. There are several

components of fairness and accessibility.

6.1 – Test Administration All test items, test materials, and student-level testing information, are secure documents

and must be appropriately handled. Secure handling must protect the integrity, validity,

and confidentiality of assessment questions, prompts, and student results. Any deviation

in test administration must be reported to ensure the validity of the assessment results.

Mishandling of test administration puts student information at risk and disadvantages the

student as tests that are improperly administered may be invalidated. Failure to honor

security severely jeopardizes district and state accountability requirements and the

accuracy of student data.

6.1.1 - Testing Requirements to Produce Valid Test Results Requirements for ethical testing that results in valid test results are mandated to ensure

that each Oregon student has a fair opportunity to demonstrate his/her abilities and that

school districts are fairly rated for state accountability. Requirements include but are

not limited to:

All Oregon Statewide Assessments must be administered by a trained Test

Administrator (TA).

TAs must receive annual training from the District Test Coordinator DTC) or

School Test Coordinator (STC) on the test administration policies and procedures

included in this Test Administration Manual. Specifically, TAs must receive

training on the components of the Oregon assessment system, requirements for

valid test administration, testing options, and requirements for both standard

administration and modified administration.

All TAs must read and understand Parts I – VIII and Appendices A, D, E, Q, R,

and T of the Test Administration Manual, as well as all appendices pertaining to

those specific assessments which the TA will be administering.

Each TA must receive security training and have a signed Test Administrator

Assurance of Test Security form valid for the current school year, prior to

administering any assessments. TAs must renew this form annually upon

completion of the security training.

STCs and DTCs must receive security training and have a signed School Test

Coordinator or District Test Coordinator Assurance of Test Security form on

file at the District Office, valid for the current school year. STCs and DTCs must

renew this form annually upon completion of the security training.

Any person (office staff, volunteers, computer lab support staff, substitutes, etc.)

who has access to or participates in the handling of test materials but who does


NOT administer the test must sign a Non-Administrator Assurance of Test

Security form. This signed form must be kept on file at the District Office, valid

for the current school year.

All test administrators are trained in how to administer the ELPA, this includes

paraprofessionals. In addition to properly configuring computer systems to run the ELPA

application, school staff ensures that students have the skills necessary to interact with the

application (Table 1, p. 14 describes the skills students will need in different grade bands

to receive a valid score on the ELPA) . Websites and computer programs offering

opportunities for students to practice or to demonstrate these skills are included among

the training links described below.

Training materials are available from the ELPA home page (www.oregonelp.net). These

training materials include a document illustrating the different types of items used

throughout the ELPA (Item Guide), training regarding technologies and content of the

ELPA (Training Guide), and several videos describing the technology of ELPA and

information around access to ELPA (Training Videos).

6.1.2 - Security of the Test Environment The test environment refers to all aspects of the testing situation while students are

testing. The test environment includes what a student can see, hear, or access. During

Online testing, the test environment also includes the electronic resources to which the

student has access.

Requirements of a secure test environment include but are not limited to:

A quiet environment, void of talking or other distractions that might interfere with

a student‟s ability to concentrate or compromise the testing situation. Read aloud

accommodations for one student must not interfere with other students‟ test-

taking environment.

Visual barriers or adequate spacing between students‟ seating.

Student access to and use of only allowable resources.

Observation of any assessment items by only the student taking an assessment

and, to a limited extent, the trained TA.

No electronic devices that allow communication among students or the

photographing of test content.

Administration of online testing only through the Secure Browser. Test

administrators double check the student name and school identification carefully

to avoid errors.

Students are instructed to log in and work independently, not offering help to

other students.

Directions are the only portion of ELPA that may be translated.

.

6.1.3 - Testing Improprieties Adult and student-initiated test improprieties are behaviors prohibited during test

administration because they can give students an unfair advantage or otherwise

compromise the State‟s standard test administration. Adults (TAs) may not assist or


interfere with student testing. Adults must carefully adhere to all test administration

procedures to avoid test improprieties (see p. 12, Test Administration Manual for list) A

list of student-initiated test improprieties that have been reported to ODE in previous

school years is provided in a table on p. 13 of the Test Administration Manual. It is not

intended to be inclusive.

6.1.4 - Responding to Student Questions Helping students violates the integrity and validity of the test. If a student asks for help

remind the student to “do your best,” but do not initiate assistance or give any indication

that you can help. Use caution: check your verbal and nonverbal cues to ensure that the

student does not receive any inappropriate coaching that may impact a student‟s response

to a test item.

6.1.5 – Testing Irregularities Testing irregularities are unusual circumstances that impact a group of students who are

testing and may potentially affect student performance on the test or interpretation of

those scores. Examples of testing irregularities include major disruptions to a test, such as

a fire drill, a school-wide power outage, or a force majeure (e.g. a natural disaster).

During an event such as a fire drill or other evacuation, safety is the top priority. If the

TA can safely access the TA workstation before evacuating the testing environment, then

the TA should pause all tests before evacuating. If the TA cannot safely access the TA

workstation, then the TA should evacuate and secure the testing environment consistent

with the school‟s evacuation policy. Upon returning to the testing environment, the TA

should pause all tests while students return to their stations. Testing irregularities also

include the administration of Test Accommodations to a group of students or to an entire

class without an investigation of individual student need. As with testing improprieties,

all testing irregularities should be reported immediately to your DTC. The DTC will then

immediately report the irregularity to ODE within one business day.

6.2 – Sensitivity Panel Review Fairness and accessibility is also addressed by the sensitivity panel. They ensure that

items

present racial, ethnic, and cultural groups in a positive light.

do not contain controversial, offensive, or potentially upsetting content.

avoid content familiar only to specific groups of students because of race or

ethnicity, class, or geographic location.

aid in the elimination of stereotypes.

avoid words or phrases that have multiple meanings.


6.3 – Differential Item Analysis Differential Item Analyses were conducted using the WINSTEPS IRT software. Two

analyses were completed. The first involved a simple standardized difference in Rasch

model difficulty parameters calculated using the Reference Group (Males) and Focal

Group (females).

Several problem areas that require further substantive review are the First Grade Reading

PC items in which nine of the 22 items were found to be significantly different, seven

favoring females, two favoring males. Eight of the 25 Fourth Grade Listening MC items

and nine of the 31 Ninth Grade Listening items were found to be statistically significant,

all favoring males. Other notable problem areas were Fourth Grade Reading MC items

(seven out of 29 were significant) and Sixth Grade Reading MC items (ten out of 33 were

significant).

The second analysis that was conducted via WINSTEPS calibration was a statistic

equivalent of the Mantel Haenszel DIF statistic (Holland & Thayer, 1986) called

MH prox. Linacre and Wright (1989) converted the MHp into ETS‟DIF categories that

can be used to design and maintain tests equivalent for groups of subjects on which the

original test data are calibrated. The first category, the A-type items, displays negligible

DIF and can be used freely. The second category, the B-type items, display slight to

moderate DIF, and if possible should be replaced by equivalent items with smaller MHp

absolute values. The third category, the C-type items, display moderate to large amount

of DIF and should be selected only if it is essential to meet the test specifications. Go to

http://www.ode.state.or.us/search/page/?id=1561 – CART Technical Report (Appendix

1) to see the number of A, B, and C items for each item type by grade level. In all, there

were 451 A-type items, 34 B-type items and only 11 C-type items.



APPENDIX


APPENDIX A

ESL PROGRAM FUNDING AND EVALUATION – STATE LAW

Oregon Administrative Rule #581-023-0100

Eligibility Criteria for Student Weighting for Purposes of State School Fund Distribution

(1) The following definitions apply to this rule:

(a) "Average Daily Membership" or "ADM" means the membership defined in ORS

327.006(3) and OAR 581-023-0006;

(b) "Days in Session" means number of days of instruction during which students are

under the guidance and direction of teachers;

(c) "Department" means the Oregon Department of Education;

(d) "Language Minority Student" means:

(A) Individuals whose native language is not English; or

(B) Individuals who come from environments where a language other than English is

dominant; or

(C) Individuals who are Native Americans or Native Alaskans and who come from

environments where a language other than English has had a significant impact on their

level of English proficiency.

(e) "Superintendent" means the State Superintendent of Public Instruction;

(f) "Weighted Average Daily Membership" or "ADMw" means the ADM plus an

additional amount or weight as described in ORS 327.013, subject to the limitations

imposed by Section (4)(a), Chapter 780, Oregon Laws 1991.

(2) Pursuant to ORS 327.013(7)(a)(A) the resident school districts shall receive one

additional ADM or "weight" for children with disabilities who comprise up to 11 percent

of the district's ADM. The Department will calculate the percentage of children with

disabilities on the basis of resident counts of students eligible for weighting from the

Special Education Child Count and the resident ADM:

(a) To be eligible, a student must be in the ADM of the school district and meet the

following criteria:

(A) The student must be eligible for special education having been evaluated as having

one of the following conditions: Mental retardation, hearing impairment including

difficulty in hearing and deafness, speech or language impairment, visual impairment,

serious emotional disturbance, orthopedic or other health impairment, autism, traumatic

brain injury or specific learning disabilities; and

(B) The student must be between the ages 5 and 21 and generate federal funding for

purposes of special education.

(b) Districts may apply for an exception to the 11 percent ceiling. Applications are to be

made on forms provided by the Department. Upon receipt of the application the

Superintendent may conduct a complete review of a district's special education records.


The Superintendent shall develop a process for conducting such reviews which will

include the following elements:

(A) Comparison of district claims with those submitted by other districts;

(B) Participation of school district and education service district staff in the review. No

district staff shall be asked to review claims submitted by the employing district.

(c) After considering the recommendations of the review committee the Superintendent

may allow all or a portion of the requested added weighted ADM over 11 percent;

(d) The Superintendent shall make the determination of approval for funding above the

11 percent limitation. Such determination may be appealed for review by the State Board

of Education according to a process established by the Superintendent;

(e) If the review indicates that a district has claimed ineligible special education students,

the Superintendent also shall withhold the related federal funds from the district, pursuant

to OAR 581-015-0049; OAR #581-023-0100

(f) A district must submit an application for an exception to the 11 percent ceiling no later

than six months after the close of the year for which payment is being sought. Payments

for allowable exceptions shall be made in the following school year as part of the May 15

payment.

(3) Pursuant to ORS 336.640(4), the resident school districts shall receive an additional

1.0 times the ADM of all eligible pregnant and parenting students:

(a) To be eligible, a student must be in the ADM of the resident school district and meet

the following criteria:

(A) The student must be identified through systematic procedures established by the

district;

(B) The student must be enrolled and receiving services described in ORS 336.640(1)(b)

and (d);

(C) The student must have an individualized written plan for such services which

identifies the specific services, their providers, and funding resources.

(b) Students counted in section (2) of this rule are not eligible under this section.

(4) Pursuant to ORS 327.013(7)(a)(B), the resident school districts shall receive an

additional .5 times the ADM of all eligible students enrolled in an English as a Second

Language program. To be eligible, a student must be in the ADM of the school district in

grades K through 12 and be a language minority student attending English as a Second

Language (ESL) classes in a program which meets basic U.S. Department of Education,

Office of Civil Rights guidelines. These guidelines provide for:

(a) A systematic procedure for identifying students who may need ESL classes, and for

assessing their language acquisition and academic needs;

(b) A planned program for ESL and academic development, using instructional

methodologies recognized as effective with language minority students;

(c) Instruction by credentialed staff and trained in instructional strategies that are

effective with second language learners and language minority students, or by tutors

supervised by credentialed staff trained in instructional strategies that are effective with

second language learners and language minority students;

(d) Adequate equipment and instructional materials;

(e) Evaluation of program effectiveness in preparing ESL students for academic success

in the mainstream curriculum.

(5) Students served in the following programs are not eligible for weighting:


(a) Programs funded fully by state funds, programs funded fully by federal funds, and

programs funded fully by a combination of state and federal funds;

(b) Private and parochial schools unless placed by the resident district in a registered

private alternative program or state approved special education program;

(c) Instruction by a private tutor or parent under ORS 339.035.

(6) No later than January 15 of each year, the designated official for a school district shall

submit to the Department a report of students eligible under sections (3) and (4) of this

rule. The report shall include the following data for the period October 1 through

December 31:

(a) Total days in session for the quarter ending December 31 for the school or program

reporting;

(b) Total days membership for the quarter ending December 31 for all students served in

eligible programs. OAR #581-023-0100

(7) Not later than July 10 of each year, the designated official for a school district shall

submit to the

Department a final report of students eligible under sections (3) and (4) of this rule. The

report shall include the following:

(a) Total days in session during the regular school year for the school or program

reporting;

(b) Name of each student;

(c) Total days membership beginning with the first day of instruction for each student and

ending with the date of withdrawal from the eligible program or the end of the regular

school year, whichever comes first;

(d) Grade level of the student.

(8) School districts must retain supporting documentation for a minimum of two years.

(9) The Department shall perform periodic reviews of the eligibility of students reported

for additional weighting. Any funds provided for ineligible students shall be recovered by

the Department for redistribution to school districts.

(10) This rule is effective beginning with the 1993-94 school year.

Stat. Auth.: ORS 327.013 & ORS 327.125

Stats. Implemented: ORS 327.013 & ORS 327.125

Hist.: EB 31-1992, f. & cert. ef. 10-14-92; EB 6-1994, f. & cert. ef. 4-29-94


APPENDIX B Executive Summary of Dimensionality Analysis LINK :

http://www.ode.state.or.us/teachlearn/testing/dev/techaspects/elpa/executive-summary-

cart-oregonelpa-operitemanalysis-aug07.pdf

http://www.ode.state.or.us/teachlearn/testing/dev/techaspects/elpa/executive-summary-cart-oregonelpa-operitemanalysis-aug07.pdf

http://www.ode.state.or.us/teachlearn/testing/dev/techaspects/elpa/executive-summary-cart-oregonelpa-operitemanalysis-aug07.pdf


APPENDIX C


APPENDIX D

LANGUAGE FUNCTIONS and FORMS

The English Language Proficiency Standards are written as pathways to the Oregon English Language Arts standards. The ELP

Standards are designed to supplement the ELA standards to ensure that LEP students develop proficiency in both the English language

and the concepts and skills contained in the ELA standards. They can be found on the web at

www.ode.state.or.us/teachlearn/standards/elp/files/all.doc.

This section contains language functions and forms that native English speakers acquire mostly before entering school or naturally at

home. These language functions and forms, however, need to be explicitly taught to English language learners (ELLs). They may be

taught to ELLs at all grade levels, and as the need and context arises.

Forms of a language deal with the internal grammatical structure of words. The relationship between boy and boys, for example, and

the relationship (irregular) between man and men would be forms of a language.

A language function refers to the purpose for which speech or writing is being used.

In speech these include:

giving instructions

introducing ourselves

making requests

In academic writing we use a range of specific functions in order to communicate ideas clearly.

These include:

describing processes

comparing or contrasting things or ideas, and

classifying objects or ideas

The contrast between form and function in language can be illustrated through a simple medical analogy. If doctors studied only a

limited portion of the human system, such as anatomical form, they would be unable to adequately address their patient‟s needs. To

fully treat their patients, physicians must understand the purposes of the human body and the relationships between organs, cells, and

genes (Pozzi, 2004). Similarly, ELLs need to understand both the form (structure) and the function (purpose) of the English language

in order to reach higher levels of proficiency.

Pozzi, D.C. (2004). Forms and functions in language: Morphology, syntax. Retrieved March 10, 2005, from University of Houston, College of Education Web site: http://www.viking.coe.uh.edu/grn11.intr/intr.0.1.2.htm

http://www.viking.coe.uh.edu/grn11.intr/intr.0.1.2.htm


Language Functions and Examples of Forms

Language Function Examples of Language Forms Expressing needs and likes

Indirect/ direct object, subject/ verb agreement, pronouns

Describing people, places, and things

Nouns, pronouns, adjectives

Describing spatial and temporal relations

Prepositional phrases

Describing actions

Present progressive, adverbs

Retelling/relating past events

Past tense verbs, perfect aspect (present and past)

Making predictions

Verbs: future tense, conditional mode

Asking Informational Questions

Verbs and verb phrases in questions

Asking Clarifying Questions

Questions with increasing specificity

Expressing and Supporting Opinions

Sentence structure, modals (will, can, may, shall)

Comparing

Adjectives and conjunctions, comparatives, superlatives, adverbs

Contrasting

Comparative adjectives

Summarizing

Increasingly complex sentences with increasingly specific

vocabulary

Persuading

Verb forms

Literary Analysis

Sentence structure, specific vocabulary

Cause and Effect

Verb forms

Drawing Conclusions

Comparative adjective


Defining

Nouns, pronouns, and adjectives

Explaining

Verb forms, declarative sentences, complex sentences, adverbs of

manner

Generalizing

Abstract nouns, verb forms, nominalizations

Evaluating

Complex sentences; increasing specificity of nouns, verbs, and

adjectives

Interpreting

Language of propaganda, complex sentences, nominalizations

Sequencing

Adverbs of time, relative clauses, subordinate conjunctions

Hypothesizing and speculating

Modals (would, could, might), compound tenses (would have

been)


ACQUISITION OF LANGUAGE FUNCTIONS AND GRAMMATICAL FORMS

1. Language Function: Expressing Needs and Likes

BEGINNING EARLY

INTERMEDIATE

INTERMEDIATE EARLY

ADVANCED

ADVANCED TARGET

FORMS:

Students demonstrate minimal

comprehension of general

meaning; gain familiarity with the

sounds, rhythms and patterns of

English. Early stages show no

verbal responses while in later

stages one or two word responses

are expected. Students respond in

single words and phrases, which

may include subject or a predicate.

Many speech errors are observed.

(bear, brown)

Students demonstrate

increased comprehension

of general meaning and

some specific meaning; use

routine expressions

independently and respond

using phrases and simple

sentences, which include a

subject and predicate.

Students show basic errors

in speech. (The bear is

brown. He is eating.)

Students demonstrate good

comprehension of general

meaning; increased

comprehension of specific

meaning; responds in more

complex sentences, with

more detail using newly

acquired vocabulary to

experiment and form

messages. (The brown

bear lived with his family

in the forest.)

Students demonstrate

consistent comprehension

of general meaning; good

understanding of implied

meaning; sustain

conversation, respond with

detail in compound and

complex sentences;

actively participate using

more extensive vocabulary,

use standard grammar with

few random errors. (Can

bears live in the forest if

they find food there?)

Students‟ comprehension

of general and implied

meaning, including

idiomatic and figurative

language. Students

initiate and negotiate

using appropriate

discourse, varied

grammatical structures

and vocabulary; use of

conventions for formal

and informal use.

(Would you like me to

bring pictures of the

bear that I saw last

summer?)

One or two-word answers (nouns or yes/no) to questions about preferences, (e.g., two, apples, or tree)

Simple sentences with subject/verb/object. “I like/don’t like—(object)—.” I need a /some — (object)—.”

Elaborated sentences with

subject/verb/object

Sentences with

subject/verb/object and

dependent clause

Complex sentences,

perhaps with tags or

embedded questions

Sentence Structure:

The basic sentence

structures that we use

to express needs and

likes are foundations

to the more complex

sentence structure we

use for academic

purposes.


ALL GRADES

2. Language Function: Describing People, Places and Things

BEGINNING EARLY

INTERMEDIATE

INTERMEDIATE EARLY

ADVANCED

ADVANCED TARGET FORMS

Common nouns and adjectives Simple sentences with the

verb to be, using common

nouns and adjectives. The

(my, her) ______ is/are

_______. A (it) has/have

_________.

Elaborated sentences

has/have/had or

is/are/were with nouns

and adjectives

Compound sentences with

more specific vocabulary

(nouns, adjectives)

Complex sentences with

more specific vocabulary

(nouns, adjectives)

Nouns Pronouns and

Adjectives: Students

learn to understand and

generate oral and written

language with nouns,

pronouns and adjectives.

3. Language Function: Describing Location

BEGINNING EARLY

INTERMEDIATE

INTERMEDIATE EARLY

ADVANCED


Demonstrated comprehension of

total physical response

commands, including prepositions

(e.g., on, off, in, out, inside,

outside)

Simple sentences with

prepositional phrases

(e.g., next to, beside,

between, in front of, in

back of, behind, on the

left/right, in the middle of,

above, below, under)

May include two

prepositional phrases with

more difficult

prepositions (e.g., in front

of, behind, next to)


phrases using prepositions

(e.g., beneath, within)


phrases using prepositions

(e.g., beneath, within)

Prepositional Phrases:

Students learn to

understand and generate

oral and written

language with

prepositional phrases.

4. Language Function: Describing Action

BEGINNING EARLY

INTERMEDIATE

INTERMEDIATE EARLY

ADVANCED


Demonstrate comprehension

(perform or describe actions)

Present progressive Variety of verb tenses and

descriptive adverbs

Adverb clauses telling

how, where, or when

Adverb clauses telling

how, where, or when.

Present Progressive,

Adverbs: Students learn

to understand and


language skills with

present progressive and

adverbs.


5. Language Function: Retelling/Relating Past Events (Kinder – General Understanding

BEGINNING EARLY

INTERMEDIATE

INTERMEDIATE EARLY

ADVANCED


Single words in response to past

tense question


past progressive __

(pronoun) ___ was/were

_____-ing.


regular and irregular past

tense verbs

“Yesterday/Last ____/On

___day (pronoun) ____ -

ed (prep. phrase or other

direct object).” First ___

and then __ . Finally

Compound sentences

using past tense and

adverb

Present progressive/past

perfect tense with

specialized prepositions

_____ have/has been

____-ing since/for ____.

Past Tense Verbs:

Students learn to


oral and written

language with past tense

verbs.

6. Language Function: Making Predictions

BEGINNING EARLY

INTERMEDIATE

INTERMEDIATE EARLY

ADVANCED


In response to questions, may

respond by circling, pointing, and

so on, or answer with one or two

words

The _____ is/are going to

______.

The ________ will

________.

Conditional (could, might)

mood in complex

sentences

Conditional (could,

might) mood in complex

sentences

Verbs: Future Tense,

Conditional Mood:

Students learn to


oral and written

language with future

tense verbs and

conditional mood.

7. Language Function: Asking Informal Questions

BEGINNING EARLY

INTERMEDIATE

INTERMEDIATE EARLY

ADVANCED


Simple questions about familiar

or concrete subjects

Present or present

progressive tense

questions with to be

Who, what, where, why

questions with do or did

Detailed questions with

who, what, when, where,

why and how

Detailed questions with

expanded verb phrase

Verbs and Verb Phrases

in Questions: Students

learn to understand and


language with verbs and

verb phrases in

questions.


9. Language Function: Expressing and Supporting Opinions

BEGINNING EARLY

INTERMEDIATE

INTERMEDIATE EARLY

ADVANCED


I like/don‟t like ______

(concrete topics).

I think/agree with (don‟t)

______.

I think/agree with (don‟t)

____ because _____.

In my opinion ____ should

____ because/so ______.

Complex sentences using

modals and clauses Sentence Structure

10. Language Function: Compacting

BEGINNING EARLY

INTERMEDIATE

INTERMEDIATE EARLY

ADVANCED


Single words or phrases in

response to concrete comparison

questions

Sentences with

subject/verb/adjective

showing similarities and

differences

Subject/verb/adjective,

but _____.

Adjective with –er or –est

Varied sentence structures

with specific comparative

adjectives and phrases

Complex sentence

structure with specific

comparative language

Adjectives and

Conjunctions

11. Language Function: Contrasting

BEGINNING EARLY

INTERMEDIATE

INTERMEDIATE EARLY

ADVANCED


Sentences with


showing similarities and

differences

Subject/verb/adjective

like ____ but


Subject/verb/adjective,

both

subject/verb, but

Approximately used

idiomatic phrases and

contrasting words (e.g.,

whereas, and in contrast)

Comparative Adjectives

8. Language Function: Asking Clarifying Questions

BEGINNING EARLY

INTERMEDIATE

INTERMEDIATE EARLY

ADVANCED


Not Applicable Formula questions

clarifying classroom

procedures, rules and

routines

Formula questions

clarifying classroom

procedures, rules and

routines

A variety of fairly specific

questions clarifying

procedures or content

Varied, specific

questions clarifying

procedures or content

Questions with Increasing

Specificity


12. Language Function: Summarizing

BEGINNING EARLY

INTERMEDIATE

INTERMEDIATE EARLY

ADVANCED


Simple sentences with key

nouns, adjectives, and

verbs

Compound sentences

with and/but

Conjunctions that

summarize (to conclude,

indeed, in summary, in

short)

Conjunctions that

summarize (indeed,

therefore, consequently)

Increasingly Complex

Sentences with

Increasingly Specific

Vocabulary

13. Language Function: Persuading

BEGINNING EARLY

INTERMEDIATE

INTERMEDIATE EARLY

ADVANCED


Imperative verb forms Complex sentences with

future and conditional


varied verb forms and tag

questions, idiomatic

expressions or embedded

clauses

Verb Forms

14. Language Function: Literary Analysis

BEGINNING EARLY

INTERMEDIATE

INTERMEDIATE EARLY

ADVANCED


Single words for character and

setting

Simple sentences

(subject/verb/adjective)

(subject/verb/object)

Compound sentences

with and, because,

before, after

Descriptive language in

more complex sentences

Specific descriptive

language in complex

sentences

Sentence Structure and

Specific Vocabulary

15. Language Function: Cause and Effect Relationship

BEGINNING EARLY

INTERMEDIATE

INTERMEDIATE EARLY

ADVANCED


Answer cause and effect

question with a simple

response

Descriptive sentences

with past tense verbs


past tense verbs

Conditional: If ___

had/hadn‟t _____. _____

would/wouldn‟t have

_____.

Verb Forms


16, Language Function: Draw Conclusions

BEGINNING EARLY

INTERMEDIATE

INTERMEDIATE EARLY

ADVANCED



with past tense verbs in

simple sentences


with conjunctions such as

although, because, that


with idiomatic phrases

and passive voice

Comparative Adjectives

17. Language Function: Defining

BEGINNING EARLY

INTERMEDIATE

INTERMEDIATE EARLY

ADVANCED


Patterned responses: A table is

furniture/ A boy is a person.

Simple terms, aspects of

concrete and familiar

objects, regular nouns

singular and plural,

personal pronouns,

present tense, simple

sentences

Connected text including

irregular nouns, personal,

possessive pronouns and

adjectives with some

irregular past tense verbs

Concrete and abstract

topics using irregular

nouns, singular and plural,

personal and possessive

pronouns and adjectives

Clear, well-structured,

detailed language on

complex subjects,

showing controlled use of

nouns, pronouns,

adjectives

Nouns, Abstract Nouns,

Pronouns, Adjectives:

Students learn to define

concrete and abstract

objects/concepts with

correct nouns, pronouns,

and adjectives

18. Language Function: Explaining

BEGINNING EARLY

INTERMEDIATE

INTERMEDIATE EARLY

ADVANCED


Main points in familiar

idea or problem with

some precision using

simple indicative verb

forms in simple

declarative sentences

(Large oaks grew in the

park/ The length of the

room is 40 feet.)

Explain simple,

straightforward

information of immediate

relevance, using regular

verbs and adverbs of

manner in declarative

sentences and compound

sentences (Maria planted

the petunia seeds

carefully.)

Get across important

points using declarative,

compound and complex

sentences, regular and

irregular verb forms

Complex: As I came home,

I stopped at the store.

Compound: The children

who came in early had

refreshments, but those

who came late had none.

Get across which point

he/she feels is most

important using regular

and irregular verb forms,

adverbs of manner and

compound-complex

sentences.

Adverbs of manner: The

children who sang loudly

got a cookie, but those

who didn’t sing had none.

Verb Forms- Indicative

verb (makes a statement

of fact), Declarative

Sentences, Complex

Sentences, Adverbs of

Manner:

Students learn to develop

and use explanations

using appropriate verb

forms, declarative and

complex sentences and

adverbs of manner.

19. Language Function: Generalizing

BEGINNING EARLY

INTERMEDIATE

INTERMEDIATE EARLY

ADVANCED


Imperative mode:

expresses command

Indicative mode: makes a

statement of fact (The

Subjunctive mode:

expressing a condition

Nouns – Common,

Collective and Abstract


(Take me home. Stay

there.)

Collective nouns name, as

a unit, the members of a

group (herd, class, jury,

congregation).

temperature is low.)

Abstract nouns: name

things or ideas that people

cannot touch or handle

(beauty, honesty, comfort,

love).

contrary to fact or

expressing a doubt (If

only he were here.)

Nouns; Verb Forms:

Students learn to develop

and use generalizations

using abstract nouns,

verb forms and

nominalizations.

20. Language Function: Evaluating

BEGINNING EARLY

INTERMEDIATE

INTERMEDIATE EARLY

ADVANCED


Adjectives that point out

particular objects (that wagon,

those toys, each person, every

girl)

Number adjectives: (two men, ten

ships, the third time, the ninth

boy)

Adjectives used to limit:

(few horses, much snow,

little rain)

Evaluate simple direct

exchange of limited

information on familiar

and routine matters using

simple verbs and

adjectives.

Correlative conjunctions

are used in pairs: both –

and; not only – but also

(Neither the teacher nor

the students could solve

the problem.)

Qualify opinions and

statements precisely in

relation to degrees of

certainty/uncertainty,

belief/doubt, likelihood,

etc.

Convey finer, precise

shades of meaning by

using, with reasonable

accuracy, a wide range of

qualifying devices, such

as adverbs that express

degree (This class is too

hard.); clauses expressing

limitations (This is a

school van, but it is only

used for sports.); and

complex sentences

Complex Sentences;

Increasing Specificity of

Nouns, Verbs, and

Adjectives; Correlative

Conjunctions:

Students learn to

understand and use

complex sentences using

very specific nouns,

verbs and adjectives.

21. Language Function: Interpreting

BEGINNING EARLY

INTERMEDIATE

INTERMEDIATE EARLY

ADVANCED


Interpret a single phrase at a time,

picking up familiar names, words,

and basic phrases (D’Onofrio

chocolates are the best.)

Interpret short, simple

texts containing the

highest frequency

vocabulary

Interpret short, simple

texts on familiar matters

of a concrete type, which

consist of high frequency

everyday or school-

related language

Interpret a wide range of

long and complex texts,

appreciating subtle

distinctions of style and

implicit as well as explicit

meaning

Interpret critically

virtually all forms of the

written language

including abstract,

structurally complex, or

highly colloquial non-

literary writings

Language of

Propaganda, Complex

Sentences:

Students learn to identify

and interpret the

language of propaganda

and use complex

sentences.

22. Language Function: Sequencing

BEGINNING EARLY

INTERMEDIATE

INTERMEDIATE EARLY

ADVANCED


Subject

(The girl who was sick went

home.)

Direct object

(The story that I read was

long.)

Prepositional object

(I found the book that

John was talking about.)

Possessive

(I know the woman whose

father is visiting.)

Object of comparison

(The person whom Susan

is taller than is Mary.)

Adverbs of time,

Relative clauses,

Subordinate


Natural sequencing

(I hit him and he fell over.)

Indirect object

(The man to who[m] I

gave the present was

absent.)

Subordinate conjunctions-

used to join two

grammatical parts of equal

rank (Although he worked

hard, he did not finish his

homework.)

conjunctions:

Students learn

sequencing using

adverbs of time, relative

clauses and subordinate

conjunctions.

23. Language Function: Hypothesizing and Speculating

BEGINNING EARLY

INTERMEDIATE

INTERMEDIATE EARLY

ADVANCED


Auxiliary verbs that

indicate futurity: will and

shall

Auxiliary verb indicating

desire or intent: would

Auxiliary verbs include

modal verbs, which may

express possibility: may,

might, can, could.

Modals (would, could,

might), Compound

tenses (would have

been):

Students learn to

hypothesize and

speculate using modals

and compound tenses.

24. Language Function: Summarizing

BEGINNING EARLY

INTERMEDIATE

INTERMEDIATE EARLY

ADVANCED


Copy out short texts;

can copy out single words and

short texts

Paraphrase short written

passages in a simple

fashion, using the original

text wording and

ordering; pick out and

reproduce key words and

phrases or short sentences

from a short text within

the learner‟s limited

competence and

experience

Summarize extracts from

news items, interviews or

documentaries containing

opinions, argument and

discussion; summarize

the plot and sequence of

events in a poem or play;

collate short pieces of

information from several

sources and summarize

them for someone else

Summarize a wide range

of factual and imaginative

texts, commenting on and

discussing contrasting

points of view and the

main themes

Summarize information

from different sources,

reconstructing arguments

and accounts in a

coherent presentation of

the overall result

Modals (would, could,

might), Compound

tenses (would have

been):

Students learn to

summarize and speculate

using modals and

compound tenses.

53

APPENDIX E

Explanation of Eligible Content

Each of the five components of the eligible content will be explained. However, the five

components interact: Morphology reflects syntax, words with similar meanings occur in different

syntactic structures, and illocutionary functions can only be expressed through forms. Forms

never exist without illocutionary meaning, and meaning cannot be conveyed without forms.

Syntax refers to what is traditionally called “grammar.” Syntax occurs at the sentence level. It is

often explained as “word order,” but in fact the order of words in a sentence are governed by

rules that convey the interrelated meanings of the words and phrases in a sentence. Examples of

syntax include:

Tenses and Aspects:

Simple present

Simple past

Simple future

Modals

Tenses with modals

Perfect tenses

Perfect tenses with modals

Tenses with progressive -ing

Examples of Tenses and Aspects

Simple Present: I ride the bus to school every day. Mario studies English.

Simple Past: I rode the bus to school this morning. Mario studied English last year.

Simple future: I will ride the bus to school tomorrow. Mario will study English next semester.

Tenses with Modals: I should (may, can, etc.) ride the bus to school tomorrow. Mario might

study English next semester.

Perfect Tenses: I have ridden the bus to school every day this year. Mario has studied

English for three years. I had always ridden the bus until I got a car. Mario had studied

English before he immigrated to the United States.

Perfect Tenses with Modals: I should have ridden the bus to school this morning. At the end

of this semester, Mario will have studied English for five years.

Tenses with Progressive –ing: I’m riding the bus to school tomorrow. (Present progressive

functioning as future) Mario has been studying English for five years.

54

Sentence Structure

Simple subject+verb(+NP)

Simple subject+verb with compound subject or verb phrase

Compound sentences: Two or more subject+verb(+NP)

Complex sentences with subordinate clauses.

Complex sentences with relative clauses

Examples of Sentence Structures

Simple subject+verb: Rebecca eats pizza.

Simple subject+verb with compound subject or verb: Rebecca and Jessica eat pizza. Rebecca

eats pizza and drinks soda.

Compound sentences: Rebecca eats pizza and she drinks soda. Rebecca eats pizza, but she

doesn’t drink soda. (Note the coordinate conjunctions, and a but, which signal a relationship

between the two independent clauses.)

Complex Sentences with Subordinate Clauses: Subordinate clauses are sentences within

sentences. They can be introduced with a subordinate conjunction that expresses the

relationship between the main clause and the subordinate clause. Rebecca eats pizza because

she likes it. Rebecca drinks soda after she eats the pizza. Rebecca drinks soda when she eats

pizza. Rebecca likes pizza better than Jessica does. (In this examples, note that “Jessica” is

the subject of the subordinate clause, and “does” takes the place of “likes pizza.”) Other

examples: Mary stayed home from school because she felt sick. After the students returned

from gym class, the alarm sounded for a fire drill. Katie held the door open while the

students filed out. (Note again that the subordinate conjunctions, when, better than, because,

after, while, indicate a relationship between the main and subordinate clauses.)

Complex sentences with relative clauses, including deleted relative pronouns, e.g., The man

driving the car ran the stop sign. The man [who was] driving the car ran the stop sign. Mario

read the instructions to Al, who carried out the experiment.

Negation

Negation can occur in independent and dependent clauses:

Rebecca doesn‟t like pizza, but she likes seafood.

Rebecca likes pizza, but she doesn‟t like seafood.

Rebecca doesn‟t like pizza, and she doesn‟t like seafood either.

Mary stayed home from school because she didn‟t feel well.

Mary didn‟t stay home from school even though she didn‟t feel well.

The placement of the negation indicates which part of a complex sentence is negated. Consider:

It‟s not important that you speak to the school board.

It‟s important that you not speak to the school board.

55

Indirect Speech

Indirect speech can be difficult for the English learner. Dependent clauses in indirect speech are

introduced with “for” or “to. John asked Sally to open the window. Robert asked for the waiter to

bring the check. (In the latter case, he didn‟t speak directly to the waiter.) John told us to go

ahead. John said for us to go ahead. Using the “for” or the “to” construction depends on the

main verb, tell or say, which are semantically similar but occur in different syntactic contexts.

Vocabulary, or “lexicon,” consists of the words of the language. Words fall into several common

so-called “parts of speech”:

Nouns

Verbs

Adjectives

Adverbs

Prepositions

Pronouns

Articles

Conjunctions

ELL students acquire a great deal of vocabulary without instruction, particularly vocabulary that

they frequently hear, words that represent tangible or concrete experiences, or words that related

to the students‟ immediate experiences.

ELL students often use relatively general words, and often, teachers use simplified vocabulary to

make meaning more comprehensible. However, ELL students need to learn the subtle

distinctions of vocabulary, e.g., look, stare, glare, gaze, peer, watch, see.

Two-word verbs may challenge ELL students because they can resemble verb + preposition but

mean different things: Look up a word v. Look up a chimney. Get on the bus v. Get on with your

business.

Language arts classes cover such prefixes as un-, mis- and re-. However, many words such

as prepositions can serve as prefixes to create new words: outshine, outrun, overeat, overdo,

overreact, underachieve, undercut.

Morphology refers to the components of words, such as their base forms, prefixes, suffixes, and

inflectional and derivational endings, and even changes in the base forms themselves to indicate

syntactic roles such as tense (am v. was, eat v. ate, etc.) Common morphemes include:

Third-person –s

Other inflections for person, e.g., am, is, are

Plural –s or –es

56

Other inflections for number, e.g., ox, oxen

Tense and aspect markers, e.g., -ed, -en, -ing

Derivational suffixes, e.g., -er, -ing, -able

Illocutionary competencies refer to the ability to use English, applying correct forms, to

communicate or understand communication. Illocutionary competencies that may appear on the

ELPA are ideational and manipulative functions.

Ideational functions communicate ideas from one person to another, e.g., describing actions,

expressing likes and dislikes, comparing and contrasting, explaining, defining, cause and effect,

and sequencing. Those are listed in the standards document. Ideational functions are prevalent in

instruction. Examples of anguage forms that can occur in ideational functions include big, bigger

than, less than, similar to, and different from, for comparing and contrasting; prefer and would

rather for expressing likes and dislikes; because, as a result, for cause and effect; before, after,

having completed, for sequencing or describing temporal relations.

Manipulative functions are the use of language to get something done or influence behavior,

such as requesting or giving instructions. Language forms that occur in manipulative functions

might include the imperative, e.g., Sit down. Other forms can also be used, such as Would you

please, I’d like for you to, Why don’t you, and many others.

57

APPENDIX F

63

APPENDIX G

STANDARD ERROR OF MEASUREMENT (SEM)

TOTAL PROFICIENCY

0

2

4

6

8

10

12

14

16

430 440 450 460 470 480 490 500 510 520 530 540 550 560

SE

M

Composite Score

SEM Conditioned on the Composite ScoreGrade Band K to 1

64

0

2

4

6

8

10

12

430 440 450 460 470 480 490 500 510 520 530 540 550 560

SE

M

Composite Score

Test SEM conditioned on Composite ScoreGrade Band 2 to 3

0

1

2

3

4

5

6

470 480 490 500 510 520 530 540 550

SE

M

Composite Score

SEM Conditioned on Composite ScoreGrade Band 4 to 5

65

0

1

2

3

4

5

6

460 470 480 490 500 510 520 530 540 550

SE

M

Composite Scores

SEM Conditioned on Composite ScoresGrade Band 6 to 8

0

1

2

3

4

5

6

7

460 470 480 490 500 510 520 530 540 550 560

SE

M

Composite Score

SEM Conditioned on Composite ScoreGrade Band 9 to 12

66

LISTENING

0

5

10

15

20

25

450 460 470 480 490 500 510 520 530 540 550

SE

M

Listening Score

SEM Conditioned on Listening ScoreGrade Band K to 1

0

2

4

6

8

10

12

14

16

18

20

450 460 470 480 490 500 510 520 530 540 550

SE

M

Listening Score

SEM Conditioned on Listening ScoreGrade Band 2 to 3

67

0

2

4

6

8

10

12

14

16

450 460 470 480 490 500 510 520 530 540 550

SE

M

Listening Score

SEM Condition on Listrening ScoreGrade Band 4 to 5

0

2

4

6

8

10

12

14

460 470 480 490 500 510 520 530 540 550

SE

M

Listening Score


68

0.00

2.00

4.00

6.00

8.00

10.00

12.00

14.00

440 460 480 500 520 540 560

SE

M

Listening Score


69

SPEAKING

0

2

4

6

8

10

12

14

460 470 480 490 500 510 520 530 540

SE

M

Speach Score

SEM Conditioned on Speaking ScoreGrade Band K to 1

70

0

2

4

6

8

10

12

14

16

460 470 480 490 500 510 520 530 540

SE

M

Speaking Score

SEM Conditioned on Speaking ScoreGrade Band 2 to 3

0

5

10

15

20

25

30

460 470 480 490 500 510 520 530 540 550 560

SE

M

Speach Score

SEM Conditioned on the Speaking ScoreGrade Band 4 to 5

71

0

5

10

15

20

25

30

450 460 470 480 490 500 510 520 530 540 550 560

SE

M

Speach Score


0

5

10

15

20

25

30

0 100 200 300 400 500 600

SE

M

Speach Score


72

READING

0.00

2.00

4.00

6.00

8.00

10.00

12.00

14.00

16.00

440 460 480 500 520 540 560

SE

M

Reading Score

SEM Conditioned on Reading ScoreBand K to 1

73

0

2

4

6

8

10

12

14

16

440 460 480 500 520 540 560

SE

M

Reading Score

SEM Conditioned on the Reading ScoreGrade Band 2 to 3

0

2

4

6

8

10

12

14

16

18

450 460 470 480 490 500 510 520 530 540 550

SE

M

Reading Score

SEM Conditioned on the Reading ScoreBand 4 to 5

74

0.00

2.00

4.00

6.00

8.00

10.00

12.00

14.00

16.00

460 480 500 520 540 560

SE

M

Reading Score

SEM Conditioned on Reading ScoreTest Grade Band 6 to 8

0.00

2.00

4.00

6.00

8.00

10.00

12.00

14.00

16.00

440 460 480 500 520 540 560

SE

M

Reading Score

SEM Conditioned on Reading ScoreGrade Bands 9 to 12

75

WRITING

0.00

2.00

4.00

6.00

8.00

10.00

12.00

14.00

16.00

440 460 480 500 520 540

SE

M

Writing Score

SEM Conditioned on Writing ScoreGrade Bands K to 1

76

0

2

4

6

8

10

12

14

16

18

450 460 470 480 490 500 510 520 530 540 550 560

SE

M

Writing Score

SEM Conditioned on Writing ScoreGrade Band 2 to 3

0.00

2.00

4.00

6.00

8.00

10.00

12.00

14.00

16.00

18.00

440 460 480 500 520 540 560

SE

M

Writing Score


77

0

2

4

6

8

10

12

14

450 460 470 480 490 500 510 520 530 540 550

SE

M

Writing Score


0

2

4

6

8

10

12

14

450 460 470 480 490 500 510 520 530 540 550 560

SE

M

Writing Score


78

COMPREHENSION

(Listening and Reading)

0

2

4

6

8

10

12

14

16

18

450 460 470 480 490 500 510 520 530 540 550

SE

M

Comprehension Score

SEM Conditioned on Comprehension ScoreGrade Band K to 1

79

0

2

4

6

8

10

12

14

440 460 480 500 520 540 560

SE

M

Comprehension Score

SEM Conditioned on Comprehension ScoreGrade Band 2 to 3

0

2

4

6

8

10

12

460 480 500 520 540 560

SE

M

Comprehension Score


80

0

2

4

6

8

10

12

460 470 480 490 500 510 520 530 540 550 560

SE

M

Comprehension Score


0

2

4

6

8

10

12

460 470 480 490 500 510 520 530 540 550 560

SE

M

Comprehension Score

SEM Conditioned on Comprehesion ScoreGrade Band 9 to 12

81

APPENDIX H

Executive Summary of Oregon English Language Proficiency

Examination Operational Item Analysis

Background on Oregon’s English Language Proficiency Assessment

Oregon’s English Language Proficiency Examination (ELPA) assess grades kindergarten through

12 measuring reading, listening, writing, and speaking with a calculated score for comprehension (combining reading and listening) and a composite score across the range of the domains. The

assessment employs two general item types, selected response (including multiple-choice,

picture-click, and cloze items) and constructed response (including, elicited-imitation, short-answer, word-builder, and extended-response items). Selected response items provide multiple

potential responses from which to choose a response. Constructed response items essentially allow free response and the response performance is scored by an established rubric. Rubrics can

be dichotomous (correct or incorrect) or polytomous (multiple score potentials).

ELPA is administered as a two-stage computer-adaptive multistage (ca-MST) test (Luecht &

Nungester, 1998, 2000; Luecht, 2004) structured for each of 5 grade bands (K-1, 2-3, 4-5, 6-8,

and 9-12). The test opens with a 30-item locator block presenting exactly the same questions to all students within a grade-band. Examinee responses to the locator block are scored so that the

examinee is routed immediately from the locator to one of three leveled follow-on tests of 50-items each for each grade band. Using this model, the test can achieve greater precision than

could be garnered from a single test administered to all students within a grade band with far

fewer items than would be necessary otherwise.

Using techniques collectively described as Item Response Theory, all of the tests can be placed

onto a single scale. Item Response Theory provides a means of placing each item on the test onto a scale of difficulty. Equating across tests is done by including common items across the

tests and fixing the scale at the difficulty point for each of these common items.

Item Analysis Methods Applied to EPLA

Two types of item analyses were performed for the 494 operational items used during the Spring

2007 ELPA administration: (1) a modified classical item analysis and (2) a concurrent IRT calibration using the Rasch model. The reason for these two analyses is explained below.

Modified classical item analysis served to evaluate patterns of distractors for selected response items and frequency of scoring patterns for the constructed response items. The principal

modification of this approach was the use of an external proficiency score as a grouping variable

for item-test correlations. This was essential because the number-correct total score is confounded with item difficulty under the ca-MST design. This item-difficulty confounding can

also carry over to the item statistics produced during a classical item analysis. Because difficult items appear easier because they are only administered to higher-proficiency examinees and

easier items may appear more difficult because they are only administered to lower-proficiency

examinees, typical analysis tends to deflate item means. Item standard deviations likewise are “range restricted” and the associated item-test correlations are similarly systematically reduced.

To avoid some of these variances and range-restrictions, IRT scores based on a concurrent calibration of all operational items were used in conjunction with the item analysis. A special,

modified version of the Classical ITem ANalysis (CITAN, Luecht, 2005) program was used for the

82

operational analyses that included conditioning on external scores—in this case, the estimated

IRT θ scores from the Spring 2006 examinee sample. A total of 494 items were analyzed for 62,296 examinees. The item analysis is comprised of two components: the classical item analysis

and the IRT-based WinSteps (Linacre, 2006) Rasch calibration analysis. This analysis provides an indication of item difficulty, independent of the ca-MST design pathways or routes and also

provides various fit analyses.

Modified Classical Item Analysis

A sparse 62,296 by 494 matrix (rows=items, columns=examinees) of raw responses was

analyzed using a modified version of CITAN (Luecht, 2005). CITAN provides a classical item and test analysis, including distractor analysis and high-low group statistics. For this analysis, the

program was modified to input an estimated proficiency score, θ, for each of the examinees. The proficiency scores were obtained from a concurrent, local Rasch calibration of all 62,296

examinees and all 494 items using WinSteps (Linacre, 2006). The estimated proficiency scores

were used in place of the number-correct total test scores for all score groupings and for computing all item-test correlations1. These proficiency scores are summarized in Table 1.

Table 1. Summary of Proficiency Scores (N=62,296)

Statistic Value

N (Examinees) 62296

Mean 0.86

Std. Deviation 1.26

Variance 1.58

Skewness -0.32

Kurtosis -0.45

Minimum -5.55

Maximum 5.48

These values match the results from the WinSteps calibration2, and as noted above, were used

by the modified version of the CITAN item analysis software to compute score groupings and for

computing all item-test correlations.

The classical item statistics are summarized in Table 2, reported by item type and then

aggregated for all 494 items. Item type codes are: CZ=cloze; EI=elicited information; ER=extended response; MC=multiple choice; PC=picture click; S2=short answer; and WB=word

builder items.

1 The point-biserial correlations produced by CITAN matched the WinSteps point-biserial

correlations exactly. 2 Examinees with extreme values are trimmed from the WinSteps summary report. Extreme

scores are assigned by the software for examinees with near-perfect or near-null total-test scores. All examinees are summarized in Table 1, including examinees with extreme scores.

83

Table 2. Summary of Item Statistics by Item Type and For All Items (n=494)

Item

Type Statistics

Item

Mean Item SD

Item Min.

Score

Item Max.

Score r(pbis) r(bis) Np

CZ Item Count 38 Minimum 0.177 0.262 0 1 0.163 1002

Maximum 0.926 0.500 0 1 0.565 23944

Mean 0.611 0.451 0 1 0.402 10178.658 Std. Dev. 0.180 0.056 0 0 0.109 5125.450

EI Item Count 18

Minimum 0.170 0.376 0 1 0.257 5365 Maximum 0.804 0.500 0 1 0.433 45495

Mean 0.512 0.463 0 1 0.334 24255.556 Std. Dev. 0.191 0.039 0 0 0.049 12388.371

ER Item Count 24

Minimum 1.293 0.528 0 3 0.326 1824 Maximum 2.403 0.976 0 3 0.535 17942

Mean 1.803 0.743 0 3 0.436 9229.917 Std. Dev. 0.261 0.124 0 0 0.053 3710.058

MC N 304

Minimum 0.161 0.178 0 1 0.015 0.021 900 Maximum 0.967 0.500 0 1 0.662 0.857 36470

Mean 0.622 0.448 0 1 0.358 0.480 9349.313 Std. Dev. 0.175 0.060 0 0 0.117 0.162 5972.248

PC N 66

Minimum 0.192 0.278 0 1 0.111 0.186 900 Maximum 0.916 0.500 0 1 0.642 0.805 16897

Mean 0.660 0.437 0 1 0.346 0.464 6577.682

Std. Dev. 0.173 0.061 0 0 0.128 0.156 5118.222

S2 Item Count 12

Minimum 1.501 0.334 0 2 0.305 9472 Maximum 1.903 0.727 0 2 0.404 21148

Mean 1.751 0.483 0 2 0.364 11849.667

Std. Dev. 0.132 0.126 0 0 0.034 4419.253

WB Item Count 32

Minimum 0.170 0.339 0 1 0.158 2176

Maximum 0.868 0.500 0 1 0.611 15743 Mean 0.494 0.471 0 1 0.373 7988.219

Std. Dev. 0.166 0.040 0 0 0.110 4789.984

Total Item Count 494 Minimum 0.161 0.178 0 1 0.015 0.021 900

Maximum 2.403 0.976 0 3 0.662 0.857 45495 Mean 0.699 0.464 0 1.121 0.364 0.477 9552.721

Std. Dev. 0.356 0.091 0 0.452 0.114 0.161 6677.238

As shown, sample sizes ranged from 900 to 45,495 valid responses per item; the average number of examinee responses per item was approximately 9553. The means and standard

deviations of the item scores shown in the “All Items” block should be interpreted cautiously

since both selected response items (scored 0 or 1) and constructed response items are included, with the latter having raw score points ranging from 0 to 1, 0 to 2, or 0 to 3 points. The

intersections of rows labeled “Minimum” and “Maximum” with columns labeled “Min. Score” and “Max. Score” specify the appropriate range of scores for each item type.

84

Two item-test correlations are reported. The biserial correlations (rbis) are only reported for the

MC and PC selected-response items. The point biserial correlations (rpbis) are Pearson product-moment correlations. The biserial correlations are only shown for the SR items (item type = MC

or PC). Those point biserial correlations are typically lower than the biserial correlations. On average, the point biserial correlations are fairly consistent across item types, with the cloze

items demonstrating the highest degree of discrimination by a nominal margin. Item-test score

correlations of less than 0.10 should be investigated on an individual basis.

Twenty-two items were flagged as having at least one distractor other than the correct-answer

key showing a positive correlation with the proficiency scores, with the majority being multiple-choice items. The data suggest that, for the majority of those items, the positive non-key

distractor correlation was only nominally above zero. Nonetheless, the data indicate that these items should be substantively reviewed.

In general, the classical item analysis results suggest that the 494 operational items are

performing reasonably well. Some of the specific distractors for items flagged might be reviewed to discover a possible item writing fix that could avoid the positive correlations with the total test

proficiency scores for distractors.

IRT Analysis

A concurrent (all grade bands, K-12) local calibration was conducted in WinSteps (Linacre, 2006)

of all 494 operational items. 62,296 examinees were included in the calibration. Raw score groupings and recoding of the ordered response categories was done within WinSteps for the

extended response (ER) item. ER items are normally scored on a 0 to 3 point scale. Some items are scored using two different scoring evaluators, grammatical aspects (g-scored) and

illocutionary aspects (i-scored): For the g-scored ER items, the recoding was Xi={0,1,2,3}{0,1,1,2}. For the i-scored ER items, the recoding was Xi={0,1,2,3}{0,0,1,2}.

This recoding was determined as part of calibration of the spring 2006 data.

Results from this analysis indicate that the ER items are the most difficult with positive mean b-

values. The short-answer (S2) items are easiest, and the remaining item types are moderately difficulty (mean b near zero). To interpret this mean difficulty, consider that the average

proficiency score for all 62,296 examinees is 0.86 of the θ metric. That translates to a probability of approximately 0.70 of correctly answering an average item on the ELPA.

MS(Infit), a statistic derived during IRT analysis, denotes the fit of the response data to the

Rasch model and is most sensitive to where the density of examinee scores is highest. A second statistic, MS(Outfit), indicates the fit of the Rasch model to the data for examinees who are

located further away for the item location (difficulty). Of the two measures, MS(Infit) is generally preferred because it tells us which items are potentially misfitting the calibration model for a

majority of the examinees. In general, values of MS() in the range 0.7 to 1.3 are considered to indicate a good-fitting item.

Review of these statistics reveals that, of the 458 items in score-group A, most fit quite well. This

is encouraging, given that the concurrent calibration puts all examinees, K-12, taking the reading, listening, writing, and speaking items, on a common scale. The extended response items exhibit

a small degree of misfit for several items, as do the short-answer items. A high degree of correlation was shown between the two fit indices. The most extreme values of the MS(Infit) and

MS(Outfit) include only six items that exhibited MS(Infit) values outside of the “good” range.

85

Discussion

In general, the operational items performed quite well. The modified classical item analysis suggested several items that were flagged as potentially too easy or too difficult, in addition to a

number of selected response items having slightly positive correlations between an incorrect distractor and the total test score. The Rasch IRT analysis suggested a reasonable range of item

difficulties, with the extended response items being the most difficult. The item misfit analysis

highlighted six items as having MS(Infit) values outside the “good” range (0.7 to 1.3). All were extended response items, but none of the misfit was overly extreme.

86

APPENDIX I

ITEM DISTRIBUTION

FORMS AND FUNCTIONS

The conceptual framework for the Oregon ELP Assessment is based on research in

the field of Education, Applied Linguistics and the English Language Acquisition

process. After a great deal of research into current linguistic models, Oregon has

adopted a framework which focuses on two major components of language

competence: Grammatical Competence and Illocutionary Competence. Each of

these is further sub-divided, resulting in a total of five assessable components of

language competence:

Grammatical Competence (Forms of Language)

1. Morphology

2. Vocabulary

3. Syntax

Illocutionary Competence (Functions of Language)

4. Ideational [replaces original‟s „Representative‟]

5. Manipulative

The tables below shows expected item distributions. Distribution of items among sub-

domains is fixed so that each has an equal number of items. This is because the design

must guarantee a usable sub-score for each sub-domain required by Title III of NCLB.

Grade Band K-1 (Form A—Beginning/Easy)

Subdomain



TOTAL ITEMS

Morph- ology


Manipu- lative

Listening 8 10 2 20

Reading 19 2 21

Speaking 1 8 9

Writing 16 16

Total Items 1 43 8 12 2 66

87

Grade Band K-1 (Form B—Medium)

Subdomain



TOTAL ITEMS

Morph- ology


Manipu- lative

Listening 8 10 2 20

Reading 22 3 25

Speaking 11 11

Writing 2 12 14

Total Items 2 42 11 13 2 70

Grade Band K-1 (Form C—Hard)

Subdomain

Forms -Grammatical Competence


TOTAL ITEMS

Morph- ology


Manipu- lative

Listening 9 9 2 20

Reading 3 21 2 26

Speaking 10 10

Writing 1 14 15

Total Items 4 44 10 11 2 71

88

Grade Band 2-3 (Form A—Beginning/Easy)

Subdomain

Forms -Grammatical Competence


TOTAL ITEMS

Morph- ology


Manipu- lative

Listening 7 14 21

Reading 3 13 5 21

Speaking 8 8

Writing 6 4 10 20

Total Items 9 24 8 29 70

Grade Band 2-3 (Form B—Medium)

Subdomain

Forms-Grammatical Competence


TOTAL ITEMS

Morph- ology


Manipu- lative

Listening 7 14 21

Reading 4 11 9 24

Speaking 12 1 13

Writing 6 3 9 18

Total Items 10 21 12 33 76

89

Grade Band 2-3 (Form C—Hard)

Subdomain



TOTAL ITEMS

Morph- ology


Manipu- lative

Listening 9 13 22

Reading 3 12 6 21

Speaking 12 1 13

Writing 6 3 12 21



Subdomain



TOTAL ITEMS

Morph- ology


Manipu- lative

Listening 5 18 23

Reading 4 17 21

Speaking 6 6

Writing 7 1 13 21


90


Subdomain



TOTAL ITEMS

Morph- ology


Manipu- lative

Listening 6 17 23

Reading 6 14 20

Speaking 8 2 10

Writing 9 13 22



Subdomain



TOTAL ITEMS

Morph- ology


Manipu- lative

Listening 3 17 20

Reading 6 18 24

Speaking 8 2 10

Writing 7 16 23


91


Subdomain



TOTAL ITEMS

Morph- ology


Manipu- lative

Listening 10 12 22

Reading 6 13 19

Speaking 9 9

Writing 11 1 10 22



Subdomain



TOTAL ITEMS

Morph- ology


Manipu- lative

Listening 9 12 21

Reading 9 12 21

Speaking 10 2 12

Writing 9 14 23


92


Subdomain



TOTAL ITEMS

Morph- ology


Manipu- lative

Listening 8 12 20

Reading 9 15 24

Speaking 10 2 12

Writing 5 1 16 22



Subdomain



TOTAL ITEMS

Morph- ology


Manipu- lative

Listening 6 15 1 22

Reading 4 18 22

Speaking 6 6

Writing 4 3 15 22

Total Items 4 13 6 48 1 72

93


Subdomain



TOTAL ITEMS

Morph- ology


Manipu- lative

Listening 3 18 1 22

Reading 6 16 22

Speaking 8 2 10

Writing 3 2 17 22



Subdomain



TOTAL ITEMS

Morph- ology


Manipu- lative

Listening 3 17 1 21

Reading 4 17 21

Speaking 9 2 11

Writing 4 1 18 23

Total Items 4 8 9 54 1 76