Test standardization

Test Standardization

Standardization is the process of trying out the test on a

group of people to see the scores which are typically

obtained. This standardization provides a mean (average)

and standard deviation (spread) relative to a certain

group. When an individual the test, she can determine

how far above or below the average her score is, relative

to the normative group.

A standardized test is a test administered and scored

in a consistent manner. Test are designed in such a way

that the “questions, conditions for administering, scoring

procedures, and interpretations are consistent and are

administered and scored in a predetermined, standard

manner.

Understanding Norms and Test Scores

Standardization is the process of testing a group of

people to see the scores that typically attained. With a

standardized test, the participant can compare where that

score fell compared to the standardization group‘s

performance.

With the standardization the normative group must

reflect the population for which the test was designed.

The group‘s performance is the basis for the test norms.

What is standardized testing? Standardized tests are

tools designed to allow measure of student performance

relative to all others taking the same test.

History of Standardized Testing

1909- Thorndike Handwriting Scale was first popular

standardized achievement test used in public schools.

1930- Most schools in the United States and Canada were

using some form of standardized testing.

1950- Student would graduate from high school taking

probably three standardized tests to the present where kids

take between 18-21 tests, it is easy to believe that the “volume

of testing has an annual growth rate of 10-20 percent”.

1965- Standardized tests were not used in early grades,

because these were years of growth and development.

1980- Sixteen states and districts in 21 others now required

children to take a standardized test before entering

kindergarten and districts in at least 42 states requires

students to pass a standardized test before “graduating” from

kindergarten.

Types of Standardized Testing

Norm-referenced

Testing measures performance relative to all

other students taking the same test. You can

use it if you want to know how a student is

compared to the rest.

Criterion referenced

Testing measures factual knowledge of a

defined body of material. Multiple- choice tests

that people take to get their license or a test in

fractions are both examples of this type of

testing.

Application in Classroom and Similar Settings

Standardized test are intended to help a teacher, school,

or district make decisions on what is working in the

classroom, how to improve the education, and how to help

a specific student.

However, standardized test scores should not be the

only thing a teacher, school, district, or school should look

at when making a decision about programs or students.

Other areas of consideration should be: observations in

the classroom; evaluation of day-to-day class work,

homework and assignments ; meetings with parents; and

observation of student change and growth throughout the

year.

Establishing Test Validity

According to Calmorin the degree of validity is most

important attribute of test. Validity refers to the degree to

which test is capable of achieving certain aims. The

validity must be determined with reference to the

particular use for which the test being considered. The

validity of test must always be considered in relation to the

purpose it serves. Validity is always specific in relation to

some definite situation. A test is always valid.

Item Analysis

It is done after the first try out of the test. One method conducting item

analysis is U-L Index Method.

1. The teachers score the papers and rank the scores from highest to

lowest according to the total score.

2. Separate the upper 27% and lower 27% of the papers.

3. Tally the responses made to each test item by each student in the

upper 27% then do the same with the lower 27%.

4. Compute the percentage of the upper group that got the item right.

This is called the U.

5. Compute the percentage of the lower group that got the item right. This

is called L.

6. Average U and L percentage. The result is the difficulty index.

7. Subtract the L percentage from the U percentage. The result is

discrimination index.

After the item analysis, the tester uses the

following table of equivalents interpreting the

difficulty index:

.00- .20 - Very Difficult

.21- .80 - Moderately Difficult

.81- 1.00 - Very Easy

Item Revision

On the basis of the item analysis data, test items are

revised for improvement. After revising the test items that

need revision, the tester needs another try out. The revised

must be administered to the same set of samples.

Third try out

After two revisions, the test is considered ready for the

final form. The test is good in terms of difficulty index and

discrimination indices. At this time, the test is ready for it

reliability testing.

How to Establish Reliability

Reliability may be estimated through a variety of methods that fall into two

types;

single-administration and multiple-administration.

Multiple –administration methods require that two assessments are

administered.

• Test-retest reliability

Is estimated as the Pearson Product-moment Correlation Coefficient

between two administrators of the same measure. This is sometimes

known as the Coefficient of Stability.

• Alternative forms reliability

Is estimated by the Pearson product-moment correlation coefficient of two

different forms of a measure, usually administered together. This is

sometimes known as the Coefficient of Equivalence.

Single- administration methods include split-half and

internal consistency.

Split- half reliability

Treats the two halves of a measure as alternative forms.

This “halves reliability” estimate is then stepped up to the

full test length using the Spearman Brown Prediction

Formula. This is sometimes referred to as the Coefficient

of Internal Consistency.

Internal Consistency

Measure is Cronbach’s alpha, which is usually interpreted

as the mean of all possible split-half coefficients.

Cronbachs alpha which is a generalization of an earlier

form of estimating internal consistency, Kuder-Richardson

Formula 20.

Reliability Estimation Using a Split-half

Methodology

The split-half design in effect creates two

comparable test administrations. The items in a test

are split into two test that are equivalent in content

and difficulty. Often this is done by splitting among

odd and even numbered items. This assumes that

the assessment is homogenous in content.

Estimating Reliability using Kuder- Richardson Formula 20

The rationale for Kuder and Richardson’s most commonly used procedure is

roughly equivalent to:

Securing the mean inter-correlation of the number of items (k) in the test.

Considering this to be the reliability coefficient for the typical item in the test.

Stepping up this average with the Spearman- Brown formula to estimate the

reliability coefficient of an assessment of k items.

Formula for Kuder- Richardson Formula 20:

Where:

k - the number of items in the test

SD – standard deviation of the test

p – the proportion of examinees who got an item correctly

q – the proportion of those who got the item incorrectly

SUBMITTED BY:

Aileen B. Ferriols

SUBMITTED TO:

MRS. KATHERINE PARANGAT

Test standardization

Documents

Transcript of Test standardization