CHAPTER 4 RESEARCH DESIGN - repository.tufs.ac.jp

17
CHAPTER 4 RESEARCH DESIGN In the previous chapters ways of approaching how read defined from the perspective of test item specifications wer it has been examined and emphasized that in investigating t with its relation to the latent structure of reading ability th is on the“product”of FL reading as a result of FL reading“p Chapter 3 had described a way in which a constnlct of re defined by developing test items that elicit certain types o takers’reading comprehension Reading“competence”was te constitutes a major part of reading“performance” and i construct fbr the purpose of reading test item developmen although a test item is defined to be a tool which elicits a r performance should be accepted as something that all inferences and make generalizations about what sort of readin might be able to do Furthermore this should be conside interaction of his competence and the context rather than c holistic and contentrepresentative Tb continue along th the significance of specifying the components of a test i particular in operationalizing the reading construct to be was f耐her explored by reflecting on item diffriculty or a quan item The discussion had concluded in suggesting a possibili question type of a test item and its difficulty which provid questions to the present research 41Research questions Research Question 1 Is it valid to employ‘question types as a prime compone items used in eliciting test takers L2 reading performanc 46 東äșŹć€–ć›œèȘžć€§ć­Š ćšćŁ«ć­Šäœè«–æ–‡ Doctoral thesis (Tokyo University of Foreign Studies)

Transcript of CHAPTER 4 RESEARCH DESIGN - repository.tufs.ac.jp

Page 1: CHAPTER 4 RESEARCH DESIGN - repository.tufs.ac.jp

CHAPTER 4 RESEARCH DESIGN

     In the previous chaptersïŒŒă€€ways of approaching how reading ability could be

defined from the perspective of test item specifications were exploredïŒŽă€€In Chapter 2

it has been examined and emphasized thatïŒŒă€€in investigating the nature of reading test

with its relation to the latent structure of reading abilityïŒŒă€€the scope ofthe present study

is on the“product”of FL reading as a result of FL reading“processâ€ïŒŽă€€Furthermore

Chapter 3 had described a way in which a constnlct of reading ability could be

defined by developing test items that elicit certain types of reading product in test

takers’reading comprehensionïŒŽă€€Reading“competence”was termed to be a facet that

constitutes a major part of reading“performanceâ€ïŒŒă€€and in defining the reading

construct fbr the purpose of reading test item developmentïŒŒă€€it was proposed that

although a test item is defined to be a tool which elicits a reading performanceïŒŒă€€that

performance should be accepted as something that allows the testers to draw

inferences and make generalizations about what sort of reading activities the test taker

might be able to doïŒŽă€€FurthermoreïŒŒă€€this should be considered analytically as an

interaction of his competence and the context rather than considering it as something

holistic and contentrepresentativeïŒŽă€€Tb continue along the same lines of apProach

the significance of specifying the components of a test item‘‘question types”in

particularïŒŒă€€in operationalizing the reading construct to be tested was discussedïŒŽă€€This

was f耐her explored by reflecting on item diffricultyïŒŒă€€or a quantitative aspect of a test

itemïŒŽă€€The discussion had concluded in suggesting a possibility of a link between the

question type of a test item and its difficultyïŒŒă€€which provides the fbllowing research

questions to the present research

41Research questions

Research Question 1

Is it valid to employ‘question typesïŒŒă€€as a prime component that constructs test

items used in eliciting test takersïŒŒă€€L2 reading performances

46

東äșŹć€–ć›œèȘžć€§ć­Š ćšćŁ«ć­Šäœè«–æ–‡ Doctoral thesis (Tokyo University of Foreign Studies)

Page 2: CHAPTER 4 RESEARCH DESIGN - repository.tufs.ac.jp

     What are the factors that constitute the L2 reading performances of leamers of

English at secondary education in JapanïŒŒă€€when they are extracted from factor analrtic

studies of reading products elicited using reading test itemsïŒŸă€€Would they differ

across learners with different reading abilities

     In an attempt to come up with a test item specification that effectively

operationalizes different reading performances to be testedïŒŒă€€inspired by Negishi

1996and Wada2003ïŒ‰ïŒŒă€€the present study proposes the‘question typ♂of a test item

to be a prime component to constitute such a䞘ameworkïŒŽă€€At the same timeïŒŒă€€however

because Negishi1996and Wada2003had not accommodated the interactions of

these constructing components with the latent reading structure of test takersïŒŒă€€an

attention will be rendered to this aspect in much greater depthïŒŒă€€as it is possible that

the prime factors could change in accordance with the test takers’reading abilities

Research Question 2

Is it valid to assume a certain relationship betWeen question types and item

difficulty in eliciting test takersïŒŒă€€L2 reading performances

    Is the item difficulty of a test itemïŒŒă€€calibrated using Item Response Theory

affected by its question typeïŒŸă€€If soïŒŒă€€howïŒŸă€€Wbuld this relationship differ across

learners with different reading abilities

     With an intere st in suggesting the facets of a reading te st item that would allow

the writers of test items to predeterrnine the difficulty of a test itemïŒŒă€€the present study

investigates the possibility of a link between the item diflriculty of a test item and its

question typeïŒŽă€€Attention will also be given to cases with different abilities of test

takers to see if the orders of perceived dif6culties across different question types

differ according to the different ability groups of test takers

47

東äșŹć€–ć›œèȘžć€§ć­Š ćšćŁ«ć­Šäœè«–æ–‡ Doctoral thesis (Tokyo University of Foreign Studies)

Page 3: CHAPTER 4 RESEARCH DESIGN - repository.tufs.ac.jp

42Data Collection

421Subjects

     Asample of 8301earners of English from senior high school and university in

Japan had participated in the main part of the present studyïŒŽă€€Of these280 were

thirdyear high school students and 550 were firstyear undergraduate students in

皿lverslty

     The maj ority of high school students had five years of English education under

the Course of Study provided by the Ministry of EducationïŒŒă€€CultureïŒŒă€€SportsïŒŒă€€Science

and Technology in English in a foreign language environmentïŒŽă€€They were told that

the test was administered to collect data on individual’s English proficiencyïŒŽă€€The

students had five English classes in a weeknothing was done in the classroom that

would help the students to prepare fbr the tests administered in this study

     For the university studentsïŒŒă€€the circumstances were the same as high school

students except that the duration of time English was leamed was mostly six years

All of the皿iversity students maj ored in one foreign language other than English and

were given the test early in AprilïŒŒă€€immediately after they had entered universityïŒŒă€€as a

placement test fbr their English classes that were prerequisite in the university

curriculumïŒŽă€€This was to ensure that the test takers did not have any special

knowledge of English or of any other academic field that would distort the outcome

of data collections

     There were some variations in both high school and university students’

background of how and how long English was learnedegïŒŽă€€students who had

overseas experiencesïŒ‰ïŒŒă€€howeverïŒŒă€€the variation in the number of years they had spent

time abroad or the intensity of how much English they had leamed were so great that

it was not possible to come up with any generalizable criterion fbr omitting the scores

MoreoverïŒŒă€€it could be assumed that tho se variations would be an inherent factor in

leamers’reading ability that enables them to score high on the testïŒŒă€€so the present

author had decided to disregard such factors in the process of data collection as long

as it did not affect the distribution of scores too greatly

48

東äșŹć€–ć›œèȘžć€§ć­Š ćšćŁ«ć­Šäœè«–æ–‡ Doctoral thesis (Tokyo University of Foreign Studies)

Page 4: CHAPTER 4 RESEARCH DESIGN - repository.tufs.ac.jp

422 Materials

     The two sets of test instrument were employed in the main study

4221Test Setl

     Test Set Apresented in Appendix Aconsists of nine passagesïŒŒă€€each passage

with three multiplechoice test itemsone correct option and three distracters

providedto be responded on the base of its comprehensionïŒŽă€€These nine passages

were selected after an item selection was done in the pilot studyïŒŒă€€providing 27 reading

test itemsïŒŽă€€The features of these nine passages are as follows

Table 41 The features of passages employed in Test Set A

TEXT Item REase Gr Level Words1 13 568 87 952 46 664 73 1093 79 552 10 1085 1315 652 96 1106 1618 648 75 957 1921 657 76 101

8 2224 681 74 1049 2527 558 84 97

10 2830 571 95 103

6168 844 10244

TeXt 4ïŒŒă€€as well as ltem 1011 and 12 are missing from the table because they were omitted after the

item SeleCtiOn

     All of the passages are taken from Reading Comprehension Sectionadvanced

levelof Global Test of English Comm皿icationGTECdeveloped by Benesse

CorporationïŒŽă€€The pre sent author had determined GTEC to be an appropriate source of

reading texts since it was designed to test English proficiency of highintermediate

leamers in senior high schools and皿iversities in JapanïŒŒă€€which is at an equivalent

level of the subjects to be tested and also of what the Course of Study provided by

Ministry of EducationïŒŒă€€CultureïŒŒă€€SportsïŒŒă€€Science amd Teclmology aims for

49

東äșŹć€–ć›œèȘžć€§ć­Š ćšćŁ«ć­Šäœè«–æ–‡ Doctoral thesis (Tokyo University of Foreign Studies)

Page 5: CHAPTER 4 RESEARCH DESIGN - repository.tufs.ac.jp

     In Table 41‘‘RïŒŽă€€Ease”indicates the Flesch Reading Ease and‘‘GrïŒŽă€€Level”

indicates the FleschKincaid Grade Leve1ïŒŽă€€They both indicate a readability indexïŒŒă€€a

means of describing how easily written materials could be read and皿derstood

Although they employ the same core measuresword length and sentence lengthto

calculate the indexïŒŒă€€they have different weighting factorsïŒŒă€€which sometimes create

incoherence in the outcome of calculationsïŒŽă€€The indices provided by the Flesch

Reading Ease indicates the easiness of reading a passage from the scale of zero to one

h皿dredïŒŒă€€zero being the most difficult to one h皿dred being the easiest

FleschKincaid Grade Level expresses the readability in a grade level of US

educational systemïŒŒă€€making it easier to j udge the readability level of various books

and textsïŒŽă€€Observing these indices fbr the nine passages used in Test Set AïŒŒă€€the

present author assumes the diffriculty of passages were appropriate for the subj ects

and fbr the purpose of the present researchsee 431 for fUrther explanations on how

the subj ect groups were predetermined for the main study

     The number of words in each passage was co皿ted so as to regulate the

characteristics of each passageïŒŽă€€The present author had selected passages that were

around 100 words in totalïŒŒă€€considering the time constraint of testing environments

The numbers at the bottom indicate the means for each index

     As fbr the three multiplechoice test items that were to be answered after

reading each passageïŒŒă€€the present author had written the questions and four options・

The validity of which question typesee 342 fbr detailed explanationseach item

represented was checked by her colleaguestwo teachers at a senior high schooland

their assessment had sufflcient correlation of76ïŒŽă€€For the items where

disagreements were fbundïŒŒă€€they were discussed and revised so that all three people

the two colleagues and Iwere satisfied with the decision

      For each passageïŒŒă€€the first item was written so that the question elicits a

“globalinferential”comprehension of the passageïŒŽă€€These were the items numbered

1471316192225and 28ïŒŒă€€and they asked fbr the main idea of the passage

For exampleïŒŒă€€item l of Test Set A‘‘1What is the main idea of this passage”can be

answered correctly if a test taker comprehends that the main idea in the passage is the

50

東äșŹć€–ć›œèȘžć€§ć­Š ćšćŁ«ć­Šäœè«–æ–‡ Doctoral thesis (Tokyo University of Foreign Studies)

Page 6: CHAPTER 4 RESEARCH DESIGN - repository.tufs.ac.jp

growing seam in the seafloor of the Atlantic OceanïŒŽă€€The wording and phrases used

in each question may varyïŒŒă€€but all nine questionsitems 1471316192225ïŒŒă€€and

28are made to elicit“globalinferentiar’type of reading

     The second item was written so that the question asks fbr a‘‘localliteral 

comprehensionïŒŽă€€These were the items numbered 2581417202326ïŒŒă€€and 29

and they asked fbr the information which is directly interpreted侘om a relatively

small amount of text sourceïŒŽă€€With regard to the first passage which appears in Te st

Set AïŒŒă€€item 2 is such test itemïŒŽă€€Item 2 requires a test taker to complete the sentence

‘‘

QThe speed at which the seafloor is spreading isïŒżâ€ă€€The correct option‘‘Chalf

as fast as human fingemails grow”can be chosen if the test taker can spot and

understand the last sentence in the passage“This spreading occurs in half of a speed

of how fast fingernails grow”as it i sïŒŒă€€Without any fUrther inferring from the text

     The last item was composed so that the question provokes a“localinferentialうう

皿derstanding of the passageïŒŽă€€These were items 3691518212730ïŒŒă€€and they

called for the information which could be obtained after making an inference from a

relatively small amount of text sourceïŒŽă€€With regard to the first passage which

appears in Test Set AïŒŒă€€item 3‘‘3ïŒŽă€€The breakoff of Pangaea started because”

requires such type of comprehension and asks fbr the cause of the growing seam in

the seafloor ofthe Atlantic OceanïŒŽă€€In order to choose the correct option‘‘Baplate

started to develoP皿derwater and the land was separated”atest taker needs to

understand the sentence‘‘Since that timeïŒŒă€€the Atlantic Ocean has widened along a hot

rockproducing seΔm in the seafloore’and infer that the‘rockproducing searn’is the

cause the breakoff of Pangaea

     The three questions fbr each passage were asked so that the globalinferential

question would come firstïŒŒă€€the localliteral question secondïŒŒă€€and the localinferential

thirdïŒŽă€€The present author had chosen to provide them in this order because this is

the order in which the questions seem to appear in the reading sections of common

standardized proficiency testsïŒŒă€€such as TOEFL or TOEIC

     As fbr the time allocated to this testïŒŒă€€because one class period in senior high

schools is usually 50 minutes50 minutes was the maximum length of time allowed

51

東äșŹć€–ć›œèȘžć€§ć­Š ćšćŁ«ć­Šäœè«–æ–‡ Doctoral thesis (Tokyo University of Foreign Studies)

Page 7: CHAPTER 4 RESEARCH DESIGN - repository.tufs.ac.jp

to implement Test Set AïŒŽă€€IdeallyïŒŒă€€sufficient time should be given to the test takers

since the fbcus of the present study is in the test takers’‘powerâ€™ïŒŒă€€rather than their

‘speedâ€™ïŒŽă€€TherefbreïŒŒă€€special attention was given so that the test takers would be able

to complete the test set within the time allocated

     Prior to the test implementation fbr the main studyïŒŒă€€a pilot test was carried out

in order to validate the test items developed by the procedures described aboveïŒŽă€€The

subj ects were 143 students from a senior high school which is considered to be ofthe

equivalent academic level to the high school at which Test S et A was implemented in

the main study

     The main interest in canying out the pilot test was to find and edit the test

items that exhibit problems with its item discrimination indicesïŒŽă€€Item discrimination

is‘‘the capacity of test items to differentiate among candidates possessing more or

less of the trait that the test is designed to measure”Davies etïŒŽă€€al199996ïŒ‰ă€€In

developing a test instrumentïŒŒă€€it is essential that the test items have high levels of item

discriminability to ensure a reliable measurement of test takers’abilityïŒŽă€€Items with

low item discrimination index are usually eliminated䞘om a test or editedïŒŽă€€In the

present studyïŒŒă€€item discriminability was calculated using classical test theory

pointbiserial correlation calculated by ITEMANdue to the small number of

subj ects and items

     In Table 42‘‘PBs”indicates pointbiserial correlationïŒŒă€€and‘‘PCう’indicates the

percentage of test takers who correctly answered each itemïŒŽă€€Indices fbr

pointbiserial correlation are used to indicate how well an item discriminates test

takers who are more capable with those who are not so capableïŒŽă€€It is often defined

that point biserial correlations of25 and above are acceptableHenning 198753

and most of the items surpassed this criterionïŒŽă€€Percentage correct is used to show

how easyor difficultatest item is because the higherlowerthe percentage of

people who correctly answered a test itemïŒŒă€€the easiermore diffiT cultatest item had

been perceived by the test takers

      As it is apparentïŒŒă€€items 101112 were considered to be problematic because

they show negative or very low discriminationïŒŽă€€These were the items provided fbr

52

東äșŹć€–ć›œèȘžć€§ć­Š ćšćŁ«ć­Šäœè«–æ–‡ Doctoral thesis (Tokyo University of Foreign Studies)

Page 8: CHAPTER 4 RESEARCH DESIGN - repository.tufs.ac.jp

the sarne passageïŒŒă€€so it could be presumed that the passage itself was problematic for

this level of test takersïŒŽă€€For this reasonïŒŒă€€the present author had decided it best to

eliminate all three items along with the passageïŒŽă€€Items 1239and l 6 also had low

discriminabilityïŒŒă€€so the present author had reviewed and revised each itemïŒŽă€€Test Set

Apresented in Appendix A is the final version of these items after the revisionThe item

numbers were left as they were when the test set was implemented in the main stUdyïŒŒă€€and this

was announced orally to test takers by the proctors

Table 42 The discrimination indices of test items in the pilot version of Test Set A

ITEM PBs PC1 005 046

2 013 043

3 021 033

4 051 08

5 049 07

6 039 04

7 035 076

8 042 064

9 001 013

10 äž€ă€‡ïŒŽ02 019

ll äž€ă€‡ïŒŽ1 012

12 018 038

13 041 047

14 056 036

15 051 069

16 013 04

17 049 043

18 049 048

19 042 054

20 042 057

21 039 028

22 051 062

23 058 045

24 044 053

25 051 062

26 054 043

27 038 068

28 047 042

29 056 047

30 051 042

In order to compare the reading abilities of test takers who took this set of test

53

東äșŹć€–ć›œèȘžć€§ć­Š ćšćŁ«ć­Šäœè«–æ–‡ Doctoral thesis (Tokyo University of Foreign Studies)

Page 9: CHAPTER 4 RESEARCH DESIGN - repository.tufs.ac.jp

andïŒŒă€€more importantlyïŒŒă€€to observe the alteration of latent ability structure among test

takers with different reading abilitiesïŒŒă€€items 12 and 3 reappear in Test Set B as items

12and 3ïŒŒă€€items 789as items 45ïŒŒă€€and 6 and items 1011ïŒŒă€€and 12 as items 78

and 9ïŒŽă€€HoweverïŒŒă€€as it was stated in the previous paragraphïŒŒă€€because items 1011

and 12 were omitted from Test S et AïŒŒă€€items 78ïŒŒă€€and 9 had to be omitted from Test

Set B as well

     As fbr the time allocated fbr the completion of the testïŒŒă€€it was reported from the

teachers who had proctored for the pilot study that most of the test takers appeared to

have reached the last item of the te stïŒŒă€€which proves that 50 minutes was a sufficient

time fbr the test takers in the present study

4222Test Set B

     Test S et B i s presented in Appendix BïŒŽă€€In totalïŒŒă€€there are 27 test items in the

test setnine passages are providedïŒŒă€€each with three multiplechoice test items to test

test takers’comprehensionïŒŽă€€Each item has one correct option and three distracters

These nine passages were selected after an item selection was done in the pilot study

The features of these nine passages are presented in Table 43

Table 43 The features of passages employed in Test Set B

TEXT 1tem REase Gr Level Words1 13 568 87 952 46 552 10 108

4 1012 341 12 157

5 1315 353 12 142

6 1618 348 12 1607 1921 377 12 160

8 2224 389 12 152

9 2527 334 12 155

10 2830 336 12 151

40 114 14222

ïŒˆă€ŒreXt 3ïŒŒă€€as well as ltem 78and g are missing from the table because they were omitted after the

item seleCtion

Text l is the same passage as Text l in Test Set AïŒŒă€€Text 2 is the same passage as

54

東äșŹć€–ć›œèȘžć€§ć­Š ćšćŁ«ć­Šäœè«–æ–‡ Doctoral thesis (Tokyo University of Foreign Studies)

Page 10: CHAPTER 4 RESEARCH DESIGN - repository.tufs.ac.jp

Text 3 in Test Set AïŒŒă€€and Text 3 is the same passage as Text 4 in Test Set AïŒŽă€€This

was done to compare the reading abilities of test takers who took this set of testïŒŒă€€Test

Set BïŒŒă€€with the test takers who took the Test Set A andïŒŒă€€in particularïŒŒă€€to see if any

alteration would emerge with regard to test takers’latent ability stmcture among

different ability groupsïŒŽă€€The rest of the passages were taken䞘om Reading

Comprehension S ection of TOEFL Test Preparation Kit〃MorkbookETS 1998ïŒ‰ïŒŽă€€The

present author had determined TOEFL Test preparation material to be an appropriate

source of reading passages becauseïŒŒă€€since TOEFL was designed to test English

proficiency of students who are seeking to study at an undergraduate or graduate level

in the Englishspeaking environmentïŒŒă€€the level of English proficiency required to

succeed in completing them would be the same as that of advanced learners in Japan

which is at an equivalent level of the subj ects to be tested by Test S et B

     In the Table 43“RïŒŽă€€Ease”indicates Flesch Reading Ease and‘‘GrïŒŽă€€Leverう

indicate s FleschKincaid Grade LevelïŒŽă€€The number of words were counted so as to

regulate the characteristics of each passageïŒŽă€€The present author had selected

passages that were around 150 words in total for Texts 4 to 10ïŒŒă€€considering the time

constraint of testing environmentsïŒŽă€€The numbers at the bottom indicate the means

fbr each index

     As fbr the three multiplechoice test items that were to be answered after

reading each passageïŒŒă€€the present author had written the questions and fbur options・

These questions and options were examined fbr their validity by her two colleagues

After each passageïŒŒă€€a‘‘globalinferentia1”question‘‘localliteralう’questionïŒŒă€€and

â€˜â€˜æ’ƒç›Ÿăƒąâ‰ æˆŸ|inferential”questionsee 342 fbr detailed explanations of‘question types’

are presented in the same manner as these questions are presented in Test Set A

Thi s means thatïŒŒă€€fbr each passageïŒŒă€€a‘‘globalinferential” question is the first item that

comes after the passageïŒŒă€€a‘‘localliteral”question the secondïŒŒă€€and a‘‘localinferential”

que stion the lastïŒŽă€€ThereforeïŒŒă€€items numbered 14101316192225ïŒŒă€€and 28 are

‘‘№撃盟b≠戟|inferential’うquestion which asked fbr the main idea of the passageïŒŒă€€items

numbered 25111417202326ïŒŒă€€and 29 are“localliteral”questions which asked

fbr the information which is directly interpreted from a relatively small amount of

55

東äșŹć€–ć›œèȘžć€§ć­Š ćšćŁ«ć­Šäœè«–æ–‡ Doctoral thesis (Tokyo University of Foreign Studies)

Page 11: CHAPTER 4 RESEARCH DESIGN - repository.tufs.ac.jp

text sourceïŒŒă€€and items 36121518212730 are‘‘localinferentialううquestions

asked for the information which could be obtained after making an inference from

relatively small amo皿t of text sourcesee pP5152 fbr detailed explanation and

examples of how these questions were presentedïŒ‰ïŒŽă€€The validity of which question

type each item represented was confirmed by the two colleagues who had worked on

the question types of Test Set AïŒŒă€€and their correlation was71ïŒŽă€€For the items where

disagreements were fbundïŒŒă€€they were discussed and revised so that all three people

the two colleagues and Iwere satisfied with the decision

     For Test Set BïŒŒă€€the time allocated to the test was 50 minutes in order to parallel

Test Set AïŒŽă€€In writing and revising Test Set BïŒŒă€€special attention was also given so

that the test takers would be able to complete the test set within the time allocated

     Prior to the test implementation fbr the main studyïŒŒă€€a pilot test was carried out

in order to validate the test items developed by the procedures described aboveïŒŽă€€The

subjects were 156 students from the same皿iversity at which Test Set B was

implemented in the main studyïŒŽă€€They were of the same academic background as the

subj ects who had participated in the main study

     The main interest in carrying out the pilot test was to find and edit the test

items that exhibit problems with its item discrimination indicesïŒŽă€€As it was done in

the pilot study for Test S et AïŒŒă€€item discriminability was calculated using classical test

theorypointbiserial correlation calculated by ITEMANdue to the small number of

subj ects

     In Table 44‘‘PBs”indicates pointbiserial correlation fbr item discriminability

and‘‘PC”indicates the percentage of test takers who correctly answered each item to

show item di䌍cultyïŒŽă€€Items 789were automatically eliminated because they were

the same items as those eliminated from Test Set Aitems 1011ïŒŒă€€and 12ïŒ‰ïŒŽă€€The

present author had originally intended to use these three items fbr level comparison

across different subject groups but decided to discard them fbr this reason and also

due to the time constraint expected in the testing environmentïŒŽă€€FurthermoreïŒŒă€€items l

and 2ïŒŒă€€which reveal low item discrimination in Table 44ïŒŒă€€were revised because they

were the items presented as items l and 2 in Test Set A and had also shown low item

56

東äșŹć€–ć›œèȘžć€§ć­Š ćšćŁ«ć­Šäœè«–æ–‡ Doctoral thesis (Tokyo University of Foreign Studies)

Page 12: CHAPTER 4 RESEARCH DESIGN - repository.tufs.ac.jp

discrimination in the pilot test fbr Test Set AïŒŽă€€The same was true fbr item 4 and 6

which were numbered 7 and 9 in Test S et AïŒŽă€€Items 23 and 29 also had low

discriminabilityïŒŒă€€so they were reviewed and revised accordinglyïŒŽă€€Test Set B which is

presented in Appendix B the final version after these revisionsThe item numbers

were left as they were when the test set was implemented in the main studyïŒŒă€€and this

was㎜o皿ced orally to test takers by the proctors

Table 44 The discrimination indices of test items in the pilot version of Test Set B

ITEM PBs PC1 027 094

2 018 099

3 030 086

4 029 043

5 043 081

6 020 044

7 038 080

8 056 063

9 018 067

10 045 071

ll 039 084

12 049 057

13 042 084

14 054 031

15 041 036

16 030 036

17 033 063

18 032 021

19 020 097

20 028 091

21 033 081

22 023 065

23 015 036

24 042 051

25 047 052

26 027 040

27 036 022

28 029 091

29 018 084

30 033 075

As fbr the time allocated fbr the completion of the testïŒŒă€€it was reported from the

57

東äșŹć€–ć›œèȘžć€§ć­Š ćšćŁ«ć­Šäœè«–æ–‡ Doctoral thesis (Tokyo University of Foreign Studies)

Page 13: CHAPTER 4 RESEARCH DESIGN - repository.tufs.ac.jp

teachers who had proctored for the pilot study that most of the test takers appeared to

have reached the last item of the testïŒŒă€€which proves that 50 minutes was a sufficient

time fbr the test takers in the present study

423 Test Administration

     Test Set A and Test Set B were both administered in 50 minutesïŒŽă€€Senior high

school students were given Test Set AïŒŽă€€It was implemented as a reading proficiency

test in a 50minute class periodïŒŒă€€proctored by the teachers who taught the class in the

regular lesson

     For皿iversity studentsïŒŒă€€the test was administered as a part of a placement test

fbr their required English classes which consisted of a listening comprehension

section and a reading comprehension sectionïŒŽă€€They were given either Test Set A or

Test Set BïŒŒă€€depending on the date they were taking the testïŒŽă€€Those students who

took the test on the first day of the placement test were given the test which included

Test Set A as the reading comprehension sectionïŒŒă€€and those who took the test on the

second dayïŒŒă€€Test Set BïŒŽă€€The scores on the reading comprehension section of the test

were not counted in the placement itself because of the difference in difficulty

between the two test setsïŒŽă€€In the first half of the testing timeïŒŒă€€students were given 50

items that tested their listening skillsïŒŽă€€In this part of the testïŒŒă€€the time was regulated

by the listening materialïŒŽă€€At the end of this sectionïŒŒă€€which was announced by the

listening material itselfïŒŒă€€students were told to begin the reading sectionïŒŽă€€The

students were given 50 minutes fbr the reading sectionïŒŽă€€The test was proctored by

the teachers who teach the required English classes

     Both high school students and皿iversity students were asked to provide their

answers on marksheetsïŒŽă€€These marksheets were scored electrically on the

marksheet sca皿er

43Data Analysis

431 Predetermining Ability Groups

58

東äșŹć€–ć›œèȘžć€§ć­Š ćšćŁ«ć­Šäœè«–æ–‡ Doctoral thesis (Tokyo University of Foreign Studies)

Page 14: CHAPTER 4 RESEARCH DESIGN - repository.tufs.ac.jp

     Prior to the data analysesïŒŒă€€three groups of different abilities were determined

based on the results of the data collection aboveïŒŽă€€The three groups areGroup

ALowïŒŒă€€Group AHighïŒŒă€€and Group B

     Group ALow and Group B were to represent the groups of test takers who

were responding to the items that had a difficulty that is equivalent to their reading

abilityïŒŒă€€and Group AHigh to represent the test takers who were responding to the

items that were considered to have a difficulty lower than their reading abilityïŒŽă€€In

this wayïŒŒă€€the results of Group ALow and Group AHigh could be compared to

investigate the differences exhibited by test takers with different reading abilities

tackling the test items of the same difficultyïŒŽă€€R耐hermoreïŒŒă€€the results of Group

ALow and Group B were to be compared to observe the differences presented by test

takers with different reading abilities responding to the test items that had the

difficulty equivalent to their ability

     HereïŒŒă€€an explanation of what is meant by‘‘test takers with different reading

abilities responding to the test items that had the difficulty equivalent to their ability”

for Group ALow and Group B and‘‘the te st takers who were re sponding to the items

that were considered to have the difficulty lower than their reading ability”fbr Group

AHigh may be necessary’In Item Response TheoryIRTïŒ‰ïŒŒă€€the theory on which the

calculation of item difficulty was based in the analyses of Section 53ïŒŒă€€the idea is to

find the relationship between the difficulty of a test itemïŒŒă€€the ability of a test taker

and the probability of a test taker answering a test item correctlyOhtomo 199669

The difficulty of a test item is determined by its“item characteristic curveâ€ïŒŒă€€a graph

which is drawn after the calibration using logistic fUnctionïŒŽă€€On this graphïŒŒă€€the point

where it meets where the probability of a person responding to that item is O5050

indicates the ability level of that personïŒŒă€€the person whose probability of answering

that test item correctly is O50ïŒŒă€€and that ability index is employed as the difficulty of

the test itemïŒŽă€€TherefbreïŒŒă€€the index provided as‘‘theta”in Appendix C1ïŒŒă€€D1ïŒŒă€€and

E1indicates the ability levelfrom3O to 30of a person whose probability of

responding to that item is O50 and that also represents the difficulty of the test item

This relationship between the ability of a test taker and the difficulty of a test item

59

東äșŹć€–ć›œèȘžć€§ć­Š ćšćŁ«ć­Šäœè«–æ–‡ Doctoral thesis (Tokyo University of Foreign Studies)

Page 15: CHAPTER 4 RESEARCH DESIGN - repository.tufs.ac.jp

brings the present reader to characterize each subj ect group as having an ability that is

‘‘?æ›čæ”źć„„é­”â‰ æ’ƒ?獅煀@to’うor‘‘higher than”the difficulty level oftest items

     OriginallyïŒŒă€€the present author had chosen to give Test Set A to high school

students and half of the university studentsïŒŒă€€so that high school students would

represent Group ALow and university studentsïŒŒă€€Group AHighïŒŽă€€Test Set B was

given to the rest of the university students to represent Group BïŒŽă€€At this pointïŒŒă€€the

author had assumed that university students would possess higher ability in English

reading comprehension since they had had an extra year of English education along

with their preparatory learning experience fbr皿iversity entrance examinations

HoweverïŒŒă€€this method of predetermining the ability groups did not fUnction for the

present study becauseïŒŒă€€virtuallyïŒŒă€€no difference could be fbund between the scores of

high school students and university students on Test Set Athe mean scores were 176

fbr the high school students and 179 fbr皿iversity studentsïŒŽă€€One possibility which

could have caused this to hapPen is the fact that皿iversity students were given the

reading comprehension test after they had worked on the listening comprehension

section in the placement testïŒŽă€€The cognitive load which was imposed on the test

takers while working on the listening comprehension could have exhausted them

cognitively and impeded their performances on the reading sectionïŒŒă€€rendering the

result aboveïŒŽă€€HoweverïŒŒă€€when the listening test material was evaluatedïŒŒă€€it was

determined that it did not appear to exhibit the diffriculty that would influence test

takers’performance in the latter section of the testïŒŽă€€TherefbreïŒŒă€€it was presumed that

there indeed was little difference in reading ability between high school students and

university students who were given Test Set AïŒŽă€€For this reasonïŒŒă€€at this pointïŒŒă€€the

present author decided to look at the results of test takers who worked on Test Set A

as a wholeïŒŒă€€regardless of whether they were high school students or university

studentsïŒŒă€€and predetermine the ability groups based on their test scores on Test Set A

Adetailed description of how these groups were decided is presented in Chapter 5

No change was made in predetermining Group B since the university students who

worked on Test Set B had averaged 163ïŒŒă€€which showed that the test takers who were

given Test Set B were advanced leamers who are at the same ability level as the

60

東äșŹć€–ć›œèȘžć€§ć­Š ćšćŁ«ć­Šäœè«–æ–‡ Doctoral thesis (Tokyo University of Foreign Studies)

Page 16: CHAPTER 4 RESEARCH DESIGN - repository.tufs.ac.jp

reading ability expected to correctly respond to the test items in Test Set B

432Statistical Proc
edures

     Three stati stical procedures were taken in order to analyze the data collected

4321Descriptive Statistics

     For each test setïŒŒă€€mean and standard deviation were calculatedïŒŽă€€KR20 was

used to estimate the intemal consistency of each test set to ensure its reliability in

measuring students’reading abilityïŒŽă€€For the purpose of test validationïŒŒă€€the facility

valuepercentage correctand discrimination indexpointbiserial correlation

calculated using Classical Test Theory by ITEMANă€€ïŒˆAssessment Systems

Corporationwas also provided

4322Factorlnalytic Studies

     In an attempt to come up with a test item specification that effectively

operationalizes different reading performances to be testedïŒŒă€€the present study

proposes that the‘‘question typeう’of a test item could be a prime component to

constitute such a frameworkïŒŽă€€In order to identifă‚·the componentsïŒŒă€€or factorsïŒŒă€€that

constitute L2 reading performancesïŒŒéŁŽctor analyses are done fbr the collected data in

each Test SetïŒŽă€€The nature ofthe factors generated is consulted qualitatively

   Fullinformation factor analysis was applied in factor analytic studies of both test

sets via TESTFACT 2Scientific Software IntemationalïŒ‰ïŒŽă€€Although some problems

are pointed out in using traditional factor analysis methods with binary dataieïŒŽă€€items

that are scored dichotomously by judging right or wrongïŒ‰ïŒŒă€€fUllinformation factor

analysis has been evaluated to accommodate such circumstancesNegishi 1996Bock

1984

4322」rtem A n alyses

     To discover which facets of a reading test item would allow the writers of test

items to predetermine the diffriculty of a test itemïŒŒă€€the present study investigates the

61

東äșŹć€–ć›œèȘžć€§ć­Š ćšćŁ«ć­Šäœè«–æ–‡ Doctoral thesis (Tokyo University of Foreign Studies)

Page 17: CHAPTER 4 RESEARCH DESIGN - repository.tufs.ac.jp

possibility of a link between the item difficulty of a test item and its question type

For this purposeïŒŒă€€test items are analyzed by consulting their item difficulty indices

calculated via Rasch Analysis using RASCALAssessment Systems Corporationin

relation with question typeïŒŽă€€Other information in the final parameter estimates as

well as a raw score conversion tableïŒŒă€€an item by person distribution mapïŒŒă€€a test

characteristic curveïŒŒă€€and a test information curve are provided in this section of

analysis

62

東äșŹć€–ć›œèȘžć€§ć­Š ćšćŁ«ć­Šäœè«–æ–‡ Doctoral thesis (Tokyo University of Foreign Studies)