Article Analysis - Language Testing

14
In the Name of God Article Analyzing By: Masoud Dolatshahi MA Student in TEFL University of Guilan 1394-2

Transcript of Article Analysis - Language Testing

Page 1: Article Analysis - Language Testing

In the Name of God

Article Analyzing By: Masoud Dolatshahi MA Student in TEFLUniversity of Guilan

1394-2

Page 2: Article Analysis - Language Testing

Detecting Gender DIF with an English Proficiency Test in EFL Context

Seyed Mohammad Reza Amirian, Seyed Mohammad Alavi, Angel M. Fidalgo

Differential Item Functioning:

also referred to as measurement bias, occurs when people from different groups (commonly gender or ethnicity) with the same latent trait (ability/skill) have a different probability of giving a certain response on a questionnaire or test.[1] DIF analysis provides an indication of unexpected behavior of items on a test. An item does not display DIF if people from different groups have a different probability to give a certain response; it displays DIF if and only if people from different groups with the same underlying true ability have a different probability of giving a certain response. Common procedures for assessing DIF are Mantel-Haenszel, item response theory (IRT) based methods, and logistic regression

Page 3: Article Analysis - Language Testing

Why Gender Discrimination ?

• Test fairness is an issue of utmost importance in language testing which is closely related to test validity and test validation.

Page 4: Article Analysis - Language Testing

Methods (spss)

• Mantel-Haensze : to compare two groups of data which allows the comparison of two groups on a dichotomous/categorical response.

It is used when the effect of the explanatory variable on the response variable is influenced by covariates that can be controlled. It is often used in observational studies where random assignment of subjects to different treatments cannot be controlled, but

influencing covariates can.

• Logistic Regression : binary dependent variables—that is, where it can take only two values, such as pass/fail, win/lose, alive/dead or healthy/sick. Cases with more than two categories are referred to as multinomial logistic regression, or, if the multiple categories are ordered, as ordinal logistic regression.[2]

Page 5: Article Analysis - Language Testing

Participants

1- The data for the present study were gathered from 1550 test takers who took UTEPT in 2010. The sample was divided into a reference group of 899 male and a focal group of 651 female test takers with various age ranges.2- All PhD candidates

Page 6: Article Analysis - Language Testing

Statistics

After generating the data, the descriptive statistics was computed using SPSS. The results indicated a slight difference between the mean score of male (M= 50.96, SD= 12.70) and female (M=50.61, SD= 13.14) examinees. The effect size of mean difference is estimated small according to Cohen's (1988) test (d= .027) .The reliability of the data was also estimated 0.88 using Cronbach's alpha which indicates that the test enjoys high reliability.

Page 7: Article Analysis - Language Testing

Results• The aim: to check the comparability of MH and LR DIF findings, and to

examine the content of DIF items for potential sources of linguistic bias. • 100 items in UTEPT : 31 items (31%) are flagged with MH gender DIF

Page 8: Article Analysis - Language Testing

Logistic Regression

29 items (29%) showed gender DIF

Page 9: Article Analysis - Language Testing

Comparison of two methods

Page 10: Article Analysis - Language Testing

GrammarItem 1. Having been ……………….. the prize, the professor continued working hard on his project.A. awarded B. awardC. awarding D. the award

This item shows moderate DIF magnitude in favor of female test takers. Content experts believed there is no clue in such a short grammar item as to why this item is working to the advantage of females.

Item 26. Without regular supplies of some hormones, our capacity to behave would be seriously impaired; without others we would soon die. Tiny amount of some hormones can….. ……our moods and our actions.A. Modification B. ModifyingC. Modify D. Modified

This item was shown to advantage examinees in the female group by both MH and LR methods. The expert judges believed that since females show more interest in topics such as human biology, they are systematically favored by this item. One of the judges also pointed out that words such as "behave and mood" in the item indicate social interactions which is a topic of women's interest.*?

Page 11: Article Analysis - Language Testing

Vocabulary Items - 9 item

Item 38. Analytic tools enable one to get at the most fundamental logic of any discipline.A. enforced B. essentialC. established D. escorted

Reading Comprehension - Out of 35 items in this section, only six items

Item92. According to the passage, which of the following was one of the distinguishingcharacteristics of Impressionist painting?A. The emphasis on people rather than nature scenesB. The way the subjects were presented from multiple anglesC. The focus on small solid objectsD. The depiction of the effects of light and color

Page 12: Article Analysis - Language Testing

DiscussionMH and LR method in DIF show close correlation

Items favor females belong to the grammar section (14 items) and only 3 items belong to the reading section while no item in vocabulary section is in favor of females.

Most items that favor males come from the vocabulary section (10 items)1 grammar and 3 reading comprehension

Page 13: Article Analysis - Language Testing

Ethical conclusion

This finding indicates that the developers or users of UTEPT should be aware of the test-takers for whom the test is intended.For example, high scores on the grammar subtest may not provide sufficient information to permit inferences about a female test taker’s overall English ability since it showed DIF in advantage of females.

Page 14: Article Analysis - Language Testing

Slide Title