Automatic Generation of Verbal Analogy Items
-
Upload
serina-holder -
Category
Documents
-
view
29 -
download
2
description
Transcript of Automatic Generation of Verbal Analogy Items
Automatic Generation of Verbal Analogy Items
Alan D. MeadIllinois Institute of Technology
AIG in employment testing
• Rise of unproctored Internet testing (UIT)• UIT may cause many security problems– One is item theft and coaching
• Solution: Generate entire test from scratch for each examinee– Item theft less of a problem– Coaching less effective– Items could be “watermarked”
• Also reduces cost and speeds deployment
AIG in employment testing (cont.)
• Need a variety of test content– Verbal analogies– Vocabulary– Math– Perceptual speed and accuracy– Spatial ability– Personality– Situational Judgment– Etc.
Verbal AnalogiesShovel:Diga) Bag:Buyb) Baby:Cryc) Fork:Eatd) Car:Stop
Shovel:Dig::Forka) Buyb) Cryc) Eatd) Stop
• Identify a “bridge”; you DIG with a SHOVEL• Find a matching answer; you EAT with a FORK
Pair responses Word Responses
Generating Verbal Analogies
• Identified database of relationships (e.g., “RIDER operates a BIKE”)
• Identified additional bridge relationships (“BOVINE means COW-like” & “ABSENT is the opposite of PRESENT”)
• Gathered data on word frequency and (part of this study) word familiarity
Generating Verbal Analogies (cont.)
1. Randomly select a bridge2. Randomly select TWO pairs for this bridge
(one for the stem, one for the key)3. Randomly select 2-3 additional pairs from
other bridges4. Randomly assign key pair; fill in remaining
pairs
Sample Items
1. paternal:father:: ?a. juvenile:childb. microphone:soundc. chalk:writerd. unfold:fold
3. rocket:astronaut:: ?a. lamp:lightb. stick:skating rinkc. jet:pilotd. demand:supply
Alternative format
1. paternal:father:: juvenile:?a. childb. soundc. writerd. fold
3. rocket:astronaut::jet:?a. lightb. skating rinkc. pilotd. supply
Keys1. paternal:father:: ?[Bridge: FATHER is described by PATERNAL]a. juvenile:child ***b. microphone:sound (unrelated: sound is a (typical) theme of microphone)c. chalk:writer (unrelated: writer is a (typical) agent of chalk)d. unfold:fold (unrelated: unfold and fold are opposites/opposed)
3. rocket:astronaut:: ?[Bridge: ASTRONAUT operates ROCKET]a. lamp:light (unrelated: lamp is a (typical) result of light)b. stick:skating_rink (unrelated: skating_rink is a (typical) location of stick)c. jet:pilot ***d. demand:supply (unrelated: supply and demand are opposites/opposed)
Present Study
• H1: Two forms of AIG analogies (word responses and pair responses) will have comparable reliability & validity
• H2: AIG scales will have reliability comparable to manually-written scale
• H3: AIG scales will have construct and criterion validity comparable to manually-written scale
Method
• Sample of N=251 gathered online and from psychology classes
• Measures: – n=20 AIG & human-written verbal analogy scales – N=40 vocabulary– Self-reported performance at work & school
Feasibility
• Manually examined items for feasibility• 40/64 (63%) items were feasible• Reasons for infeasibility– Over-use of a bridge or a pair (some bridges have
few pairs)– Ambiguous pairs (drum:drum?)– Foil inadvertently a correct key
Results for H1 Variable Mean SD n 1 2 3 4
1 Vocabulary 0.75 0.14 40 (0.86) 0.66 0.66 0.69
2 Human-written items 0.65 0.14 20 0.46 (0.57) 0.97 1.04
3 AIG items with pairs responses 0.73 0.16 20 0.52 0.63 (0.73) 0.94
4 AIG items with word responses 0.81 0.14 19 0.54 0.67 0.68 (0.72)
5 Self-Rated Performance 3.72 0.61 6 -0.04 -0.01 0.05 0.10
6 Academic Performance 0.02 0.72 3 0.14 0.22 0.20 0.14
H1: Two forms of AIG analogies (word responses and pair responses) will have comparable reliability & validity CONFIRMED
Results for H2 Variable Mean SD n 1 2 3 4
1 Vocabulary 0.75 0.14 40 (0.86) 0.66 0.66 0.69
2 Human-written items 0.65 0.14 20 0.46 (0.57) 0.97 1.04
3 AIG items with pairs responses 0.73 0.16 20 0.52 0.63 (0.73) 0.94
4 AIG items with word responses 0.81 0.14 19 0.54 0.67 0.68 (0.72)
5 Self-Rated Performance 3.72 0.61 6 -0.04 -0.01 0.05 0.10
6 Academic Performance 0.02 0.72 3 0.14 0.22 0.20 0.14
H2: AIG scales will have reliability comparable to manually-written scale NOT CONFIRMED because the AIG scales had better reliability
Results for H3 Variable Mean SD n 1 2 3 4
1 Vocabulary 0.75 0.14 40 (0.86) 0.66 0.66 0.69
2 Human-written items 0.65 0.14 20 0.46 (0.57) 0.97 1.04
3 AIG items with pairs responses 0.73 0.16 20 0.52 0.63 (0.73) 0.94
4 AIG items with word responses 0.81 0.14 19 0.54 0.67 0.68 (0.72)
5 Self-Rated Performance 3.72 0.61 6 -0.04 -0.01 0.05 0.10
6 Academic Performance 0.02 0.72 3 0.14 0.22 0.20 0.14
H3: AIG scales will have construct and criterion validity comparable to manually-written scaleCONFIRMED
Predicting Item DifficultyPredictor Correlation
Automatically generated (1) or manually written (0) 0.28*
Familiarity of least familiar word in item 0.33*
Familiarity of second least familiar word in item 0.39**
Mean familiarity of all words in item 0.37**
Lowest log(count(word)) 0.14
Second lowest log(count(word)) -0.06
Mean log(count(word)) 0.17
Future Directions
• Better handling of senses (DRUM is for DRUMMING)
• Better difficulty calculations based on larger sample of items
• Automated feasibility checking• Enhanced database of relationships• Choosing foils to have more semantic
similarity to other words