Automatic Generation of Verbal Analogy Items

18
Automatic Generation of Verbal Analogy Items Alan D. Mead Illinois Institute of Technology

description

Automatic Generation of Verbal Analogy Items. Alan D. Mead Illinois Institute of Technology. AIG in employment testing. Rise of unproctored Internet testing (UIT) UIT may cause many security problems One is item theft and coaching - PowerPoint PPT Presentation

Transcript of Automatic Generation of Verbal Analogy Items

Page 1: Automatic Generation of Verbal Analogy Items

Automatic Generation of Verbal Analogy Items

Alan D. MeadIllinois Institute of Technology

Page 2: Automatic Generation of Verbal Analogy Items

AIG in employment testing

• Rise of unproctored Internet testing (UIT)• UIT may cause many security problems– One is item theft and coaching

• Solution: Generate entire test from scratch for each examinee– Item theft less of a problem– Coaching less effective– Items could be “watermarked”

• Also reduces cost and speeds deployment

Page 3: Automatic Generation of Verbal Analogy Items

AIG in employment testing (cont.)

• Need a variety of test content– Verbal analogies– Vocabulary– Math– Perceptual speed and accuracy– Spatial ability– Personality– Situational Judgment– Etc.

Page 4: Automatic Generation of Verbal Analogy Items

Verbal AnalogiesShovel:Diga) Bag:Buyb) Baby:Cryc) Fork:Eatd) Car:Stop

Shovel:Dig::Forka) Buyb) Cryc) Eatd) Stop

• Identify a “bridge”; you DIG with a SHOVEL• Find a matching answer; you EAT with a FORK

Pair responses Word Responses

Page 5: Automatic Generation of Verbal Analogy Items

Generating Verbal Analogies

• Identified database of relationships (e.g., “RIDER operates a BIKE”)

• Identified additional bridge relationships (“BOVINE means COW-like” & “ABSENT is the opposite of PRESENT”)

• Gathered data on word frequency and (part of this study) word familiarity

Page 6: Automatic Generation of Verbal Analogy Items

Generating Verbal Analogies (cont.)

1. Randomly select a bridge2. Randomly select TWO pairs for this bridge

(one for the stem, one for the key)3. Randomly select 2-3 additional pairs from

other bridges4. Randomly assign key pair; fill in remaining

pairs

Page 7: Automatic Generation of Verbal Analogy Items

Sample Items

1. paternal:father:: ?a. juvenile:childb. microphone:soundc. chalk:writerd. unfold:fold

3. rocket:astronaut:: ?a. lamp:lightb. stick:skating rinkc. jet:pilotd. demand:supply

Page 8: Automatic Generation of Verbal Analogy Items

Alternative format

1. paternal:father:: juvenile:?a. childb. soundc. writerd. fold

3. rocket:astronaut::jet:?a. lightb. skating rinkc. pilotd. supply

Page 9: Automatic Generation of Verbal Analogy Items

Keys1. paternal:father:: ?[Bridge: FATHER is described by PATERNAL]a. juvenile:child ***b. microphone:sound (unrelated: sound is a (typical) theme of microphone)c. chalk:writer (unrelated: writer is a (typical) agent of chalk)d. unfold:fold (unrelated: unfold and fold are opposites/opposed)

3. rocket:astronaut:: ?[Bridge: ASTRONAUT operates ROCKET]a. lamp:light (unrelated: lamp is a (typical) result of light)b. stick:skating_rink (unrelated: skating_rink is a (typical) location of stick)c. jet:pilot ***d. demand:supply (unrelated: supply and demand are opposites/opposed)

Page 10: Automatic Generation of Verbal Analogy Items

Present Study

• H1: Two forms of AIG analogies (word responses and pair responses) will have comparable reliability & validity

• H2: AIG scales will have reliability comparable to manually-written scale

• H3: AIG scales will have construct and criterion validity comparable to manually-written scale

Page 11: Automatic Generation of Verbal Analogy Items

Method

• Sample of N=251 gathered online and from psychology classes

• Measures: – n=20 AIG & human-written verbal analogy scales – N=40 vocabulary– Self-reported performance at work & school

Page 12: Automatic Generation of Verbal Analogy Items

Feasibility

• Manually examined items for feasibility• 40/64 (63%) items were feasible• Reasons for infeasibility– Over-use of a bridge or a pair (some bridges have

few pairs)– Ambiguous pairs (drum:drum?)– Foil inadvertently a correct key

Page 13: Automatic Generation of Verbal Analogy Items

Results for H1 Variable Mean SD n 1 2 3 4

1 Vocabulary 0.75 0.14 40 (0.86) 0.66 0.66 0.69

2 Human-written items 0.65 0.14 20 0.46 (0.57) 0.97 1.04

3 AIG items with pairs responses 0.73 0.16 20 0.52 0.63 (0.73) 0.94

4 AIG items with word responses 0.81 0.14 19 0.54 0.67 0.68 (0.72)

5 Self-Rated Performance 3.72 0.61 6 -0.04 -0.01 0.05 0.10

6 Academic Performance 0.02 0.72 3 0.14 0.22 0.20 0.14

H1: Two forms of AIG analogies (word responses and pair responses) will have comparable reliability & validity CONFIRMED

Page 14: Automatic Generation of Verbal Analogy Items

Results for H2 Variable Mean SD n 1 2 3 4

1 Vocabulary 0.75 0.14 40 (0.86) 0.66 0.66 0.69

2 Human-written items 0.65 0.14 20 0.46 (0.57) 0.97 1.04

3 AIG items with pairs responses 0.73 0.16 20 0.52 0.63 (0.73) 0.94

4 AIG items with word responses 0.81 0.14 19 0.54 0.67 0.68 (0.72)

5 Self-Rated Performance 3.72 0.61 6 -0.04 -0.01 0.05 0.10

6 Academic Performance 0.02 0.72 3 0.14 0.22 0.20 0.14

H2: AIG scales will have reliability comparable to manually-written scale NOT CONFIRMED because the AIG scales had better reliability

Page 15: Automatic Generation of Verbal Analogy Items

Results for H3 Variable Mean SD n 1 2 3 4

1 Vocabulary 0.75 0.14 40 (0.86) 0.66 0.66 0.69

2 Human-written items 0.65 0.14 20 0.46 (0.57) 0.97 1.04

3 AIG items with pairs responses 0.73 0.16 20 0.52 0.63 (0.73) 0.94

4 AIG items with word responses 0.81 0.14 19 0.54 0.67 0.68 (0.72)

5 Self-Rated Performance 3.72 0.61 6 -0.04 -0.01 0.05 0.10

6 Academic Performance 0.02 0.72 3 0.14 0.22 0.20 0.14

H3: AIG scales will have construct and criterion validity comparable to manually-written scaleCONFIRMED

Page 16: Automatic Generation of Verbal Analogy Items

Predicting Item DifficultyPredictor Correlation

Automatically generated (1) or manually written (0) 0.28*

Familiarity of least familiar word in item 0.33*

Familiarity of second least familiar word in item 0.39**

Mean familiarity of all words in item 0.37**

Lowest log(count(word)) 0.14

Second lowest log(count(word)) -0.06

Mean log(count(word)) 0.17

Page 17: Automatic Generation of Verbal Analogy Items

Future Directions

• Better handling of senses (DRUM is for DRUMMING)

• Better difficulty calculations based on larger sample of items

• Automated feasibility checking• Enhanced database of relationships• Choosing foils to have more semantic

similarity to other words