The Ins and Outs of making a Wordlist Rob Waring Notre Dame Seishin University

23
The Ins and Outs of making a Wordlist Rob Waring Notre Dame Seishin University www.robwaring.org/presentatio ns/

Transcript of The Ins and Outs of making a Wordlist Rob Waring Notre Dame Seishin University

Page 1: The Ins and Outs of making a Wordlist Rob Waring Notre Dame Seishin University

The Ins and Outs of making a Wordlist

Rob WaringNotre Dame Seishin Universitywww.robwaring.org/presentations/

Page 2: The Ins and Outs of making a Wordlist Rob Waring Notre Dame Seishin University

Overview

Purpose - What kind of list?List structureSelection factorsDefinitions or translations?MechanicsValidating

Page 3: The Ins and Outs of making a Wordlist Rob Waring Notre Dame Seishin University
Page 4: The Ins and Outs of making a Wordlist Rob Waring Notre Dame Seishin University
Page 5: The Ins and Outs of making a Wordlist Rob Waring Notre Dame Seishin University

Android too

Page 6: The Ins and Outs of making a Wordlist Rob Waring Notre Dame Seishin University

Black – in levelRed out of listRed underline – out of levelGreen – ignored words

Page 7: The Ins and Outs of making a Wordlist Rob Waring Notre Dame Seishin University

What kind of list - Purpose?

• To give to students to learn from (paper or digital)• To analyze texts against e.g. a graded reader• To cover the majority of words in a given field (e.g. top 1000

business words)• Master list to source sublists from?• Multiple level lists, or one list?• For a single class – or general (e.g. all natives, all intermediates)• Spoken, written, mixed?• For a specific audience?

– TOEIC, business, academic– A certain age– A certain level (intermediates)

Page 8: The Ins and Outs of making a Wordlist Rob Waring Notre Dame Seishin University

What kind of list - Starting point?

• Use existing wordlists – GSL, Nation’s BNC lists, NGSL, NAWL ...• Use existing corpus (e.g. BNC, COCA) and dig out what you want• Create your own corpus (business, TOEIC, nursing)

• Does it suit your purpose? Will BNC give you an academic list?• Is it structured the way you want? Headwords only? Lemmas?

Mixture?

Page 9: The Ins and Outs of making a Wordlist Rob Waring Notre Dame Seishin University

BNC raw (by type) BNC Nation Family list

Page 10: The Ins and Outs of making a Wordlist Rob Waring Notre Dame Seishin University

List structure I - List with Levels

• How many levels? Why?• What are the breaks between levels? Will learners get from

one level to another with ease?• Will the breaks be even (say 560 words each) or vary?• Level by frequency? utility? range? intuition? Learnability?

Page 11: The Ins and Outs of making a Wordlist Rob Waring Notre Dame Seishin University

Selection Criteria IRepresentativeness: The list should adequately represent the wide range of uses of languageFrequency and range: A word should occur frequently across a wide range of texts. Word families: Sensible set of criteria regarding what forms and uses are counted as being members of the same familyUtility: how useful will the words be to the target learnersIdioms and set expressions: Some items larger than a word behave like high frequency words

Page 12: The Ins and Outs of making a Wordlist Rob Waring Notre Dame Seishin University

Selection Criteria IILearnability: how easy to learn? Related words may be easierRegularity: regular forms are easier than irregular forms, but some derivatives operate differently within a family. Excuse inexcusableCoverage: (it is not efficient to be able to express the same idea in different ways. It is more efficient to learn a word that covers a quite different idea)Stylistic level and emotional words: West saw second language learners as initially needing neutral vocabularyIntuition: how well does it match the teacher’s sense of what to include

Page 13: The Ins and Outs of making a Wordlist Rob Waring Notre Dame Seishin University

Which of these would you put in your list?

out ofper centsuch asof coursefor examplein front ofall rightas soon asin generalin addition tonext toon top ofinstead ofin charge ofjust aboutprovided thatas good aswith a view to

in betweenby and largeat randomper seold fashionedgrown upmatter of factsq mfait accomplistraight forwardhabeas corpusself-samehaute cuisinea good deallaissez fairepersona non grata

Page 14: The Ins and Outs of making a Wordlist Rob Waring Notre Dame Seishin University

How frequently do lexical phrases occur (BNC)?Raw Rank Word Per million

words177 out of 490222 per cent 382272 such as 321285 of course 309378 for example 2381538 in front of 651725 all right 582159 as soon as 472491 in general 412970 in addition to 343307 next to 303755 on top of 264378 instead of 215409 in charge of 175987 just about 157396 provided that 117885 as good as 109125 with a view to 8

Raw Rank Word Per million words

11459 in between 613507 by and large 514369 at random 416684 per se 419505 old fashioned 322060 grown up 228441 matter of fact 243572 sq m 148241 fait accompli 151717 straight forward 158511 habeas corpus 174321 self-same 076170 haute cuisine 082928 a good deal 083882 laissez faire 089371 persona non grata 0

Page 15: The Ins and Outs of making a Wordlist Rob Waring Notre Dame Seishin University

Selection criteria – a new headword or in the family?

• Only mega-headwords (that cover all meaning senses)• Inflections only? - Plurals, verb forms, -er –est adjectives. Keep them all

together? If not where do low frequency derivatives go?– USE uses using used user users useful useless usefulness usefully usable misused

misuse misusing misuses misuser misusers uselessness uselessly unused usability reuse reuses reused reusing unusable

• Derivatives in the family or as a new headword?– interest, interesting, interested, disinterested, interestingly

• Polygraphs with different meaning senses – book, bank, bat, bill• Nuances – a brain, to brain someone• Phrasal verbs – bring down, bring back, bring up, bring over• Compound words – handbag, policeman, airflow, birdwatching• Multi-word units? – traffic light, lunch box, all right, by and large

Page 16: The Ins and Outs of making a Wordlist Rob Waring Notre Dame Seishin University

Selection – where to put derivatives?

Level 1: A different form is a different word. Capitalization is ignored.Level 2: Regularly inflected words are part of the same family. The inflectional categories are - plural; third person singular present tense; past tense; past participle; -ing; comparative; superlative; possessive.Level 3: -able, -er, -ish, -less, -ly, -ness, -th, -y, non-, un-, all with restricted uses.Level 4: -al, -ation, -ess, -ful, -ism, -ist, -ity, -ize, -ment, -ous, in-, all with restricted uses.Level 5: -age (leakage), -al (arrival), -ally (idiotically), -an (American), -ance (clearance), -ant (consultant), -ary (revolutionary), -atory (confirmatory), -dom (kingdom; officialdom), -eer (black marketeer), -en (wooden), -en (widen), -ence (emergence), -ent (absorbent), -ery (bakery; trickery), -ese (Japanese; officialese), -esque (picturesque), -ette (usherette; roomette), -hood (childhood), -i (Israeli), -ian (phonetician; Johnsonian), -ite (Paisleyite; also chemical meaning), -let (coverlet), -ling (duckling), -ly (leisurely), -most (topmost), -ory (contradictory), -ship (studentship), -ward (homeward), -ways (crossways), -wise (endwise; discussion-wise), ante- (anteroom), anti- (anti-inflation), arch- (archbishop), bi- (biplane), circum- (circumnavigate), counter- (counter-attack), en- (encage; enslave), ex- (ex-president), fore- (forename), hyper- (hyperactive), inter- (inter- African, interweave), mid- (mid-week), mis- (misfit), neo- (neo-colonialism), post- (post-date), pro- (pro-British), semi- (semi-automatic), sub- (subclassify; subterranean), un- (untie; unburden).Level 6: -able, -ee, -ic, -ify, -ion, -ist, -ition, -ive, -th, -y, pre-, re-.Level 7: Classical roots and affixes.

Page 17: The Ins and Outs of making a Wordlist Rob Waring Notre Dame Seishin University

Selection Criteria - How will you deal with … I

• Proper nouns: SONY, Dave, Jackson, Thomson, Paris, London• Proper nouns that are words - Bell, Sue, Jack, Nation, Mark• Numbers: 1, one, thirty, twenty-seven, thousand, billion• Acronyms – NATO, DNA, UN, NSA, DARPA, • Dialectal differences (e.g. US vs UK spelling)• Multi-word units – post office, train station, city hall, • Closed lexical sets such as days of the week, months etc.• Typos – mispelings, heros, amatur, arguement, bellweather• Incomplete words – travelin’, roarin’, ‘cept• Slang forms – gonna, wanna, nuffink, wassup

Page 18: The Ins and Outs of making a Wordlist Rob Waring Notre Dame Seishin University

Selection Criteria - How will you deal with … II

• Offensive words – pooh, shit, crap, bugger, bastard, fart,• Culturally loaded words – temple vs. church, hijab, sporran• Non-pc words – stewardess, waitress, negro, retarded, stupid• NCLB words - beer, alcohol, drugs, tobacco, smoking,• Archaic words – thou, thee, thine, groovy, gay, • Prototypical sets – words often taught in sets

– foods - pizza, apple, cake, bread, salt, tomato, zucchini, eggplant, capsicum– drinks – coffee, tea, juice, water, cola, mojito, screwdriver, bloody Mary– buildings – office, station, hotel, city hall, auditorium, ice rink– shops – supermarket, mall, barber, stationer, grocer– colors – red, blue, green, yellow, pink, violet, scarlet, puce

Page 19: The Ins and Outs of making a Wordlist Rob Waring Notre Dame Seishin University

Definitions - What aspects of word knowledge to include?

• Definition• POS – how detailed do you want to be?• Translations – how will you deal with translators who disagree?• Example sentence – authentic, contrived?• Usage notes – which ones?• Synonyms• Anyonyms• Distractors? (for online test auto-create software)

Page 20: The Ins and Outs of making a Wordlist Rob Waring Notre Dame Seishin University

Definitions - style

What style?e.g. Apple

synonym fruitshort definition hard red or green fruitlong definition the fleshy usually rounded red,

yellow or green edible fruit of a usually

cultivated tree (genus Malus) of therose family

Use of a defining vocabulary list? Which one? Which words?

Page 21: The Ins and Outs of making a Wordlist Rob Waring Notre Dame Seishin University

Mechanics

• Word? Excel?• Specialized database software such as Access or Filemaker?• Versions. Is it important to know which version of your

wordlist was given to which users?• Do you have the time and patience?• SERIOUSLY. Do you have the time and patience?

Page 22: The Ins and Outs of making a Wordlist Rob Waring Notre Dame Seishin University

Validating your wordlist

• How will you evaluate the list’s integrity?• How will you check if you missed words?• How will you check mis-levelled words?• How will you check consistency of definitions, examples,

translations?

Page 23: The Ins and Outs of making a Wordlist Rob Waring Notre Dame Seishin University

And soooooo much more!

Questions?

If you want help [email protected]