The Ins and Outs of making a Wordlist Rob Waring Notre Dame Seishin University
-
Upload
rosaline-nash -
Category
Documents
-
view
217 -
download
0
Transcript of The Ins and Outs of making a Wordlist Rob Waring Notre Dame Seishin University
The Ins and Outs of making a Wordlist
Rob WaringNotre Dame Seishin Universitywww.robwaring.org/presentations/
Overview
Purpose - What kind of list?List structureSelection factorsDefinitions or translations?MechanicsValidating
Android too
Black – in levelRed out of listRed underline – out of levelGreen – ignored words
What kind of list - Purpose?
• To give to students to learn from (paper or digital)• To analyze texts against e.g. a graded reader• To cover the majority of words in a given field (e.g. top 1000
business words)• Master list to source sublists from?• Multiple level lists, or one list?• For a single class – or general (e.g. all natives, all intermediates)• Spoken, written, mixed?• For a specific audience?
– TOEIC, business, academic– A certain age– A certain level (intermediates)
What kind of list - Starting point?
• Use existing wordlists – GSL, Nation’s BNC lists, NGSL, NAWL ...• Use existing corpus (e.g. BNC, COCA) and dig out what you want• Create your own corpus (business, TOEIC, nursing)
• Does it suit your purpose? Will BNC give you an academic list?• Is it structured the way you want? Headwords only? Lemmas?
Mixture?
BNC raw (by type) BNC Nation Family list
List structure I - List with Levels
• How many levels? Why?• What are the breaks between levels? Will learners get from
one level to another with ease?• Will the breaks be even (say 560 words each) or vary?• Level by frequency? utility? range? intuition? Learnability?
Selection Criteria IRepresentativeness: The list should adequately represent the wide range of uses of languageFrequency and range: A word should occur frequently across a wide range of texts. Word families: Sensible set of criteria regarding what forms and uses are counted as being members of the same familyUtility: how useful will the words be to the target learnersIdioms and set expressions: Some items larger than a word behave like high frequency words
Selection Criteria IILearnability: how easy to learn? Related words may be easierRegularity: regular forms are easier than irregular forms, but some derivatives operate differently within a family. Excuse inexcusableCoverage: (it is not efficient to be able to express the same idea in different ways. It is more efficient to learn a word that covers a quite different idea)Stylistic level and emotional words: West saw second language learners as initially needing neutral vocabularyIntuition: how well does it match the teacher’s sense of what to include
Which of these would you put in your list?
out ofper centsuch asof coursefor examplein front ofall rightas soon asin generalin addition tonext toon top ofinstead ofin charge ofjust aboutprovided thatas good aswith a view to
in betweenby and largeat randomper seold fashionedgrown upmatter of factsq mfait accomplistraight forwardhabeas corpusself-samehaute cuisinea good deallaissez fairepersona non grata
How frequently do lexical phrases occur (BNC)?Raw Rank Word Per million
words177 out of 490222 per cent 382272 such as 321285 of course 309378 for example 2381538 in front of 651725 all right 582159 as soon as 472491 in general 412970 in addition to 343307 next to 303755 on top of 264378 instead of 215409 in charge of 175987 just about 157396 provided that 117885 as good as 109125 with a view to 8
Raw Rank Word Per million words
11459 in between 613507 by and large 514369 at random 416684 per se 419505 old fashioned 322060 grown up 228441 matter of fact 243572 sq m 148241 fait accompli 151717 straight forward 158511 habeas corpus 174321 self-same 076170 haute cuisine 082928 a good deal 083882 laissez faire 089371 persona non grata 0
Selection criteria – a new headword or in the family?
• Only mega-headwords (that cover all meaning senses)• Inflections only? - Plurals, verb forms, -er –est adjectives. Keep them all
together? If not where do low frequency derivatives go?– USE uses using used user users useful useless usefulness usefully usable misused
misuse misusing misuses misuser misusers uselessness uselessly unused usability reuse reuses reused reusing unusable
• Derivatives in the family or as a new headword?– interest, interesting, interested, disinterested, interestingly
• Polygraphs with different meaning senses – book, bank, bat, bill• Nuances – a brain, to brain someone• Phrasal verbs – bring down, bring back, bring up, bring over• Compound words – handbag, policeman, airflow, birdwatching• Multi-word units? – traffic light, lunch box, all right, by and large
Selection – where to put derivatives?
Level 1: A different form is a different word. Capitalization is ignored.Level 2: Regularly inflected words are part of the same family. The inflectional categories are - plural; third person singular present tense; past tense; past participle; -ing; comparative; superlative; possessive.Level 3: -able, -er, -ish, -less, -ly, -ness, -th, -y, non-, un-, all with restricted uses.Level 4: -al, -ation, -ess, -ful, -ism, -ist, -ity, -ize, -ment, -ous, in-, all with restricted uses.Level 5: -age (leakage), -al (arrival), -ally (idiotically), -an (American), -ance (clearance), -ant (consultant), -ary (revolutionary), -atory (confirmatory), -dom (kingdom; officialdom), -eer (black marketeer), -en (wooden), -en (widen), -ence (emergence), -ent (absorbent), -ery (bakery; trickery), -ese (Japanese; officialese), -esque (picturesque), -ette (usherette; roomette), -hood (childhood), -i (Israeli), -ian (phonetician; Johnsonian), -ite (Paisleyite; also chemical meaning), -let (coverlet), -ling (duckling), -ly (leisurely), -most (topmost), -ory (contradictory), -ship (studentship), -ward (homeward), -ways (crossways), -wise (endwise; discussion-wise), ante- (anteroom), anti- (anti-inflation), arch- (archbishop), bi- (biplane), circum- (circumnavigate), counter- (counter-attack), en- (encage; enslave), ex- (ex-president), fore- (forename), hyper- (hyperactive), inter- (inter- African, interweave), mid- (mid-week), mis- (misfit), neo- (neo-colonialism), post- (post-date), pro- (pro-British), semi- (semi-automatic), sub- (subclassify; subterranean), un- (untie; unburden).Level 6: -able, -ee, -ic, -ify, -ion, -ist, -ition, -ive, -th, -y, pre-, re-.Level 7: Classical roots and affixes.
Selection Criteria - How will you deal with … I
• Proper nouns: SONY, Dave, Jackson, Thomson, Paris, London• Proper nouns that are words - Bell, Sue, Jack, Nation, Mark• Numbers: 1, one, thirty, twenty-seven, thousand, billion• Acronyms – NATO, DNA, UN, NSA, DARPA, • Dialectal differences (e.g. US vs UK spelling)• Multi-word units – post office, train station, city hall, • Closed lexical sets such as days of the week, months etc.• Typos – mispelings, heros, amatur, arguement, bellweather• Incomplete words – travelin’, roarin’, ‘cept• Slang forms – gonna, wanna, nuffink, wassup
Selection Criteria - How will you deal with … II
• Offensive words – pooh, shit, crap, bugger, bastard, fart,• Culturally loaded words – temple vs. church, hijab, sporran• Non-pc words – stewardess, waitress, negro, retarded, stupid• NCLB words - beer, alcohol, drugs, tobacco, smoking,• Archaic words – thou, thee, thine, groovy, gay, • Prototypical sets – words often taught in sets
– foods - pizza, apple, cake, bread, salt, tomato, zucchini, eggplant, capsicum– drinks – coffee, tea, juice, water, cola, mojito, screwdriver, bloody Mary– buildings – office, station, hotel, city hall, auditorium, ice rink– shops – supermarket, mall, barber, stationer, grocer– colors – red, blue, green, yellow, pink, violet, scarlet, puce
Definitions - What aspects of word knowledge to include?
• Definition• POS – how detailed do you want to be?• Translations – how will you deal with translators who disagree?• Example sentence – authentic, contrived?• Usage notes – which ones?• Synonyms• Anyonyms• Distractors? (for online test auto-create software)
Definitions - style
What style?e.g. Apple
synonym fruitshort definition hard red or green fruitlong definition the fleshy usually rounded red,
yellow or green edible fruit of a usually
cultivated tree (genus Malus) of therose family
Use of a defining vocabulary list? Which one? Which words?
Mechanics
• Word? Excel?• Specialized database software such as Access or Filemaker?• Versions. Is it important to know which version of your
wordlist was given to which users?• Do you have the time and patience?• SERIOUSLY. Do you have the time and patience?
Validating your wordlist
• How will you evaluate the list’s integrity?• How will you check if you missed words?• How will you check mis-levelled words?• How will you check consistency of definitions, examples,
translations?