Research and Practice in Applied...

Research and Practice in Applied Linguistics

General Editors: Christopher N. Candlin and David R. Hall, Linguistics Department, Macquarie University, Australia.

All books in this series are written by leading researchers and teachers in Applied Linguistics, with broad international experience. They are designed for the MA or PhD student in Applied Linguistics, TESOL or similar subject areas and for the language professional keen to extend their research experience.

Titles include:

Dick Allwright and Judith HanksTHE DEVELOPING LANGUAGE LEARNERAn Introduction to Exploratory Practice

Francesca Bargiela-Chiappini, Catherine Nickerson and Brigitte PlankenBUSINESS DISCOURSE

Alison Ferguson and Elizabeth ArmstrongRESEARCHING COMMUNICATION DISORDERS

Sandra Beatriz HaleCOMMUNITY INTERPRETING

Geoff HallLITERATURE IN LANGUAGE EDUCATION

Richard Kiely and Pauline Rea-DickinsPROGRAM EVALUATION IN LANGUAGE EDUCATION

Marie-Noëlle Lamy and Regine HampelONLINE COMMUNICATION IN LANGUAGE LEARNING AND TEACHING

Virginia Samuda and Martin BygateTASKS IN SECOND LANGUAGE LEARNING

Norbert SchmittRESEARCHING VOCABULARYA Vocabulary Research Manual

Helen Spencer-Oatey and Peter FranklinINTERCULTURAL INTERACTIONA Multidisciplinary Approach to Intercultural Communication

Cyril J. WeirLANGUAGE TESTING AND VALIDATION

Tony WrightCLASSROOM MANAGEMENT IN LANGUAGE EDUCATION

Forthcoming titles:

Anne Burns and Helen da Silva JoyceLITERACY

Lynn FlowerdewCORPORA AND LANGUAGE EDUCATION

9781403_985354_01_prexviii.indd i9781403_985354_01_prexviii.indd i 6/11/2010 1:15:09 PM6/11/2010 1:15:09 PM

Sandra Gollin and David R. HallLANGUAGE FOR SPECIFIC PURPOSES

Numa Markee and Susan GonzoMANAGING INNOVATION IN LANGUAGE TEACHING

Marilyn Martin-JonesBILINGUALISM

Martha PenningtonPRONUNCIATION

Annamaria PinterTEACHING ENGLISH TO YOUNG LEARNERS

Devon Woods and Emese BukorINSTRUCTIONAL STRATEGIES AND PROCESSES IN LANGUAGE EDUCATION

Research and Practice in Applied LinguisticsSeries Standing Order ISBN 978–1–4039–1184–1 hardcoverSeries Standing Order ISBN 978–1–4039–1185–8 paperback(outside North America only)

You can receive future titles in this series as they are published by placing a standing order. Please contact your bookseller or, in case of difficulty, write to us at the address below with your name and address, the title of the series and one of the ISBNs quoted above.

Customer Services Department, Macmillan Distribution Ltd, Houndmills, Basingstoke, Hampshire RG21 6XS, England

Also by Norbert Schmitt

WHY IS ENGLISH LIKE THAT? (with R. Marsden, 2006)

FOCUS ON VOCABULARY (with D. Schmitt, 2005)

FORMULAIC SEQUENCES: ACQUISITION, PROCESSING, AND USE (editor, 2004)

AN INTRODUCTION TO APPLIED LINGUISTICS 2nd edition (editor, 2010)

VOCABULARY IN LANGUAGE TEACHING (2000)

VOCABULARY: DESCRIPTION, ACQUISITION, AND PEDAGOGY (co-editor with M. McCarthy, 1997)

9781403_985354_01_prexviii.indd ii9781403_985354_01_prexviii.indd ii 6/11/2010 1:15:09 PM6/11/2010 1:15:09 PM

Researching VocabularyA Vocabulary Research Manual

Norbert SchmittUniversity of Nottingham, UK

9781403_985354_01_prexviii.indd iii9781403_985354_01_prexviii.indd iii 6/11/2010 1:15:10 PM6/11/2010 1:15:10 PM

© Norbert Schmitt 2010

All rights reserved. No reproduction, copy or transmission of this publication may be made without written permission.

No portion of this publication may be reproduced, copied or transmitted save with written permission or in accordance with the provisions of the Copyright, Designs and Patents Act 1988, or under the terms of any licence permitting limited copying issued by the Copyright Licensing Agency, Saffron House, 6-10 Kirby Street, London EC1N 8TS.

Any person who does any unauthorized act in relation to this publication may be liable to criminal prosecution and civil claims for damages.

The author has asserted his right to be identified as the author of this work in accordance with the Copyright, Designs and Patents Act 1988.

First published 2010 byPALGRAVE MACMILLAN

Palgrave Macmillan in the UK is an imprint of Macmillan Publishers Limited,registered in England, company number 785998, of Houndmills, Basingstoke, Hampshire RG21 6XS.

Palgrave Macmillan in the US is a division of St Martin’s Press LLC, 175 Fifth Avenue, New York, NY 10010.

Palgrave Macmillan is the global academic imprint of the above companies and has companies and representatives throughout the world.

Palgrave® and Macmillan® are registered trademarks in the United States,the United Kingdom, Europe and other countries.

ISBN: 978–1–4039–8535–4 hardbackISBN: 978–1–4039–8536–1 paperback

This book is printed on paper suitable for recycling and made from fully managed and sustained forest sources. Logging, pulping and manufacturing processes are expected to conform to the environmental regulations of the country of origin.

A catalogue record for this book is available from the British Library.

Library of Congress Cataloging-in-Publication Data

Schmitt, Norbert. Researching vocabulary : a vocabulary research manual / Norbert Schmitt. p. cm. —(Research and practice in applied linguistics) Includes bibliographical references and index. ISBN 978–1–4039–8536–1 (pbk. : alk. paper) –

ISBN 978–1–4039–8535–4 (alk. paper) 1. Language and languages—Study and teaching. 2. Vocabulary—Study

and teaching. 3. Second language acquisition. I. Title.

P53.9.S365 2010 418.007�2—dc22 2009046796

10 9 8 7 6 5 4 3 2 119 18 17 16 15 14 13 12 11 10

Printed and bound in Great Britain byCPI Antony Rowe, Chippenham and Eastbourne

9781403_985354_01_prexviii.indd iv9781403_985354_01_prexviii.indd iv 6/11/2010 1:15:10 PM6/11/2010 1:15:10 PM

Improve the WorldStart with Knowledge

9781403_985354_01_prexviii.indd v9781403_985354_01_prexviii.indd v 6/11/2010 1:15:11 PM6/11/2010 1:15:11 PM

9781403_985354_01_prexviii.indd vi9781403_985354_01_prexviii.indd vi 6/11/2010 1:15:11 PM6/11/2010 1:15:11 PM

vii

Contents

Quick Checklist xi

General Editors’ Preface xiii

Preface xiv

Acknowledgements xvi

Part 1 Overview of Vocabulary Issues

1 Vocabulary Use and Acquisition 3 1.1 Ten key issues 3 1.1.1 Vocabulary is an important component of

language use 3 1.1.2 A large vocabulary is required for

language use 6 1.1.3 Formulaic language is as important as

individual words 8 1.1.4 Corpus analysis is an important

research tool 12 1.1.5 Vocabulary knowledge is a rich and

complex construct 15 1.1.6 Vocabulary learning is incremental in

nature 19 1.1.7 Vocabulary attrition and long-term

retention 23 1.1.8 Vocabulary form is important 24 1.1.9 Recognizing the importance of the L1 in

vocabulary studies 25 1.1.10 Engagement is a critical factor in

vocabulary acquisition 26 1.2 Vocabulary and reading 29 1.3 A sample of prominent knowledge gaps in

the field of vocabulary studies 35

Part 2 Foundations of Vocabulary Research

2 Issues of Vocabulary Acquisition and Use 47 2.1 Form-meaning relationships 49 2.1.1 Single orthographic words and multi-word

items 49

9781403_985354_01_prexviii.indd vii9781403_985354_01_prexviii.indd vii 6/11/2010 1:15:11 PM6/11/2010 1:15:11 PM

viii Contents

2.1.2 Formal similarity 50 2.1.3 Synonymy and homonymy 52 2.1.4 Learning new form and meaning versus

‘relabelling’ 52 2.2 Meaning 52 2.2.1 Imageability and concreteness 53 2.2.2 Literal and idiomatic meaning 53 2.2.3 Multiple meaning senses 54 2.2.4 Content versus function words 54 2.3 Intrinsic difficulty 55 2.4 Network connections (associations) 58 2.5 Frequency 63 2.5.1 The importance of frequency in

lexical studies 63 2.5.2 Frequency and other word knowledge

aspects 64 2.5.3 L1/L2 frequency 66 2.5.4 Subjective and objective estimates of

frequency 67 2.5.5 Frequency levels 68 2.5.6 Obtaining frequency information 70 2.6 L1 influence on vocabulary learning 71 2.7 Describing different types of vocabulary 75 2.8 Receptive and productive mastery 79 2.9 Vocabulary learning strategies/self-regulating

behavior 89 2.10 Computer simulations of vocabulary 97 2.11 Psycholinguistic/neurolinguistic research 105

3 Formulaic Language 117 3.1 Identification 120 3.2 Strength of association – hypothesis tests 124 3.3 Strength of association – mutual information 130 3.4 A directional measure of collocation 131 3.5 Formulaic language with open slots 132 3.6 Processing formulaic language 134 3.7 Acquisition of formulaic language 136 3.8 The psycholinguistic reality of corpus-extracted

formulaic sequences 141 3.9 Nonnative use of formulaic language 142

Part 3 Researching Vocabulary

4 Issues in Research Methodology 149

9781403_985354_01_prexviii.indd viii9781403_985354_01_prexviii.indd viii 6/11/2010 1:15:11 PM6/11/2010 1:15:11 PM

Contents ix

4.1 Qualitative research 149 4.2 Participants 150 4.3 The need for multiple measures of

vocabulary 152 4.4 The need for longitudinal studies and

delayed posttests 155 4.5 Selection of target lexical items 158 4.6 Sample size of lexical items 164 4.7 Interpreting and reporting

results 166

5 Measuring Vocabulary 173 5.1 Global measurement issues 173 5.1.1 Issues in writing vocabulary items 174 5.1.2 Determining pre-existing vocabulary

knowledge 179 5.1.3 Validity and reliability of lexical

measurement 181 5.1.4 Placing cut-points in study 187 5.2 Measuring vocabulary size 187 5.2.1 Units of counting vocabulary 188 5.2.2 Sampling from dictionaries or other

references 193 5.2.3 Recognition/receptive vocabulary size

measures 196 5.2.4 Recall/productive vocabulary size

measures 203 5.3 Measuring the quality (depth) of vocabulary

knowledge 216 5.3.1 Developmental approach 217 5.3.2 Dimensions (components) approach 224 5.4 Measuring automaticity/speed of

processing 242 5.5 Measuring organization 247 5.6 Measuring attrition and degrees of residual

lexical retention 256

6 Example Research Projects 260

Part 4 Resources

7 Vocabulary resources 279 7.1 Instruments 279 7.1.1 Vocabulary levels test 279 7.1.2 Vocabulary size test 293

9781403_985354_01_prexviii.indd ix9781403_985354_01_prexviii.indd ix 6/11/2010 1:15:11 PM6/11/2010 1:15:11 PM

x Contents

7.1.3 Meara’s_lognostics measurement instruments 306

7.2 Corpora 307 7.2.1 Corpora representing general English

(mainly written) 309 7.2.2 Corpora representing spoken English 320 7.2.3 Corpora representing national varieties of

English 323 7.2.4 Corpora representing academic/business

English 324 7.2.5 Corpora representing young native English 325 7.2.6 Corpora representing learner English 325 7.2.7 Corpora representing languages other than

English 326 7.2.7.1 Parallel corpora 326 7.2.7.2 Monolingual corpora 327 7.2.8 Corpus compilations 331 7.2.9 Web-based sources of corpora 333 7.2.10 Bibliographies concerning corpora 335 7.3 Concordancers/tools 335 7.4 Vocabulary lists 345 7.5 Websites 347 7.6 Bibliographies 351 7.7 Important personalities in the field of

vocabulary studies 352

Notes 359

References 362

Index 385

9781403_985354_01_prexviii.indd x9781403_985354_01_prexviii.indd x 6/11/2010 1:15:11 PM6/11/2010 1:15:11 PM

xi

Quick Checklist (Principal sections which discuss these issues)

Target lexical items

● Do any lexical characteristics potentially confound your results? (2.1–2.4, 4.5)

● Have you taken frequency into account? (2.5)● Does L1 influence potentially confound your results? (2.6)● Is your sampling rate sufficient to make your results meaningful? (4.6)● Have you considered including formulaic sequences as well as individual

words? (3)

Measurement instruments

● Are they valid, reliable, and appropriate for your participants? (5)● Are they suitable for answering your research questions? (whole book)● Are you measuring receptive or productive mastery, or both? (2.8)● Have you considered measuring word knowledge aspects besides meaning

and form? (1.1.5, 4.3, 5.3)● Have you considered measuring depth of lexical knowledge? (5.3)● Have you considered measuring lexical organization and speed of

processing? (2.4, 2.11, 5.4, 5.5)● If the study is focused on acquisition, is previous lexical knowledge deter-

mined or controlled for? (5.1.2)● If the study is focused on acquisition, are there delayed posttests? (4.4)

Participants

● Are there enough participants to make the study viable? (4.2)

Corpus issues

● Is the corpus you use appropriate for your research questions? (1.1.4, 3.8, 6.2)

9781403_985354_01_prexviii.indd xi9781403_985354_01_prexviii.indd xi 6/11/2010 1:15:11 PM6/11/2010 1:15:11 PM

xii Quick Checklist

Reporting

● Were the units of counting clearly described? (5.2.1)● Did you discuss the absolute size of any gain/attrition? (4.7)● Did you report effect sizes? (4.7)● Are your interpretations and conclusions warranted based on your

results? (4.7)

Bottom line

● Is your study interesting?● Is your study useful to anyone?

9781403_985354_01_prexviii.indd xii9781403_985354_01_prexviii.indd xii 6/11/2010 1:15:12 PM6/11/2010 1:15:12 PM

xiii

General Editors’ Preface

Research and Practice in Applied Linguistics is an international book series from Palgrave Macmillan which brings together leading researchers and teachers in Applied Linguistics to provide readers with the know ledge and tools they need to undertake their own practice related research. Books in the series are designed for students and researchers in Applied Linguistics, TESOL, Language Education and related sub ject areas, and for language profession-als keen to extend their research experience.

Every book in this innovative series is designed to be user-friendly, with clear illustrations and accessible style. The quotations and defi nitions of key concepts that punctuate the main text are intended to ensure that many, often competing, voices are heard. Each book presents a concise historical and con-ceptual overview of its chosen field, identifying many lines of enquiry and findings, but also gaps and disagreements. It provides readers with an overall framework for further examination of how research and practice inform each other, and how practitioners can develop their own problem-based research.

The focus throughout is on exploring the relationship between research and practice in Applied Linguistics. How far can research pro vide answers to the questions and issues that arise in practice? Can research questions that arise and are examined in very specific circum stances be informed by, and inform, the global body of research and practice? What different kinds of information can be obtained from different research methodologies? How should we make a selection between the options available, and how far are different methods com patible with each other? How can the results of research be turned into practical action?

The books in this series identify some of the key researchable areas in the field and provide workable examples of research projects, backed up by details of appropriate research tools and resources. Case studies and exemplars of research and practice are drawn on throughout the books. References to key institutions, individual research lists, journals and profes-sional organizations provide starting points for gathering infor mation and embarking on research. The books also include annotated lists of key works in the field for further study.

The overall objective of the series is to illustrate the message that in Applied Linguistics there can be no good professional practice that isn’t based on good research, and there can be no good research that isn’t informed by practice.

Christopher N. Candlin and David R. HallMacquarie University, Sydney

9781403_985354_01_prexviii.indd xiii9781403_985354_01_prexviii.indd xiii 6/11/2010 1:15:12 PM6/11/2010 1:15:12 PM

xiv

Preface

This is a vocabulary research manual. It aims to give you the background knowledge necessary to design rigorous and effective research studies into the behavior of L1 and L2 vocabulary. It can also help you better under-stand other people’s research and interpret it more accurately. In order to keep the manual to a reasonable length, I assume that you already have an understanding of basic research methodology for language research in general, and also have a basic understanding of statistics. I also assume you have a general understanding of vocabulary issues. The manual will build on this knowledge and discuss the issues which have particular importance for vocabulary research. The exception to these assumptions of previous knowledge is statistical knowledge about corpus linguistics (e.g. t-score and MI), which is more specific to vocabulary research, and so the calculations behind these statistical procedures are spelled out in Chapter 3. In addi-tion, I have almost always built descriptions of terminology and concepts into the text, but in a few cases have added Concept Boxes to supplement the text.

I did not want this book to be just my personal take on vocabulary research, but rather wished it to be a consensus state-of-the-art research manual. While it inevitably reflects my own interests and biases (and uses many of the studies I have been involved with for illustration), I have been extremely fortunate that many of my friends in the field of vocabulary stud-ies have been willing to read all or parts of the book and provide comments. I often incorporated their insightful critiques more-or-less directly into the text, and the final version of the book is greatly improved by the proc-ess. As a result, I feel that the book does reflect a (somewhat personalized) consensus view of good vocabulary research practice. While many of my colleagues might do certain things differently than indicated in this book, it does indicate the major issues which need to be considered to carry out worthwhile vocabulary research, and hopefully will help you to avoid many of the pitfalls that exist.

Although most of the issues discussed in this handbook pertain to vocab-ulary research in any language, the majority of research to date has been on English, including my own personal research. Almost inevitably, this has led to the majority of examples and citations referring to the English language. There is no value judgement intended in this, and I hope you are able to take the ideas and techniques and apply them to the languages you are researching.

9781403_985354_01_prexviii.indd xiv9781403_985354_01_prexviii.indd xiv 6/11/2010 1:15:12 PM6/11/2010 1:15:12 PM

Preface xv

This handbook can’t tell you the exact research methodologies to use, as every lexical study is different, entailing unique goals and difficulties. However, I have tried to provide enough background information about the nature of vocabulary and discussion of possible research methodologies to help guide you in thinking about the issues necessary in selecting and developing sound methodologies for the lexical research you wish to do.

I love vocabulary research, and with so many questions still unanswered, I want to encourage as much of it as I can. I hope this book stimulates you to begin researching vocabulary yourself, or to keep researching if you are already at it. It is a fascinating area, and I hope to hear your results at a future conference and/or read them in a future journal.

Norbert SchmittNottingham June 1, 2009

9781403_985354_01_prexviii.indd xv9781403_985354_01_prexviii.indd xv 6/11/2010 1:15:12 PM6/11/2010 1:15:12 PM

xvi

Acknowledgements

I would like to thank the University of Michigan for giving my wife a Morley scholarship to study in Ann Arbor in July 2008. This allowed me to write a large portion of this book in the wonderful environs of the Rackham Building on their campus. All that was missing from the atmosphere was Indiana Jones sliding through the library on his motorcycle.

Colleagues who have graciously commented on the entire manuscript include Paul Nation, Birgit Henriksen, Averil Coxhead, and Ronald Carter. Their many perceptive comments have improved the final version, and helped to make it more complete. I also owe a debt of thanks to numer-ous colleagues who commented on the parts of the book where their par-ticular specialisms were covered, or who contributed material. Their input has added much to the rigor of the book: Frank Boers, Tom Cobb, Kathy Conklin, Zoltán Dörnyei, Philip Durrant, Catherine Elder, Nick Ellis, Glen Fulcher, Tess Fitzpatrick, Lynne Flowerdew, Gareth Gaskell, Sylviane Granger, Kirsten Haastrup, Marlise Horst, Jan Hulstijn, Kon Kuiper, Batia Laufer, Phoebe Lin, Ron Martinez, Paul Meara, Imma Miralpeix, Anne O’Keeffe, Spiros Papageorgiou, Sima Paribakht, Aneta Pavlenko, Pam Peters, Diana Pulido, Ana Maria Pellicer Sánchez, Paul Rayson, John Read, Ute Römer, Diane Schmitt, Rob Schoonen, Barbara Seidlhofer, Anna Siyanova, Suhad Sonbul, Pavel Trofimovish, Mari Wesche, Cristina Whitecross, and David Wood.

Comments from my editors Chris Candlin and David Hall did much to sharpen both the thinking and presentation of the material. Of course, eve-ryone had slightly different views on the best research methodologies and other content of the book, and so the final distillation of the various points of view is my personal interpretation for which I alone am responsible.

Finally, to my wife Diane, for commenting on the manuscript, but more importantly, for taking me to places like Carcassone, Ann Arbor, Auckland, and Copenhagen where writing various parts of the book was a pleasure. I love you more than ever.

The author and publishers wish to thank Wiley-Blackwell, Elsevier and Lee Osterhout for permission to reproduce copyright material:

Figure 2.1 The Relationship between Historical Origin and Register, G. Hughes A History of English Words, 2000, Malden, MA: Blackwell p. 15

Figure 2.12 ERP plots showing N400 and P600 phenomena, Osterhout, L., McLaughlin, J., Pitkänen, I., Frenck-Mestre, and Molinaro, N. (2006). Novice learners, longitudinal designs, and event-related potentials: A means

9781403_985354_01_prexviii.indd xvi9781403_985354_01_prexviii.indd xvi 6/11/2010 1:15:12 PM6/11/2010 1:15:12 PM

Acknowledgements xvii

for exploring the neurocognition of second language processing. Language Learning 56, Supplement 1: p. 204.

Figure 2.13 fMRI brain location results; Hauk, O., Johnsrude, I. & Pulvermüller, F. Somatotopic representation of action words in the motor and premotor cortex. Neuron 41, 301–307 (2004), Elsevier Science

9781403_985354_01_prexviii.indd xvii9781403_985354_01_prexviii.indd xvii 6/11/2010 1:15:12 PM6/11/2010 1:15:12 PM

9781403_985354_01_prexviii.indd xviii9781403_985354_01_prexviii.indd xviii 6/11/2010 1:15:13 PM6/11/2010 1:15:13 PM

Part 1

Overview of Vocabulary Issues

9781403_985354_02_cha01.indd 19781403_985354_02_cha01.indd 1 6/9/2010 1:58:04 PM6/9/2010 1:58:04 PM

3

1 Vocabulary Use and Acquisition

This is a vocabulary research manual whose primary goal is to provide readers with a solid foundation of vocabulary research methodology, both in terms of good research practice, and in terms of the common pitfalls to avoid. But in doing research, we must always make methodology serve the research issues we are interested in exploring. The issues which attract the most atten-tion (and thus research) in the field of vocabulary concern the nature of lexis, its employment in language use, and the best ways of facilitating its acquisition. In order to design good vocabulary research on these issues, one must be on good terms with what the field already knows about these issues. There are a number of good overviews/collections which should be reviewed to gain a general understanding of vocabulary and its behavior (e.g. Bogaards and Laufer, 2004; Carter, 1998; Coady and Huckin, 1997; Daller, Milton, and Treffers-Daller, 2007; Hunt and Beglar, 1998, 2005; McCarthy, 1990; Meara, 2009; Nation, 1990, 2001; Read, 2000, 2004; Schmitt, 2000, 2008; Schmitt and McCarthy, 1997; Singleton, 1999). This chapter of the book will follow up on the information in these publications and highlight ten key issues which must be taken into account when designing vocabulary research. They are outlined below and have direct implications for the discussion of methodol-ogy in the following chapters of the book. I will then identify a number of important vocabulary issues about which we do not yet have much knowl-edge, and how these gaps affect lexical research.1

1.1 Ten key issues

1.1.1 Vocabulary is an important component of language use

Quote 1.1 Wilkins on the importance of vocabulary for communication

Without grammar very little can be conveyed, without vocabulary nothing can be conveyed.

(1972: 111)


4 Overview of Vocabulary Issues

One thing that all of the partners involved in the learning process (stu-dents, teachers, materials writers, and researchers) can agree upon is that learning vocabulary is an essential part of mastering a second language. The importance of vocabulary is highlighted by the oft-repeated obser-vation that learners carry around dictionaries and not grammar books. However, it is important to provide empirical evidence to back up this type of anecdotal observation (after all, this is a research manual!). This is easily done, as there is plenty of evidence pointing to the importance of vocabu-lary in language use.

One strand of this evidence is the typically high correlations between vocabulary (usually measures of vocabulary size) and various measures of language proficiency. For example, a close relationship has been shown between vocabulary size and reading (e.g. correlations of .50–.75, Laufer, 1992).2 Furthermore, Laufer and Goldstein (2004) found that knowing the form-meaning link of words accounted for 42.6% of the total variance in participants’ class grades according to a regression analysis. Given that the language class grade reflected performance on reading, listening, speaking and writing, grammatical accuracy, sociolinguistic appropriateness, and language fluency, the above figure indicates that vocabulary knowledge contributes a very great deal to overall language success.

Albrechtsen, Haastrup, and Henriksen (2008) compared measures of vocabulary size and depth (association tests) of Danish ESL learners with several measures of the ability to use English. In the L1, lexical size corre-lated with lexical inferencing success (guessing the meaning of unknown words in written text/written discourse) at .69–.82, and in the L2 at .48–.66. L2 vocabulary size correlated with L2 reading ability at .73–.80.

One of the most systematic explorations of the relationship between vocabulary knowledge and language proficiency occurred as part of the development of the DIALANG3 tests (Alderson, 2005). His research team, with Paul Meara heading the vocabulary section, compared scores on vari-ous vocabulary tests with the scores from the other language components of the DIALANG test. The results are illustrated in Table 1.1.

As is clear from the table, vocabulary has strong relationships with the language skills. The checklist test and the vocabulary test battery correlate with reading at .64, listening from .61–.65, writing from .70–.79, and gram-mar at .64. Thus the r2 (i.e. correlation values squared) values indicate that vocabulary accounts for 37–62% of the variance in the various language proficiency scores. Considering the multitude of the factors which could affect these scores (e.g. learner motivation, background knowledge, famili-arity with test task), it is striking that a single factor, vocabulary knowledge, can account for such a large percentage of the variation. The relationship between vocabulary and writing is particularly strong, but even the indi-vidual skill subcomponents (e.g. inferencing) have strong relationships with vocabulary knowledge. Moreover, this strong relationship is not a ‘one-off’; rather it is consistent across the board. The lowest correlation reported was


Vocabulary Use and Acquisition 5

between the checklist test and understanding specific detail in listening (.44), which still accounts for a very respectable 19% of variance. In short, the DIALANG data clearly support the intuitive notion that vocabulary is important for language use.

Quote 1.2 Alderson on the importance of vocabulary for language use

What [the DIALANG analysis] would appear to show is that the size of one’s vocabulary is relevant to one’s performance on any language test, in other words, that language ability is to quite a large extent a function of vocabulary size.

(2005: 88)

Table 1.1 Correlations between vocabulary and other language proficienciesa

Vocabulary

checklist Vocabulary test battery

Test Meaningb Collocationb

Gap–fillc

Word formationc Total

Reading .64– Identifying

main idea.50

– Understanding specific detail

.47

– Lexical inferencing

.58

Listening .61 .44 .43 .56 .50 .65– Identifying

main idea.60

– Understanding specific detail

.44

– Lexical inferencing

.56

Writing .70 .62 .63 .66 .71 .79– Accuracy .70– Register .57– Textual

organization.51

Grammar .64

a This table is compiled from Alderson (2005: 87, 89, 205).b Receptive test.c Productive test.



1.1.2 A large vocabulary is required for language use

People use language to communicate, and so naturally one key issue in vocab-ulary studies is how much vocabulary is necessary to enable this communica-tion. The short answer is a lot, but it depends on one’s learning goals. If one wishes to achieve native-like proficiency, then presumably it is necessary to have a vocabulary size similar to native speakers. Because most of the research on vocabulary size has been done on English, my discussion will focus on that language, although there are reasons to believe that the figures for other languages may be lower (Nation and Meara, 2002). Unfortunately, much of the research into native speaker vocabulary size has been methodologi-cally flawed, leading to wildly varying estimates (Nation, 1993). In fact, the estimates are sometimes an order of magnitude apart. However, there have been a few well-designed studies which provide reliable estimates. Goulden, Nation, and Read (1990) found that their New Zealand university under-graduates had a vocabulary size of about 17,000 word families. (See Section 5.2.1 for a description of the various units for counting vocabulary.) D’Anna, Zechmeister, and Hall (1991) found that their university students knew a little under 17,000 of the headwords in the 1980 Oxford American Dictionary. Using the same methodology as D’Anna et al., Zechmeister, Chronis, Cull, D’Anna, and Healy (1995) found similar results for university students (around 16,000 headwords), while junior high school students knew 11,836 headwords on average, and retired adults 21,252. When I use the Goulden et al. (1990) checklist test with my educated friends and university students, I normally come up with estimates in line with the above, ranging between 15,000 and 18,000 word families. Native speakers will always vary in their vocabulary size to some extent, depending on the amount and the manner in which they use their language. We would expect highly educated persons to have a larger vocabulary than less educated persons, but this may not always be true. For example, a crossword enthusiast may well have a wider vocabulary than a holder of a PhD. Nevertheless, a range of 16,000–20,000 word families seems a fair estimate of the vocabulary size for educated native speakers.

Quote 1.3 Nation and Waring on native-speaker vocabulary size

The best conservative rule of thumb that we have is that up to a vocabulary size of around 20,000 word families, we should expect that [English] native speakers will add roughly 1,000 word families a year to their vocabulary size. This means that a [L1] five year old beginning school will have a vocabulary of around 4,000 to 5,000 word families. A university graduate will have a vocabulary of around 20,000 word families. These figures are very rough and there is likely to be a large variation between individuals. These figures exclude proper names, compound words, abbreviations, and foreign words.

(1997: 7–8)



Luckily, second language learners do not need to achieve native-like vocabulary sizes in order to use English well. A more reasonable vocabulary goal for these learners is the amount of lexis necessary to enable the various forms of communication in English. One of the most basic things a person might want to do is to communicate orally on an everyday basis (e.g. asking directions to the train station, describing one’s holiday). If we assume that 98% of the vocabulary needs to be known (Hu and Nation, 2000; Schmitt, Jiang, and Grabe, in press), and also assume that the proper nouns in the discourse are known, we can estimate the number of word families it takes to be able to engage in informal daily conversation. Nation (2006), using word lists based on the Wellington Corpus of Spoken English, calculated that 6,000–7,000 word families are required to reach this goal. An analysis of the spoken CANCODE corpus (Adolphs and Schmitt, 2003) found cov-erage figures congruent with Nation’s at the 3,000 word family level (the upper limit of their analysis), supporting Nation’s calculations.

Kon Kuiper (2009) argues it is useful to make a distinction between ‘com-municatively competent’ in a genre and ‘native-like’ in a genre. He makes the case for genre-specific native-like competence and performance, because no native speaker has native-like competence and communicative perform-ance in all genres. For L2 speakers, communicative competence is possible in a number of genres, but native-like competence is much more difficult.

However, it is not yet clear that the 98% coverage figure (derived from research on written discourse) is the most appropriate figure for spoken dis-course. Nation (2006), using the Wellington Corpus, calculated that 95% coverage would require knowledge of about 3,000 word families, plus proper nouns. In addition, Staehr (2009) found that advanced Danish listeners who knew the 5,000 most frequent word families in English were also able to demonstrate adequate listening ability on the Cambridge-ESOL Certificate of Proficiency in English (CPE) listening exam. Overall, the current evi-dence suggests that it requires between 2,000 and 3,000 word families to be conversant in English (if 95% coverage is adequate) or between 6,000 and 7,000 word families if 98% coverage is needed. However, there is simply not enough evidence to confidently establish a coverage requirement for listen-ing at the moment.

For estimates of written vocabulary, we are on firmer ground. Nation (2006) went on to calculate that 8,000–9,000 word families are necessary to read a range of authentic texts (e.g. novels or newspapers), based on British National Corpus (BNC) data and 98% coverage. Similarly, both the highest level (C2) of the Common European Framework and the CPE require between about 4,500 and 5,000 word families on a 5,000 level test (i.e. knowing most of these frequent families) (Milton and Hopkins, 2006).4 Because learners are also likely to need some families beyond the 5,000 level, it seems that 8,000–9,000 word families is the realistic target if they wish to read a wide variety of texts without unknown vocabulary being a problem.



These figures may seem daunting to both teachers and learners, but even so, they probably underestimate the learning challenge. Each word family includes several individual word forms, including the root form (stimulate), its inflections (stimulated, stimulating, stimulates), and regular derivations (stimulation, stimulative). Nation’s (2006) BNC lists show that the most fre-quent 1,000 word families average about six members (types per family), decreasing to about three members per family at the 9,000 frequency level. According to his calculations, a vocabulary of 6,000 word families (ena-bling listening) entails knowing 28,015 individual word forms, while the 8,000 families (enabling wide reading) entails 34,660 words. Sometimes these word family members are transparently related (nation–national) and relatively guessable if unknown. However, this is not always the case (involve–involvedness), and learners may have trouble with these less-transparent members, especially in terms of production. While Horst and Collins (2006) found a growing morphological productive ability in their French learners of English over 100, 200, 300, and 400 hours of instruction, Schmitt and Zimmerman’s (2002) advanced learners of English (preparing to enter English-medium universities) typically knew only some, but not all, of the noun/verb/adjective/adverb members of word families taken from the Academic Word List (Coxhead, 2000). Thus, it cannot be assumed that knowing one word family member implies knowing (or being able to guess) other related members.

The upshot is that learners must learn a very large number of lexical items to be able to operate in English, especially considering that the above fig-ures do not take into account the multitude of phrasal lexical items (see Chapter 3) which have been shown to be extremely widespread in language use (e.g. Schmitt, 2004; Wray, 2002). Learning such a large number of lexi-cal items is one of the greatest hurdles facing learners in acquiring English. Moreover, it is one which a great many learners fail to cross successfully, as the vocabulary sizes of learners reported in research studies typically fall well short of these size requirements (Table 1.2).

The scope of the vocabulary learning task, and the fact that many learn-ers fail to achieve even moderate vocabulary learning goals, indicates that it can no longer be assumed that an adequate lexis will simply be ‘picked up’ from exposure to language tasks focusing either on other linguistic aspects (e.g. grammatical constructions) or on communication alone (e.g. commu-nicative language teaching). Rather, a more proactive, principled approach needs to be taken in promoting vocabulary learning, which includes both explicit teaching and exposure to large amounts of language input, espe-cially though extensive reading (Laufer, 2005a; Schmitt, 2008).

1.1.3 Formulaic language is as important as individual words

Vocabulary instruction has tended to focus on individual words because they have been considered the basic lexical unit, but also because they



are easier to work with than formulaic language. Languages like English indicate individual words in text by placing spaces around them, while formulaic language is seldom rendered as single forms (e.g. with hyphens: state-of-the-art). Individual words are convenient units to teach and incor-porate into materials. The main vocabulary reference source, dictionaries, are set up around individual headwords. Word processors give counts of individual words in documents. It is therefore no wonder that most teachers and students tend to think of vocabulary in terms of individual words. For similar reasons, most vocabulary research has studied individual words.

However, it is becoming increasingly clear that formulaic language5 is an important element of language learning and use, in ways outlined over the years by Pawley and Syder (1983), Nattinger and DeCarrico (1992), Moon (1997), Wray (2002), Schmitt and Carter (2004), Fellbaum (2007), and Granger and Meunier (2008), among others. There are a number of reasons why we should give formulaic language a prominent place in vocabulary research:

Normal discourse, both written and spoken, contains large (but not yet ●

fully determined) percentages of formulaic language. Erman and Warren (2000) calculated that 52–58% of the L1 English language they analyzed

Table 1.2 English vocabulary size of foreign learnersa

CountryVocab.

sizeHours of

instructionb Reference (re: size)

JapanEFL University

2,0002,300

800–1,200 Shillaw, 1995Barrow et al., 1999

China English majors

4,000 1,800–2,400 Laufer, 2001

IndonesiaEFL University

1,220 900 Nurweni and Read, 1999

OmanEFL University

2,000 1,350+ Horst, Cobb, and Meara, 1998

IsraelHigh school graduates

3,500 1,500 Laufer, 1998

FranceHigh school

1,000 400 Arnaud et al., 1985

GreeceAge 15, high school

1,680 660 Milton and Meara, 1998

Germany Age 15, high school

1,200 400 Milton and Meara, 1998

a Table is taken from Laufer (2000a: 48, slightly adapted).b The data on hours of instruction were largely obtained by Laufer’s personal communication with colleagues from the respective countries.



was formulaic, and Foster (2001) came up with a figure of 32% using dif-ferent procedures and criteria.If much discourse is made up of formulaic language, then this implies that ●

proficient language users know a large number of formulaic expressions. Pawley and Syder (1983: 213) suggest that the number of ‘sentence-length expressions familiar to the ordinary, mature English speaker probably amounts, at least, to several hundreds of thousands’. Jackendoff (1995) concludes from a small corpus study of spoken language in a TV quiz show that people may know at least as many formulaic sequences as single words. Mel’cuk (1995: 169) believes that phrasemes are more numerous than words by a ratio of at least 10 to 1. It must be said however, that there is little hard research yet to either support or refute these assertions.Formulaic language is not a homogeneous phenomenon, but is, on the ●

contrary, rather varied. Formulaic sequences can be long (You can lead a horse to water, but you can’t make him drink) or short (Oh no!), or anything in between. They are commonly used for different purposes. They can be used to express a message or idea (The early bird gets the worm = do not procrastinate), functions ([I’m] just looking [thanks] = declining an offer of assistance from a shopkeeper), social solidarity, and to transact specific information in a precise and understandable way. They real-ize many other purposes as well, as formulaic sequences can be used for most things society requires of communication through language. These sequences can be totally fixed (Ladies and Gentlemen) or have a number of ‘slots’ which can be filled with appropriate words or strings of words ([someone/thing, usually with authority] made it plain that [some-thing as yet unrealized was intended or desired]). Formulaic language also includes the multitude of collocations which exist in language (blue sky, hard work).6

Similarly, formulaic language is used to realize a number of different com- ●

municative purposes in language use, including:Functional use There are recurring situations in the social world that require language to deal with them. These are often described as func-tions, and include such speech acts as apologizing, making requests, giving directions, and complaining. These functions typically have conventionalized language attached to them, such as I’m (very) sorry to hear about ——— to express sympathy and I’d be happy/glad to ——— to comply with a request (Nattinger and DeCarrico, 1992). Because mem-bers of a speech community know these expressions, they serve as a quick and reliable way to achieve the related speech act.

Social interaction (phatic communion) People commonly engage in ‘light’ conversation for pleasure or to pass the time of day, where the purpose is not really information exchange or to get someone to do something. Rather, the purpose is social solidarity, and people rely on non-threatening phrases to keep the conversation flowing, including



comments about the weather (Nice weather today; Cold isn’t it?), agreeing with your interlocutor (Oh, I see what you mean; OK, I’ve got it), providing backchannels and positive feedback to another speaker (Did you really?; How interesting). Research has shown that such phrases are a key ele-ment of informal spoken discourse (McCarthy and Carter, 1997).

Discourse organization Formulaic phrases are a common way to sign-post the organization of both written (in other words, in conclusion) and spoken discourse (on the other hand, as I was saying).

Precise information transfer Technical vocabulary are words which have a single and precise meaning in a particular field (scalpel is a specific type of knife used in medicine). But this phenomenon is not restricted to individual words. Indeed, fields often have phraseology to transact information in a way which minimizes any possible misunderstanding. For example, in aviation language, the phrase Taxi into position and hold clearly and concisely conveys the instructions to move onto the runway and prepare for departure, but to wait for final clearance for takeoff.

The use of formulaic language helps speakers be fluent. Pawley and Syder ●

(1983) suggest native-speakers have cognitive limitations in how quickly they can process language, but they are also able to produce language seemingly beyond these limitations. They present evidence that the larg-est unit of novel discourse that native speakers are able to process is a single clause of eight to ten words. When speaking, they will speed up and become fluent during these clauses, but will then slow down or even pause at the end of these clauses (Dechert, 1983). Presumably these pauses permit the speaker to formulate the next clause. Speakers seldom pause in the middle of a clause, or at least not for long. Together, this evidence suggests that speakers are unable to compose more than about eight to ten words at a time.

On the other hand, native speakers can fluently say multi-clause utterances. Consider the following examples:

You shouldn’t believe everything you hear.1. It just goes to show, you can’t be too careful.2. You can lead a horse to water, but you can’t make him drink.3.

They have increasingly more words, and Example 3 is clearly beyond the limit of eight to ten words. Yet native speakers can say them all without hesitation. Pawley and Syder suggest that these examples can be fluently produced because they are actually already memorized, i.e. as prefabricated phrases which are stored as single wholes and are, as such, instantly avail-able for use without the cognitive load of having to assemble them on-line as one speaks. Pawley and Syder suggest that the mind uses its vast memory to store these prefabricated phrases in order to compensate for a limited



working memory (and the capacity to compose novel language on-line). Indeed, research by Kuiper (2004) shows that speakers who operate under severe time constraints (play-by-play sports announcers, auctioneers) use a great deal of formulaic language in their speech. In addition, there is now converging evidence that collocations and other formulaic language are indeed processed more quickly than non-formulaic language (Ellis, 2006a; Conklin and Schmitt, 2008; Underwood, Schmitt, and Galpin, 2004).

Overall, these points illustrate that formulaic language is intrinsically con-nected with functional, fluent, communicative language use. As such, it is just as important as individual words. Thus, vocabulary researchers always need to be aware of both single- and multi-word lexical items, and whenever practical, include both types in their research and discussions.

1.1.4 Corpus analysis is an important research tool

One of the most significant developments in vocabulary studies in recent years has been the use of corpus evidence to provide an empirical basis for determining vocabulary behavior, instead of relying on appeals to intui-tion or tradition. The first major corpus study I could find any reference to was carried out by F.W. Kaeding in 1898 (Howatt, 2004: 290). He super-vised a massive analysis of German, where hundreds of workers manually counted nearly 11 million words. This is an amazing feat for such an early time, without even typewriters, let alone computers! By the first half of the 1900s, corpus analysis was already making an impact on pedagogy. Several scholars (Harold Palmer, Michael West, Edward Thorndike, Lawrence Faucett, Irving Lorge) were concerned with ways to systematize the selec-tion of vocabulary for learners. They also tried to make vocabulary easier by limiting it to some degree, and so their attempts came to be collectively

Quote 1.4 Wray on the development of formulaic language in L1 and L2

Wray suggests that the development of good collocation intuitions comes down to how language is learned. Natives appear to learn formulaic language through-out the language acquisition process, while nonnatives focus more on individual words than sequences because they are more manageable and give a feeling of control over the language:

The consequence [of focusing on word-sized units in L2 learning] is a failure to value the one property of nativelike input which is most characteristic of the idiomaticity to which the learner ultimately aspires: words do not go together, having first been apart, but, rather, belong together, and do not necessarily need separating.

(2002: 212)



known as the vocabulary control movement (see Howatt, 2004; Schmitt, 2000; and Zimmerman 1997, for overviews). The work of several of these scholars merged into what came to be referred to as ‘The Carnegie Report’ (Palmer, West, and Faucett, 1936). The report recommended the development of a list of vocabulary which would be useful in the production of simple reading materials. The list ended up having about 2,000 words, and was finally pub-lished as the General Service List of English Words (GSL) (West, 1953). A key feature of the GSL is that each word’s different parts-of-speech and different meaning senses are listed, which makes the list much more useful that a sim-ple frequency count. It has been immensely influential in lexical research and materials design, but is now dated (as it is based on word counts from the first part of the last century) and requires a complete revision based on current corpora. (See Sections 2.5, 2.7, and 6.4 for more details.)

However, corpus analysis really took off when it became computerized. Early computerized corpora of 1 million words were considered large, e.g. the Brown Corpus (Kucera and Francis, 1967) focusing on American English, and its counterpart in Europe, the Lancaster-Oslo/Bergen Corpus (LOB) (Hofland and Johansson 1982; Johansson and Hofland, 1989) focusing on British English. Nowadays, much larger corpora are the norm. Perhaps the best-known corpus of general English is the 100 million word British National Corpus (including 10 million words of unscripted spoken discourse). American English is also well catered for with the 385 million word Corpus of Contemporary American English. The Bank of English Corpus is larger (524 million words),7 but contains almost exclusively written text. The TOEFL 2000 Spoken and Written Academic Language Corpus consists of 2.7 million words sampled at four US universities, including almost 1.7 million spoken (1.2 million from class sessions) and 1 million written. There are also several corpora based on unscripted spoken English, including the Cambridge and Nottingham Corpus of Discourse English (CANCODE – 5 million words), the Michigan Corpus of Academic Spoken English (MICASE – 1.7 million words), the British Academic Spoken English corpus (BASE – 1.6 million words). (See Section 6.2 for detailed descriptions of these and many other corpora.)

Frequency is one of the most important characteristics of vocabulary, affecting most or all aspects of lexical processing and acquisition. Corpus data is the best source of frequency information, and several findings invariably appear. One key characteristic is that a relatively small number of the most frequent words cover an inordinate percentage of word occur-rences in language. For example, the is the most frequent word in written and spoken English, making up approximately 6.2% of all word occur-rences. The top three words (the, of, and) make up about 12.8%, the top ten words (the, of, and, a, in, to, it, is, was, I) 22.2% of all tokens. (These figures are based on unlemmatized BNC data (Leech, Rayson, and Wilson, 2001).) Nation and Waring (1997) report that 2,000 lemmas cover about 80% of the occurrences in the Brown Corpus. Thus, a relative handful of



words cover the vast majority of language, while the rest occur much less frequently.

Another finding is that the most frequent words in English tend to be grammatical words. This stems from the commonsense fact that such gram-matical words are necessary to the structure of English regardless of the topic. Articles, prepositions, pronouns, conjunctions, forms of the verb be etc. are equally necessary whether we are talking about cowboys, space exploration, botany, or music.

A third insight is that the frequencies of lexical items differ considerably between spoken and written discourse. For example, a number of content words, such as know, well, got, think, and right, are more frequent in spoken discourse than written discourse. On closer inspection, it turns out that these words are not content words at all but actually elements of interper-sonal phrases (you know, I think), single-word organisational markers (well, right), smooth-overs (never mind), hedges (kind of/sort of ), and other kinds of discourse items which are characteristic of the spoken mode (McCarthy and Carter, 1997). This shows that spoken language makes frequent use of these types of discourse markers, while they rarely occur in written language. A related difference is that the same word may take different meanings in the two modes. McCarthy and Carter (1997) show that got is used mainly in the construction have got in the CANCODE as the basic verb of possession or personal association with something. However, they highlight the follow-ing two sentences from the corpus which are indicative of other meanings:

I’ve got so many birthdays in July.1. I’ve got you.2.

In Example 1, the speaker means something like ‘I have to deal with’, because he/she is referring to the obligation of sending numerous birthday cards. In Example 2, I’ve got seems to mean ‘I understand you’. Neither of these mean-ing senses would be common in the formal written mode.

These insights about vocabulary frequency have immediate ramifications for research. The most important is that frequency must be considered in the selection of target words for vocabulary studies. Language learners typi-cally acquire higher frequency vocabulary before lower frequency vocabu-lary, so matching the vocabulary frequency to the level of the participants in a study is important. For example, if your participants are beginners, higher frequency words will need to be selected. Another implication is that we need to be aware of the differences in spoken and written frequency and not assume that they are interchangeable. Thus, a study concerning spoken vocabulary will usually be best designed based on data from spoken, rather than written, corpora. It should also be noted that, in some sense, genre is relevant: some lexical items that are infrequent in general English (e.g. collocation) become much higher in frequency in specific genres, like



applied linguistics. There will be further detailed discussion of vocabulary frequency effects in Section 2.5.

Another major vocabulary characteristic for which corpus data supplies information is the kind of lexical patterning described in the above section on formulaic language. It is also worth mentioning that corpus data has been influential in lexicography, as all of the major modern learner diction-aries have been based on corpus data.

1.1.5 Vocabulary knowledge is a rich and complex construct

In addition to needing a large vocabulary size to function in a language, a person must also know a great deal about each individual lexical item in order to use it well. This often referred to as the quality or ‘depth’ of vocabulary knowledge, and is as important as vocabulary size. Most laymen (including many teachers and learners) might consider a lexical item ‘learned’ if the spo-ken/written form and meaning are known. Furthermore, Brown (in press) found that the nine general English textbooks he analysed focused mainly on meaning and form, with some attention to grammatical function, to the exclusion of other types of word knowledge (see below). While it is true that the form-meaning link is the first and most essential lexical aspect which must be acquired, and may be adequate to allow recognition, much more must be known about lexical items, particularly if they are to be used productively.

Quote 1.5 Reppen and Simpson on the value of corpora

Corpus linguistics provides an extremely powerful tool for the analysis of natural language and can provide tremendous insights as to how language use varies in different situations, such as spoken versus written, or formal interactions versus casual conversation.

(2002: 92)

Quote 1.6 Anderson and Freebody on breadth and depth of vocabulary knowledge

The first [type of vocabulary knowledge] may be called ‘breadth’ of knowledge, by which we mean the number of words for which the person knows at least some of the significant aspects of meaning ... . [There] is a second dimension of vocabulary knowledge, namely the quality or ‘depth’ of understanding. We shall assume that, for most purposes, a person has a sufficiently deep understanding of a word if it conveys to him or her all of the distinctions that would be understood by an ordi-nary adult under normal circumstances.

(1981: 92–93)



There are a number of ways ‘depth of knowledge’ can be conceptualized. One is overall proficiency with a word, ranging from no knowledge at all to complete mastery. This ‘developmental’ conceptualization (Read, 2000) is typically measured along a scale. Examples of such scales include the Vocabulary Knowledge Scale (Paribakht and Wesche, 1997) and a four-stage scale used by Schmitt and Zimmerman (2002). (See Section 5.3.1 for a fuller description and evaluation of developmental scales.)

A second way of conceptualizing vocabulary knowledge is by breaking it down into its separate elements, which could be described as a ‘component’ or ‘dimensions’ approach. The genesis of this approach is usually traced back to an article in 1976 by Jack Richards in TESOL Quarterly, where he discussed several assumptions about knowing vocabulary. His article attracted notice, and led Paul Nation to specify the kinds of knowledge one must have about a word in order to use it well. The original list included eight types of word knowledge:

spoken form ●

written form ●

grammatical patterns ●

collocations ●

frequency ●

appropriateness (register) ●

meaning ●

associations. (Nation, 1990: 31) ●

He presented a revised and expanded version in 2001, which is the best specification of the range of ‘word knowledge’ aspects to date (Table 1.3).

These various types of word knowledge become important when teaching a language or developing research for a number of reasons. First, some of these word knowledge aspects are relatively amenable to intentional learn-ing, such as word meaning and word form, while the more contextualized aspects, such as collocation and intuitions of frequency, are much more dif-ficult to teach explicitly. They probably have to be acquired instead through massive exposure to the L2. Likewise, some aspects are relatively easy to measure in research (e.g. written form, meaning), while some are extremely difficult to capture (register, collocation). In addition, although all of the word knowledge types are learned concurrently, some are mastered sooner than others (Schmitt, 1998a). This has implications for research, as different vocabulary measures might be appropriate at the different stages of acquisi-tion of an item. At the beginning of the incremental learning process, meas-uring the meaning-form link is probably most appropriate, but as the word becomes more established, it might be better to measure some of the con-textual types of word knowledge (e.g. collocation) to determine the degree of higher-level mastery of a lexical item.


10.1057/9780230293977 - Researching Vocabulary, Norbert Schmitt

Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


Another way lexical items are mastered lies in the automaticity in which they can be recognized and produced. This approach has often been used in psycholinguistic experiments (Section 2.11), although often in studies where the focus was on some other aspect (e.g. the influence of the L1 on L2 processing) and where vocabulary was just an expedient linguistic element to measure. Measures of automaticity have just begun to be used in gen-eral vocabulary research (e.g. Siyanova and Schmitt, 2008). However, auto-maticity measures can be highly important if we wish to move away from declarative knowledge and explore procedural knowledge. As the essence of vocabulary mastery is the ability to use it fluently in communication (not the ability to talk about it metalinguistically), measures which tap into fluent

Table 1.3 What is involved in knowing a word

Form Spoken R What does the word sound like?

P How is the word pronounced?Written R What does the word look like?

P How is the word written and spelled?

Word parts R What parts are recognizable in this word?

P What word parts are needed to express this meaning?

Meaning Form and meaning R What meaning does this word form signal?

P What word form can be used to express this meaning?

Concept and referents R What is included in the concept? P What items can the concept

refer to? Associations R What other words does this

make us think of? P What other words could we

use instead of this one?

Use Grammatical functions R In what patterns does the word occur?

P In what patterns must we use this word?

Collocations R What words or types of words occur with this one?

P What words or types of words must we use with this one?

Constraints on use(register, frequency ...)

R Where, when, and how often would we expect to meet this word?

P Where, when, and how often can we use this word?

(Nation, 2001: 27).



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


and accurate usage are crucial. Barcroft (2002) suggests that the mind has lim-ited cognitive resources, and so if they are focused on one aspect (e.g. form), there will be less available to apply to other aspects (e.g. meaning). Thus, the more automatic some word knowledge aspects are, the more resources can be given to other aspects. Furthermore, these cognitive constraints are not limited only to vocabulary. The ability to use lexical items without thinking frees up resources for other language processes, e.g. planning, and so auto-matic lexical processing has benefits for language use as a whole.

The above perspectives highlight the ways people gain better mastery over lexical items, i.e. knowing more lexical items, knowing more about each item, and being able to utilize the items more automatically. However, vocabulary mastery can also be considered in terms of the overall mental lexicon rather than individual items. The main method of exploring this lexical organiza-tion has been with word associations, a methodology probably best known for its application in the field of psychology. A stimulus word is given to partici-pants and they are asked to respond with the first word or words which come to mind. For example, the stimulus word needle typically elicits the responses thread, pin(s), sharp, and sew(s). The assumption is that automatic responses which have not been thought out will consist of words which have the strong-est connections with the stimulus word in the subjects’ mental lexicon. By analyzing associations, we can gain clues about the mental relationships between words and thus the organization of the mental lexicon. In general, we find that association responses exhibit a great deal of systematicity, i.e. many of the same responses are produced by a wide variety of participants, signalling similar lexical organization. We also find that nonnatives produce a wider variety of responses than natives, suggesting less-well-organized lexi-cons. These results will be discussed in more detail in Section 2.4.

All of these dimensions of word mastery are interrelated, and are holisti-cally connected. However, it is not possible to measure all of these dimen-sions individually in any kind of test that can be envisaged. Even if it were possible, the test battery would probably be too long and complex for research purposes, and would certainly be too extended to be of any kind of pedagog-ical use. Therefore, vocabulary researchers need to carefully consider which limited aspects they are going to measure in their vocabulary studies, and carefully consider the limitations and implications of their choices.

Quote 1.7 Read on depth of vocabulary knowledge

... learners need to have more than just a superficial understanding of the mean-ing [of a word]; they should develop a rich and specific meaning representation as well as knowledge of the word’s format features, syntactic functioning, colloca-tional possibilities, register characteristics, and so on.

(2004: 155)



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


1.1.6 Vocabulary learning is incremental in nature

Vocabulary acquisition is incremental both in terms of acquiring an ade-quate vocabulary size, and in terms of mastering individual lexical items. The gradual acquisition of increasingly larger lexicons is well-illustrated in a study by Henriksen (2008). She measured the L2 vocabulary size of Danish EFL students, but almost uniquely, she also measured their L1 size. She found that there was consistent improvement in vocabulary size across the increasing grades in both the L1 and L2, although this growth was achieved over an extended period of time. Unsurprisingly, she also found the L1 scores were larger than the L2 scores, even though the L1 test included very low-frequency items compared to the L2 test.

The above are total vocabulary size estimates, but the Vocabulary Levels Test (Section 5.2.3) which Henriksen used also provides a profile of how much learners know at various frequency levels. These profiles similarly show the gradual growth of vocabulary through the various frequency levels.

Considering the incremental acquisition of individual lexical items, it is well-established that individual lexical items need to be met many times in

Grade 7a Grade 10 Grade 13

L1 M 50.2b 83.5 102.1

SD 18.4 18.7 10.1

L2 M 33.8 71.9 94.8

SD 22.4 20.6 14.7

a Each grade had 29 Danish informants.b Max. score = 120.

Number of participants masteringa each frequency level (N = 29 in each grade)

Level Grade 7a Grade 10 Grade 13

< 2.000b 24 11 02,000 5 12 8

3,000 0 2 11

5,000 0 4 5

10,000 0 0 5

a Mastery was set at scoring 26 out of 30 items.b These students failed to meet the criterion of 26 correct items on the 2,000 level.



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


order to be learned (Nation, 2001). Thus, it is obvious that lexical items can-not be fully learned from only a single exposure, yet much of research and testing seems to revolve around the assumption that it can. Much vocabu-lary research discusses items as being either ‘unknown’ or ‘learned’ depend-ing on whether a test item was answered correctly or not. I think this is unfortunate, as the true underlying vocabulary learning process is clearly incremental in nature. Moreover, it is incremental in a variety of ways. Let us examine this in more detail.

We have seen that complete mastery of an item entails a number of types of word knowledge, as shown in Table 1.3, not all of which can be learned completely from a few exposures. Experience has shown that some are mas-tered before others. For example, learners will surely know a word’s basic meaning sense before they have full collocational competence. However, at the moment it is difficult to confidently say much about how the different word knowledge types develop in relation to each other, simply because there is a shortage of studies which look at the acquisition of multiple types of word knowledge concurrently. Those that do exist (Pigada and Schmitt, 2006; Schmitt, 1998a; Schmitt and Meara, 1997; Webb, 2005, 2007a, 2007b) seem to confirm that some word knowledge types do develop before oth-ers, but it is difficult to come to any conclusion about an overall pattern. However, based on such studies and my own understanding of vocabu-lary, I would suggest the following scenario. On the first exposure to a new word, all that is likely to be picked up is some sense of word form and meaning. If the exposure was oral, the person might remember the pro-nunciation of the whole word, but might only remember what other words it rhymes with or how many syllables it has. If the exposure came from a written text, the person may only remember the first few letters of the word, or its broad structural outline. Since it was only a single exposure, it is only possible to gain the single meaning sense which was used in that context. There is also the possibility that the word class was noticed, but not much else. As the person gains a few more exposures, these features will start to be consolidated, and perhaps some other meaning senses will be encountered. But it will probably be relatively late in the acquisition process before a person develops intuitions about the word’s frequency, reg-ister constraints, and collocational behavior, simply because these features require a large number of examples to determine the appropriate values. This account allows for a great deal of variability in how individual lexical items are learned, but the key point is that some word knowledge aspects develop before others.

Thus, vocabulary learning is incremental because some types of word knowledge are established before others. However, I would also argue that each individual type of word knowledge is learned incrementally as well. As part of Henriksen’s (1999) description of the incremental development of vocabulary knowledge, she proposes that learners have knowledge of any



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


lexical aspect which ranges from zero to partial to precise. This would mean that all word knowledge ranges on a continuum, rather than being known versus unknown. Even knowledge as seemingly basic as spelling can behave in this manner, ranging on a cline something like this:

can’t spell knows some phonologically fully correctword at all letters correct spelling< >

Other word knowledge aspects would follow a similar zero→partial→precise development. I found evidence for partial/precise degrees of knowl-edge in a study I made of advanced L2 learners at university level (Schmitt, 1998a). I followed their mastery of a number of word knowledge aspects for 11 words over the majority of an academic year. The students rarely knew all of the words’ derivational forms or meaning senses. They nor-mally knew the word class of the stimulus word and one derivation, but rarely all of the four main forms (noun, verb, adjective, adverb). Likewise, they normally knew the core meaning sense, but almost never all of the possible senses. The association scores for my students generally became more native-like over time, indicating the words were gradually becoming better integrated into the students’ mental lexicons. All of this shows that learner knowledge of the various word knowledge aspects is often partially mastered, and that it takes time to develop each of these word knowledge aspects towards more precision.

One word knowledge component which is well-researched is meaning. From this body of research, it is clear that receptive mastery generally devel-ops before productive mastery, although this may not be the case for every item. This is illustrated by studies which have compared the number of words known productively versus receptively. For example, Laufer (2005a) compared learners’ productive test scores on L1-L2 recall tests as a per-centage of their receptive test scores on L2-L1 translation tests. She found productive/receptive ratios ranging from 16% at the 5,000 frequency level to 35% at the 2,000 level, while Fan (2,000) found a range from 53% to 81% (mean 69.2%) for words taken from the 2,000, 3,000, and UWL levels. Laufer and Paribakht (1998) found an average ratio of 77% for Israeli EFL students and 62% for Canadian ESL students. While the ratios are highly dependent on the types of receptive/productive tests used (Laufer and Goldstein, 2004), it seems clear that a learner’s receptive lexicon is likely to be larger than his/her productive lexicon. See Sections 2.8, 5.2, and 5.3 for a detailed discussion of receptive versus productive mastery of vocabulary and tests thereof.

We also know that learners vary in their ability to use lexical items in written and spoken discourse, i.e. their orthographic and phonological mastery of items. Milton and Hopkins (2006) compared the written and



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


spoken English vocabulary sizes of Greek- and Arabic-speaking learners, and found that the written was generally larger (mean written size: 2,655 words; spoken: 2,260). The correlation between the two sizes was moderate (.68), but varied according to L1: Greek .81; Arabic .65). The relationship between orthographic and phonological knowledge also varied according to profi-ciency: for both language groups, low scores tend to be associated with a greater tendency for phonological vocabulary knowledge to exceed ortho-graphic vocabulary knowledge, but for high scorers, the reverse was true. Thus we cannot assume a straightforward relationship between the written and spoken knowledge of words in a learner’s lexicon, although these do seem generally to increase in parallel manner.

In sum, not only is vocabulary acquisition incremental, but it is incremental in a variety of ways. First, lexical knowledge is made up of different kinds of word knowledge and not all can be mastered simultaneously. Second, each word knowledge aspect may develop along a cline, which means not only is word learning incremental in general, but learning of the individual word knowledge aspects is as well. Third, each word knowledge type varies in the degree of receptive/productive mastery. Taken together, this indicates that word learning is a complicated, but gradual process. The implication for research is that simple knows/doesn’t know descriptions of vocabulary knowledge (usu-ally based only on the initial form-meaning link) are wholly inadequate for describing vocabulary knowledge. If, for practical reasons, only a single word knowledge aspect like meaning can be measured in a study, at a minimum, the results need to be interpreted in terms of an incremental learning perspec-tive. For example, instead of reporting that words in such studies are ‘learned’ because a form-meaning test item was correctly answered, it is better to inter-pret this result as showing that the word’s form-meaning link has been estab-lished at either the receptive or productive level, and acknowledge that this does not imply that other word knowledge aspects have been mastered. Thus, such a test item only indicates initial learning of a word. However, it is bet-ter to establish a norm in which multiple measures of vocabulary are used in studies to paint a more complete picture of vocabulary knowledge and acquisi-tion. This could be in terms of receptive/productive mastery, different types of word knowledge, degree of mastery of an individual word knowledge aspect, contexts of use, etc., or some combination of these.

Quote 1.8 Newton on research implications of the incremental nature of vocabulary acquisition

There is a need to develop instruments which are more sensitive to degrees of acquisition and to both receptive and productive vocabulary knowledge.

(1995: 171)



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


1.1.7 Vocabulary attrition and long-term retention

Vocabulary acquisition is not a tidy linear affair, with only incremental advancement and no backsliding. All teachers recognize that learners for-get material as well. This forgetting (attrition) is a natural fact of learning. We should view partial vocabulary knowledge as being in a state of flux, with both learning and forgetting occurring until the word is mastered and ‘fixed’ in memory. In Schmitt (1998a), I found that advanced L2 uni-versity students improved their knowledge of the meaning senses of target words about 2.5 times more than that knowledge was forgotten (over the course of one year), but this means there was some backsliding as well.

Of course attrition can also occur even if vocabulary is relatively well known, such as when one does not use a second language for a long time, or one stops a course of language study. Studies into attrition have produced mixed results, largely due to the use of different methods of measuring vocab-ulary retention (e.g. Bahrick, 1984; Hansen and McKinney, 2002; Weltens and Grendel, 1993). In general though, lexical knowledge seems to be more prone to attrition than other linguistic aspects, such as phonology or gram-mar. This is logical because vocabulary is made up of individual units rather than a series of rules, although we have seen that lexis is much more pat-terned than previously thought. It appears that receptive knowledge does not attrite dramatically, and when it does, it is usually peripheral words, such as low-frequency noncognates, which are affected (Weltens and Grendel, 1993). On the other hand, productive mastery is more likely to be lost (Cohen, 1989; Olshtain, 1989), although see Schmitt (1998a) for contrary results. There is some evidence that the rate of attrition is connected to proficiency level, with learners with larger vocabularies retaining more residual knowledge of their vocabulary (Hansen et al., 2002). Weltens, Van Els, and Schils (1989) found that most of the attrition for the participants in their study occurred within the first two years and then levelled off. Overall, once vocabulary is learned, it does not seem to ever completely disappear, as Bahrick (1984) found residual vocabulary knowledge in his informants even after 50 years of language dis-use. It therefore is probably best to think of attrition in terms of loss of lexical access, rather than in terms of a complete elimination of lexical knowledge. See Section 5.6 for more on attrition and its measurement.

Quote 1.9 Hansen, Umeda, and McKinney on the absence of complete attrition even after long periods of language disuse

... even though access to lexical knowledge is lost, attriters may retain a substan-tial advantage in regaining that knowledge, in comparison with others who are learning the same words for the first time.

(2002: 669)



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


1.1.8 Vocabulary form is important

As mentioned above, learning a lexical item is often conceptualized as learning its meaning. While learning meaning is undoubtedly an essential initial step, more precisely this involves developing a link between form and meaning. If one thinks about it, this linkage is the minimum specifica-tion for knowing a word. If a lexical form is familiar, but its meaning is not known, then this item is of no communicative use. Likewise, if a meaning is known, but not its corresponding form, then the item cannot be either rec-ognized or produced. It is thus little wonder that most vocabulary materials attempt to teach this form-meaning link, and that most tests measure it in one way or another.

However, another common assumption seems to be that meaning is the key component of this link, while the form element is often downplayed or disregarded. In fact, there is a large body of research indicating that L2 learners often have trouble with the word form. For example, Laufer (1988) studied words with similar forms and found that some similarities were par-ticularly confusing for students, especially words which were similar except for suffixes (comprehensive/comprehensible) and for vowels (adopt/adapt). Similarly, Bensoussan and Laufer (1984) found that a mis-analysis of word forms, which looked transparent but were not, sometimes leads to misinter-pretation. Their learners interpreted outline (which looks like a transparent compound) as ‘out of line’, and discourse (which looks as if it has a prefix) as ‘without direction’. Moreover, it is not only the forms of the words them-selves which can lead to problems. Regardless of the word itself, if there are many other words which have a similar form in the L2 (i.e. large orthographic neighborhoods (Grainger and Dijkstra, 1992)), it makes confusion more likely. For example, the word poll may not be difficult in itself, but the fact that there are many other similar forms in English can lead to potential confu-sion (pool, polo, pollen, pole, pall, pill).

One reason people can learn their L1 so easily is that the mind becomes attuned to the features and regularities in the L1 input (Doughty, 2003; Ellis, 2006b). This developmental sharpening applies to the word form as well, as people become attuned to the particular set of phonemes and graph-emes in their L1, and the ways in which they combine. This specializa-tion makes L1 processing efficient, but can cause problems when there is an attempt to process an L2 in the same way, even though this may be counterproductive because the languages have different characteristics. For example, English speakers use mainly stress to parse words in the speech stream, while French speakers rely more on syllable cues. Cutler and her colleagues have found that both French and English speakers used their L1 cue processing strategies when learning the other language as an L2, causing problems for both groups (e.g. Cutler, Mehler, Norris, and Segui, 1986; Cutler and Norris, 1988). The same type of mismatch has been found in the processing of written language, for example, between Chinese and



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


English (e.g. Koda, 1997, 1998). What this means is that learners not only have to learn new oral and written forms in the L2, but they may also have to develop a completely new way of processing those forms, one which is in opposition to the automatic processes in their L1. The effect of this shows up in laboratory experiments, where de Groot (2006) found that L2 words which match L1 orthographical and phonological patterns are easier to learn and are less susceptible to forgetting than L2 words which do not match the L1 patterns.

Thus, while Ellis (1997) argues that form is mainly acquired through exposure, it is clear that this may not occur without problems in an L2. Consequently, vocabulary researchers must not take orthographic/phono-logical mastery of word form for granted, but must consider it an essen-tial (and often problematic) component part of lexical learning. Another implication is that formal aspects of target vocabulary need to be carefully considered when designing vocabulary research, in order to control for the difficulty that accrues from lexical form.

1.1.9 Recognizing the importance of the L1 in vocabulary studies

There is no doubt from research that the L1 exerts a considerable influence on the learning and use of L2 vocabulary in a number of ways (Ringbom, 2007; Swan, 1997). To start with, for learners studying an L2 through junior high school, senior high school, and university, the size of L1 and L2 lexi-cons correlate strongly (.61–.75), showing parallel growth to a large extent (Henriksen, 2008, see Section 1.1.6). In terms of learner output, Hemchua and Schmitt (2006) studied the lexical errors in Thai university EFL com-positions, and found that nearly one-quarter were judged to be attributable to L1 influence. But for verb-noun collocation errors in particular, the per-centage may be over 50% (Nesselhauf, 2003). Learners also typically employ their L1 in learning an L2, most noticeably in the consistently high usage of bilingual dictionaries (Schmitt, 1997). They also strongly believe that translating helps them acquire English language skills such as reading, writ-ing, and particularly vocabulary words, idioms, and phrases (Liao, 2006). But perhaps the best evidence for L1 influence comes from psycholinguistic studies, which demonstrate that the L1 is active during L2 lexical processing

Quote 1.10 Koda on the problems related to poor recognition of word form

[I]nefficient orthographic processing can lead not only to inaccurate lexical retrieval, but to poor [reading] comprehension as well.

(1997: 35)



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


in both beginning and more advanced learners (e.g. Hall, 2002; Jiang, 2002; Sunderman and Kroll, 2006).

The importance of the L1 is highlighted by a number of second-language vocabulary acquisition studies. Prince (1996) found that more newly learned words could be recalled using L1 translations than L2 context, particularly for less proficient learners. With secondary school Malaysian learners, using L1 translations was tremendously more effective than providing L2-based meanings (Ramachandran and Rahim, 2004). Laufer and Shmueli (1997) found the same trend with Hebrew students. Lotto and de Groot (1998) found that L2-L1 word pairs lead to better learning than L2-picture pairs, at least for relatively experienced foreign-language learners.

The ubiquitous influence of the L1 on L2 vocabulary learning and use needs to be taken into consideration in vocabulary research. It has impor-tant implications for the selection of target vocabulary, as words which are cognates are typically easier than non-cognates. Also, words which follow the orthographic/phonologic regularities of the L1 will normally be easier than those which do not. In addition, the L1 needs to be considered in selecting measurement formats. For example, it has been hypothesized that the initial form-meaning link consists of the new L2 word form being attached to a representation of the corresponding L1 word which already exists in memory (Hall, 2002), and so L1 translation tasks would be a natu-ral task for measuring this. See Section 2.6 for more discussion of L1 lexical influence.

1.1.10 Engagement is a critical factor in vocabulary acquisition

It is a commonsense notion that the more a learner engages with a new word, the more likely he/she is to learn it. A number of attempts have tried to define this notion of engagement more precisely. Craik and Lockhart’s (1972) Depth/Levels of Processing Hypothesis laid the basic groundwork by stat-ing that the more attention given to an item, and the more manipulation involved with the item, the greater the chances it will be remembered. Laufer and Hulstijn (2001; also Hulstijn and Laufer, 2001) refined the notion further and suggested that the total involvement for vocabulary learning consists of three components: need, search, and evaluation. Need is the requirement

Quote 1.11 Swan on L1 influence on second-language vocabulary

The mother tongue can influence the way second-language vocabulary is learnt, the way it is recalled for use, and the way learners compensate for lack of knowl-edge by attempting to construct complex lexical items.

(1997: 179)



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


for a linguistic feature in order to achieve some desired task, such as need-ing to know a particular word in order to understand a passage. Search is the attempt to find the required information, e.g. looking up the meaning of that word in a dictionary. Evaluation refers to the comparison of the word, or information about a word, with the context of use to see it fits or is the best choice. The authors found some support for their hypothesis: learners writ-ing compositions remembered a set of target words better than those who saw the words in a reading comprehension task, and learners who supplied missing target words in gaps in the reading text remembered more of those words than learners who read marginal glosses. In both comparisons, the ‘better learning’ case had higher involvement according to the Laufer and Hulstijn scheme. They also reviewed a number of studies and found that the tasks with relatively more need, search, and evaluation elements were more effective (Table 1.4).

While this is almost certainly true, research also shows that many other factors make a difference as well. For example, while Laufer and Hulstijn’s Involvement Load Hypothesis is useful for materials writers to set up good materials which can facilitate incidental vocabulary learning, it does not fully take the student into account. The ‘need’ component does have a

Table 1.4 Relative effectiveness of vocabulary learning methods

The more effective task The less effective task Study

Meaning selected from several options

Meaning explained by synonym

Hulstijn, 1992

Meaning looked up in a dictionary

Reading with/without guessing

Knight, 1994; Luppescu and Day, 1993

Meaning looked up in a dictionary

Meaning provided in a marginal gloss

Hulstijn, Hollander, and Greidanus, 1996

Reading and a series of vocabulary exercises

Reading only (and inferring meaning)

Paribakht and Wesche, 1997

Meaning negotiated Meaning not negotiated Newton, 1995

Negotiated input Premodified input Ellis, Tanaka, and Yamazaki, 1994

Used in original sentences (oral task)

Used in non-original sentences

Joe, 1995, 1998

Interactionally modified output

Interactionally modified input

Ellis and He, 1999

Used in a composition (Ll-L2 look up)

Encountered in a reading task (L2-L1 look up)

Hulstijn and Trompetter, 1998

Reading, words looked up in a dictionary

Reading only, words not looked up

Cho and Krashen, 1994

(Laufer and Hulstijn, 2001: 13).



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


motivational aspect, because strong need is when learners decide that a lexical item is necessary for some language task they wish to do. However, Joe (2006) points out that students can scan, engage, and interpret in many different ways regardless of material design, and there is little way to know in advance exactly how. Moreover, students’ strategic behavior has an effect as well. It appears that vocabulary learning is part of a cycli-cal process where one’s self-regulation of learning leads to more involve-ment with and use of vocabulary learning strategies, which in turn leads to better mastery of their use. This better mastery enhances vocabulary learning, the effectiveness of which can then be self-appraised, leading to a fine-tuning of self-regulation and the start of a new cycle (Tseng and Schmitt, 2008). (See Section 2.9 for more on vocabulary learning strategies and self-regulation.) Furthermore, Folse (2006) suggests that the number of exposures to the target items may be at least as important as the type of learning activity.

There are a range of other factors which recur throughout the literature as facilitating vocabulary learning, including the following:

– increased frequency of exposure– increased attention focused on lexical item– increased noticing of lexical item– increased intention to learn lexical item– a requirement to learn lexical item (by teacher, test, syllabus)– a need to learn/use lexical item (for task or for a personal goal)– increased manipulation of lexical item and its properties– increased amount of time spent engaging with lexical item– amount of interaction spent on lexical item

Overall, it seems that virtually anything that leads to more exposure, attention, manipulation, or time spent on lexical items adds to their learning. In fact, even the process of being tested on lexical items appears to facilitate better retention, as research designs which include mul tiple posttests usually lead to better results on the final delayed posttest com-pared to similar designs with fewer or no intermediate posttests (e.g. Mason and Krashen, 2004). Previously, there was no one cover term that encompassed all of these involvement possibilities, and so in an overview of instructed second-language vocabulary instruction, I proposed the term engagement (Schmitt, 2008). Because it has such an important influ-ence on vocabulary learning, it needs to be especially carefully controlled in vocabulary research. Comparison conditions need to be as equivalent as possible in terms of the number of exposures, total time of exposure, type of manipulation, and even the number of interim tests before the final assessment.



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


1.2 Vocabulary and reading

It is beyond the scope of this book to provide a detailed overview of vocabu-lary in all four skills. However, the vast majority of skills/vocabulary research has focused on reading. I will briefly survey the vocabulary/reading research as a means of giving a flavor of the kind of lexical research which can be done in relation to the skills. Readers interested in vocabulary and the other skills are directed to Chapters 4 and 5 in Nation (2001).

The effectiveness of incidental vocabulary learning from reading

Early research on vocabulary acquisition from incidental exposure in read-ing found a discouragingly low pickup rate, with about 1 word being cor-rectly identified out of every 12 words tested (Horst, Cobb, and Meara, 1998). However, the early studies typically had a number of methodologi-cal weaknesses, including very small amounts of reading, insensitive meas-urement instruments, inadequate control of text difficulty, small numbers of target words, and no delayed posttests. More recent studies which have addressed some or all of these problems have found more gains from read-ing than previous studies indicated. Horst et al. (1998) found learning of about 1 new word out every 5, and that this learning persisted over a period of at least ten days. Horst (2005) found that her participants learned well over half of the unfamiliar words they encountered in their extensive read-ing. Pigada and Schmitt (2006) studied the learning of spelling, meaning, and grammatical characteristics during a one-month extensive reading case study. They found that 65% of the target words were enhanced on at least one of these word knowledge types, for a pickup rate of about 1 of every 1.5

Quote 1.12 Schmitt on the importance of engagement

In essence, anything that leads to more and better engagement should improve vocabulary learning, and thus promoting engagement is the most fundamental task for teachers and materials writers, and indeed, learners themselves.

(2008: 339–340)

Concept 1.1 Incidental learning

Incidental learning is learning which accrues as a by-product of language usage, without the intended purpose of learning a particular linguistic feature. An exam-ple is any vocabulary learned while reading a novel simply for pleasure, with no stated goal of learning new lexical items.



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


words tested. Spelling was strongly enhanced, even from a small number of exposures, while meaning and grammatical knowledge were enhanced to a lesser degree. Brown, Waring, and Donkaewbua (2008) found encouraging amounts of durable incidental vocabulary learning in terms of recognition of word form and recognition of meaning in a multiple-choice test, but far less in terms of being able to produce the meaning in a translation task. Waring and Takaki (2003) also found stronger gains and retention for rec-ognition than recall knowledge. Their Japanese participants recognized the meaning of 10.6 out of 25 words on a immediate multiple-choice test, but only were able to provide a translation for 4.6 out of 25. However, after three months, while the recognition of meaning score dropped to 6.1, the transla-tion score dropped much more sharply to 0.9. This indicates that incidental vocabulary learning from reading is more likely to push words to a partial rather than full level of mastery, and that any recall learning is more prone to forgetting than recognition learning.

Number of exposures necessary to promote incidental learning from reading

An important issue related to lexical acquisition from reading is the number of exposures which are necessary to push the incremental learning of a word forward, especially in a way that is durable. Webb (2007a) compared the learning of words from the study of L2-L1 word pairs, both with and without the addition of a single example sentence. The results for the two conditions were the same, indicating that a single context had little effect on gaining vocabulary knowledge. Beyond a single exposure, learning increases, but there does not appear to be any firm threshold when it is certain. At the lower end of the frequency spectrum, Rott (1999) found that 6 exposures led to better learning than 2 or 4 exposures. Pigada and Schmitt (2006) found that there was no frequency point where the acquisition of meaning was assured, but by about 10+ exposures, there was a discernable rise in the learn-ing rate. However, even after 20+ exposures, the meaning of some words eluded their participant. Waring and Takaki (2003) found it took at least 8 repetitions in order for learners to have about a 50% chance of recognizing

Quote 1.13 Nation on the relationship between vocabulary and reading

Research on L1 reading shows that vocabulary knowledge and reading compre-hension are very closely related to each other ... This relationship is not one direc-tional. Vocabulary knowledge can help reading, and reading can contribute to vocabulary growth.

(2001: 144)



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


a word’s form, or its meaning on a multiple choice test, three months later. However, even if a new word was met 15–18 times, there was less than a 10% chance that a learner would be able to give a translation for it after three months, and no words met fewer than 5 times were successfully translated. Horst et al., (1998) also found that words appearing 8 or more times in their study had a reasonable chance of being learned, while Webb (2007b) found that 10 encounters led to sizeable learning gains across a number of word knowledge types. Of course, learning a word depends on more than just the frequency of exposure. The quality of engagement will obviously also play a part. Furthermore, Zahar, Cobb, and Spada (2001) suggest that the number of encounters needed to learn a word might depend on the proficiency level of the learners, because more advanced learners who know more words seem to be able to acquire new words in fewer encounters. Nevertheless, the research seems to suggest that 8–10 reading exposures may give learners a reasonable chance of acquiring an initial receptive knowledge of words.

Taken together, the research confirms that worthwhile vocabulary learning does occur from reading. However, the pickup rate is relatively low, and it seems to be difficult to gain a productive level of mastery from just exposure. Hill and Laufer (2003) estimate that, at the rates of incidental learning reported in many studies, an L2 learner would have to read over 8 million words of text, or about 420 novels, to increase his/her vocabulary size by 2,000 words. This is clearly a daunting prospect, and thus it is probably best not to rely upon incidental learning as the primary source of the learning for new words. Rather, incidental learn-ing from reading seems to be better at enhancing knowledge of words which have already been met. This conclusion is congruent with Waring and Takaki’s (2003) findings that reading graded readers does not lead to the learning of many new words, but that is very useful in developing and enriching partially-known vocabulary. Studies with a variety of test types have shown that exposure leads to improvements in multiple types of word knowledge. Also, given that repetition is key to learning words, the benefits of repeated exposures in different contexts for consolidating fragile initial learning and moving it along the path of incremental devel-opment cannot be underestimated.

Quote 1.14 Webb on the value of repetition in reading for vocabulary learning

Repetition affects incidental vocabulary learning from reading. Learners who encounter an unknown word more times in informative contexts are able to dem-onstrate significantly larger gains in [various] vocabulary knowledge types than learners who have fewer encounters with an unknown word.

(2007b: 64)



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


Extensive reading

One way of maximizing the vocabulary learning benefits from reading is to organize an extensive reading component (Day and Bamford, 1998). Although readers need to know 98–99% of the words in a text, many authentic texts will still be suitable for more advanced learners, especially if teachers provide support for the more difficult vocabulary (see below). However, for developing learners, the vocabulary load will probably be too high in authentic texts, and so the use of graded readers is recommended, as the vocabulary load is both fine-tuned for the learner’s level, and sys-tematically recycled (Nation and Wang, 1999). Graded readers used to have a bad reputation for being boring and poorly written, but that is no longer the case, with several major publishers providing a series of interesting and well-presented readers. Most importantly, research shows that substantial vocabulary learning can be derived from graded readers. For example, Horst (2005) found that her participants learned over half of the unfamiliar words they encountered in the graded readers they read. Likewise, Al-Homoud and Schmitt (2009) found that Saudi learners in a short ten-week course incor-porating extensive reading and graded readers increased their vocabulary at the 2,000, 3,000, and 5,000 frequency levels, as well as improving their reading speed and attitudes towards reading. Unsurprisingly, the amount of reading is key: of ten variables entered into a regression analysis, only the amount of extensive reading done during a two-month course came up as a significant predictor of gain scores in overall language proficiency (Renandya, Rajan, and Jacobs, 1999).

Lexical inferencing

While extensive reading programs can maximize the amount of exposure, it is possible to help learners utilize that exposure more effectively. One way is to train them in lexical inferencing (Haastrup, 1991: 13):

The process of lexical inferencing involves making informed guesses as to the meaning of a word in the light of all available linguistic cues in combination with the learner’s general knowledge of the world, her awareness of the co-text and her relevant linguistic knowledge.

In Haastrup’s view, it is clear that lexical inferencing is much more than merely ‘guessing from context’, as learners use both their existing knowledge and the textual context to guess the meaning of unknown lexical items. It is probably best to think of lexical inferencing as qualified guessing of the meaning of lexical items in context, rather than guessing from context, as contextual cues are only one of several knowledge sources.

Learners typically rate lexical inferencing as a useful strategy (Schmitt, 1997; Zechmeister, D’Anna, Hall, Paus, and Smith, 1993) and research has shown that it is one of the most frequent and preferred strategies for learners



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


when dealing with unknown words in reading. In one study, Paribakht and Wesche (1999) found that their university ESL students used inferencing in about 78% of all cases where they actively tried to identify the meanings of unknown words, while Fraser (1999) found that her students used inferenc-ing in 58% of the cases where they encountered an unfamiliar word. It also seems to be a major strategy when learners attempt to guess the meaning of phrasal vocabulary, at least for idioms (Cooper, 1999). Unfortunately, this does not mean that it is necessarily effective. Nassaji (2003) found that of 199 inferences, learners only made 51 (25.6%) that were successful, and another 37 (18.6%) that were partially successful. This low success rate is similar to the 24% rate that Bensoussan and Laufer’s (1984) learners achieved. In an extensive cross-sectional study, Haastrup (2008) studied the lexical infer-encing success of young Danish learners of English, in both their L1 and L2, in Grades 7, 10, and 13.

Unsurprisingly, she found that her participants’ L1 lexical inferencing was better than their L2 inferencing, but she also found increasing success as the learners matured, both in the L1 and the L2. However, by Grade 13, the lexi-cal inferencing success rate had still only improved to the region of 50%.

One of the reasons for this relatively poor rate is that learners often con-fuse unknown words for words which they already know with a similar form (Nassaji, 2003), again highlighting the importance of form in learning vocabulary. Other factors include the percentage of unknown words in the text, word class of the unknown words, and learner proficiency. Liu and Nation (1985) found that unknown words embedded in a text where 96% of the other words were known were guessed more successfully than unknown words in a text with only 90% known. They also found that verbs were eas-ier to infer than nouns, and nouns easier than adjectives or adverbs. Finally they found that higher proficiency learners successfully inferred 85–100% of the unknown words, while the lowest proficiency learners only inferred 30–40% successfully.

This uneven success in lexical inferencing suggests that these skills need to be taught. Two meta-analyses (Fukkink and De Glopper, 1998; Kuhn and Stahl, 1998) and an overview (Walters, 2004) have found a positive effect for instruction in the use of context. Both meta-analyses found that context clue instruction was as or more effective than other forms of instruction (e.g. cloze exercises, general strategy instruction), but the inferencing improve-ment may be mostly about attention given to the inferencing process, as Kuhn and Stahl concluded that there was little difference between teaching

Grade 7 Grade 10 Grade 13

L2 16.83% 37.27% 48.10%

L1 28.93% 50.07% 58.80%



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


learners inferencing techniques and just giving them opportunities to prac-tise guessing from context. Walters (2006) found that learners of different proficiencies seemed to benefit from different approaches, with beginning learners benefiting most from instruction in a general inferencing proce-dure (Clarke and Nation, 1980), and more advanced learners benefiting more from instruction in the recognition and interpretation of context clues. She also found that instruction in inferencing may do more to improve reading comprehension than the ability to infer word meaning from context.

Glossing

Another way to help learners utilize reading exposure better is to give them information about unknown words in the text. One way this can be done in teacher-prepared texts is with glossing. Nation (2001) believes there are several reasons why glossing can be useful: more difficult texts can be read, glossing provides accurate meanings for words that might not be guessed correctly, it has minimal interruption to reading – especially compared to dictionary use, and it draws attention to words which should aid the acquisition process. Research tends to support these views. Hulstijn (1992) found that glosses helped prevent learners from making erroneous guesses about unknown words, which is important because learners seem reluctant to change their guesses once made (Haynes, 1993). Moreover, Hulstijn et al. (1996) found that L2 readers with marginal glosses learned more vocabulary than dictionary-using readers, or readers with no gloss/dictionary support. (This is mainly because the L2 readers used the glosses more than their dictionaries. However, when the readers did use their dic-tionaries, the results were better than for using glosses, which is why Laufer and Hulstijn (2001) judged that dictionary use is more effective: Table 1.4, this volume.) But how and where to gloss? Research indicates that it does not matter much whether the gloss is an L2 description or an L1 transla-tion, as long as the learner can understand the meaning (Jacobs, Dufon, and Fong, 1994; Yoshii, 2006), which suggests that there is no reason not to use L1 glosses with less proficient learners. Glosses just after the target word do not seem to be very effective (Watanabe, 1997), but glosses in the margin, bottom of the page, or end of the text have similar effectiveness (Holley and King, 1971). As learners seem to prefer marginal glosses, this is probably the best place for them (Jacobs et al., 1994). If phrasal vocabulary is being glossed, it helps to make the phrases more salient by highlighting their form (e.g. by printing them in color, and/or underlining them), so that the learner can recognize them as chunks (Bishop, 2004).

Supplementing incidental vocabulary acquisition with explicit activities

Glossing is one way of focusing explicit attention on lexical items dur-ing reading where otherwise only incidental learning would occur.



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


Furthermore, it is effective: reading with marginal glosses or referral to a dictionary leads to better receptive knowledge of words than reading alone (Hulstijn et al., 1996). But there are many other possibilities for adding an explicit learning component to reading, based on the general principle that intentional and incidental learning are complementary approaches which can be usefully integrated. Perhaps the most effective way of improving incidental learning is by reinforcing it afterwards with intentional learning tasks. Hill and Laufer (2003) found that post-reading tasks explicitly focus-ing on target words led to better vocabulary learning than comprehension questions which required knowledge of the target words’ meaning. Atay and Kurt (2006) found that young Turkish EFL learners who carried out reading comprehension and interactive tasks as post-reading activities out-performed students who did written vocabulary tasks, and that the inter-active tasks were much more appealing for the young learners. Mondria (2003) gives a particularly good illustration of the value of post-reading exercises. Dutch students who inferred the meaning of French words from sentence contexts, and then verified the meaning with the aid of a word list before memorization, learned just as much vocabulary (about 50% of the target words on a two-week delayed receptive test) as students who were given a translation before memorization. This shows that incidental learn-ing plus explicit follow-up (particularly the memorization element) can be just as effective as a purely explicit approach. However, it is not as time effective, as the ‘translation + memorization’ method used 26% less time than the ‘incidental + follow-up’ method to achieve the same results.

However, although the greater engagement of reading + explicit attention leads to greater learning, it is still fragile and needs to be followed up. Rott, Williams, and Cameron (2002) found that while reading + multiple-choice glosses led to better immediate scores than reading-only incidental learning alone, after five weeks the scores had decayed to the same level as the inciden-tal learning condition. Thus, the improved learning gained from incidental exposure + supplementary tasks can be useful if subsequently consolidated and maintained, but if not followed up, the advantage may well be lost.

1.3 A sample of prominent knowledge gaps in the field of vocabulary studies

We have learned much about vocabulary since the blossoming of lexical research which Meara first noted in 1987, with the points noted above being just a sample of some of the important findings from the last 20 years. Moreover, we must not forget that a great deal was learned about vocabu-lary in the early and mid-1900s, much of which has been forgotten, and which is often presented as ‘new’ in more recent studies which are unaware of the earlier research. However, there are still a large number of important



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


lexical issues about which we have little or no understanding. This section will highlight some of what I feel are the biggest gaps. All of these would make excellent (and challenging) research topics for the ambitious vocabu-lary researcher.

No overall theory of vocabulary acquisition ●

This has been often commented upon, and represents the Holy Grail of vocabulary studies. While we are gaining an increasing understanding of the development of some isolated aspects of vocabulary, the overall acqui-sition system is far too complex and variable for us to comprehend in its entirety, and so it still eludes description. It is difficult to visualize any par-ticular study which could unlock the mystery; rather it will probably take a large number of studies using a combination of methodologies before the key developmental patterns become obvious. Some of the studies will need to be done in the actual environments in which learners learn (e.g. class-rooms, private reading). There will also need to be experiments done in the laboratory, where the large number of potential learning variables can be better controlled. Computer simulations of learning models can be very useful, both because they make us think carefully about the assumptions we make about acquisition (in order to write the programming rules), and because a vast number of trials can be run (without the need for finding vast numbers of participants) (e.g. Meara, 2004, 2005, 2006; see Section 2.10). In addition, neurolinguistics is now beginning to shed light on the physiological underpinnings of language acquisition and use. Combining the strengths of these diverse research paradigms offers the best chance of understanding vocabulary acquisition well enough to formulate an explan-ation of its mechanisms.

The relationship between receptive and productive ●

mastery of vocabulary

An important part of the overall development of vocabulary is the movement from no knowledge to receptive mastery to productive mastery. Although we know that receptive mastery usually precedes productive mastery, it is unclear how the process proceeds, or exactly what input/practice is required to initiate it. The relationship between the two has been seen by some as a continuum (e.g. Melka, 1997), where gradually increasing knowledge helps one move from receptive to productive mastery. In contrast, Meara (1990) has argued that the two might be quantifiably different, perhaps depend-ing on an item’s status within the lexical network. One of the problems in describing receptive and productive mastery lies in the difficulty of meas-urement. Waring (1999) found that the relationship between the two largely depends on the measurement instruments used. For example, if a researcher uses a relatively difficult receptive measurement and a relatively easy pro-ductive measure, it might even be found that productive mastery precedes



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


receptive mastery! Thus research into receptive and productive mastery requires careful selection of the measurement instruments, and a careful interpretation of the results. Only by carrying out a number of studies with both receptive and productive measurements of the same lexical items can the true relationship between the two levels of mastery be clarified. Only after this descriptive stage will it then be possible to hypothesize about the acquisition mechanisms which can explain the results. See Section 2.8 for much more on the issue of receptive and productive mastery.

Measuring the various word knowledge aspects ●

In order to better understand the nature and acquisition of vocabulary knowledge, it is necessary to develop measures for the different word knowl-edge aspects. While it is not practically possible, and probably not desir-able, to measure all word knowledge aspects in any particular study, it is important to have a better understanding of how each of the various word knowledge aspects develops. For some aspects, like form-meaning, there are numerous measurement instruments which are commonly used. However, some aspects have hardly been researched at all and so no measurement instruments have been developed. Register is a good example of this, and we therefore have virtually no idea of how it develops. The acquisition of other aspects (e.g. collocation, intuitions of frequency, association) has received some attention, but there are still no accepted measurement instruments. Until valid and reliable measurements can be taken of more word knowl-edge aspects, it will be impossible to chart the incremental acquisition of overall vocabulary knowledge.

Understanding implicit/procedural as well as ●

explicit/declarative vocabulary knowledge

The vast majority of research into vocabulary involves measurement and discussion of explicit/declarative knowledge. For example, most vocabu-lary tests measure form and meaning, both lexical aspects which can be described by the learner (e.g. the French word merci means thank you in English). There has been much less research on how well the lexical items can be used in language use, that is implicit/procedural knowledge. For example, this would include how well vocabulary is utilized when giving a speech. This research bias largely results from the relative ease of measur-ing explicit/declarative knowledge, compared to the difficulty of measuring vocabulary in use and the underlying implicit/procedural knowledge which makes this possible. It also stems from the belief that lexical knowledge is mainly declarative, ignoring the complex nature of word knowledge. A few researchers are now beginning to discuss the nature of the declarative/pro-cedural and explicit/implicit distinctions in language learning in general (e.g. DeKeyser, 2003; Hulstijn, 2007) and it is time to begin applying this type of thinking to exploring vocabulary knowledge, especially as research



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


methodologies from the field of psycholinguistics and neurolinguistics now hold the promise of meaningful measurement of implicit/procedural knowledge. It is early days yet, but research into implicit/procedural knowl-edge can only help to clarify the full nature of what it means to know vocabulary, and the factors which allow this knowledge to be employed accurately, appropriately, and fluently by both L1 and L2 speakers in their language communication.

Vocabulary in spoken discourse ●

There has been a great deal of research on reading’s relationship with vocabulary, and as a result, we know a great deal about the interaction between the two. However, there has been much less research on spoken discourse, again probably because it is harder to research. Therefore, there is a big gap in the field’s understanding of spoken discourse and vocabu-lary. We have little idea of how vocabulary is learned from listening, how many repetitions it requires, or what makes a word salient for learning in spoken discourse. Another big gap is the percentage of lexical cover-age which is necessary for listening comprehension, that is, what per-centage of lexical items in spoken discourse need to be known in order to comprehend it? This is critical, because as we have seen above, a few percentage points difference makes a huge difference in the amount of vocabulary required. If only 95% coverage were required, then something like 2,000–3,000 word families will be sufficient (Adolphs and Schmitt, 2003), but if 98% were necessary, this would entail a vocabulary size of 6,000–7,000 word families (Nation, 2006). Clearly, these are vastly differ-ent vocabulary size targets, which have huge implications for pedagogy and syllabus design.

Measurement of vocabulary in free ●

language production

Vocabulary studies have traditionally focused on receptive vocabulary. There could be several reasons for this. One could be because the receptive form-meaning link is the first step of learning vocabulary, it can make good sense to measure this. It must be said however, that receptive measurement of this aspect has probably had more to do with the ease of measurement than of any theoretical consideration of the nature of vocabulary acquisi-tion. Another reason could be that both participants and researchers know and are comfortable with the type of multiple-choice items which typically appear in receptive formats. But probably a more important reason is that receptive test formats usually offer researchers more control than productive ones. Read (2000: 9) describes one of the dimensions of vocabulary assess-ment as a continuum between selective measures (where specific vocabulary items are the focus of the assessment) and comprehensive measures (where the measure takes account of the whole vocabulary content of the input



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


material, e.g. a composition). Receptive test writers typically select target vocabulary, and there are numerous advantages to being able to do this, perhaps the greatest being a limiting of the scope of the enquiry. In an analysis of free written production, however, there is little control of the output vocabulary (other than limiting an examinee to all of the possible lexical items which could be used to discuss a particular topic), and one may therefore find a relatively unpredictable variety of lexical items. Because one cannot predict which items will appear with any precision (other than some necessary technical vocabulary particular to the topic area), it is not possible to define the behavior of specific lexical items in advance as part of a scoring scheme.

This limitation has led to measures of free lexical output which focus on statistical analyses of the overall lexical output, e.g. the number of types and token produced, or the number of items produced from particular frequency bands. An example of this is Morris and Cobb’s (2004) analy-sis of entrance exam essays with the 2002 version of VocabProfiler. They found that resultant vocabulary profiles correlated with scores in a peda-gogical grammar course, but only at between .34–.37, or around 13% of the variance. The profiles could distinguish between the writing of successful native and nonnative TESL trainees, but this was limited to comparisons of the first 1,000 words, words from the AWL, and function/content words. Overall, the authors felt that the profiles could add useful information to other measurements, but could not stand on their own.

This seems a fair evaluation, as my personal experience has shown the various kinds of profiling to be quite limited in the insights they can pro-duce. For example, I find that compositions which appear clearly different in lexical terms often do not show much of a difference when analyzed according to such profile statistics. It seems it doesn’t matter so much what particular lexical items are used, but rather if they are used appropriately for the particular context in terms of register, collocation, etc. Thus I feel that global measures of lexis (e.g. type-token ratios) will generally be less informative than measures of how appropriately the individual items are used. At the moment, there is no recognized measure of the appropriacy of written lexis in compositions, and this is a major gap in the field. Until we develop one, it will be very difficult to distinguish relatively better lexical performance in free composition writing from relatively weaker perform-ance, at least in quantifiable and replicable terms. The same argument can also be made for the vocabulary in spoken output, and perhaps even more so. It must be said, however, that more detailed lexical profiling tools are continuously being developed, and they may well prove more informative than earlier tools. An especially good example of this continuing develop-ment is the Compleat Lexical Tutor by Tom Cobb, which now gives a fre-quency analysis of each 1,000 frequency band to the 20,000 level (available at <http://www.lextutor.ca/>).



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

http://www.lextutor.ca


The amount of formulaic language in language use ●

Research has clearly shown that there is a substantial amount of formulaic language in language (Biber, Johansson, Leech, Conrad, and Finegen, 1999; Erman and Warren, 2000; Foster, 2001), and that this is true in both written and spoken discourse. However, it is not yet clear what the proportion of for-mulaic language typically is compared to that creatively generated through grammar + vocabulary, and how this proportion varies according to the mode, genre, topic, or speaker. There must be a huge number of formulaic sequences in most languages, but we have no principled estimate of the size of the phrasal lexicon in English, or in any other language for that matter. It has been established that formulaic language has transactional and func-tional uses, but how much is required to operate in the four skills remains a mystery. Part of the problem is that there is no accepted methodology of how to identify and count formulaic items, which has led to a range of size estimates. Until an accepted categorization system appears, we will continue to have studies which are difficult to compare due to different counting methodologies.

Another problem is that studies into formulaic language have almost always been corpus- and computer-based. Computers are amazingly effec-tive at finding and counting any linguistic feature which can be unam-biguously described. However, they are largely incapable of working with any feature that cannot be so described. Researchers of formulaic language usually ask a computer to find instances of contiguous words, because the computer has a hard time identifying probabilistic patterns where colloca-tions can appear in any of a number of slots. Thus, most formulaic lan-guage research has researched only contiguous sequences (e.g. Biber et al., 1999). This is a problem, as many (most? – we simply do not know) formu-laic sequences have slots into which one or more words can be inserted. (See Fellbaum, 2007, for one approach to this.)

a ——— ago (hour, day, year ...)Would you please ———? (open the door, shut up ...)

——— (someone or something) thinks nothing of ——— (some unusual or unexpected activity)

Patterns with open slots have a lot of flexibility to match different lan-guage use situations, and so are likely to be very common. In fact, flexible formulaic sequences may well outnumber the fixed kind, and may have greater importance in language use. The mind doesn’t seem to have a prob-lem working with these flexible slots, but until very recently, computer pro-grams were not able to handle this variability. Fortunately, there are now several programs (e.g. kfNgrams, ConcGrams) which can work with open-slot phraseology, and over time, should provide a much fuller description of the



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


scope and nature of formulaic language. (See Section 6.3 for more informa-tion on these concordancing tools.)

Not enough vocabulary research is replicated ●

As with most research in Applied Linguistics, vocabulary research suffers from the lack of replication to confirm and refine results. This often results in key information in the field being based on the findings from single stud-ies. A good example of this was the finding that learners require knowledge of 95% of the words in a text in order to comprehend it adequately. Batia Laufer came to this conclusion after carrying out a small study with her stu-dents. She published it in a relatively obscure edited volume (Laufer, 1989), and never made great claims for it, other than as an initial finding based on her particular participants, instruments, and criteria. Yet as so often hap-pens, the 95% ‘number’ was picked out and widely cited. The chaining of citations eventually led to the figure having an authority which was never intended, and a great deal of research was based upon it, at least until newer research came along. In a study published in 2000, Hu and Nation carried out a more carefully controlled analysis, and came up with a coverage figure closer to 98–99% for adequate comprehension. However, even here, these new results are based on very small participant numbers (66 in total, split into four groups of only 16–17).

What is still needed in this case is two types of replication. The first is a ‘pure replication’ with the same type of participants with the same instru-ments, to discover whether the results of the original studies are sound and reliable. If not, then either the overall research designs and instrumentation are suspect, or there are hidden factors which are affecting the results. If the replications confirm the original results, then additional replications in different contexts are then warranted. These could include participants with different L1s or L2 proficiencies, learner contexts, etc. Since the gen-erally correct answer to most SLA questions is ‘It depends’, it is necessary for researchers to explore the various factors which can affect results and describe their effect.

Given the importance of replication, one may wonder why more are not done. The main reason is probably that journals are not inclined to publish replication studies, which has a chilling effect on their production, espe-cially by established researchers who have a high-stakes interest in getting their work published. The exception is the journal Language Teaching, which now explicitly encourages submission of well-done replications.

Overall, replication is an important part of the research cycle, and our area needs to be more creative in developing ways to encourage it. Perhaps a start would be not to use the term replication, which has for some a negative image, but speak rather of cycles of research, which more people might view favourably. More established researchers could build replication into their research studies, either in follow-up studies, or by doing several concurrent



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


studies incorporating different participants/contexts. This could be done in collaboration with international colleagues, perhaps even providing them with the design and instrumentation, with the offer to potentially co- publish the combined findings.

Another obvious solution to the problem is to involve emerging research-ers. Replications are a very good way to break into research, as the research design and analysis methodology (and in many cases the instruments as well) are already provided. Beginning researchers (e.g. BA and MA students) may well benefit from carrying out replications, and having a much bet-ter chance of being able to do good research, as it will be based on designs which have already been tried, and vetted by journals’ editors and review-ers. The field could benefit by having a steady stream of replication either confirming or disconfirming existing results. The replication findings could be disseminated in several ways: (a) in journals which allow replications, (b) by ‘packaging’ and reporting several similar replications together, which may make them more attractive to journals which have traditionally liked only ‘original’ research, (c) by publishing them on-line on their university’s research web pages, or (d) the students’ supervisors could refer to the replica-tions in the supervisors’ own work.

Vocabulary specialists in different fields do not ●

talk to one another

There are many researchers who focus on vocabulary, but often in separate fields. For example, vocabulary specialists work with disabled patients in speech pathology. Bilingual specialists look at knowledge of vocabulary in two or more languages, often through laboratory research. Lexicographers decide which words belong in dictionaries and how they can best be defined. All of these fields have rich insights which could prove beneficial to lexical scholars and practitioners working in the other areas. However, it seems that researchers working in one field often do not search out and read the relevant lexical literature from another field. While this is inevitable at the practitioner level for reasons of time, vocabulary researchers owe it to them-selves and their readers to cast their nets more widely, and take advantage of the wider lexical insights available.

In addition to these research gaps, Paul Nation lists numerous research topics on his website (http://www.victoria.ac.nz/lals/staff/paul-nation/vocrefs/researchlval.aspx). Some examples include:

Make a replacement for the ● General Service List.Investigate the qualitative differences between receptive and productive ●

vocabulary knowledge.What unique information do different techniques add to word knowl- ●

edge? What common information do they add?



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

http://www.victoria.ac.nz/lals/staff/paul-nation/vocrefs/researchlval.aspx

http://www.victoria.ac.nz/lals/staff/paul-nation/vocrefs/researchlval.aspx


Determine the factors influencing incidental vocabulary learning by ●

using a message-focused computer game.How does learners’ focus of attention change as a text is listened to several ●

times? Where does vocabulary fit in this range of focuses of attention?How can vocabulary learning from graded readers be optimized? ●

What aspects of word knowledge are learned by guessing from context? ●

Develop a list of frequent collocations using well-defined and carefully ●

described criteria.Devise a well-based measure of total vocabulary size for non-native ●

speakers.Measure the pattern of native-speaker and ESL non-native-speaker vocab- ●

ulary growth.

Quote 1.15 Meara on the necessity of having vocabulary research mirror ‘real world’ conditions

One of the main shortcomings of ... [some vocabulary research] is that it has focused attention on the acquisition of vocabulary divorced from use or from real context. Many of the subjects tested in the methodological comparisons were not real language learners, the time-scale studied was short compared to the time it takes to learn a language, and the vocabularies learned were actually quite small in comparison to what a real language learner has to acquire to become fluent. There is a serious shortage of good research that has looked at the behavior of real language learners acquiring vocabularies over a long time-scale.

(1999: 565)



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

Part 2

Foundations of Vocabulary Research



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

47

2Issues of Vocabulary Acquisition and Use

What’s a word? This seems like a simple question and almost any layman could come up with a description of word. It might be something like ‘a group of letters with an empty space on either side, which has a meaning’. For the layman, such a definition is probably perfectly adequate, as most people only need to conceptualise word well enough to look up ‘words’ in a dictionary, or to count how many ‘words’ are in a document they are writing. But if we want to push beyond this basic level of understanding, we must understand the intricacies of lexis and control for them in our vocabulary research designs. This section will discuss some of the inter-related characteristics of vocabulary you will need to address when dealing with words in your research.

Psycholinguistic research has taken the lead in describing the various char-acteristics of vocabulary, because these characteristics need to be carefully controlled in order to isolate the effects of whatever linguistic variable is being studied. It has identified a large number of lexical characteristics which affect the way vocabulary is acquired and used. This is well-illustrated by a website that provides sets of target words in which users can manipulate over 30 characteristics (http://www.psy.uwa.edu.au/mrcdatabase/uwa_mrc.htm). Below is a partial listing of the lexical characteristics on this website:

number of letters ●

number of phonemes ●

number of syllables ●

Concept 2.1 Psycholinguistics

Psycholinguistics is the study of language acquisition, processing, and use through the use of theories and research tools drawn from the field of psychology.



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

http://www.psy.uwa.edu.au/mrcdatabase/uwa_mrc.htm


48 Foundations of Vocabulary Research

written frequency ●

familiarity rating ●

concreteness rating ●

imagability rating ●

meaningfulness rating ●

age of acquisition rating ●

common part of speech ●

morphemic status ( ● prefix/suffix/abbrev/hyph, etc.)contextual status ( ● colloquial/dialect/archaic, etc.)pronunciation variability ●

capitalization ●

irregular plural ●

stress pattern ● (reduced, unstressed, stressed).

As impressive as this list seems, it is far from comprehensive. It does not include spoken frequency, as the website relies on early corpus evidence (Kucera and Francis, 1967; Thorndike and Lorge, 1944, L count), which was based on written texts. It also does not capture issues based around the form-meaning link, such as synonymy (different forms with the same meaning).

The list also tends to focus on lexis as individual words, and so does not include information based on those words’ connections to other words. Corpus research has been at the forefront of highlighting how the behavior of individual words is both constrained and enriched by their contextual environment (e.g. collocation and other phrasal patterning). This is par-ticularly essential if vocabulary is conceptualized (as it should be) as multi-word units as well as single words. Likewise, research into word associations has shown that words have connections with many others in the mental lexicon, through formal, paradigmatic, and syntagmatic links. Thus, in many kinds of vocabulary research, it is important to consider how a word’s behavior is affected these contextual and mental network connections. This brings up a number of other characteristics which may be relevant:

a word’s collocations ●

whether a word’s meaning is largely driven by its phrasal patterning ●

(semantic prosody)whether a word’s frequency differs according to mode ( ● written, spoken, sign, etc)whether a word’s meaning and usage is connected to particular extra- ●

lingual cues (e.g. some spoken words can be tightly connected with spe-cific gestures or body language)context availability (how easy it is to think of a sentence or phrase which ●

a word can appear in)a word’s associations ( ● formal, paradigmatic, syntagmatic).



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

Issues of Vocabulary Acquisition and Use 49

All of the above characteristics involve either the lexical item itself, or its connections to its discourse context or mental associations. However, if one is studying L2 vocabulary acquisition, it is also important to consider the match between L1 and L2 lexical characteristics. This is because research has shown that typicality (the degree to which the form structures (phono-logical and orthographic) and the semantic classifications of the L2 lexis resemble those of the learner’s L1 lexis) has a strong impact on the learn-ability of L2 vocabulary, with more similarity leading to better learning (de Groot, 2006; Ellis and Beaton, 1993). The formal similarities between words are also related to cognateness, and this characteristic has been shown to affect L2-L1 translation.

It is probably impossible to fully control for all of these characteristics when doing research. Furthermore, different sets of characteristics will be relevant for different research designs and goals. Psycholinguistic studies will need to control for a relatively comprehensive range of characteristics, as the precise measures used in this type of research (e.g. often measured in milliseconds) can be confounded by even small differences in process-ing which can be caused by lexical factors. On the other hand, in lexical acquisition studies, it is important to focus on the characteristics which can make lexical items relatively more or less difficult to learn. In L2 vocabulary acquisition, this largely relates to how similar or dissimilar L2 word knowl-edge aspects are compared to their counterparts in the L1. Corpus research is not usually burdened by processing issues, as the focus in on the linguis-tic description of language output, where syntagmatic patterning comes to the fore.

The important point from this discussion is that vocabulary research-ers need to be aware of the various lexical characteristics, and so be able to make conscious and principled decisions about which characteristics to control for in their studies. Careful consideration at the initial stages of research design can be the best insurance against a study being later con-taminated by unwanted lexical behavior which confounds interpretation of the results. Below are short discussions of some of the factors which may be worth considering in your studies. See Ellis and Beaton (1993) and de Groot (2006) for more discussion of these factors.

2.1 Form-meaning relationships

2.1.1 Single orthographic words and multi-word items

A basic characteristic of vocabulary is that meaning and form do not always have a one-to-one correspondence. Consider the following items:

die ●

expire ●

pass away ●



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


bite the dust ●

kick the bucket ●

give up the ghost ●

(Schmitt, 2000: 1)

The six examples are synonymous, all with the meaning ‘to die’. However, several of the items contain more than one word. In some languages, and especially in English, meanings can be represented by multiple words oper-ating as single units. To accommodate the fact that both single and multi-word units can realize meaning, we use the terms lexeme, lexical unit, and lexical item. These interchangeable terms are all defined as ‘an item that functions as a single meaning unit, regardless of the number of words it contains’. Thus, all of the above examples are lexemes with the same mean-ing. I will generally use the term lexical item in this book to emphasize the point that most vocabulary research issues apply to both single orthographic words and multi-word lexemes, but will use the term word when words were the unit of counting in particular studies. (See Section 5.2.1 for more on units of counting in vocabulary studies.)

2.1.2 Formal similarity

We should also note that the forms of the above items vary considerably. Although the first two items are synonymous single words, they bear no formal similarity to one another. The same thing is true for the multi-word items. This is common in English: consider the English terms for stealing cattle (rustling), appropriating writing and ideas (plagiarism), and comman-deering an aircraft in flight (hijacking). These words have no formal simi-larities whatsoever, even though they are all theft in one form or another (Nation and Meara, 2002). However, in other languages, these concepts would be expressed by words or expressions that literally translate as stealing cows or stealing writing or stealing aircraft. In these languages, the meaning of these expressions is relatively transparent, and they could easily be under-stood by people who knew the basic words of which these expressions are composed.

Of course, English often does give formal cues to words of related meaning. For example, swim is action of propelling oneself through the water, swimmer is a person who swims, and swimming pool is a place where this can be done. However, as we have seen above, words with related meaning are not always this obvious. As a comparison, Arabic is an example of a language which gives more reliable formal cues to meaning. As opposed to Indo-European languages, which tend to have relatively stable roots to which affixes are attached, Arabic is based on roots that normally consist of three consonants, which can be combined with various vowels to form families of words that share a common



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


meaning (Ryan, 1997: 188):

k-t-b maktaba (library) ketaab (book) kataba (he wrote)d-r-s mudarris (teacher) madrasa (school) darrasa (to learn)

One reason for English’s inconsistent meaning-form relationships lies in its historical development. English was originally a Germanic language (Old English), though only around 15% of the original 24,000 or so lexical items still exist in Modern English (Schmitt and Marsden, 2006: 82). However, these are the most basic and frequent words in the language: man, wife, live, good, eat, strong. From the beginning, English absorbed loanwords from many different languages, particularly French after the Norman conquest and Latin/Greek after the 1600s. In many cases, English retained several words with no formal similarities as synonyms, but each with different reg-ister marking, e.g. kingly (Old English), royal (French), and regal (Latin). The relationship between source language/time of absorption and register mark-ing in these cases is illustrated in Figure 2.1 from Hughes (2000).

The lack of formal similarity among semantically-related lexical items is a factor that makes vocabulary in languages like English relatively more difficult to learn than vocabulary from languages with more transparent formal relationships. This may be important to consider when working with semantically-related items in your research. Because all L2 vocabulary research is affected by the L1, it might be useful to consider whether your

Figure 2.1 The relationship between historical origin and register

cordial

cardiac

interrogate lexeme

hearty

GENERAL

FORMAL

RE

GIS

TE

R

SPECIFIC/TECHNICAL

askword

question

term

Anglo-Saxon Middle EnglishTIME

Early Modern English



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


participants’ L1(s) handle semantically-related lexical items in a more trans-parent way than your target language.

2.1.3 Synonymy and homonymy

Synonymy is common in languages (several forms → one meaning), but so is the converse, where a single form has several meanings. This can be called either polysemy or homonymy. The distinction usually revolves around whether the different meaning senses are related or not. Chip is usually considered polysemous, in that a chip of wood, a computer chip, a potato chip, and a poker chip all have the same underlying concept of being small, thin, and flat(ish). A financial bank, a river bank, and the banking of an airplane when it turns are usually thought of as homonyms, as the meaning senses are totally unrelated. The distinction between polysemy and homonymy is important for lexicographers, as they have to decide the best way to list words in a dictionary. For other vocabulary researchers, it probably does not make much difference whether words are considered polysemous or homonymous; the important issue is the complexity of the form-meaning relationship, and the difficulties this leads to in vocabulary acquisition and use. In other words, it is probably the degree of variation between form and meaning that is the important factor to consider in most cases.

2.1.4 Learning new form and meaning versus ‘relabelling’

For adolescent and adult learners, most of the concepts connected with L2 words are likely to be already known. (This is obviously not always true for younger learners.) In this case, the learning task mainly consists of attach-ing an L2 label to a known concept, and then perhaps later fine-tuning the concept to match the exact L2 semantic representation. However, some-times learners acquire new concepts in the L2, especially the technical vocabulary of the particular fields they are studying (e.g. legal language, business terminology). In these cases, the student is simultaneously learn-ing both the concept and the L2 label. This is considered more difficult than the simpler ‘relabelling’ mentioned above. This difference in cogni-tive difficulty may be important when comparing the acquisition to dif-ferent sets of lexical items. Those items where the participants must learn both concept and L2 label will presumably be more difficult than items where they already know the underlying concept. This makes it important to control for the ‘concept + label’ versus ‘relabelling only’ items in target vocabulary.

2.2 Meaning

In addition to the issues concerning the form-meaning link, there are sev-eral aspects of meaning itself which may warrant attention.



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


2.2.1 Imageability and concreteness

Imageability refers to how easy it is to imagine a concept. Concreteness is ‘a variable that expresses the degree to which a word (or, rather, the entity the word refers to) can be experienced by the senses’ (de Groot, 2006: 473). These two characteristics are strongly associated, because in practice, lexical items that refer to concrete entities are usually easier to imagine, and items referring to abstract entities more difficult to imagine. Thus psychological research often conflates them, and reports them as a single variable. For example, de Groot (2006) gave her participants an imageability rating task and reported the results as ‘concreteness’. However, it is important not to assume that abstract entities are always more difficult to imagine: ‘Typical exceptions are words with strong emotional or evaluative connotations, such as anxiety and jealousy: words for fictitious creatures, such as demon and devil; and some concrete but extremely rare words, such as armadillo and encephalon’ (de Groot, 2006: 473).

The degree of imageability/concreteness is important because it has been shown that more concrete/imageable words are learned far better than less concrete/imageable words, with the effect being both large and robust (de Groot, 2006; Ellis and Beaton, 1993). Therefore, in studies which compare the acquisition of groups of words, it is necessary to ensure that each group has equivalent concreteness, or any advantage in learning for a group may be due to higher concreteness alone, and not whatever acquisition variable is being studied.

2.2.2 Literal and idiomatic meaning

Most lexical items have meanings that are literal. For example, die and expire literally mean ‘to stop living’. Others can only be interpreted idiomatically: put your nose to the grindstone = work hard and diligently. (I would be very surprised if anyone has ever heard of someone literally pressing his nose against a grindstone!) Some can be interpreted both literally and figura-tively (a breath of fresh air). For some research purposes, it may be important to determine the difficulty/knowledge of both literal/idiomatic meaning senses. Although it can be difficult to determine absolute learning dif-ficulty, frequency can be a good guide to the chances of literal versus idi-omatic meaning senses being known. One might assume that the literal meaning would usually be the most frequently used, but research with formulaic language has shown that it is often the idiomatic meaning that is far more frequent (e.g. Conklin and Schmitt, 2008). Whereas frequency tends to predict acquisition, this indicates that idiomatic meaning senses may often be more likely to be known than literal meaning senses. When using target items that have the possibility of both literal and idiomatic meaning senses, a researcher should determine the relative frequencies of each, to better understand the relationship between the various senses.



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


2.2.3 Multiple meaning senses

In languages like English, many lexical items have multiple meaning senses (see Section 2.2.3). This is an important feature to consider in vocabulary research. Learners will typically acquire the most frequent meaning senses before less frequent ones, so it often makes sense to do a frequency analy-sis not only on target words themselves, but also on their various mean-ing senses. This can be particularly true in acquisition studies where the researcher is interested in the depth of vocabulary knowledge as indicated by knowledge of the various meaning senses. While knowledge of the most frequent meaning sense is certainly important, knowledge of rarer meaning senses can indicate more comprehensive knowledge of a lexical item.

2.2.4 Content versus function words

A useful distinction is between lexical items that carry propositional con-tent (i.e. meaning) and those which carry out grammatical functions. The former are commonly called content vocabulary (e.g. cow, fly, excruciating), while the latter are referred to grammatical or function vocabulary (e.g. is, he, the). Corpus word counts consistently show that function words are among the most frequent in language, which is not surprising because they are necessary for communicating about any topic, from daily life to astrophys-ics. This holds true regardless of whether the discourse is general in nature, technical, or academic. This is illustrated in Table 2.1, which lists the most frequent word forms in the BNC (Leech, Rayson, and Wilson, 2001: 120).

The first 50 word forms in English are made up entirely of function word forms (depending on whether you consider I and you as content or function words). In fact, you must go beyond the first 100 word forms before content

Table 2.1 The most frequent 50 word forms in Englisha

1 the 11 I 21 have 31 she 41 do

2 of 12 for 22 are 32 that 42 been

3 and 13 that 23 not 33 which 43 their

4 a 14 you 24 this 34 or 44 has

5 in 15 he 25 ‘sb 35 we 45 would

6 to 16 be 26 but 36 ‘sc 46 there

7 it 17 with 27 had 37 an 47 what

8 is 18 on 28 they 38 ~n’t 48 will

9 to 19 by 29 his 39 were 49 all10 was 20 at 30 from 40 as 50 if

a Some word forms occur more than once in different word classes, e.g. to occurs as both infinitive marker and as preposition.b Genitive marker.c Verb.



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


words are consistently found. Corpus research by Johansson and Hofland (1989) and Francis and Kučera (1982) found that the approximately 270 function word types in English (176 word families) accounted for 43–44% of the running words in most texts.

This has important implications for vocabulary research. A very large pro-portion of discourse is made up of function words, yet these lexical items are extremely difficult to test. Likewise, although function words are among the most frequent in a language, learners often find them the most dif-ficult to learn, e.g. articles are notoriously difficult for learners of English. Therefore, measures and discussions of vocabulary knowledge often address only content words, completely ignoring the extremely frequent category of function words. While this might often make sense, it can also be mislead-ing if not handled properly. For example, in discussions about vocabulary coverage in texts, it has been found that the first 1,000 words in English make up about 70–85% of discourse (e.g. Nation, 2001: 17; Table 2.7 in this volume), but it is usually not pointed out how important the 1st 100 (mainly function) words are in achieving this high percentage of coverage.

2.3 Intrinsic difficulty

An interesting question is whether some words are relatively easier or harder to learn than others. Laufer (1997) lists a number of factors which affect the difficulty of learning a lexical item (Table 2.2).

Some of these factors have to do the intrinsic difficulty of words, e.g. a word’s length and a word’s grammatical class. Other factors relate to the language system, e.g. whether an affixation rule is regular in a language and whether the particular lexical item conforms with it. The relationship between a word and others in the language also makes a difference: if sev-eral words have a similar written or orthographic form (synformy), it can make learning more difficult. For many of these factors, it is the relative similarity/dissimilarity between L2 and L1 which makes the difference. For instance, whether a word is difficult to pronounce depends largely on the phonological features one already has in their inventory from previous languages. If those features match the features of the new word, then it is comparatively easy; if not, it is comparatively difficult. This means that the absolute difficulty of a lexical item’s phonological requirements depends on the learner, and to a large extent, their L1. For example, an English word like rapid will be relatively difficult to pronounce for Japanese learners who do not have /r/ in their native repertoire, but relatively easy for French learn-ers who do. Thus whether words are easy or difficult depends on intrinsic difficulty, the regularity of the systematic elements of the language being learned, and similarity with languages already known.

Nation (1990) notes that the manner in which lexical items are taught can also affect the learning burden, with inappropriate techniques actually



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


making items harder to learn. For example, teaching two new items together initially which have formal or meaning similarities can lead to cross- association (e.g. teaching left and right together in the first instance). In addi-tion, teaching exceptions before the underlying rule is learned can make learning that rule more difficult. For example, teaching the words reply and release before the prefix re- is mastered can make learning that prefix more difficult. (Of course, this might often be in tension with the need to teach such words relatively early to learners because their meanings are required.)

Examining the intrinsic factors in more detail, we unsurprisingly find that regularity is a facilitating characteristic. We see this in phonotactics, which refers to the phoneme/grapheme clusters which occur in a language. For instance, str occurs in English at the beginning of words but not in a word-final position. Words which contain clusters which follow the norms of a language (phonotactic regularity) will be easier to learn than those which do not. Likewise, regularity of stress placement (e.g. fixed initial stress in Finnish) will aid the recognition and production of spoken vocabulary com-pared to languages where the stress is variable (e.g. English: phótograph, pho-tógraphy, photográphic). It is not difficult to see how a consistent relationship

Table 2.2 Factors which affect vocabulary learning

Facilitating factorsDifficulty-inducing factors

Factors with no clear effect

familiar phonemes presence of foreign phonemes

phonotactic regularity phonotactic irregularity

fixed stress variable stress and vowel change

consistency of sound-script relationship

incongruency in sound-script relationship

word length

inflexional regularity inflexional complexity

derivational regularity derivational complexity

morphological transparency

deceptive morphological transparency

synformy

part of speech

concreteness/abstractness

generality specificity

register neutrality register restrictions

idiomaticity one form for one meaning

one form with several meanings

(Laufer, 1997: 154)



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


between sounds and their written correlates can make learning easier, as well as an affixation system which is limited and regular. Lexical items which can be used in a wide variety of contexts (generality) are less prone to error than items which can only be used appropriately in particular con-texts (specificity), precisely because they have a wider range of appropriate usage: famous can be used to describe almost any person who is well-known, while notorious can be used to describe only people famous for unsavoury reasons. This is linked to register – general items usually have fewer register constraints than specific items.

Laufer (1997) suggests that idiomatic expressions (make up one’s mind) are more difficult than their non-idiomatic meaning equivalents (decide). This is undoubtedly true for idioms, which are very numerous in language as a category, but which occur relatively infrequently as individual items (Moon, 1997). The infrequency of all but a few idioms means learners have trouble obtaining enough exposure to them for much acquisition to occur. Learners also have trouble with more frequent types of idiomatic expressions, such as phrasal verbs (put off ), preferring instead their one-word equivalents (post-pone) (Dagut and Laufer, 1985), even in informal spoken contexts where the multi-word verbs would be more appropriate (Siyanova and Schmitt, 2007). However, idioms and phrasal verbs are only two types of idiomatic expressions. Many lexical items can have both literal and figurative mean-ing senses (dog = animal/very poor example of something; let off steam = release steam pressure/release mental stress), and in these cases, the idio-matic meaning may well be the most frequent (Section 2.2.2) leading to a tension between the facilitating effects of frequency and the inhibiting effects of idiomaticity.

It is interesting to note that Laufer categorizes word length, part of speech, and concreteness/abstractness as factors which have no clear effect. While this may be true of the earlier studies she reviewed (many relying on paper-and-pencil measurement), these factors do affect the results of more sen-sitive psycholinguistic experiments. For example, Ellis and Beaton (1993) argue that nouns are much easier to learn than verbs, possibly because they

Quote 2.1 Blum and Levenston on the attractiveness of general words

... learners will prefer words which can be generalized to use in a large number of contexts. In fact they will over-generalize such words, ignoring register restric-tions and collocational restraints, falsifying relationships of hyponymy, synonymy and antonymy.

(1978: 152)



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


are more imageable, and thus more memorable, and perhaps also because they do not have the complex argument structures and associated thematic roles that verbs do. Likewise, de Groot (2006) argues that L2 translations of concrete L1 words are learned far better than those of abstract L1 words. The three factors are therefore included among the basic lexical character-istics which need to be controlled for in psycholinguistic experiments (see Section 2 for a fuller list):

number of letters ●

number of phonemes ●

number of syllables ●

common part of speech ●

familiarity rating ●

concreteness rating ●

imageability rating ●

meaningfulness rating. ●

2.4 Network connections (associations)

Lexical items have numerous formal and semantic connections with other items in every person’s mental lexicon. These connections can lead not only to appropriate lexical usage (e.g. being able to think of words which rhyme; being able to retrieve an appropriate synonym or collocation), but also more automaticity in using this knowledge, as a well-organized mental lexicon is thought to improve accessibility. Lexical connections are appar-ent in a number of types of language output. In slips-of-the-tongue (when you mean one thing but say another), the misspoken word usually has some close connection to the intended word, e.g. saying left for right, or Tuesday for Wednesday. Likewise, similar words are sometimes blended together (Aitchison, 2003: 88):

I went to Noshville (Nashville + Knoxville, Tennessee towns).

When a person has a concept in mind, but cannot remember the word for it, they often produce related words in their attempt to retrieve the word they

Quote 2.2 Scrivener on enhancing knowledge of ‘old’ words

... much of the difficulty of lexis isn’t to do with learning endless new words, it’s learning how to successfully use words one already knows, i.e. learning how ‘old’ words are used in ‘new’ ways.

(2005: 246)



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

http://www.eat.rl.ac.uk


this group. On the other hand, there were a large number of participants who gave idiosyncratic responses. In fact, this pattern describes very well the dis-tribution of responses for almost any stimulus word for almost any group: a small number of responses being relatively frequent, with a larger number of responses being relatively infrequent. This pattern of communality has been demonstrated across numerous studies. For example, for Lambert and Moore’s (1966) English-speaking high school and university subjects, the primary response covered about one-third of the total responses and the primary, secondary, and tertiary responses together accounted for 50–60%. This is congruent with the 57% figure reported by Johnston (1974) when she studied the three most popular responses of 10–11-year-olds.

Associations can be analyzed according to what category they belong to. The three traditional categories are clang associations, syntagmatic associa-tions, and paradigmatic associations. In clang associations, the response is sim-ilar in form to the stimulus word, but is not related semantically. An example is reflect-effect. The other two categories take into account the associations’ word class. Responses which have a sequential relationship to the stimulus word are called syntagmatic, and usually, but not always, have differing word classes. Examples from Table 2.3 would be adjective-noun pairs like black-magic, verb-noun pairs like eat-food, and verb-adverb pairs like walk-slowly. Responses of the same word class as the stimulus are labelled paradigmatic. Examples are verb-verb pairs like eat-drink, noun-noun pairs like house and home, and adjective-adjective pairs like black and white. While syntagmatic relationships involve the contiguity (occurring in close proximity) of words in language, paradigmatic relationships are more semantic in nature. Sometimes paradigmatic pairs are roughly synonymous (blossom-flower) and sometimes they exhibit other kinds of sense relation (deep-shallow, table-furniture).

Table 2.3 Frequent word associations

Stimulus black eat house slowly

Responses (%) white (58) food (45) home (28) quickly (24)

brown (3) drink (16) garden (8) fast (22)

color (3) sleep (5) door (6) walk (3)

magic (3) fat (4) boat (4) walking (3)

night (3) hunger (2) chimney (4) amble (2)

belt (2) hungry (2) roof (4) car (2)

blue (2) now (2) flat (3) crawling (2)

cat (2) quickly (2) brick (2) plod (2)

death (2) a lot (1) building (2) stop (2)

bag (1) bite (1) bungalow (2) surely (2)

Others (1–2%) 21 19 32 34



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


Analyzing the associations gives us clues about the process in which words are acquired. Aitchison (2003) lists three basic recurrent findings. First, the responses are almost always items from the semantic field of the stimulus word: in response to needle, people usually gave words related to sewing, and not nail or poker, even though these are sharp pointed objects. Second, if a stimulus word was part of an obvious pair (husband/wife) or had a clear antonym (tall/short), the partner word was usually given as the response. Third, adults usually give a response which is the same word class as the prompt word, e.g. nouns elicited noun responses.

Another recurrent finding in association studies is that responses tend to shift from being predominately syntagmatic to being predominantly para-digmatic as a person’s language matures. Conversely, there is a decrease with age in clang associations. Quite early on, it was demonstrated that L1 children have different associations from adults (Woodrow and Lowell, 1916). Later, Ervin (1961) elicited associations from kindergarten, first-grade, third-grade, and sixth-grade students and found that as the students’ age increased, their proportion of paradigmatic responses also increased. This syntagmatic → paradigmatic shift is not exclusive only to English. Sharp and Cole (1972) studied subjects who spoke Kpelle, an African language structurally different from most European languages, and found the same shift. The shift occurs at different times for different word classes, however. Research by Entwisle and her colleagues (1964, 1966) suggests that nouns are the first to shift, with adjectives next. The shift begins later for verbs and is more gradual. On the other hand, it is interesting to note that more recent research (Nissen and Henriksen, 2006) found that native informants demonstrated an overall preference for syntagmatic responses, leading those researchers to question the validity of the syntgamatic→paradigmatic shift.

What can we infer about the organization of the lexicon from such asso-ciation research? The large degree of agreement in native responses suggests that the lexicons of different native speakers are organized along similar lines. If natives have a ‘normal’ or ‘preferred’ organizational pattern, then it seems reasonable that nonnatives would benefit if their lexicons were organized similarly. We do not really know how to facilitate this yet, but the fact that responses usually have either syntagmatic or paradigmatic rela-tionships with the stimulus words suggests that these relationships might be important in vocabulary teaching and learning. As for how lexical organi-zation changes over time, the presence of clang associations indicates that word form similarity may initially play some role in the early lexical organi-zation of L1 children. But formal similarity is obviously a less preferred way of organizing the lexicon, as evidenced by the rapid disappearance of clang associations as learners mature. Syntagmatic relationships are next to be focused upon by the young learner, suggesting a salient aspect of language at this point is contiguity. Later, as learners sort out the word class and sense relations of the word, their associations become more meaning-based and



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


paradigmatic. It must be stressed that not every word passes through this progression, and as the child becomes more proficient, there will probably be no clang associations at all. Rather, the progression indicates the general evolution of lexical organization patterns as a learner’s language matures.

The linking of words based on similar concepts, senses, and relations pre-sumably aids the acquisition of new words. The interrelations between words form categorical clusters, and once these develop, they begin to connect with other clusters creating lexicons based on shared connections (Haastrup and Henriksen, 2000). These connections facilitate the quick growth of lexicons as new words can be assimilated with known words. As more words become associated, the lexical network strengthens (Haastrup and Henriksen, 2000; see this paper also for an interesting card-sorting methodology for tapping association knowledge).

Although most of the association research has dealt with young native speakers, it has also been applied to second-language acquisition research. Meara (1980, 1983) surveyed the research available at that time and detected several traits of L2 associations. First, although L2 learners typically have smaller vocabularies than native speakers, their association responses are much less regular and often not of the type which would be given by native speakers. This is partly because L2 responses often include clang associa-tions. It is also presumably because the organization of L2 learners’ mental lexicons is usually less advanced. Second, L2 subjects frequently misun-derstand the stimulus words, leading to totally unrelated associations. Third, nonnative speakers, like L1 children, tend to produce more syntag-matic responses, while native-speaking adults tend towards paradigmatic responses. Fourth, L2 responses are relatively unstable.

A recent association study, using an enhanced methodology, found that the association profiles of natives and nonnatives were remarkably similar, although natives produced somewhat more synonyms and collocation-based associations, and nonnatives more form-based associations and those with only a loose conceptual relationship (Fitzpatrick, 2006, see also Zareva, 2007). Some previous studies attempted to show that L2 acquisition mirrors first-language acquisition in that association preferences systematically shift from syntagmatic to paradigmatic (Politzer, 1978; Söderman, 1993), but Fitzpatrick concludes that, although the nonnative response behavior did change with proficiency, there was no evidence that it became more native-like.

Quote 2.4 Fitzpatrick on word association research

It is important that future studies investigate the similarities as well as the differ-ences between LI and L2 response patterns, and the differences as well as the similarities within each subject group.

(2006: 144)



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


2.5 Frequency

2.5.1 The importance of frequency in lexical studies

As briefly introduced in Section 1.1.4, the frequency in which a word occurs in language permeates all aspects of vocabulary behavior. It is arguably the single most important characteristic of lexis that researchers must address. It is not difficult to see why, as it affects the acquisition, processing, and use of vocabulary. In terms of acquisition, frequent vocabulary are, by defini-tion, the words most likely to be met in discourse. As a result, learners gener-ally acquire more frequent vocabulary before less frequent lexis (e.g. Read, 1988; Schmitt, Schmitt, and Clapham, 2001). This effect is robust for both L1 and L2. Tremblay, Baayen, Derwing, and Libben (2008) using ERP (event-related potentials) methodology (Section 2.11) found that every experience of a lexical item leaves a memory trace, and that this effect holds for for-mulaic sequences as well as individual words: the higher the frequency of lexical bundles, the better people remembered them.

Frequent vocabulary is also processed better, and this has been demon-strated in a number of ways. For example, more frequent lexical items are correctly translated more often, faster, and with fewer errors than lower frequency items (de Groot, 1992). Ellis (2002: 152) summarizes a range of research and concludes that

‘for written language, high-frequency words are named more rapidly than low frequency ones ... , they are more rapidly judged to be words in lexical decisions tasks ..., and they are spelled more accurately ... Auditory word recognition is better for high-frequency than low frequency words ... there are strong effects of word frequency on the speed and accuracy of lexi-cal recognition processes (speech perception, reading, object naming, and sign perception) and lexical production processes (speaking, typing, writing, and signing), in children and adults as well as in L1 and L2.

(See SSLA 24, 2 for frequency effects on other aspects of language.)

In terms of use, frequency plays a prominent part in how lexical items are employed in discourse. For instance, more frequent words tend to have less register marking (connotation, formality), which allows them to be used in a wide variety of contexts (and thus be more frequent). Lower-frequency lexical items often have semantic and collocational constraints which limit their usage to certain contexts. For example, odd (6,162 occurrences in the 179 million-word New Longman Corpus) can be used to describe virtually anything that is not quite the usual or normal case. On the other hand, eccentric (1,014) is usually used to describe a person, or his/her behavior, and has the additional connotation of being odd in a somehow endear-ing manner. It also collocates with a more coherent set of nouns (behavior,



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


habits, millionaire, father, lady, inventor, genius). It is useful to note that even a relatively modest disparity in frequency (only 6{) is enough to highlight the differences in contextual constraints between these two words.

2.5.2 Frequency and other word knowledge aspects

In fact, frequency interrelates with most other lexical characteristics. We have seen an example of frequency/meaning/collocation interaction, but the interrelationships extend beyond this. In the realm of word form, more frequent words are shorter, and less frequent words are longer. Moreover, this relationship is systematic, following Zipf’s law. Zipf’s law states there is a relatively constant relationship between the rank of a lexical item on a frequency list and its frequency of occurrence. We can see an example of this in the word-length/frequency relationship outlined in Table 2.4, where Crystal (1987: 87) illustrates data from one of the earliest corpus studies in 1898 by Kaeding.

The table shows that for every additional syllable of word length, the number of words of that length roughly halves. Another way of looking at this is that for every increase in syllable ranking (e.g. 1→2, 2→3), the frequency of occurrence in the corpus systematically decreases by about half. A more modern word count of the BNC confirms this pattern (Leech et al., 2001: 121).

Table 2.4 The relationship between word length and frequency of occurrence for German words

Number of syllables in word

Number of word occurrences Percentage of whole

1 5,426,326 49.76

2 3,156,448 28.94

3 1,410,494 12.93

4 646,971 5.93

5 187,738 1.72

6 54,436 0.50

7 16,993 all remaining = 0.22

8 5,038

9 1,225

10 461

11 59

12 35

13 8

14 215 1



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


Research has shown that frequency also varies according to mode. The most frequent words in both spoken and written discourse largely consist of function words, but there can be quite a difference in the relative frequen-cies of some content words. The Leech et al. (2001) word count lists the items that have the greatest frequency contrasts between speech and writ-ing. Twenty of the most ‘distinctive’ spoken/written items are illustrated in Table 2.5.

Some of these words very much have a spoken or written flavor, and it is not surprising to find that yeah and okay are used more in speech, or that thus and political are used more in writing. Indeed, the frequency dis-parity between the modes is largely what give these words their register distinctiveness. However, it might have been hard to predict the disparity of some other of the these words, e.g. that know and mean are more frequent in speech, or that most and new are more frequent in writing. This high-lights the limitations of intuition, which is not always a reliable indicator of frequency (see below for more on this).

Table 2.5 Distinctiveness list contrasting speech and writing

Word Spoken frequencya Written frequencya

yeah 7,890 17

no 4,388 230

know 5,550 734

think 3,977 562

mean 2,250 198

just 3,820 982

okay 950 7

like 784 7

really 1,727 337

say 2,116 512

want 1,776 432

mind 246 51

however 90 664

thus 8 228

while 156 543

most 199 607

new 603 1,208

political 71 333

international 38 242former 21 187

a Per million words.Adapted from Leech et al. (2001, List 2.4).



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


McCarthy and Carter (1997) explored the differences between spoken and written discourse, and made a number of interesting observations. First, many of the frequency discrepancies were caused by the words being part of highly frequent spoken interpersonal markers, such as you know, I think, and never mind. This emphasizes one of the weaknesses of most corpus counts: they are usually based on orthographic words and seldom capture the frequencies of formulaic language. As it is becoming increasingly clear that vocabulary is largely phrasal in nature, this bias towards single ortho-graphic words can sometimes lead to misleading results.

They also found that synonyms can also have different distributions in speech and writing. It can be argued that there are no true interchangeable synonyms in language, as it would be redundant to have two words that are used exactly the same way in exactly the same contexts (McCarthy, 1990: 16–17). Rather, they will have some differences in their context of use (dif-ferent collocations, syntactic behavior, or register), and here we see that one possible difference is modal preference. To illustrate this, McCarthy and Carter (1997) compare start and begin, and too and also. In terms of meaning, it is difficult to discern a difference the pairs, but frequency information shows differences in their mode of use (Table 2.6).

While start and too appear with similar frequency in both spoken and written contexts, begin and also are used more in written contexts. McCarthy and Carter conclude that findings like these argue for the utility of separat-ing spoken and written corpora when examining the distribution and usage patterns of lexis.

2.5.3 L1/L2 frequency

In L2 learning situations, there are, of course, two frequencies to consider: the frequency of the L2 word and of its L1 counterpart. If these were very different, it might be a confounding factor in vocabulary research. Luckily, there is some evidence that frequency in L1 and L2 can be fairly parallel.

Table 2.6 Comparison of written and spoken frequency

Writtena Spokenb

startc 232 260

beginc 119 27

too (excluding too + adjective)

119 132

also 289 107

a Occurrences in a 330,000 word written corpus.b Occurrences in a 330,000 word spoken corpus.c Lemmas.Adapted from McCarthy and Carter (1997: 27).



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


De Groot (1992) compared the log frequencies of target words in a 42.5 mil-lion word Dutch written corpus with those from their equivalents in a 18.8 million word written English corpus (both part of the CELEX corpora) and found they correlated at .78. However, we need to be cautious about this finding, because the corpora used were relatively small, and because it is not clear whether such a close correlation would hold for other L1/L2 combina-tions which are more dissimilar than Dutch and English.

2.5.4 Subjective and objective estimates of frequency

Most work on frequency has been corpus-based for obvious reasons. Corpus counts are objective and quantifiable, and computers are well suited to fast and accurate counting. Furthermore, the above discussion shows how corpus evidence can uncover lexical behavior that would be difficult to intuit. However, it is useful to not become complacent and too trusting of automatized computer counts. A frequency count is only as good as the corpus it is based upon, and every corpus has limitations.

First, no corpus can truly mirror the experience of an individual person; rather it is hopefully representative of either the language across a range of contexts (e.g. general English corpus – BNC), or of a particular segment of language (e.g. a corpus of automotive repair manuals – AUTOHALL (Milton and Hales, 1997)). Similarly, the frequency of a lexical item in a language and a person’s psychological ‘impression’ of that frequency will not neces-sarily always tally. Every person will have his/her own unique experience of language exposure, and will have relatively more or less exposure to particu-lar lexical items depending on his/her environment and interests. (Pilots will be exposed to more aviation vocabulary both because they are inter-ested in it, and because they move in circles where that vocabulary is more often used.) This means that intuitions of frequency differ from person to person, and so is an idiolectal feature, at least for all but the highest frequency items.

Second, every corpus is necessarily a compromise limited by the amount and the types of language extracts which can be collected. The amount of words is usually less than the compilers would like, and some kinds of language (e.g. closed-door boardroom discussions, intimate bedtime chat, secret intelligence documents) are difficult to collect. Thus, to some extent, corpora are usually biased towards language types that are easy to collect. Third, some features (e.g. very low-frequency words, longer phrasal strings) appear so infrequently, that even extremely large corpora have difficulty providing a good picture of their usage.

Given the limitations of corpus counts, in some cases it may make sense to consider the other main way of determining frequency – user intui-tions. The main case one can envisage this being useful is in reflecting the amount of exposure that particular learners have received. The main lan-guage corpora represent the usage of language in native contexts. Unless



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


there is a corpus based on the L2 language available in a learner’s native environment (I am not aware of any such corpus), then the learner intui-tion of frequency may better mirror their exposure than a corpus which was compiled in countries which the learners have probably never been to, and may have little interest in. Another possibility is using ESL teachers frequency intuitions, as Wang and Koda (2005) note that ESL instructors’ familiarity ratings may be more appropriate for studies with ESL learners as participants than frequency counts made from texts and discourse for native English speakers.

But this raises the question of the accuracy of frequency intuitions. Earlier research tended to show that people do have reasonable frequency intuitions (e.g. Shapiro, 1969), but more recent studies show rather dis-appointing correlations between corpus frequency figures and figures derived from intuition elicitation (e.g. around .67, Alderson, 2007; .53–.65, Schmitt and Dunham, 1999). Furthermore, both Alderson and Schmitt and Dunham found great variability between their raters, and even Alderson’s very linguistically-aware corpus linguist judges failed to do very well. McCrostie (2007) found that native speaker intuitions were limited to differentiating between very frequent and very infrequent words, with teachers performing no better than first-year university undergraduates. However, McGee (2008) notes that it is not surprising that corpus data and intuitive data sometimes diverge, just as different corpora will often differ on relative word frequencies. His suggestion to consider both corpus- and intuition-based information as useful seems reasonable. Overall, it seems only prudent to consult corpus-based evidence if it is available, albeit while recognizing its inherent limitations (e.g. carefully considering the content of the corpus, and realizing corpus counts do not usually capture formulaic language). If one considers it appropriate to use intuition-based frequency information, or if no corpus-based frequency counts are avail-able, Alderson suggests that the best introspection-based indicator is to average the frequency judgements of a group of raters, because of the vari-ability of individual ratings.

2.5.5 Frequency levels

As we have seen, frequency is an absolutely crucial factor to consider in vocabulary research. But how is the broad range of frequency to be classi-fied? After all, the frequency range in English extends from the most fre-quent word the, which occurs 61,847 times per million words (Leech et al., 2001), to a word like persnickety, which might occur only once in many mil-lions. The most common distinction is high- versus low-frequency words. High-frequency words are the most basic and essential words in a lan-guage. Although it is obvious that these words will be necessary for almost any communicative purpose, there is no fixed upper limit to this category, as the general rule ‘The more vocabulary, the better’ applies. However,



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


by convention, the first 2,000 items (words? lemmas? word families? See Section 5.2.1) has generally been accepted as high-frequency vocabulary. This figure partly comes from the GSL, which includes about this many headwords, and research by Schonell, Meddleton, and Shaw (1956) which showed that 2,000 families covered around 99% of the spoken language they studied. As a rule of thumb, the most frequent 2,000 items will make up about 80% of an average written text. However, the first 1,000 and second 1,000 do a disproportional amount of the work, with the first 1,000 typically covering about 70–75%, and the second 1,000 adding only around another 5–8% (Nation, 2001).

Low-frequency vocabulary has been conceptualized in widely vary-ing ways. Sometimes it has been defined as all words beyond the ‘2,000+ academic vocabulary’ level, especially in studies which have used Paul Nation’s Vocabulary Profiler, which classifies vocabulary in four categories: first 1,000, second, 1,000, academic vocabulary, and all other words. Other studies consider words beyond the suspiciously round number of 10,000 as low frequency, based partly on the fact that this is the Vocabulary Levels Test’s highest level, and on Hazenberg and Hulstijn’s (1996) finding that around 10,000 word families would provide the lexical resources for univer-sity study in Dutch.

However, these traditional frequency levels have been called into ques-tion by recent research by Nation (2006). His analysis estimates that it actually requires some 6,000–7,000 word families to operate in a spoken English environment, and about 8,000–9,000 families for a written one. If these figures hold up, they will force a reappraisal of vocabulary levels. Low-frequency vocabulary should probably be thought of as vocabu-lary beyond the 8,000–9,000 word families needed for wide reading in English. For if a person knows enough vocabulary to read widely, and can also use this vocabulary productively, he or she should have the lexical resources to be successful in most language use situations. Any vocabu-lary beyond this is a luxury and clearly not essential. However, if 6,000 word families or more are necessary to speak English, then it is difficult to maintain that high-frequency vocabulary stops at 2,000 families. On the other hand, words at much beyond this level drop off in frequency quite rapidly.

Quote 2.5 Nation and Hwang on low-frequency vocabulary

Low frequency vocabulary consists of words that occur with low frequency over a range of texts, that are so rare that low frequency is inevitably related to narrow range, or that are the technical vocabulary of other subjects (one person’s techni-cal vocabulary is another person’s low frequency vocabulary!).

(1995: 37)



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


It seems that we need a new category which can bridge the gap between the highest frequency vocabulary and the amount that is required for lan-guage use. This mid-frequency vocabulary category (2,000 to 8,000–9,000 level) is important, especially in terms of pedagogy. The classical advice has been to explicitly teach and learn the first 2,000 items, and to more-or-less disregard low-frequency vocabulary, because it does not appear enough to justify the time and effort to learn it. The vocabulary in between could be acquired through exposure, especially extensive reading, and the skilled use of learner strategies (e.g. Nation, 1990). This advice made sense when the field believed that 2,000 items allowed verbal usage, and 5,000 items written usage, as there were not such a great number to be independently acquired through exposure + strategies. However, if the learning target is 6,000–9,000 word families, it is clearly not realistic for learners to acquire the lexis beyond the 2,000 level without a great deal of help. Thus, Nation’s newer figures would suggest that all of the partners involved in the learn-ing process (learners, teachers, materials writers, and researchers) will have to focus attention on mid-frequency vocabulary in order to help learners acquire a large enough vocabulary to be able to use language without a lack of lexis being a problem.

Beyond the broad frequency bands of high-, mid-, and low-frequency vocabulary, lexis is also often described by 1,000 band levels, especially as newer technology has made this finer-grained analysis easier. To illustrate such an analysis, Table 2.7 gives frequency figures from the Lextutor website (<http://www.lextutor.ca>) for extracts from four different types of text. The first is a Level 1 graded reader called Inspector Logan (MacAndrew, 2002). The second is an award-winning novel (Spies, Frayn, 2002), while the third con-tains news stories form the British Observer newspaper (September 7, 2008). The fourth type of text is an academic journal article published in Applied Linguistics (Conklin and Schmitt, 2008).

The detailed K1-K20 analysis shows clear differences between the graded reader meant for L2 learners and the other three texts meant for native speakers. As might be expected, a very high percentage (87.68%) of the text in the beginning level graded reader is made up of K1 and K2 words. Moreover, almost all of the off-list words are proper names from the story. The novel, newspaper, and article have similar frequency distributions to each other, with the K1 vocabulary making up 73–78% of the text, and the cumulative K1-K5 vocabulary making up very close to 92% for all three texts. This shows that although the vocabulary levels for different texts will vary to some extent, the typical overall frequency distribution is always likely to be evident.

2.5.6 Obtaining frequency information

These discussions in this section highlight the importance of consider-ing the effects of frequency in vocabulary research. There are a number of



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25



sources of frequency information available. These corpora, concordancing tools, and vocabulary lists are described in detail in the Resources Sections 6.2, 6.3, and 6.4.

2.6 L1 influence on vocabulary learning

As noted in the section above, many of the factors that affect learning dif-ficulty involve the relative similarity/dissimilarity between a learner’s L1 and the target L2. This includes formal aspects like phonemes, graphemes, the suprasegmental system of pitch, stress, and juncture, and the degree of sound/symbol correspondence. Morphological aspects are also important, including inflexional and derivational regularity and complexity, as well as how transparent morphological transformations are. In addition, languages

Table 2.7 Lextutor 1,000–20,000 frequency profile of four text types

Tokens (coverage %)

Frequency level Graded reader Novel Newspaper Journal

K1 1679 (84.12) 2822 (78.04) 2486 (75.93) 4971 (73.24)

K2 71 (3.56) 269 (7.44) 331 (10.11) 548 (8.07)

K3 34 (1.70) 133 (3.68) 109 (3.33) 187 (2.76)

K4 4 (0.20) 63 (1.74) 55 (1.68) 502 (7.40)

K5 3 (0.15) 37 (1.02) 33 (1.01) 51 (0.75)

K6 35 (0.97) 26 (0.79) 58 (0.85)

K7 5 (0.25) 27 (0.75) 17 (0.52) 42 (0.62)

K8 2 (0.10) 16 (0.44) 9 (0.27) 63 (0.93)

K9 16 (0.44) 11 (0.34) 15 (0.22)

K10 12 (0.33) 9 (0.27) 11 (0.16)

K11 14 (0.39) 10 (0.31) 40 (0.59)

K12 4 (0.11) 2 (0.06) 13 (0.19)

K13 11 (0.30) 3 (0.09) 4 (0.06)

K14 4 (0.11) 1 (0.03) 12 (0.18)

K15 4 (0.11)

K16 1 (0.03) 3 (0.09)

K17 3 (0.08) 3 (0.09) 5 (0.07)

K18 1 (0.03) 1 (0.01)

K19 1 (0.03) 5 (0.07)

K20 1 (0.03)

Off-List 198 (9.92) 143 (3.95) 165 (5.04) 259 (3.82)

Total 1996 (100) 3616 (100) 3274 (100) 6787 (100)



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


can conceptualize real-world phenomenon in different ways, and how simi-larly two languages parse a concept’s ‘semantic space’ has an effect on learn-ing burden.

A number of researchers have studied the role of formal similarity in sec-ond language learning. Ellis and Beaton (1993) found that the match of L1 and L2 phonological features (i.e. the ease of pronunciation of the L2 words) had a major influence on L2 vocabulary learnability. Likewise, de Groot (2006: 466) concludes that ‘words with a “cognate” translation in the FL [foreign language] (where the FL word to be learned is orthographi-cally and phonologically similar to its L1 equivalent) were learned far better than those with a noncognate translation’. This effect stems not only from the relative similarity/dissimilarity of the forms in the L1 and L2, but also from the way those forms are processed. In her research, Koda (e.g. 1997) finds that learners often transfer their L1 processing routines over to the L2 in their attempt to process the L2 forms, whether those routines are appropriate for the L2 form system or not. This means that L1-L2 formal dis-similarity has a potential double negative effect: both the forms themselves and the underlying routines for processing those forms can be different to varying degrees.

The congruence of the morphological systems in the L1 and L2 can also make lexical items easier or more difficult to learn. L2 lexical items with features such as irregularity of plural, gender of inanimate nouns, and noun cases are intrinsically more difficult to learn than items with no such complexity, but this difficulty is exacerbated if there are no corresponding features in the L1 from which the learner can draw analogies.

While most second-language vocabulary learning involves attaching a L2 lexical item onto an already known concept (relabelling), there are often times where the L2 conceptualizes that concept differently than in the L1. Cases of this are illustrated in Table 2.8, where the semantic boundaries of tree, wood, and forest are parsed quite differently in three languages. There are four concepts to be realized (tree, the building material derived from

Concept 2.2 Crosslinguistic influence

Under the influence of the behaviorist school, most L1 influence was once consid-ered negative, making the learning of an L2 more difficult. The L1 influence was often referred to as negative transfer. However, it was eventually realized that L1 influence can aid, as well as hinder, L2 learning. A new term, crosslinguistic influ-ence, was developed to denote this neutral view of L1 influence on L2 learning. In terms of L2 lexical acquisition, the nature of the L1 influence (whether positive or negative) largely depends on whether there are congruent cognate items in the L1 and L2.



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


trees, a small group of trees standing together, and a larger group of trees). None of the three languages has a different word for each of these concepts, and Danish uses only two words. From this comparison, all other things being equal (which they never are), one would expect that the congruency of the French and English categorizations should facilitate the learning of the words by speakers of the other. Conversely, Danish speakers would find it more difficult learning the French or English words, because in addition to learning the word forms, they also have to learn to make semantic dis-tinctions which do not exist in their own language.

Even where the semantic space is parsed in basically the same way in two languages, there are often other differences. Translation ‘equivalents’ may have different types of register marking, such as formality/informal-ity, being technical/non-technical vocabulary, or being more frequent in speech or writing. There is also a good chance they will have different collo-cations (German opfer bringen [bring sacrifice]; English: make sacrifice). Thus, just as it can be argued that there are no exact synonyms in a language (McCarthy, 1990), there may be few or no translation equivalents that are truly the same.

It is impossible to discuss L1 lexical influence without mentioning cog-nates. Although de Groot (2006) uses the term cognate above to mean words with orthographical and phonological similarity between L1 and L2, the term more commonly refers to lexical items in different languages which are similar because they have descended from a single lexical parent in a common ancestor language, e.g. English five, Latin quinque, and Greek pénte all evolved from the common Indo-European *penkwe (McAurthur, 1992: 229). Research has shown that cognates can be helpful for second-language learners, but learners are also tripped up by false friends (unrelated words which look as though they are cognate but are not in fact: English actual = real or true; Spanish actual = current; French actuel = current). Moreover, cognates can range from having virtually the same meaning (French somp-tueux = English sumptuous) to being partially deceptive (French expérience = English experience and experiment) to being totally deceptive (French actuel ≠ English actual) (Granger, 1993). Moreover, even if the meaning of cognates is the same across two languages, other characteristics may not be, following

Table 2.8 Parsing the semantic space of tree, wood, and forest

English French Danish Swedish

tree arbretræ

träd

Wood (material) bois trä

Wood (small forest) bois skov

skog

forest forêt

(Swan, 1997: 158)



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


the ‘no translation equivalent’ argument above. Therefore researchers must be cautious when using cognates in their research, taking care to establish the degree of relationship between the cognate items.

Jarvis (2000) surveys the sometimes confusing research into lexical transfer from the L1, and concludes much of the discrepancies have to do with fuzzy definitions and methodology. He proposes a methodological framework for the study of L1 influence, that begins with a theory-neutral definition of L1 influence which can be empirically tested in a consistent manner: ‘L1 influence refers to any instance of learner data where a statisti-cally significant correlation (or probability-based relation) is shown to exist between some feature of learners’ IL [interlanguage] performance and their L1 background’ (p. 252). He then lists three potential L1 effects that must be considered in a rigorous investigation of transfer. The first is intra-L1-group homogeneity in learners’ interlanguage performance, i.e. when learn-ers who speak the same L1 behave in a uniform manner when using the L2. The second is inter-L1-group heterogeneity in learners’ interlanguage performance (i.e. when comparable learners of a common L2 who speak different L1s diverge in their L2 performance). The third is the congruence between learners’ L1 and interlanguage performance, where learners’ use of some L2 feature can be shown to parallel their use of a corresponding L1 feature. As part of the methodology, Jarvis also provides a list of outside variables that should ideally be controlled. Consideration of the elements in Jarvis’ methodological framework would go some way towards making L1 transfer research more rigorous and comparable.

In sum, it is worthwhile considering the congruence of the L2 target lexi-cal items with the L1s of the participants in a study. Of course, if the par-ticipants are from a single L1, then it is more feasible to consider the various crosslinguistic factors discussed above, and these factors may be an impor-tant part of the interpretation and discussion of results. If the participant pool consists of mixed L1s, then the above factors may help to explain any differential results between the various L1 groups. With the L1 wielding such a strong influence on second-language vocabulary acquisition (and

Quote 2.6 Granger on the usefulness of cognates in L2 language learning

Cognates are both an aid and a barrier to successful L2 vocabulary development. Teachers should therefore seek to find a happy medium between over-reliance on cognates and near-pathological mistrust of them, two attitudes which are equally detrimental to learners’ vocabulary development.

(1993: 55)



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


potentially processing), it is important to control for it as much as possible in a research design and to consider its effects in the interpretation of study results.

2.7 Describing different types of vocabulary

Each lexical item is uniquely suited to its purpose. For example, nice is a very common modifier that can describe a wide variety of nouns, and no other word can replace it in all of those contexts with exactly the same meaning. Likewise, scalpel is a very particular type of medical knife, and its restricted meaning helps to ensure there is no confusion about which kind of cutting utensil a surgeon is asking for. Have a nice day is a frequent greeting, and its use identifies the speaker as a friendly North American. While each of these items have equal intrinsic merit within their own contexts of use, it is still obvious that they are different kinds of vocabulary. Nice is a high-frequency word, scalpel is a technical medical term, and Have a nice day is a formulaic sequence which is used only in informal spoken discourse. In other words, it is possible to classify these diverse lexical items into different categories. It is often useful to narrow the broad notion of vocabulary down into smaller, more manageable (and identifiable) classifications in vocabulary research, with the following being some of the more common distinctions:

word class (e.g. nouns, verbs) ●

content and function words ●

frequency (e.g. high-frequency vocabulary) ●

written and spoken vocabulary ●

formulaic sequences ●

general vocabulary ●

technical vocabulary ●

academic vocabulary. ●

The first four distinctions have been covered already in previous sec-tions, and formulaic language is such a huge topic that it merits it own upcoming chapter (Chapter 3), so this section will focus on the last three distinctions.

General vocabulary

Although general vocabulary is not really a technical term, it is sometimes used to describe the higher-frequency vocabulary necessary to achieve a basic functionality with a language. There are no agreed limits for which vocabulary this might include, as the notion itself is rather vague. The term is most often used in connection with discussions of the roughly 2,000 words in the General Service List (GSL), so named because West (1953) and colleagues2 wished to create a general service list of words, rather than one



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


for any specific set of purposes. The criteria used to compile the GSL include the following:

Word frequency1. Structural value (all structural words included)2. Universality (words likely to cause offence locally excluded)3. Subject range (no specialist items)4. Definition words (for dictionary-making etc.)5. Word-building capacity6. Style (‘colloquial’ or slang words excluded)7.

(Howatt, 2004: 289)

The list makes it clear that the GSL is not a frequency list, although fre-quency was a key factor. However it was moderated by a number of other factors designed to ensure that the words would be useful in a pedagogi-cal context, such as including words which could be used to define a large number of other words. West also wanted to exclude words whose function could be covered by other words in the list, which resulted in the inclusion of some low-frequency words and the exclusion of some high-frequency words. In fact, the selection criteria largely revolved around the purpose of creating simple reading materials, which was an interest for many of the scholars working on the list. This is one reason why the GSL is limited as a list of general English words. For example, it tends to neglect vocabulary from colloquial spoken English, and lacks many of the words necessary for everyday situations (Howatt, 2004).

For these reasons, it is necessary to be cautious in using the GSL as a repre-sentative of general English vocabulary. It is now very old, being published in its final form in 1953, but the vocabulary selection began in 1935. There are some obvious examples in the GSL of ageing words which have lost much of their importance over the years (plough, crown), while some important newer words are not included (the GSL includes telegraph, but not television or com-puter). On the other hand, the most frequent and important words in English tend not to change much over time, and the majority of GSL words are still essential in English (e.g. a random dip into the first ten ‘D’ words show they all have retained their usefulness (damage, damp, dance, danger, dare, dark, date, daughter, day, dead)). The coverage of the GSL is still around 75% of the running words in non-fiction texts, and around 90% of the running words in fiction (Nation and Hwang, 1995). There have been several attempts to revise the GSL (see Section 6.4), and these may prove more useful to many researchers than the original. In the end, a researcher must look at the age and selection criteria of the GSL (or its revisions) and decide if it is an appro-priate indicator of general vocabulary for their own purposes. If the research purpose is pedagogical, the GSL may still be of value, but if the researcher needs frequency information, a modern word count (e.g. Leech et al., 2001) will almost certainly prove more suitable.



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


Technical vocabulary

Technical words or phrases are those which are recognizably specific to a particular field. They range from items which are unique to the field and do not occur elsewhere (computing: pixel) to items that have the same form as high-frequency items but specialized meanings within a field (computing: memory). Technical items are reasonably common within a field, but not so common elsewhere, and differ from subject area to subject area. Technical vocabulary is essential to understanding discourse in a field, and can cover 10% or more of the running words in a text from that field (Sutarsyah, Nation, and Kennedy, 1994). There may even be formal technical terms and informal technical equivalents.

There are two main ways of identifying the technical vocabulary in a field. The first is through the intuitions of experts in the field. The results of this method will depend on the knowledge of the experts, how systematic they are in producing their technical lists, and how obvious the technical vocabulary in the field is to identify. The resulting lists from this method can be highly variable, as different experts may have quite different ideas of what the key vocabulary of a field is. However, if a large number of experts are consulted, and their consensus taken, the resulting list can be a useful indication of the technical vocabulary in a field. I would venture that this method is used in the compilation of most technical dictionaries.

The other approach is analyzing a corpus of technical discourse, and extracting the technical vocabulary. This usually involves first creating a frequency list from the corpus and then eliminating the high-frequency words which will be common to all subject areas. Formerly, the GSL has been used for this deletion, but now it is probably better to use current fre-quency counts. Next, the researcher looks for items which have a wide range (occur across many texts) and reasonable frequency of occurrence. Range is important to ensure that items that are very common only to a single author or small subfield are not included on the final list. The AWL is a good exam-ple of this methodology (Coxhead, 2000).

Typically, the technical vocabulary of a field is fairly restricted, with Nation (2001) reporting that technical dictionaries will usually contain about 1,000 entries. However, not all technical vocabulary is the same. There are degrees of ‘technicalness’ depending on how restricted a word is to a particular area. Nation (2001: 198–199) gives four categories of technical vocabulary, along with examples from the field of Applied Linguistics for each:

Category 1. The word form appears rarely, if at all, outside this particular ●

field (morpheme, hapax legomena, lemma)Category 2. The word form is used both inside and outside this particular ●

field, but not with the same meaning (sense, reference, type, token)Category 3. The word form is used both inside and outside this particular ●

field, but the majority of its uses with a particular meaning, though not



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


all, are in this field. The specialized meaning it has in this field is readily accessible through its meaning outside the field (range, frequency)Category 4. The word form is more common in this field than elsewhere. ●

There is little or no specialization of meaning, though someone knowl-edgeable in the field would have a more precise idea of its meaning (word, meaning).

Technical vocabulary is usually learned in the course of the study of a par-ticular field, and may be easier than non-technical vocabulary. Cohen, Glasman, Rosenbaum-Cohen, Ferrara, and Fine (1988) found that non-technical vocabulary poses more difficulty to EFL learners than technical vocabulary, since technical vocabulary has fixed meanings which can be more easily learned. Moreover, these terms are often defined in the content classroom. For example, Flowerdew (1992) found that a term was defined about every two minutes on average in a foundation science course given to Omani students attending an English-medium university. For native speak-ers of Romance languages, English technical vocabulary may be relatively easy to learn simply because it often derives from Latin origins.

Academic vocabulary

Academic texts contain high frequency vocabulary, and technical vocabu-lary pertinent to the field in question. However, they also contain a consid-erable amount of non-high-frequency vocabulary which is common across academic disciplines. This vocabulary is necessary to express ideas in various disciplines, such as insert, orient, ratio, and technique. This ‘support’ vocabu-lary is usually termed academic vocabulary. Typically these words make up about 9–10% of the running words in an academic text, and so are very important for people learning or working in academic areas.

Early lists of academic vocabulary were complied by either manually extracting words from small academic corpora, or by noting which words students annotated in their textbooks. Results from four early studies into academic vocabulary were combined into the University Word List (UWL) (Xue and Nation, 1984), which contained over 800 words and covered 8.5% of words in academic texts. This was a big improvement over the compo-nent studies, but suffered from the fact that it was an amalgam of existing lists, and so lacked consistent selection principles.

Coxhead (2000) took advantage of computing power to create a new word list from scratch. She first collected a large and diverse corpus of academic texts which totalled 3.5 million words (70,377 types) from 414 academic texts written by more than 400 authors, balanced across four broad areas of about 875,000 words each: arts, commerce, law, and science. Each of these areas was further broken down into seven subject areas, e.g. Arts = Education, History, Linguistics, Philosophy, Politics, Psychology, and Sociology. Coxhead then created a frequency list from the corpus and



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


eliminated the GSL words. Words were selected from the remaining list on the basis of range (occurred at least 10 times in each of the four broad areas, and in 15 or more of the 28 subject areas) and frequency (occurred at least 100 times in the academic corpus). The resulting Academic Word List (AWL) contains 570 word families and covers 10% of the academic corpus. Thus, the AWL has fewer words than the UWL, but has greater coverage. The AWL is the best list of academic vocabulary currently available, and is widely used in vocabulary research. It has also inspired a wide range of pedagogic materials, including textbooks from several publishers. However, it is also important to recognize the limits of the AWL. Hyland and Tse (2007) found that although AWL covered 10.6% of their 3.3 million word academic cor-pus (different from Coxhead’s), the individual items on the list occurred and behaved in different ways across the various subject areas in terms of range, frequency, collocation, and meaning. They argue that this various usage undermines the notion of an academic vocabulary which is general in nature, and suggest focusing on the more restricted academic vocabulary which occurs in more contextualized, discipline-specific environments.

The full AWL can be accessed from Coxhead (2000), as well as a number of websites (Section 6.5), many of which include pedagogic tools and material based on the list.

2.8 Receptive and productive mastery

From the discussion so far, we see that vocabulary knowledge is multi-faceted, and contains a number of interrelated, though separable, aspects. The word-knowledge framework helps to illustrate the range of these vocabulary knowledge aspects, but as Meara and Wolter (2004) point out, its comprehensiveness is also its weakness. It is virtually impossible to measure all of the word-knowledge aspects for words for at least three reasons. The first is that many of the word knowledge aspects do not have accepted methods of measurement. While there are numerous formats for measuring a word’s meaning, anyone attempting to measure a person’s intuitions about that word’s register characteristics, for example, will have to develop their own new methodology. A second reason has to do

Quote 2.7 Coxhead on the uses of the AWL

An academic word list should play a crucial role in setting vocabulary goals for language courses, guiding learners in their independent study, and informing course and material designers in selecting texts and developing learning activities.

(2000: 214)



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


with time. A test battery that measured all of the word-knowledge aspects for words would be extremely unwieldy and time consuming. Although such a battery might be practical in a research context, where one has the luxury of focusing on target words for an extended period of time, it would be totally impractical for any kind of pedagogical purpose. It would simply take too long, and only a very limited number of words could ever be covered. A third reason is related to the difficulty of controlling for cross-test effects. The various types of word knowledge are interrelated, and so organizing the different word-knowledge tests in test battery so that answers to one do not affect the others is not always straightforward. For these reasons, many pedagogic and research purposes require a much simpler conceptualization of vocabulary depth of knowledge. One of the most common is the distinction between receptive and productive knowl-edge (sometimes referred to as passive and active mastery). This dichot-omy has great ecological validity, as virtually every language teacher will have experience of learners understanding lexical items when listening or reading, but not being able to produce those items in their speech or writing. Unsurprisingly, studies have generally shown that learners are able to demonstrate more receptive than productive knowledge, but the exact relationship between the two is less than clear. Melka (1997) sur-veyed several studies which claim the difference is rather small; one esti-mates that 92% of receptive vocabulary is known productively. Takala (1984) suggests the figure may be even higher. Other studies suggest that there is a major gap between the two: Laufer (2005a) found that only 16% of receptive vocabulary was known productively at the 5,000 frequency level, and 35% at the 2,000 level. Other studies conclude that around one-half to three-quarters of receptive vocabulary is known productively (Fan, 2000; Laufer and Paribakht, 1998).

The inconsistency of these figures highlights the difficulties and confu-sion involved in dealing with the receptive/productive issue. One problem is the lack of an accepted conceptualization of what receptive and productive mastery of vocabulary entails. The second problem concerns measurement issues, where the productive/receptive results are highly dependent on the types of tests used (Laufer and Goldstein, 2004).

Because measurement problems often stem from an unclear conceptuali-zation of the construct to be measured, let us look at the theoretical issue first. The distinction between receptive and productive mastery has been incorporated into theoretical accounts of vocabulary knowledge in various ways. For example, Henriksen (1999) lists three components of vocabulary knowledge, of which receptive/productive mastery is one:

partial 1. → precise knowledge of word meaningdepth of knowledge of the different word knowledge aspects2. receptive knowledge 3. → productive knowledge.



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


While descriptions like this are useful in pointing out that receptive/produc-tive mastery is an important component of overall vocabulary knowledge, they do not actually tell us much about the receptive/productive relation-ship itself (although Henriksen does hypothesize about this relationship later in her article). Melka (1997) suggests that receptive and productive mas-tery lie on a continuum, and that knowledge gradually shifts from receptive mastery towards productive mastery as more is learned about the lexical item. If this is true, Read (2000) notes that the problem lies in determin-ing the threshold where receptive mastery turns into productive mastery in this incremental process. He poses the essential question: ‘Is there a certain minimum amount of word knowledge that is required before productive use is possible?’ (p. 154). To date there has been little research to inform on this key issue. Most research has compared the ratios between receptive and productive vocabulary, but very little has explored the type and amount of lexical knowledge necessary to enable productive use of lexical items. From a word-knowledge perspective, the minimum would appear to be a form-meaning link, with the learner being capable of producing either the verbal or written form.

Meara (1997) proposes a different possibility, that the move from recep-tive to productive mastery is the result of a fundamental change in the way a lexical item is integrated into the mental lexicon. Rather than being the ends of a continuum of mastery, he takes a lexical organization perspec-tive and suggests that receptive and productive vocabulary may reflect differing types of connection between lexical items. He wonders whether productively-known lexical items are those which can be activated by their links to other items in a lexical network. Thus when lexical items connected to a ‘productive’ item become active, it somehow ‘lights up’ the item, and it becomes accessible for the person to use. Conversely, receptively-known items have no ‘incoming’ links from the lexicon, and so cannot be recalled unless activated by some outside stimulus. This is shown in Figure 2.2, where the word (W) is at a receptive state. When it is read or heard and understood, it can linked to the rest of the mental lexicon (L) and used. However, the lexicon itself cannot activate the word, because there are no links in that direction, and so the word is not at a productive state, as it can-not be activated by other words in the lexicon.

As there is no natural progression from a receptive to a productive state in this view, it has the potential to explain how students can learn some words productively with very little input over a short period of time (e.g. as in studies using word lists). It can also explain why words sometimes seem to be known productively and at other times do not: if the words in the lexicon connected to the item are activated, the item will be accessible. However, if the particular words connected to the item are not activated, even though other parts of the lexicon are, the item will not be recallable. On the assumption that not all lexical items in the lexicon are connected



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


to each other, then productive mastery is largely a matter of the number of other words the item is connected to, since a greater number of total con-nections provide a greater chance of a connection to an active word in the lexicon. This view has parallels with connectionism, and measurement of the strength of productive mastery would presumably require determining the relative number of links to other members of the lexicon. (See Section 2.10 for matrix models of acquisition based on this lexical organization perspective.)

Thinking of vocabulary knowledge from a word-knowledge perspective makes receptive/productive knowledge even more complex, as it is obvi-ous that learners do not acquire all of the word knowledge components in a uniform manner. Rather, if one follows the ‘continuum’ metaphor, then each of the word-knowledge aspects will lie at various points upon a receptive-productive cline at any point in time. This was certainly the case for me in Japan, where I had a pretty good productive mastery of the spoken form and meaning of many Japanese words, but only a tenuous receptive mastery of a very few of the Chinese written characters. Extrapolating from a range of vocabulary studies, it appears that some word-knowledge aspects will reach a productive level of mastery sooner than others. For example, I found that my advanced English learners generally could produce the spell-ing of the base form of target words, but often could not produce some of that word’s derivative forms and meaning senses (Schmitt, 1998a). In gen-eral, one would expect that the ‘contextual’ word-knowledge aspects, like collocation and register, are especially likely to lag behind in reaching a productive state, as this type of knowledge requires a great deal of exposure to acquire. However, little is known about the relative progression along the receptive/productive continuum for the various word-knowledge aspects, as research into this topic requires both a multi-component approach and

W

L

Figure 2.2 Receptive knowledge in a lexical organization framework

(Meara, 1997: 120).



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


receptive and productive measurement, and few studies have studied vocab-ulary acquisition in this level of detail. (For an exception, see a discussion of Webb (2005) below.)

Highlighting the problem of measurement, Read (2000) points out how receptive and productive vocabulary has been measured in different ways.

Discrete/selective/context-independent test formats tend to focus on recognition and recall. Recognition is when ‘test-takers are presented with the target word and are asked to show that they understand its meaning, whereas in the case of recall they are provided with some stimulus designed to elicit the target word from memory’ (2000: 155). This is often tested with L1-L2 translations. For recognition, the L2 form would be presented and its meaning shown with an L1 translation. Conversely, recall involves an L1 stimulus prompting the meaning, and then the participant remembering and producing the L2 form. In these translations, meaning is related to the L1 item, and so assumed to be fully and automatically known. Thus these translations do not really test meaning; it is already in place through the L1. What is being measured is the form-meaning link of the L2 item. To measure this, the translations focus on the L2 form (the meaning is already known). Thus in recognition, the key is recognizing the form of the L2 item, while in recall, it is the ability to recall and produce the L2 form.

Read contrasts recall and recognition with comprehension and use, which are more typically used for embedded/comprehensive/context-dependent vocabulary. For example, comprehension could entail learners reading a pas-sage and then being tested on how well they understood the words in the text. Use could be measured by analyzing the vocabulary produced in a task designed to elicit target lexical items (e.g. describing a picture).

These four ways of measuring the mastery of vocabulary have often been confounded and used interchangeably. The problem is that different meas-ures lead to different scores, which has resulted in inconsistent research findings, which then do less than they should to clarify receptive/produc-tive issues. Waring (1999) found that scores of receptive and productive vocabulary varied considerably depending on the type of measurement used. In extreme cases, a difficult receptive test format could even lead to lower scores than a relatively easy productive one. It obviously would be extremely useful for the field to develop more consistent ways of measuring and reporting receptive and productive mastery.

Quote 2.8 Waring on receptive and productive mastery of vocabulary

The notions of Receptive and Productive vocabulary are part of the folklore sur-rounding vocabulary acquisition. The distinction between them is rarely ques-tioned. One major hurdle that the researcher interested in Receptive and



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


One step in this direction for measuring the form-meaning link is Laufer and Goldstein’s (2004) work in developing a computer-adaptive test of vocabulary knowledge (CATSS). They developed a categorization of vocabu-lary knowledge, based on the relationships between supplying the form for a given meaning versus supplying the meaning for a given form, and being able to recall versus only being able to recognize (whether form or meaning). This results in four possible degrees of form-meaning knowledge (note that the authors prefer the terms active and passive to productive and receptive):

The item format involves various permutations of L1/L2 translations. They are illustrated below in tests for German-speaking learners of English (German hund = English dog).

Active recall: d 1. ——— hundParticipants have to provide the word form. The first letter is given to min-imize the suppliance of other English words with the same meaning.

Passive recall: dog 2. h ———Participants are required to provide the L1 equivalent, which demon-strates understanding of the meaning of the L2 word.

Active recognition:3. hunda. cat b. dog c. mouse d. bird

Laufer and Goldstein argue that recognizing word form once meaning is known can be considered as active. This is debatable, and using that term

Productive vocabulary must overcome and tiptoe through is the definition, description and categorization of these notions we have come to blithely accept as a ‘given’ ... Rarely do we see researchers or theorists working within pedagogy or language acquisition get down the nitty gritty of what is actually meant by Receptive vocabulary and by Productive vocabulary or even the relationship between the two. These notions on closer examination are extremely difficult to pin down, despite the average teacher and language researcher being able to come up with a ‘good enough’ definition or description.

(1999: Section 1.1)

Recall Recognition

Active (retrieval of form) 1. Supply the L2 word 3. Select the L2 wordPassive (retrieval of meaning)

2. Supply the L1 word 4. Select the L1 word

(Laufer and Goldstein, 2004: 407)



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


for this test format is probably something of a misnomer. However, the par-ticular terminology used is not so important. What matters is that Laufer and Goldstein have given this particular meaning-form test format a stand-ard descriptor which can be consistently used.

Passive recognition: dog4. a. katze b. hund c. maus d. vogel

Here the L2 word is given, and the participant must recognize its form and select the L1 word with the same meaning.

Laufer and Goldstein tested 435 high school and university L2 learners with all four formats. They found evidence that the four categories form a very reliable hierarchy of difficulty, at least for the higher-level students they studied (> = more difficult than):

active recall > passive recall > active recognition > passive recognition

However, the results of Laufer, Elder, Hill, and Congdon (2004) distin-guished three rather than four different modalities, as the results for the two recognition categories were not significantly different from one another. The authors suggested that picking the correct definition of a word may not necessarily be easier than choosing the word form that matches a given set of definitions. Also, it must be noted that most of the students studied were relatively high-level, and it remains to be dem-onstrated that the implicational scaling of the categories also works with lower-level students.

However, despite these limitations, Laufer and colleagues have provided four categories of meaning-form link, and have gone some way towards empirically demonstrating their relative placement within a hierarchy. Their research studied Hebrew and Arabic learners of English in Israel (Laufer and Goldstein, 2004), and intermediate to advanced nonnatives in New Zealand and Australia (Laufer et al., 2004), so the hierarchy seems to work with a variety of learners. As expected, the production of the L2 word from a L1 meaning prompt (active recall) is the most difficult test format, and presumably represents the highest degree of form-meaning knowledge strength. Likewise the multiple-choice recognition of meaning (passive recognition) appears to be the easiest, and represents the minimum form-meaning strength.

The advantage of using these categories is the avoidance of the type of confusion Read (2000) points out in his discussion receptive/productive mastery. The categories are worth using when possible in order to clearly state what aspects of receptive/productive knowledge are being tapped into, and to make form-meaning research more comparable across studies. Even



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


if a decision is made not to use such categories, the underlying distinctions between recognition and recall, and meaning and form, need to be taken on board.

However, Laufer and Goldstein’s categories do have a limitation. The translation tasks would not work well in mixed L1 participant groups, where it would be infeasible to make a separate test version for each L1, and even then the different test versions may not be equivalent. Of course, there are other methods for measuring the form-meaning link besides using the L1, and these could be substituted for translation tasks.

Laufer and Goldstein’s distinctions are important, but I find that their terminology tends to confuse both my students and myself. I think the distinctions could be more useful if they carried more transparent descrip-tions. To do this, let us look at the four distinctions in detail, to see what is given, and what is being elicited in tests. Active recall refers to the case when the meaning is given and the L2 form must be produced. Active recognition is when meaning is given and the form must be recognized (i.e. usually selected from a number of options). In passive recall, the form is given, and the meaning must be produced. Finally, passive recognition refers to when the form is given, and the meaning must be recognized (again, usually from options). It seems to me that what is being addressed here is not so much an active/passive distinction, but rather which ele-ments of word-knowledge are given and which are being elicited. Focusing on form and meaning, we come up with a relabelled version of Laufer and Goldstein’s table:

Word knowledge Word-knowledge tested

Given Recall Recognition

Meaning Form recall(supply the L2 item)

Form recognition(select the L2 item)

Form Meaning recall (supply definition/L1 translation, etc.)

Meaning recognition (select definition/L1 translation, etc.)

Of course, this is the same table as before, but with terms which I feel are much more transparent. Because of the transparency, I advocate the use of form recall, form recognition, meaning recall, and meaning recognition to cover Laufer and Goldstein’s categories. If we match these labels with the relevant test formats, the construct being measured is much more obvious, both in terms of what aspect is required, and the degree of mastery (recall versus recognition):



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


Form recall: d1. ———hundMeaning recall: dog 2. h———Form recognition: 3. hund a. cat b. dog c. mouse d. birdMeaning recognition: dog a. 4. katze b. hund c. maus d. vogel

It is also interesting to consider how these terms fit with the more general notions of receptive and productive knowledge. The above terms cover per-mutations of the form-meaning link, but of course there is much poten-tial vocabulary knowledge beyond this link. A person will presumably use all of the various vocabulary knowledge they possess for a lexical item when engaging with it in real life, i.e. recognizing and understand-ing vocabulary when listening or reading, or recalling and produc-ing vocabulary when speaking or writing. These skills-based usages of vocabulary are what is commonly thought of when using the terms recep-tive and productive, and it makes sense to reserve them for this purpose. Thus, receptive knowledge entails knowing a lexical item well enough to extract communicative value from speech or writing. Productive knowl-edge involves knowing a lexical item well enough to produce it when it is needed to encode communicative content in speech or writing. Following this reasoning, receptive/productive knowledge of vocabulary would be usage-based definitions of mastery, and would presumably need to be measured with skill-based instruments.

Conversely, the form-meaning level of mastery discussed above can be measured in isolation, as they are in Laufer and Golstein’s test formats. Even so, form recall and meaning recall can be related to skills-based productive knowledge and receptive knowledge, respectively, and presumably are a necessary component of this knowledge. We might consider mean-ing recall as the first step along the road to receptive mastery, and form recall as the first step in productive mastery. We do not yet understand the process, but one could speculate that incremental vocabulary acquisition might proceed something like this: a learner would first establish meaning recall, then perhaps begin building up other aspects of vocabulary know-ledge (e.g. grammatical or morphological) which would facilitate recep-tive recognition of the word when listening or reading. Eventually, the learner might achieve a form recall level of knowledge, but would require more time to fill in the ‘contextualized’ elements of word knowledge (e.g. collocation, register) to a point where the lexical item could be confi-dently used in an appropriate manner in a variety of spoken and written contexts.

On the other hand, form recognition and meaning recognition probably only come into play in reference look-up situations in the real world. If a learner looks up a concept in a thesaurus or reference book like the Longman Language Activator (1993), this might be considered form recognition. Likewise, when



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


a learner looks up a polysemous word in a dictionary, and considers the various meaning senses, this resembles meaning recognition. However, this does not happen in interpersonal communication, as people are not given a choice of options of form or meaning. Rather they are expected to have the form-meaning link established at meaning recall at least. Thus, form recognition and meaning recognition levels of knowledge are useful in meas-uring the initial stages of vocabulary acquisition, but have limited utility in describing usage-based receptive and productive mastery.

Although not using terminology in the way I advocate above, Webb (2005) provides an exceptional example of a test of vocabulary depth, which provided a very rich description of the amount and type of receptive and productive vocabulary acquisition which took place. Sixty-six Japanese EFL learners tried to learn ten target nonwords.3 Two tasks were contrasted: (1) reading three example sentences with an attached L1 translation for the target words, and (2) receiving the L1 translation and writing a sentence using the words. There was an immediate test to measure learning, but the point of interest here is its comprehensiveness. It contained ten components measuring both receptive/productive mastery and a number of word know-ledge aspects:4

Test 1: Receptive knowledge of orthographyTest 2: Productive knowledge of orthographyTest 3: Receptive knowledge of meaning and formTest 4: Productive knowledge of meaning and formTest 5: Receptive knowledge of grammatical functionsTest 6: Productive knowledge of grammatical functionsTest 7: Receptive knowledge of syntaxTest 8: Productive knowledge of syntaxTest 9: Receptive knowledge of associationTest 10: Productive knowledge of association.

Both tasks lead to considerable learning in a short time. When the same amount of time was spent on both tasks, the reading task was superior, but when the allotted time on tasks depended on the amount of time needed for completion (with the writing task requiring more time), the writing task was more effective. Moreover, it seems that productive learning is superior to receptive learning, not only in developing productive knowledge, but also in producing larger gains in receptive knowledge. In terms of class-room practice, it appears that writing a sentence (productive task) might be better than reading three sentences (receptive task), both for facilitat-ing both receptive and productive knowledge, as long as adequate time is available.



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


In sum, future research needs to be clearer about what facets of recogni-tion/recall and receptive/productive mastery are being addressed in a study. There is a good case for using both receptive and productive measures of vocabulary in lexical research when possible. If, for practical or other rea-sons, a researcher chooses to use only a receptive or productive measure, then it should be made plain in the interpretation and discussion what this does (and does not) tell us about overall vocabulary knowledge. A great many vocabulary studies have used a meaning recognition test format. With this being the case, the strength of the vocabulary acquisition and knowledge they report lies at the very beginning stages of learning in terms of depth of knowledge, and probably in terms of automaticity and lexical integration as well. However, this is seldom discussed, with studies typically referring to the target items as being ‘learned’. While this is true in the sense of the form-meaning link being established in an initial manner, it is necessary not only to discuss the learning that has occurred, but also to discuss the strength of that learning.

2.9 Vocabulary learning strategies/self-regulating behavior

Research into the area of language strategies began in earnest in the 1970s as part of the movement away from a predominantly teaching-oriented perspective, to one which included interest in how the actions of learners might affect their acquisition of language. By the 1990s, a number of stud-ies on vocabulary learning strategies had been carried out. These studies showed that many learners do use strategies for learning vocabulary, prob-ably because learning individual lexical items is more manageable than stra-tegically tackling larger, more holistic elements of language proficiency like

Quote 2.9 Webb on measuring vocabulary knowledge more comprehensively

The experiments [in his 2005 study] also highlight the importance of using mul-tiple tests to measure vocabulary gains. Many vocabulary acquisition studies have measured only one aspect of knowledge – meaning – with only one test. Experiment 1 showed that no significant differences would have been found between the groups if only a receptive measure of meaning had been used. However, there were significant differences on four of the five productive tests and one of the receptive tests; this indicates that using only receptive or pro-ductive tests to measure learning might provide misleading results. Using recep-tive and productive tests to measure an aspect of knowledge and testing multiple aspects of knowledge may give a much more accurate assessment of the degree and type of learning that has occurred.

(2005: 504)



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


the four skills or grammatical knowledge. Chamot (1987) found that high school ESL learners reported more strategy use for vocabulary learning than for any other language learning activity, including listening comprehen-sion, oral presentation, and social communication.

A number of large scale studies identified the most frequently used vocabu-lary learning strategies, although the individual strategies they included in their surveys tended to vary considerably. For example, Table 2.9 shows the top 10 most frequently used strategies from two of the larger studies (Gu and Johnson, 1996, N = 850; Schmitt, 1997, N = 600).

Other studies focused on the category of strategies which were used. An example of this approach is Fan (2003), illustrated in Table 2.10.

Most studies (like the three illustrated above) tended to look at vocabu-lary learning strategies as discrete phenomena, but this approach fell out of favor for reasons discussed by Tseng, Dörnyei, and Schmitt (2006). The first relates to the diverse conceptualizations of ‘learning strategies’. There has

Table 2.9 Top 10 vocabulary strategies of L2 English learners

Gu and Johnson (1996) Ma

1. Beliefs: Learn vocabulary and put it to use 5.74 2. Dictionaries: Use for comprehension 4.97 3. Beliefs: Acquire vocabulary in context 4.94 4. Dictionaries: Use extended dictionary strategies 4.82 5. Guessing strategies: Use wider context 4.60 6. Metacognitive: Self initiation 4.58 7. Dictionaries: Looking up strategies 4.55 8. Guessing strategies: Use immediate context 4.47 9. Note taking strategies: Usage-oriented note taking 4.2710. Metacognitive: Selective attention 4.23

Schmitt (1997) %b

1. Bilingual dictionary 85 2. Verbal repetition 76 3. Written repetition 76 4. Study the spelling 74 5. Guess from textual context 74 6. Ask classmates for meaning 73 7. Say new word aloud when studying 69 8. Take notes in class 64 9. Study the sound of a word 6010. Word lists 54

a 1 = the strategy/belief was extremely unlikely to be used/believed.7 = the strategy/belief was extremely likely to be used/believed.

b Percentage of respondents reporting that they used the strategy.



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


never been coherent agreement on the defining criteria for language learn-ing strategies, including whether they should be regarded as either observa-ble behaviors or inner mental operations, or both. This is evident in Gu and Johnson’s list in Table 2.8, where beliefs are mixed together with learning strategies. With this definitional confusion, it was difficult to confidently distinguish strategic learning from ‘ordinary’ learning.

There is also the argument that an activity becomes strategic when it is particularly appropriate for the individual learner, in contrast to general learning activities which a student may find less helpful. Accordingly, learners engage in strategic learning if they exert purposeful effort to select, and then pursue, learning procedures that they believe will increase their individual learning effectiveness. This, however, means that learning strate-gies conceptualized in this vein can only be defined relative to a particular person, because a specific learning activity may be strategic for one learner and non-strategic for another. In other words, it is not what learners do that makes them strategic learners, but rather the fact that they put creative effort into trying to improve their own learning. This is an important shift from focusing on the product – the actual techniques employed – to the self-regulatory process itself and the specific learner capacity underlying it.

The second problem noted by Tseng et al. concerns the measurement of strategies. Strategy use has typically been measured by self-report question-naires in the past, since strategic learning is driven by mental processes that do not often lend themselves to direct observation and, therefore, for an accurate assessment of the extent of their functioning, we need to draw on the learners’ own accounts. The self-report questionnaires were based on the

Table 2.10 Mean scores in frequency of use by nine categories

Category Ma

Guessing 3.54Known wordsb 3.51Analysisc 3.25Dictionary 3.22Sourcesd 3.07Repetitionc 3.04Groupingc 2.54Associationc 2.51Managemente 2.51

a 1 = never use, 5 = very often use.b Using known words as part of learning, e.g. revisiting words recently learned.

c Different strategies for establishing meaning.d Replaces the social/affective category.e Metacognitive strategies.



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


assumption that strategy use and strategic learning are related to an under-lying trait because items ask respondents to generalize their actions across situations rather than referencing singular and specific learning events (Winne and Perry, 2000). However, in practice most questionnaire scales do not load on a single trait. To illustrate this, let us consider the ‘Motivated Strategies for Learning Questionnaire’ (MSLQ), developed at the University of Michigan by Paul Pintrich and his colleagues (Pintrich, Smith, Garcia, and McKeachie, 1991). The MSLQ is aimed at college students and, as the name of the instrument indicates, the items cover two broad areas, motiva-tion and learning strategies. The Learning Strategies category includes 50 items, each using a seven-point Likert scale anchored by ‘Not at all true of me’ (1) and ‘Very true of me’ (7), and is divided into two sections: (a) ‘Cognitive and metacognitive strategies’, which includes subscales labelled rehearsal, elaboration, organization, critical thinking, and metacognitive self-regulation; and (b) ‘Resource management strategies’, which includes the subscales of time and study environment, effort regulation, peer learn-ing, and help seeking. All these subscales are cumulative in the sense that composite subscale scores are formed by computing the means of the indi-vidual item scores in a subscale.

Let us compare this to the ‘Strategy Inventory for Language Learning’ (SILL) (Oxford, 1990), a frequently used instrument for assessing general language learning strategy use, but the points made will hold true for many vocabulary strategy studies as well. The SILL consists of six scales, including ‘Remembering more effectively’ (memory strategies) and ‘Using your men-tal processes’ (cognitive strategies). Scale scores are obtained, similarly to the MSLQ, by computing the average of the item scores within a scale. The SILL items all involve five-point rating scales ranging from ‘Never or almost never true of me’ to ‘Always or almost always true of me’. At first sight, these scales are similar to the scales used in the MSLQ discussed above, but a closer look reveals two fundamental differences. First, although both scale types use the term ‘true of me’, the MLSQ scales range from ‘not at all’ to ‘very’ whereas the SILL scales from ‘never or almost never’ to ‘always or almost always’. Second, the items themselves are of a different nature. The items in the MSLQ are general declarations or conditional relations focusing on general and prominent facets of the learning process (i.e. when doing this ... I try to ...). On the other hand, the SILL items are more specific, each one more or less corresponding to a language learning strategy.

These two changes, however, result in a major difference in the psycho-metric character of the two inventories. The items in the MSLQ scale tap into general trends and inclinations and can therefore be assumed to be in a linear relationship with some corresponding underlying learner trait. This is further enhanced by the rating scales asking about the extent of the correspondence between the item and the learner, answered by marking a point on a continuum between ‘not at all’ and ‘very’. Thus, every attempt



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


has been made to make the items cumulative, which is why the scale scores can be formulated by pooling all the scale items. The SILL, on the other hand, focuses on specific strategic behaviors and the scale descriptors indi-cate frequencies of strategy use (ranging between ‘never’ to ‘always’). These items are, therefore, behavioral items, which means that we cannot assume a linear relationship between the individual item scores and the total scale scores; for example, one can be a good memory strategy user in general while scoring low on some of the items in the memory scale (e.g. acting out a new word or using flashcards). Thus, the scales in the SILL are not cumula-tive and computing mean scale scores is not justifiable psychometrically.

To illustrate the problem in broad terms, a high score on the SILL is achieved by a learner using as many different strategies as possible. Therefore, it is largely the quantity that matters. This is in contradiction with more recent learning strategy theory, which has indicated clearly that in strategy use it is not the quantity but the quality of the strategies that is important (e.g. the point above about ‘appropriateness’ as a critical feature of learning strat-egies). At one extreme, one can go a long way by using only one strategy that perfectly suits the learner’s personality and learning style; and even if some-one uses several strategies, it does not necessarily mean that the person is an able strategy user because, as Ehrman, Leaver, and Oxford (2003: 315) have found, ‘less able learners often use strategies in a random, unconnected, and uncontrolled manner’. Such qualitative aspects, however, are not addressed by the SILL, or vocabulary questionnaires which focus on frequency of use.

The conceptual fuzziness and the inadequacy of the psychometric instru-ments that have been developed to measure the capacity of strategic learn-ing have driven a conceptual shift towards a notion of self-regulation (Tseng et al., 2006) drawn from the field of educational psychology. Rather than focusing on the outcomes of strategic learning (i.e. the actual strat-egies and techniques the learners apply to enhance their own learning), this conceptual approach highlights the importance of the learners’ innate self-regulatory capacity that fuels their efforts to search for and then apply personalized strategic learning mechanisms. That is, in line with contem-porary theories of self-regulation in educational psychology (e.g. Zeidner, Boekaerts, and Pintrich, 2000), the approach targets the core learner differ-ence that distinguishes self-regulated learners from their peers who do not engage in strategic learning.

Concept 2.3 Structural equation modelling (SEM)

Structural equation modelling is a modern multivariate statistical technique that allows a set of relationships to be examined simultaneously. It is useful for deter-mining the relationships between a large number of variables. A hypothetical model of these relationships is first hypothesized by the researcher, and SEM can either support or refute this model, allowing for further refinement, until the most parsimonious explanation of the data is arrived at.



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


Tseng et al. (2006) used a structural equation modelling (SEM) approach in an attempt to describe what self regulation entails in terms of vocabulary learning. They developed a model (illustrated in Figure 2.3) which suggests that this self-regulation consists of five facets:

Commitment control ● , which helps to preserve or increase the learners’ ori-ginal goal commitment (e.g. keeping in mind favorable expectations or positive incentives and rewards; focusing on what would happen if the original intention failed).Metacognitive control ● , which involves the monitoring and controlling of concentration, and the curtailing of any unnecessary procrastination (e.g. identifying recurring distractions and developing defensive routines; focusing on the first steps to take when getting down to an activity).Satiation control ● , which helps to eliminate boredom and to add extra attraction or interest to the task (e.g. adding a twist to the task; using one’s fantasy to liven up the task).Emotion control ● , which concerns the management of disruptive emotional states or moods, and the generation of emotions that will be conducive to implementing one’s intentions (e.g. self-encouragement; using relaxation and meditation techniques).Environmental control ● , which helps to eliminate negative environmental influences and to exploit positive environmental influences by mak-ing the environment an ally in the pursuit of a difficult goal (e.g. elim-inating distractions; asking friends to help and not to allow one to do something).

The SEM approach was taken a step further by Tseng and Schmitt (2008), who, building upon the self-regulation model, developed an enhanced model for the vocabulary learning process as a whole. It takes a process-oriented perspective, operationalized as the process whereby strategic behaviors are instigated, sustained, and evaluated, drawing on work by Dörnyei (2001a, 2001b, 2005) on the stages of motivation. The model is given in Figure 2.4, and the six latent variables labelled. Interested readers are referred to the article for a complete explanation of the model, including a description of the various facets (or more technically, indicators) making up each latent variable. For example, the variable Self-Regulating Capacity in Vocabulary Learning (SRCvoc) is made up of the five facets illustrated in Figure 2.3.

The model indicates that the vocabulary learning process is cyclical in nature. It starts with an Initial Appraisal of Vocabulary Learning Experience, which is conceptualized as the initial motivational level of vocabulary learning, which can be indicated by value, interest, effort, or desire. This affects a learner’s Self-Regulating Capacity in Vocabulary Learning. The cur-rent view of the nature of self-regulating capacity is that it is an aptitude which is developable, i.e. it can change incrementally with experience and



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


instruction, and the model indicates it is dependent on the instigation of the initial appraisal of vocabulary learning experience, with its related moti-vational state.

Self-regulating capacity in turn drives the use of vocabulary strategies. However, in this model, strategic behavior is divided into two components: Strategic Vocabulary Learning Involvement and Mastery of Vocabulary Learning Tactics. The former refers to a ‘quantity’ dimension of strategy use, which concerns effortful covert or overt acts to discover or improve the effective-ness of particular tactics. (We used the term tactic to avoid the baggage asso-ciated with the term strategy, and the two are essentially interchangeable.) It entails the overall involvement with vocabulary learning and the attempts made to pursue it. This includes several elements: how frequently a learner is involved in vocabulary learning behaviors, the range of vocabulary learn-ing behaviors a learner is involved with, and having a metacognitive aware-ness of how to best enhance the effectiveness of vocabulary learning tactics. One might think of SVLI as a learner’s general experience with, and under-standing of, their vocabulary learning behaviors.

The latter refers to the ‘quality’ dimension of strategy use, which con-cerns mastering specific or special covert or overt learning methods to acquire vocabulary knowledge. This mastery dimension is about using spe-cific vocabulary learning behaviors effectively. Reaching the mastery level entails developing an awareness of what learning tactics to use and when

Commitment control0.88

0.85

0.84

0.88

0.69

Self-regulatorycapacity invocabulary

learning

Megacognitive control

Satiation control

Emotion control

Environment control

Figure 2.3 A structural equation model of self-regulatory capacity in vocabulary learning.

(Tseng et al., 2006: 93).



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


and how to use them effectively. The model indicates that having a wide range of vocabulary learning involvement and experience helps organize a learner’s strategic options and helps learners gain mastery over the learning tactics that prove useful, i.e. the repeated appropriate usage of tactics (as governed by SVLI) eventually also leads to mastery over those tactics.

The skilled and appropriate use of strategies/tactics directly leads to increased Vocabulary Knowledge, which is indicated by both size and depth components.

After a learning experience, it is only natural for a learner to think about how well they have done. This period of self-reflection of task processes when the task is completed is represented in the model by Postappraisal of Vocabulary Learning Tactics. Dörnyei (2001b: 91, emphasis in original) argues that this phase is very important in that such a ‘critical retrospection contributes

ATT

D1

e3 e2 e1

D6

.48*

SRCvoc

MVLI

VOCkno

PAVLTIAVLE

SKILL

e24

.79

HELP

e23

−.72

SATIS

e22

.82

ENVe8

.43EMOTe7

.79

SATe6

.71

METAe5 .83

COMe4

.81

D3

D5

SIB

.87

SAB

.71

SEB

.69

SSB

.62

MVLT

HAND

e19

.59

SOCI

e18

HILIT

e17

COMP

e16

.81

LINK

e15

.74

IMAG

e14

.73.54

D4

EFFANX

−.75.69

.89

SIMB

.78

.56*

.62*

SIZE e21

DEPTH e20

. 67

.46*

.68 *

.67*D2

. 71

e13e12e11e10e9

Figure 2.4 A structural equation model of motivated vocabulary learning

IAVLE = initial appraisal of vocabulary learning experienceSRCvoc = self-regulating capacity in vocabulary learningSVLI = strategic vocabulary learning involvementMVLT = mastery of vocabulary learning tacticsVOCkno = vocabulary knowledgePAVLT = postappraisal of vocabulary learning tactics.



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


significantly to accumulated experience, and allows the learner to elaborate his or her internal standards and the repertoire of action specific strategies’. In particular, it has been found that learners’ causal attributions as a result of task retrospection exert a critical influence on subsequent expectancy for success, self-efficacy belief, achievement behaviors, and emotional responses (Dörnyei, 2001b). Hence, it seems that not only does initial motivational state influence the processes of task performance, but also a retrospection of task performance is likely to in turn influence this state. Thus in the model, the term ‘initial motivational state’ should be understood as the current motiva-tional state in the subsequent recursive stages of the evaluation process.

This model is certainly not comprehensive, but it does take into account some of the recent thinking on the dynamic role of motivation on language learning (Dörnyei, 2001a, 2001b). Motivation appears to be involved in all stages of learning (instigating, sustaining, and evaluating), thus permeating the whole process. Another aspect taken into consideration is the necessity for the learners to self-regulate their learning. Learners need to understand the way they learn best and be proactive in pursuing methods of learning that are effective for themselves. Much of the value of the model is that it begins to show the relationship between a number of learning-based vari-ables, and a number of implications seem supportable. First, the vocabulary learning process is systematic and cyclic in nature. Second, initial motiva-tion and self-regulation both have important parts to play in the vocabu-lary learning process. Third, metacognitive control of vocabulary learning tactics is necessary for efficient learning. Finally, postlearning evaluation is important to the learning process. Overall, such a dynamic, integrated perspective of lexical strategic behavior is a step forward from viewing strat-egies as independent learner behaviors, just as thinking of vocabulary in terms of an integrated lexicon is a step closer to reality than thinking of it as a bunch of independent lexical units.

2.10 Computer simulations of vocabulary

It is a common observation that there is currently no overall theory of vocabulary acquisition (e.g. Nation, 1995; Read, 2000; Schmitt, 2000). This is perhaps not surprising given the complexity of vocabulary knowl-edge, the large number of lexical items that have to be learned, and the diversity of those items. Of course there have been many theories limited to how specific aspects of lexical knowledge are acquired or used. Below are just a few:

the fast mapping of initial meaning (Carey, 1978) ●

the initial establishment of a form-meaning link by attaching the L2 form ●

to an existing L1 meaning (‘parasitic model’) (Barcroft, 2002; Hall, 2002; Jiang, 2002)



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


prototype theories of concept categorization (Aitchison, 2003) ●

Levelt’s (1989) view of the lexicon in listening comprehension and speech ●

productionthe dynamic model of the multilingual mental lexicon (de Bot, Lowrie, ●

and Verspoor, 2005)frequency/exposure and pattern-extraction-based models of vocabulary ●

acquisition (Ellis, 2002)the DEVLEX model of the growth of L1 lexicons (Li, Farkas, and ●

MacWhinney, 2004).

It is beyond the scope of a methodology book such as this to discuss and evaluate the various theories. What is of interest are the tools which allow researchers to explore the lexical features and properties which these theo-ries are trying to capture. One of these tools is the computerized simulation of vocabulary learning and processing. While there are some quite complex simulations of vocabulary learning (e.g. the DEVLEX model), Meara (2006) notes that a truly comprehensive model of the mental lexicon is extremely difficult. Even a small lexicon will have 2,000 or more words, and each of these will have multiple links to each other (orthographic, phonologic, morphological, semantic, grammatical, collocational, etc.). Modelling this complexity in any realistic way is verging on the impossible. However, it is possible to use much simpler models, which are easier to both understand and to manipulate when exploring the nature of small lexicons. Although relatively basic, they can be useful in suggesting how real, more complex, mental lexicons behave.

Meara (2004, 2005, 2006) has been in the forefront of developing these basic models of lexicon behaviour. He conceptualizes the lexicon as a Random Autonomous Boolean Network, where each word’s varying levels of formal, grammatical, and semantic activation are reduced to a simple binary distinction. In the following truncated and slightly edited extract, he explains the basics of his Boolean models (2006: 625–630).

In these simplified models, each lexical network consists of a set of WORDS. Each word has two states: ACTIVATED or UNACTIVATED, depending on the way it interacts with other words in the lexicon (cor-responding roughly to productive and receptive vocabulary respectively). Each word is randomly connected to only two other words in the net-work, and each word receives an INPUT from each of these link words. Words react to inputs in different ways. Some words (conventionally called AND words) become activated only when both the words they are linked to are activated, while others (conventionally called OR words) become activated if only one of their input words is activated. (That is, some words have low activation thresholds, while for other words activa-tion is more difficult.) A simple example will illustrate these features.



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


AND units are labelled with uppercase letters, OR units are labelled with lower case letters. Arrows show the direction of activation.

In Figure 2.5, we have a simple five word lexicon. Word A and Word B are AND words: they have a high activation threshold, and only become activated if BOTH of their inputs are already activated. The other words are OR words: they have a low activation threshold, and will become activated if either one of their inputs is already activated. Word A gets its inputs from Word C and Word E; Word B gets its inputs from Word A and Word C, etc. Initially, all the words are unactivated. However, let us sup-pose that an external stimulus temporarily activates Word B. This causes a ripple of spreading activation to percolate through the entire system, as shown in Figure 2.6.

A B

e d

c

Figure 2.5 A simple random autonomous Boolean network

A(a)

c

B

e d

A(b)

c

B

e d

A

c

B

e d

A

c

B

e d

(c) (d)

Figure 2.6 How a random autonomous Boolean network responds to an external stimulus

Activated units are shaded grey.Figure 2.6a Word B has been activated by an external stimulus ... Figure 2.6b ... causing Word C and Word D to become activatedFigure 2.6c This activates both Word E and Word D ... Figure 2.6d ... resulting in the reactivation of Word C.



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


Activating Word B (Figure 2.6a) causes Words C and D to become acti-vated. Both are OR words, which activate easily. The activation of Word B was temporary, so it reverts to the deactivated state (Figure 2.6b). The activation of Words C and D spreads to Word E. Word D is still receiv-ing input from Word C, so it remains in the activated state. Word C is no longer receiving any activation, so it reverts to the deactivated state (Figure 2.6c). Activation in Words D and E causes Word E to remain acti-vated, and causes Word C to be reactivated. Word D is not receiving any input, so it reverts to the deactivated state (Figure 2.6d). This process continues until network settles down into a stable state.

First impressions might lead us to expect that large Boolean networks would be extremely unstable, but surprisingly, this is not the case. Irrespective of size, Boolean networks with the characteristics that we have described above quickly settle down into an attractor state, a stable configuration in which some words are permanently activated, while other words are permanently deactivated, and the overall pattern of activation in the network remains stable as long as it receives no exter-nal stimulation. Sometimes, a small number of words form an oscillat-ing pattern, where individual words move between the two states, but it is unusual for these oscillations to be very large. Figure 2.7 shows an example of a Boolean network moving from an initial random configura-tion into a stable attractor state.

Here we have a lexical network consisting of 1,000 words, where the initial connections and activation values of the words are set at ran-dom. When the simulation starts, it iterates through the implications of these activation patterns, in much the same way as we did in Figure 2.6, but on a much larger scale. The illustration shows that the number of activated words in the initial state of the model is about half of the total. In each subsequent iteration of the model, this figure changes, until the model eventually settles down into an attractor state where just under two-thirds of the words are activated, and the remaining words are inactive. The wobble in the number of activated words indi-cates that a small number of words are oscillating between the activated and de activated states.

Once a network reaches one of its attractor states, it will tend to resist any arbitrary changes that are inflicted on it by external agencies. For instance, let us define a kick event, as a temporary activation of a hand-ful of words (e.g. 50) which are deactivated in the attractor state. (In real life, a kick event would correspond to some kind of interaction which activates a number of deactivated words. Reading a text, for example, or interacting with another speaker might have this effect.) Events of this type generally result in only a small flurry of activity which spreads through the network, and rapidly dies away. Typically, the effects of the additional activation last for only a few iterations of the model, and the



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


activation level associated with the attractor state is quickly restored. It is possible to nudge a network into a new attractor state by using kick events, but usually only very large kicks can bring about a change of this kind.

It is also possible to force a network rather than kick it. In forcing, we activate a small number of different words repeatedly. This repetition sometimes prevents the network from returning to its attractor state. Rather surprisingly, constant dripping of this sort produces very different effects from what we get with a single kick event, even a relatively large one. Typically, forcing produces a massive increase in the number of acti-vated words in the network. For example, if we use repeated individual forcing events of only five words, even this very small input forces the overall level of activation in the network to rise rapidly, and very large numbers of words are affected by these small events: in one simulation, 200 forcing events were sufficient to raise the overall activation level of the network by more than a third, although it eventually returned to its stable attractor state.

These basic tools, as Meara describes above, consist of only a very simple network structure, and a small repertoire of simple operations. However, computerized simulations can be used in a variety of ways to make interest-ing observations about real lexicons, including the issues of attrition, the nature of multilingual lexicons, and measuring productive lexicons.

Figure 2.7 A Boolean network reaching one of its attractor states



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


Figure 2.8 Ten examples of attrition in a network

(Meara, 2004: 143).



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


Meara (2004) notes that typical attrition studies look at whether indi-vidual lexical items are retained or forgotten, but do not explore how attri-tion of individual items affects the overall structured network lexicon. In fact, there is no way of doing this with real human subjects at present, but computer simulations can allow exploration of attrition at a network level. Meara defined attrition as changing OR words into AND words (i.e. words which are easily activated become words which are less easily activated). He then ran ten simulations where such a switch was made on a random word in a 2,500 word network, and the network allowed to iterate five times so the impact of the change could be absorbed. This was repeated 255 times, and any resulting degradation noted. The networks were allowed to reach a stable attractor state before the attrition events began. The ten trials are illustrated in Figure 2.8.

A number of interesting observation about attrition can be made from the graphs in Figure 2.8. The overall trend is similar across the cases, as all trials start at a very high level of activation, and they all end up with relatively low levels of activated words after a period of very heavy loss. However, the indi-vidual trials are quite different in the progress of this decline. Sometimes it sets in quite early, but in other trials it is significantly delayed. Although the average of these ten trials suggests a steadily declining pattern of vocabulary loss (Figure 2.9), this ‘average’ pattern is very misleading. This result sug-gests we need to be cautious when interpreting averaged results from ‘real’ attrition studies, as the variability of individual results may well be more important than the averaged trend (see Section 4.7). Meara also notes that not all attrition events result in an immediately observable loss of activity in the network, which indicates that it may be important to differentiate between the attrition events (i.e. changes in characteristics of individual lexical items) and the resulting vocabulary loss (when items become no longer activatable). In these simulations, vocabulary loss is always triggered

2500

2000

1500

Act

ivat

ed w

ords

1000

500

050250 75

Attrition events

100 125 150 175 200 225

Figure 2.9 Pooled data abstracted from the ten cases in Figure 2.9

(Meara, 2004: 145).



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


by attrition events, but attrition events do not always trigger vocabulary loss events. Finally, large vocabulary loss events can result from a relatively small number of attrition events, In a 2,500 word lexical network, the loss of less than 200 items typically leads to catastrophic vocabulary loss.

Meara (2006) also used Boolean networks to explore the relationships between the languages in multilingual lexicons. The patterns which spon-taneously emerged from the simulations mirrored closely some of the behavior reported in real lexical behavior. He found that his bilingual sim-ulations exhibited something reminiscent of a lexical switching mecha-nism, which bilinguals need to switch between the vocabularies of two languages (i.e. allowing rapid activation of one language and the simulta-neous deactivation of the other language). His trilingual models showed that, under certain conditions, activity in an L3 can sometimes generate spontaneous reactivation of words in an L2, a phenomenon often reported by trilingual speakers. Research along these lines is useful in that it can suggest lexical behaviors which are emergent properties of a systematically-organized lexicon.

Another application of computer simulations is to explore the validity of vocabulary measurement instruments. Meara (2005) used a different type of computer simulation – a Monte Carlo analysis – to explore the behavior of the Lexical Frequency Profile (LFP – see Section 5.2.4) (Laufer and Nation, 1995). His analysis suggests that the LFP can reliably distinguish between learners with relatively large vocabulary size differentials (1,000–2,000+ words), but that it is not particularly sensitive in distinguishing between learners who have similar vocabulary sizes. However, Laufer (2005b) ques-tions many of the assumptions underlying Meara’s analysis. For our pur-pose of discussing the methodology of computer simulations, the most important criticisms are (1) that computer simulations do not always reflect real world phenomenon, potentially resulting in misleading results because key characteristics are not accounted for in the simulations, and (2) that

Quote 2.10 Meara on the nature of vocabulary attrition in computerized simulations

Although attrition events [in computer simulations] do not necessarily result in immediate vocabulary loss, they do weaken the structure of the vocabulary. Each attrition event that does not result in immediate loss produces a cluster of words which are more dependent on each other than they were before, and more likely to be affected by a change in one of their immediate neighbours ... It seems then, that the attrition process has the effect of re-structuring the lexical network so that it is very vulnerable to small changes in activation.

(2004: 147)



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


simulated data is not necessarily as valid as real data. These are legitimate points, as Meara himself acknowledges, and serve to reinforce the need to be very careful when formulating the underlying assumptions behind simulations.

Despite being a relatively new research tool, computer simulations appear to a have great deal of potential. The number of trials that can be run far exceeds the number of human participants which could be gathered for a study. Similarly, the parameters of research design can be easily adjusted to explore different scenarios, something which is not practical when new par-ticipant groups are needed for each research design variant. Perhaps most importantly, computer simulations challenge researchers to explicitly and specifically define the linguistic behaviour they are exploring. For more dis-cussion of lexical computer simulations, see Meara (2006). Some tools for carrying out this type of research are available on the _lognostics website (http://www.lognostics.co.uk/).

2.11 Psycholinguistic/neurolinguistic research

Vocabulary research has tended to focus on issues of vocabulary size, and consequently there are a number of techniques and measures available to tap into this aspect of vocabulary knowledge. With some exceptions (e.g. word associations), vocabulary researchers have only recently turned their attention to depth of knowledge, and so techniques and measures in this area are much less well-developed. However, there has been comparatively little research into lexical processing in the applied linguistics arena, or how lexical knowledge is automatized. Conversely, the fields of psycholinguistics and neurolinguistics have given a great deal of attention to lexical process-ing and the physiological mechanisms underlying it. As a consequence, they have developed a number of measurement techniques for dealing with these issues. These have great potential to be informative in a broad range of vocabulary research, as they often tap types of lexical knowledge and learner behavior which traditional size/depth methodologies cannot address. Furthermore, they often give more precise and nuanced measure-ments of knowledge, allowing quantification of smaller amounts of learn-ing, especially at the beginning stages of the incremental learning process. As such, they are an important component of the vocabulary researcher’s toolkit.

Below I outline some of the major techniques categorized according to whether they primarily address issues of speed of processing, connections between lexical items in input or in the mind, or processing mechanisms. (Note that all of these categories are of course interrelated, and this division is not intended to suggest that they are discrete processes.) I will illustrate some of the techniques with studies, many of them drawn from research into formulaic language carried out at the University of Nottingham’s Centre for



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

http://www.lognostics.co.uk


Research in Applied Linguistics (CRAL). Many of these techniques are rela-tively new to mainstream vocabulary research, but have very great poten-tial in the hands of innovative researchers. (See Dörnyei, 2009, and Harley, 2008, for more on psycholinguistic/neurolinguistic research techniques and their possibilities and limitations.)

Speed of processing

Part of mastering lexical items means that processing becomes more autom-atized. This is evidenced in faster recognition/comprehension speeds when listening or reading, and fast retrieval/production speeds when speaking or writing. In other words, learners need to develop what most language spe-cialists would call fluency. Adequate speed of processing is essential for effi-cient language use, and this is perhaps most clearly illustrated in reading. If lexical recognition is not fast enough (natives read at 200–300 words per minute (Grabe and Stoller, 2002)), then reading slows down to a frustrating word-by-word (or even letter-by-letter) decoding, in which meaning con-struction is impaired and the overall flow of the text cannot be understood. Also, it seems that even though advanced L2 learners can match native per-formance in certain ways (e.g. answering comprehension questions after reading), they find it difficult to do this at a native-like speed (McMillion and Shaw, 2008, 2009).

Speed of processing is often referred to as automaticity, although automa-ticity can also contain the notion of an absence of attentional control in the execution of a cognitive activity. Segalowitz and Hulstijn (2005) give the example of the recognition of the single letter A. It is thought that recogni-tion of a letter like this by a proficient reader requires no conscious effort or effortful attention, is extremely rapid, and cannot be interfered with by other ongoing activities. In fact, a fluent reader cannot help but recognize the letter, and so the process is thought of as ‘automatic’. In contrast, Segalowitz and Hulstijn relate the case of a novice L2 reader who may require considerable consciously directed effort, applied slowly over an interval much longer than it takes to recognize a letter in their L1. ‘Thus, the relatively rapid, effortless, and ballistic (unstoppable) activities underlying fluent letter recognition are said to be automatic, standing in contrast to slower, effortful activities that can be interrupted or influenced by other ongoing internal processes (e.g. distractions, competing thoughts)’ (2005: 372).

Concept 2.4 Milliseconds

A millisecond is one one-thousandth (1/1,000th) of a second. Since human processing is very fast, it is the standard unit of measurement for timed experi-ments in psycholinguistics.



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


At the moment, not much is known about the acquisition and develop-ment of automaticity, with empirical research into the facilitation of auto-maticity and its impact on subsequent skills just beginning (Segalowitz and Hulstijn, 2005). As such, there is certainly great scope for research into the processing speeds of virtually all aspects of lexical mastery, and how to facilitate its development. For example, in addition to measuring whether a person knows a word’s meaning, it would also be informative to measure how quickly they could recognize its written form when reading.

Measuring automaticity typically entails using techniques which have a timed element, i.e. measuring the time taken by participants to complete some task. The type of timed task varies, but they are usually quick judgement tasks which are less likely to be contaminated by conscious thought processes. (See Section 5.4 for methodologies to measure automaticity/speed of processing.)

Connections between lexical items in input or in the mind

Word association research has illustrated the connections between lexical items in the mental lexicon. Research has also shown that lexical items are not processed in isolation, but are affected by their surrounding context. This is particularly true of the preceding context. For example, when read-ing a James Bond story about international espionage, words like spy are recognized more quickly than non-espionage words of the same frequency level. That is, the story context primes the word spy, i.e. facilitates its process-ing in terms of speed and accuracy.

Quote 2.11 de Bot on lexical automaticity in speaking

When we consider that the average rate of speech is 150 words per minute, with peak rates of about 300 words per minute, this means that we have about 200 to 400 milliseconds to choose a word when we are speaking. In other words: 2 to 5 times a second we have to make the right choice from those 30,000 words [in the productive lexicon]. And usually we are successful; it is estimated that the proba-bility of making the wrong choice is one in a thousand.

(de Bot 1992: 11)

Quote 2.12 McDonough and Trofimovich on priming

Priming refers to the phenomenon in which prior exposure to language somehow influences subsequent language ... Priming is believed to be an implicit process that for the most part occurs with little awareness on the part of individual lan-guage users ... In other words, the exact forms and meanings that speakers use can be affected by the language that occurred in discourse they recently engaged in.

(2008: 1–2)



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


Priming can accrue from either repetition of form, or from meaning rela-tionships between words. In terms of form, people have a tendency to process a word or word combination more quickly and more accurately when they have had previous exposure to that word or word combination. For example, if participants listen to a list of words such as glasses, chair, and picture spoken one at a time, and then are asked to listen and repeat words like mug, printer, and chair, they will repeat the words that were on the initial list (chair) more quickly and accurately than the words that did not appear on that list (mug and printer). Similarly, there is a tendency for people to process a word more quickly and more accurately when they have been previously exposed to a word that is related in meaning (semantic priming). For example, partici-pants will correctly identify boy as a word more quickly if they recently read the word girl as opposed to an unrelated word like road (McDonough and Trofimovich, 2008). It is useful to note, however, that priming effects are short-lived and last a matter of seconds rather than days or weeks.

Because priming is a well-known and robust effect, it can be used to deter-mine the effects of repetition in language exposure, and whether there are semantic relationships between words. This can be illustrated by a study inquiring into whether L2 learners form collocational memory traces. Durrant and Schmitt (in press) selected a number of adjective-noun com-binations which could logically occur together, but do not in fact do so in the BNC. Therefore the word combinations (wonderful drink) are not collo-cations in the English speech community, but were plausible nonetheless. Non-collocation pairs were selected so that participants would have no prior collocational mental link between the two words, as the purpose of the experiment was to determine if these were formed from exposure. Durrant and Schmitt exposed their L2 participants to various adjective-noun combi-nations in sentence contexts, including the non-collocation pairs. They then administered a priming test to determine whether using the adjective com-ponent of a noncollocation pair (e.g. wonderful) as a prime allowed the partic-ipants to produce the related noun in a completion task (dr— — —). Durrant and Schmitt found that even one exposure to the noncollocation pair in the input stage led to a small, but significant, facilitation of noun completion. In other words, even one exposure to a particular word combination led to an initial collocational memory trace. Furthermore, they found that two repeti-tions of the word combinations led to a large facilitation effect. Thus the very sensitive priming paradigm allowed the measurement of the very initial stages of collocation formation. (For a good source for advice on a number of different priming techniques, including a discussion of language studies using them, see McDonough and Trofimovich, 2008).

Processing

A number of techniques now make it possible to ‘look inside’ the brain (both figuratively and literally), and to gain insights into how language is



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


processed. They are potentially some of the most exciting ways of exploring vocabulary questions which have long evaded answers up until now, such as whether different types of vocabulary are stored in different areas of the brain, and whether acquisition/attrition involves physiological change in the brain. Below are three techniques with considerable potential for researchers with the vision to use them creatively.

Eye movement studies track the movement of the eye while doing linguistic tasks. The apparatus can either track the eye as it reads text on a monitor screen, or note which of several picture images it flicks to first when doing a language task. One major advantage is that the participant does not have to do anything (like push a button), which eliminates any physiological ‘noise’ out of the experiment (e.g. slow reflexes, pushing the wrong button by mistake). If the task focuses on reading, the eye-tracking paradigm is as close to normal reading as is possible in an experimental setting (Duyck, van Assche, Drighe, and Hartsuiker, 2007).

Reading-based eye movement studies

To illustrate an eye movement study which focuses on lexical issues, let us look at a follow-on study to the Conklin and Schmitt (2008) self-timed reading study discussed in Section 5.4. The study is particularly interesting because it demonstrates how the same research questions can be researched with different methodologies.

The researchers (Siyanova, Conklin, and Schmitt, under review) were interested in how formulaic sequences are read in context. While the Conklin and Schmitt self-timed reading methodology showed that formu-laic sequences were read more quickly than non-formulaic sequences, the eye-movement technique provided a much richer description of the reading behavior. This included measures of not only the ‘first pass’ at reading a par-ticular word or phrase (in the region of interest), but also successive rereadings in that region. Siyanova et al. examined the following five measures, which give an indication of the type of information which the eye-movement paradigm provides (illustrated in Figure 2.10).

Total Reading Time (TRT) – the sum of all fixation durations made within a region of interest. This measure indicates how much time the participant spent reading the target lexical items and includes all fixations which landed on those items.

First Pass Reading Time (1PRT) – the sum of all fixation durations made within a region of interest until the point of fixation leaves the region either to the left or to the right (also, known as gaze duration). This measure tells us how long the reader fixated on the target item the first time it was encountered, and excludes any possible rereadings and regressions.



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


Fixation Count (FC) – the number of all fixations made within a given region of interest. This measure indicates how many times the target was fixated upon and includes all possible regressions to and rereadings of the target item.

Regression Path Duration (RPD) – the sum of all fixation durations starting with the first fixation within a region of interest up to but excluding the first fixation to the right of this region. This measure gives us the dura-tions of all fixations that were made on the target item plus all later regres-sions to the left of the target.

Rereading (RR) – the regression path duration for the region of interest minus first pass reading time for this region. Rereading time gives an indication of the time the participant spent rereading the text after having encoun-tered a problem.

Stimuli similar to the Conkin and Schmitt passages were used, contain-ing idioms (left a bad taste in my mouth) and matched control phrases (the bad taste left in his mouth). Whole passages were presented on the monitor screen and the participants were eye-tracked while they read these passages. The eye-movement analysis found that the native-speaking participants processed the idioms significantly faster than the non-formulaic controls, and that there were no differences between figurative and literal readings of idioms, as indicated by TRT, FC, RPD, and RR.5 This result replicates Conklin and Schmitt’s findings, but is much more powerful, as it includes measures not only of total reading time (similar to self-paced reading), but also of the amount of reading time including regressions and rereading, as well as number of fixations. For the nonnative participants, there was no evidence on any of the measures for idioms being processed any differently from the matched controls, but the figurative readings seemed to be read slower than literal readings. The sum result from all of the eye-movement

She’s a lways been as co ld as ice wi th her ch i ldren.

1 2

5 6 783 43 4

Total Reading Time (TRT) = 3 + 4 + 6First Pass Reading Time (1PRT) = 3 + 4Fixation Count (FC) = 3 + 4 + 6Regression Path Duration (RPD) = 3 + 4 + 5 + 6Rereading (RR) = 5 + 6

Figure 2.10 Hypothetical eye-movement record. Shaded area represents the region of interest



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


measures builds a strong case that native speakers and L2 learners process idiomatic language quite differently.

Visual world paradigm

The eye movement methodology can also be used to determine how the eye fixates on non-linguistic features, such as pictures on a screen. This is useful because research has shown that

when participants are simultaneously presented with spoken language whilst viewing a visual scene, their eye movements are very closely synchronised to a range of different linguistic events in the speech stream ... eye movements are closely synchronised to the referential processing of the concurrent linguistic input ... the probability of fix-ating one object amongst several when hearing that object’s name is closely related, and the fixations closely timelocked, to phenomena associated with auditory word recognition. (Altmann and Kamide, 1999: 249)

In other words, when the mind is processing speech, eye movements reflect that underlying processing. This can be used in a number of ways in lexical research, including researching how lexical items constrain the subsequent items which can logically follow, helping the mind to ‘predict’ those subse-quent items, facilitating fast lexical access. For example, the modifiers frizzy and blonde will cumulatively constrain the choices for the follow-on noun down to little more than hair.

Altmann and Kamide (1999) explored whether this type of constraint also applies to verbs. They set up an eye-movement study where partici-pants looked at pictures and listened to a number of sentences. One sen-tence included a verb for which only one of the objects in the picture made sense, while another sentence included a verb for which any of the four or five objects could logically follow. This is exemplified by Figure 2.11, which shows a boy sitting next to a toy car, a ball, a toy train, and a cake. The stimulus sentences for this picture included The boy will move the cake and The boy will eat the cake. The researchers found that the verb eat allowed the participants to shift their fixation to the picture of a cake much faster than did the verb move. Moreover, in 54% of the cases, the shift to the picture of the cake started before the onset of cake in the speech stream. This provides evidence that knowledge of a lexical item is not only used in the recogni-tion and processing of that individual item, but also aids in the processing of downstream items as well.

The visual world paradigm has also been used to look at how knowledge of a first language influences processing in a second language. Conklin, Dijkstra, and van Heuven (under review) looked at processing by high- proficiency Dutch learners of English and an English monolingual control



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


group to determine whether knowledge about words in a L1 influences the processing of words that mean the same thing (either cognates or non-cognates) in the L2. Participants listened to English while viewing a cor-responding visual scene containing a cartoon character (Donald Duck) and an inanimate object (tractor). Interpretation of pronouns was inves-tigated following the presentation of inanimate nouns which have gender in Dutch, but not English (i.e. tractor is masculine in Dutch). To refer to a previously mentioned tractor in Dutch, the masculine singular pronoun hij is used. However, in English he refers to an animate object. Upon hear-ing the pronoun he, the Dutch native speakers had increased looks and fixation durations to inanimate objects for cognates (e.g. tractor) relative to the animate character (Donald Duck) that the English pronoun referred to. However, there were only increased looks to inanimate non-cognates ini-tially (e.g. kite, which is vlieger in Dutch), while the effect for cognates was long-lasting. Monolinguals only had looks to the animate character. Results indicate Dutch learners of English activate information about Dutch gender when processing spoken discourse in English. Further, this demonstrates that the amount of overlap between the two languages influences process-ing, with a closer relationship (i.e. cognateness) leading to more crosslin-guistic influence.

Figure 2.11 Visual world paradigm picture stimulus

(Altmann and Kamide, 1999: 250).



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


Event-related potentials (ERP)

ERP methodology measures the brain’s electrical activity during language comprehension through a number of sensors placed around the scalp (nor-mally attached to a kind of swimcap-type headgear). It gives a very precise millisecond-by-millisecond record of brain activity, and so can provide a window into the online processing of language. The electrical activity is plotted onto graphs with the time element on the x-axis, and the amplitude of the electrical activity plotted on the y-axis, such as in Figure 2.12. The figure illustrates the two major language-based ERP patterns which have emerged in the research.

The first is called N400 (N = negative voltage wave; 400 = 400 millisec-onds after a word is read or heard). (Note that negative waves are indicated above the x-axis and positive waves are below it.) N400 seems to be gener-ated by lexical and semantic processing (both form and meaning), but does not seem to be affected by syntactic variables. Conversely, the second pat-tern (P600: positive 600 millisecond onset) is sensitive to syntactic but not lexical variables, and so will not be discussed further.

The N400 wave is particularly sensitive to semantic anomaly between a target word and its preceding context.6 For example, in Figure 2.12, the word bake, which does not make sense in the context, generates a much higher N400 peak than the word eat, which generates a noticeable, but smaller, peak. For native speakers, the results are usually as follows: ‘the N400 amplitude is largest for pronounceable, orthographically legal nonwords



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


(pseudowords; e.g. flirth), intermediate for words preceded by a semantically unrelated context, and smallest for words preceded by a semantically related context’ (Osterhout et al., 2006: 209). So overall, the N400 effect grows with stimuli that are harder to integrate semantically. It also informs about how well the word is known, in the sense that the N400 amplitude indicates whether a participant has picked up on the congruency/incongruency of the word with its preceding text. This is expanded upon by Kutas, Van Petten, and Kluender (2006: 668), who believe that ‘The correct characterization of the N400 context effect is thus not that anomalous or unrelated words elicit unusual brain responses, but rather that a large negativity between 200 and 500 ms or so (N400) is the default response, and that its amplitude is reduced to the degree trial context aids in the interpretation of a poten-tially meaningful stimulus.’

Figure 2.12 ERP plots showing N400 and P600 phenomena

(Osterhout, McLaughlin, Pitkänen, Frenck-Mestre, and Molinaro, 2006: 204).



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


ERP offers the exciting potential method of measuring implicit lexical knowledge, without being confounded with declarative knowledge. It also offers a method of documenting the incremental development of lexical knowledge over time, particularly at the beginning stages of the learning process. In a very interesting study, Osterhout et al. (2006) show how this can be accomplished using ERP methodology. They studied beginning L2 French learners (L1 = English), and measured their knowledge of French words and nonwords at 14 hours, 60 hours, and 140 hours of instruction. They asked the participants to make word/nonword judgements, and took ERP readings while they did this. At 14 hours they found that the con-scious lexical judgements were at chance level, but that the N400 results were more robust for pseudowords than real words. They interpret this to mean that ‘the French learners rapidly extracted enough information about French word forms so that their brains could discriminate between actual words and pseudowords, even if the learners themselves could not do so’ (2006: 211). In other words, they were learning about word form after only 14 hours.

By the 60th hour of instruction, the participants were learning about meaning, which was indicated by smaller N400s when the target items were preceded by related words than when preceded by unrelated words. By the 140th hour, the amplitude of the N400s approximated native results. In contrast, the conscious lexical judgements remained very poor. Thus, while the learners were not able to demonstrate much learning in the lexical deci-sion task (and so would be unlikely to do well on typical vocabulary size and depth tests either), the ERP methodology was able to show that vocabulary learning was accruing below this ‘conscious threshold’.

ERP would appear to be a very useful technique for tapping into the earli-est stages of vocabulary learning. It can show that a participant has some sort of lexical entry that has a semantic component. For nonnatives, N400 is an ideal indicator of whether learners have started to create a lexical rep-resentation. ERP might also be useful for obtaining measures of implicit vocabulary knowledge which could be contrasted with explicit knowledge obtained from conscious declarative techniques. However, ERP cannot show how complete the knowledge of a lexical item is.

Functional Magnetic Resonance Imaging (fMRI)

Blood flow and blood oxygenation are closely linked to neural activity. This is because neurons in the brain do not have internal reserves of energy, and so after they fire, more energy needs to be brought in quickly via the blood-stream. This causes an increase in blood flow to regions of increased neural activity. These active brain regions take more oxygen out of the blood than less active regions. There is a difference in the magnetic signature between oxygenated or deoxygenated blood, and a fMRI scan can be used to detect this difference. Although the difference is very small, through numerous repetitions of a thought, action or experience, statistical procedures make



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


it possible to determine the areas of the brain which reliably have more of this difference, and therefore which areas of the brain are active during that mental process.

The fMRI technology was developed in the early 1990s, and has become one of the neural imaging methods of choice, because it is non-invasive, safe (not using radiation like some other techniques) and has excellent spatial resolu-tion. In terms of vocabulary research, fMRI can be used as a tool to locate the parts of the brain which are active during various types of lexical processing.

Pulvermüller (2005) gives an illustration of what fMRI can do. He dis-cusses the processing of ‘action’ verbs like lick, pick, and kick. The fMRI scans showed that these verbs were processed not in a single area of the brain, but rather near the areas of the brain which control movement of the tongue, fingers, and feet, respectively (Figure 2.13). In addition, when face-, arm-, and leg-related words were given to participants, they also activated the physiological centers in the brain for that related part of the body, e.g. silently reading kick activated the part of the brain which controls the leg. It thus appears that at least some lexis is intimately connected with the physi-ological response, and part of ‘knowing’ action words includes this auto-matic activation of the brain’s motor control centers. This is early research, but hints that it may be possible to study the learning of word meaning through large-scale neurophysiological techniques. For one introduction to fMRI research methodology, see Jezzard, Matthews, and Smith (2003).

Figure 2.13 fMRI brain location results

(Pulvermüller, 2005: 578).

Movementa Passive reading of action wordsb

Tonguemovements

Fingermovements

Footmovements

Leg-relatedwords

Arm-relatedwords

Face-relatedwords



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

117

3 Formulaic Language

As introduced in Section 1.1.3, we are now aware that vocabulary typically behaves not as single words which are held together by syntax, but rather has a strong tendency to occur in multiple word phraseological units. The phenomenon of formulaic language has long been recognized in the case of idioms, because they have the very noticeable trait of non-compositionality, i.e. the meaning of an idiom cannot be derived from the meaning of its component words. However, as research on formulaic language developed (largely through the use of computerized corpora), it became obvious that it was no mere peripheral feature, but rather was ubiquitous and must be a core characteristic of language (e.g. Biber, Johansson, Leech, Conrad, and Finegan, 1999; Sinclair, 1991).

Not only is formulaic language very common in language overall, a great deal occurs in both spoken and written modes.1 While much of the research has been done on written discourse (largely because it is more easily turned into computer-readable data), it is equally, if not more, important in spoken discourse (Altenberg, 1998; Davou, 2008; McCarthy and Carter, 2002; Kuiper, 2004; O’Keeffe, McCarthy and Carter, 2007). Oppenheim (2000) counted the multi-word stretches of talk that occurred identically in prac-tice and final renderings of a short speech on the same topic. She found that between 48% and 80% (overall mean of 66%) of the spoken output produced by six nonnative participants consisted of these identical strings. Some of this considerable amount must surely be formulaic in nature, but it is impossible to tell from her report how much. Sorhus (1977) calculated that speakers in her corpus of spontaneous Canadian speech used an item of formulaic language once every five words. (This includes one-word fillers, such as eh, well, and OK, but even without these, there is still a very high frequency of formulaic sequences like for example, at times, and a lot of.) Erman and Warren (2000) calculated that 52–58% of the language they ana-lyzed was formulaic, and Foster (2001) came up with a figure of 32% using different procedures and criteria. Biber et al. (1999) found that around 30% of the words in their conversation corpus consisted of lexical bundles, and



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


about 21% of their academic prose corpus. Howarth (1998) looked at frequent verbs in a social science/academic corpus and found that they occurred in either restricted collocations or in idioms in 31–40% of the cases. Rayson (2008) found that 15% of text is formulaic according to a Wmatrix analysis (see Section 6.3).

Furthermore, formulaic language has been found in a range of languages, including Russian, French, Spanish, Italian, German, Swedish, Polish, Arabic, Hebrew, Turkish, Greek, and Chinese (Conklin and Schmitt, 2008). Although this does not prove that formulaic language is a universal trait of all languages (and most of the research has still been done on English), the widespread existence of formulaicity in the above languages strongly sug-gests that it is a common phenomenon.

Being such a big part of language, it is not surprising that formulaic lan-guage as a category is not homogenous (although many researchers treat it as if it is). It realizes different purposes in language use, including transact-ing routinized meanings (That’ll be X dollars = typical way for American shopkeepers to state the cost of a bill), lexicalizing various functions (Pardon me = a short conventionalized form of apologizing), and smoothing social interaction (yeah, it is = a routinized way of agreeing with an interlocutor’s assertion) (Schmitt and Carter, 2004). It also provides the building blocks upon which one can create more extended strings of language (e.g. with collocations (valid point), and with lexical bundles: it should be noted that = a standard academic phrase which highlights a point of interest (Biber et al., 1999)).

Idioms are one type of formulaic language which, being very salient, have attracted perhaps the greatest amount of research. Although they are gen-erally not frequent as individual items, there are a large number of them, with Moon (1997: 48) noting that ‘the largest specialist dictionaries of English multi-word items ... contain some 15,000 phrasal verbs, idioms and fixed phrases, but the total number of multi-word items in current English is clearly much higher’. Despite the low frequencies of the individual items, such large numbers of idioms inevitably means that they are going to be a noticeable element in language, at least in some forms of discourse. For example, Nippold (1991, cited in Cain, Oakhill, and Lemmon, 2005) found that 6–10% of sentences in (American) reading programme books designed for 8–12-year-olds contained idiomatic expressions. Moreover, in genre- specific corpora (e.g. meetings, supermarket checkout-operator talk), the frequency of the central formulaic sequences, including idioms, can be very high (Keller, 1981; Kuiper and Flindall, 2000). This means that, while there are all-purpose formulaic sequences which might cross genre boundaries, each individual genre might have its own formulaic characteristics with its own particular formulaic sequences.

However, as Kuiper, Columbus, and Schmitt (2009) note, idioms are only a small part of the phrasal lexicon of both a language and individual speakers



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

Formulaic Language 119

of a language. In fact, there seem to be many types of formulaic language, varying in degree of fixedness, institutionalism/conventionality, and opa-city/non-compositionality. This lack of homogeneity is one reason for the wide range terminology in the area. For example, when researching what she termed formulaic sequences, Wray (2002: 9) found over 50 terms to de-scribe the phenomenon of formulaic language, such as:

chunks formulaic speech multi-word unitscollocations formulas prefabricated routinesconventionalized forms holophrases ready-made utterances.

Just as the five blind men of Hindustan who went out to learn about an ele-phant felt different parts of the elephant’s body and came to very different conclusions about what an elephant is like, researchers seem to be looking at different aspects of formulaic language and using terminology to make sense for that aspect. For example, Nattinger and DeCarrico (1992) stressed the relationship of formulaic language and their functional usage and called the forms lexical phrases. Work on collocations mainly look at the relation-ships between two-word pairs. Terms like prefabricated expressions and chunks focus on the holistic storage of the forms. However, when looking at formu-laic language as a phenomenon, one must include all of these types.

Moreover, one reason that formulaic language is so widespread is that they realize a wide number of referential, textual, and communicative functions in discourse. They can be used to express a concept (Get out of Dodge = get out of town quickly, usually in uncomfortable circumstances), state a com-monly believed truth or advice (Too many cooks spoil the broth = it is difficult to get a number of people to work well together), provide phatic expressions which facilitate social interaction (Nice weather today is a non-intrusive way to open a conversation), signpost discourse organization (on the other hand signals an alternative viewpoint), and provide technical phraseology which can transact information in a precise and efficient manner (two-mile final is a specific location in an aircraft landing pattern) (Schmitt and Carter, 2004). Likewise, Nattinger and DeCarrico (1992) argue that formulaic lan-guage fulfils the functions of maintaining conversations (How are you?, See you later), describing the purposes for which the conversations take place (I’m sorry to hear about X, Would you like to X?), and realizing the topics neces-sary in daily conversations (When is X? (time), How far is X? (location)). In fact, one might suppose that for every conventional activity or function in a culture, there will be associated phrasal vocabulary. If that is so, there are bound to be a large number of formulaic expressions, perhaps even a larger number than that of single word vocabulary.

Formulaic sequences become particularly important in language use when we consider their pragmatic value. For instance, they are very often used to accomplish recurrent communication needs. These recurrent communicative



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


needs typically have conventionalized language attached to them, such as I’m (very) sorry to hear about ——— to express sympathy and I’d be happy/glad to ——— to comply with a request (Nattinger and DeCarrico, 1992: 62–63). Because members of a speech community know these expressions, they serve as a quick and reliable way to achieve the desired communicative effect. Formulaic sequences also realize a variety of conversational routines and gambits and discourse objectives (Coulmas, 1979, 1981). They are typically used for particular purposes and are inserted in particular places in discourse. For instance, formulaic sequences regularly occur at places of topic-transition and as summaries of gist (Drew and Holt, 1998). Most (all?) conventional speech acts are realized by families of formulaic language and not normally by original expressions (I’m very sorry versus I am feeling apologetic towards you). Overall, understanding the pragmatic role of formulaic language can tell us much about the nature of interaction (McCarthy and Carter, 2002).

Moreover, formulaic sequences do more than just carry denotative mean-ing and realize pragmatic function. They can often have a type of register marking called semantic/collocational prosody (Hunston, 2007; Sinclair, 2004; Stubbs, 2002). This is often negative, for example, the verb cause frequently has a negative evaluation (cause pain, cause inflation). However, semantic prosody can also be positive, as in collocations that form around the word provide (provide information, provide services). This semantic prosody is one means of showing a speaker/writer’s attitude or evaluation. For example, his/her stance can be indicated concerning the knowledge status of the prop-osition following the formulaic item (I don’t know if X indicates uncertainty about X), his/her attitude towards an action or event (I want you to X shows a positive attitude towards this action), and his/her desire to avoid personal attribution (it is possible to avoids a directly attributable suggestion) (Biber, Conrad, and Cortes 2004). Likewise, the choice of formulaic sequences can reflect an author’s style and voice (Gläser, 1998). Formulaic sequences can also be used to encode cultural ideas, as Teliya, Bragina, Oparina, and Sandomirskaya (1998) have demonstrated for Russian.

3.1 Identification

The extent and diversity of formulaic language makes it very tricky to de-fine and identify. In fact, identification is probably the biggest problem in researching formulaic language. Definitions like Wray’s (2002: 9) oft-cited one for formulaic sequences were intentionally broad and inclusive, and was meant to capture the widest range of formulaic language possible:

a sequence, continuous or discontinuous, of words or other elements, which is, or appears to be, prefabricated: that is, stored and retrieved whole from memory at the time of use, rather than being subject to generation or analysis by the language grammar.



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


However, for the purposes of identification, tighter definitions are helpful in guiding the selection decisions. Wray (2008:12) proposes a more restricted definition, termed morpheme equivalent unit:

a word or word string, whether incomplete or including gaps for inserted vari-able items, that is processed like a morpheme, that is, without recourse to any form-meaning matching of any subparts it may have.

There have been several approaches to identification based on various re-search purposes, and on the richness of the data available (e.g. is it possible to access phonological information?). It is possible to discern at least four of these. In L1 acquisition studies, the criteria tend to focus around the lexis that children repeat. Much of this is formulaic in nature. Sometimes the formulaic sequences are fused strings which the child has constructed from grammar rules and lexical items and stored whole for later use, and which may or may not be fully adult-like. Sometimes the sequences are extracted from the input the child receives, and which may or may not yet be fully analyzed into the component words (Wray, 2002).

The acquisition approach is related to what might be called the ‘psycholin-guistic’ approach, where formulaic language is assumed to be holistically stored in the mind. There is evidence for this on the phonological front: formulaic sequences are typically spoken more fluently, with a coherent in-tonation contour, to the extent that this has been accepted as one criterion of formulaicity (e.g. van Lancker, Canter, and Terbeek, 1981; Peters, 1983: 10). This criterion means that there should be no hesitation pauses within the chunk when it is spoken (Kuiper, 1996), and neither should there be any internal errors or transformations (to be honest with you; *to be honest [pause] with you; *to be honest on you; *to be with you honestly), although Nooteboom (1999) notes that the pronunciation may be more ‘sloppy’, possibly because chunks may not get as much attentional resources as novel utterances do in production. (However, see Ashby, 2006, and Lin (in preparation) for a somewhat different view on phrasal prosody, and for possible differences between child and adult formulaic speech.)

Quote 3.1 Kuiper on the nature of formulaic language

Koenraad Kuiper was most helpful in pointing out that there are two underlying properties which define [formulaic sequences]: a) the units of formulaic language are not merely any sequence of words, but phrases, and b) they are lexical items exactly like other lexical items such as words, and with the same properties as words would have if they were phrases.

(In Schmitt and Carter, 2004: 4)



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


Although hesitation pauses and errors are relatively easy to identify, vari-ation within a formulaic sequence is a trickier issue. While we would indeed expect a chunk that is preformulated in a speaker’s mind to be articulated without any changes, how do we know whether it is in fact preformulated? Using a criterion of ‘no transformation’ would be circular. To get around this, we often use corpus data to establish norms of the way formulaic sequences are used in the speech community. The problem is that despite the overall idea that formulaic language is fixed, in fact it tends to be quite variable. For example, Moon (1997, 1998) shows that formulaic language (illustrated here with idioms) can vary across a number of factors:

British/American variations: ●

not touch someone/something with a bargepole (British) not touch someone/something with a ten foot pole (American)Varying lexical component: ● burn your boats/bridgesUnstable verbs: ● show/declare/reveal your true coloursTruncation: ● every cloud has a silver lining/silver liningTransformation: ● break the ice/ice-breaker/ice-breaking

(Moon, 1997: 53)

This widespread tolerance for different types of variation makes it difficult to define the ‘standard form’ of any formulaic sequence, and therefore a criterion of whether a person’s output of that sequence is appropriate or not. This means phonological features are usually a more practical iden-tification criteria. Thus the psycholinguistic approach mainly focuses on spoken discourse, because of the time-sensitive nature of the on-line output. Also, it usually works with spontaneous data, because a chance to rehearse may allow the production of strings resembling formulaic sequences even though they are not preformulated.

Another method to identification, which could be called a ‘phraseo-logical approach’, has been taken up by scholars of the ‘Russian tradition’ like Vinogradov, Amosova, Kunin, Mel’cuk and Cowie, who define for-mulaic language in terms of transparency and substitutability (see Cowie, 1998, Chapters 1 and 10). For example, they look at words like cut and slash, and note how they are constrained by particular collocation restrictions: cut/*slash one’s throat versus slash/*cut one’s wrists. They might also note that several modes of transportation collocate with the verb drive: drive a car/bus/truck, but some do not: drive a *bicycle/*motorcycle.

However, the phraseological approach to identification is problematic for several reasons (Durrant, 2008). First, it is not easy to operationalize the criteria of transparency and substitutability (see, e.g. Nesselhauf, 2005: 25–33). Second, phraseological approaches rely on human analysts to identify formulaic lan-guage. This makes analysis extremely labour intensive, and so only suitable for limited enquiries. It is difficult to see, for instance, how an entire corpus of any



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


size could be analyzed. Third, the human analysis also makes the process rather subjective. Unless multiple raters were used and coordinated, it would impos-sible to know whether any results were the outcome of an individual analyst’s preferences and prejudices, or whether they had more general validity. It is also not clear if manual identification can capture the whole range of formulaic sequences, e.g. would only ‘interesting’ sequences be identified, or could the process also capture frequent, but perhaps less salient, sequences as well?

By far the most common means of identifying formulaic language is through corpus statistics.2 The main idea here is identifying sequences which recur in a corpus, based on the underlying criterion of frequency, and this has the great advantage of being easily automated. To extract formulaic sequences, concordancers can be asked to identify all of the word combina-tions according to a predetermined frequency criterion. For example, Biber et al. (1999) interrogated the 40 million word Longman Spoken and Written English Corpus to find strings of various lengths which occurred a min-imum number of times (five to ten). This produced numerous three-word (I don’t think; in order to), four-word (I don’t want to; in the case of ), five-word (I see what you mean; the aim of this study), and six-word (do you know what I mean; from the point of view of ) combinations. Taken together, these combi-nations were very frequent, making up about 30% of conversation (almost 45% if two-word contractions are included (I don’t)) and about 21% of aca-demic prose. Biber et al. refer to these combinations as lexical bundles, but other scholars refer to the results from this kind of procedure N-Grams, i.e. fixed strings of N length. It is interesting to note that this kind of procedure tends to produce strings which are not complete structural units (e.g. the end of the), as opposed to the Russian school, which focuses on formulaic language strings which relate to identifiable meaning or function correlates, as so tend to be structurally complete (e.g. on the other hand is structurally complete because the form fully realizes the notion of contrast).

Simple frequency can also be used to inform whether word combin-ations are collocations or not. In this approach, the concordancers count

Concept 3.1 Concordances and concordancers

Concordances (or concordance lines) are lines of data from a corpus which are lined up so that comparison of the queried item (node) is facilitated. (See Chapter 6, Research Project 6, for an example of concordance lines.) This allows quick and easy comparison of the immediate context surrounding the node item. Concordancers are the software programs which search the corpus and sort the data for the user into concordance lines. However, the term concordancer has become the generic name for software programs which do all types of corpus enquiry. Thus concordancers typically build frequency lists, compare texts, and various other things, as well as building concordance lines.



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


how many times these combinations occur in a corpus. For example, nice day occurs 189 times in the 170 million word New Longman Corpus, while crazy day occurs only four times. On the basis of this, one might assume that nice day is likely to be a collocation, while crazy day is much less so. However, this method suffers from two problems. First, the most frequent combinations consist of function words (which are the most frequent words in language) (Hunston, 2002: 69–70) and so the results (e.g. of the (1,069,458 occurrences); and a (86,772)) will be made up of words which co-occur by chance simply because they are so frequent, and not because they have an interesting relationship. Second, this method also misses real collocations at the other end of the frequency spectrum simply, because the combinations are so infrequent (cloven hoof (4), sheer lunacy (7)).

To get around these problems, researchers usually use strength of associ-ation measures. These measures compute the likelihood of two words occur-ring together as opposed to the likelihood of their occurring separately. However, there are two conceptually-different approaches to making these calculations (asymptotic hypothesis tests and mutual information), and the resulting collocation lists can be quite different (Durrant, 2008). The differ-ent methods, their formulas, and characteristics are discussed below.

3.2 Strength of association – hypothesis tests

The principal ‘hypothesis testing’ strength of association measures include z-score, t-score, chi-squared and log-likelihood tests. These all test the null hypothesis that words appear together no more frequently than we would expect by chance alone. All of these methods start by calculating how many times we would expect to find word pairs together in a corpus of a certain size by chance alone, given the frequencies of their component words. This is calculated by first determining the probability that a word combination, if chosen at random from the corpus, would occur:

P(Word 1Word 2) = P(Word 1) * P(Word 2)

This simply states that the probability that any randomly selected pair of words will be the combination (Word 1 Word 2) is equal to the probability of Word 1 occurring on its own multiplied by the probability of Word 2 occur-ring on its own. In the case of black coffee, the word black appears in the New Longman Corpus 44,422 times, and the word coffee appears 10,305 times. Since the New Longman Corpus consists of 179,012,415 words, the probabil-ities of occurrence for each of these words are:

44,422( ) = 0.00025

179,012,415P black ≈



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


10,305( ) 0.000058

179,012,415P coffee = ≈

Thus, if we selected any word at random from the New Longman Corpus, the probability that it would be black is 0.00025 and the probability that it would be coffee is 0.000058. Using the above formula, we can calculate that the probability that a two-word combination, picked at random from the New Longman Corpus, would be the pair black coffee is:

P(black coffee) = 0.00025 * 0.000058 = 1.45e-08

This is a very low probability, but given the very large size of the New Longman Corpus, the word combination black coffee will still occur by chance alone at some point. By multiplying the above probability by the size of the corpus, we can predict that it will occur about 2.60 times:

1.45e-08 * 179,012,415 ≈ 2.60 times

Consulting the New Longman Corpus, we find that in fact it occurs 139 times. This strongly suggests that the pair collocates more frequently than by chance. However, in research, results that appear obvious are not always reliable. The various hypothesis testing methods determine whether the apparent differences are statistically significant, thus giving a much firmer basis for interpretation. The following formulas and discussion are taken from Durrant (2008) and Manning and Schütze (1999: 162–163), and Evert (2004).

We can calculate the z-score with the formula:

–-

O Ez score

E=

O = the observed frequency of occurrence of the combinationE = the expected frequency of occurrence based on the null hypothesis that

there is no relationship between the words

For black coffee, the figures are:

139 – 2.60- = = 84.59

2.60z score

A problem with z-score is that, because it takes expected occurrence as its denominator, a misleadingly high score can be returned if the words



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


involved are infrequent in the corpus (Evert, 2004). The t-score test tries to avoid this problem by taking observed occurrence as the denominator. It is calculated as follows:

-O – E

t score = O

Thus, for black coffee:

139 – 2.60- = = 11.57

139t score

Both of these statistics can be criticized on the grounds that they assume an approximately normal distribution of results. However, this is prob-ably not the case for rare events like collocations (Dunning, 1993). The hypothesis testing methods conceive of a corpus as a series of bigrams, each of which may have a value of 1 (the bigram is the word pair being examined) or 0 (the word pair is not the word pair being examined). Two-outcome tests of this sort (analogous to a series of coin tosses) generate a binomial distribution. Where the mean number of positive outcomes is relatively high (as in the case of getting heads from a coin toss), the binomial distribution approximates the normal distribution. However, where the mean number of outcomes is relatively low (as in the case of collocation), the binomial distribution is heavily skewed (Dunning, 1993: 64–65).

One way of getting around this problem is by using non-parametric tests, which do not rely on the assumption of normality. One such test is Pearson’s chi-squared. This relies on the following 2 × 2 contingency tables showing the expected and observed occurrences in the corpus of each word and its collocate:

Expected:

Word 2 = X Word 2 ≠ X

Word 1 = YE

R C

N11

1 1= ER C

N12

1 2=

Word 1 ≠ YE

R C

N21

2 1=

ER C

N22

2 2=



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


If we plug in values for the word pair black coffee, we get the tables:

Expected:

Observed:

On the basis of these tables, chi-squared is calculated as follows:

2 11 11

11 22

2( – )=

N O Ex

E E


22 179,012,415(139 – 2.6)

= = 7282.32.6 *178,957,690.6

x

Word 2 = X Word 2 ≠ X

Word 1 = Y O11 O12 = R1

Word 1 ≠ Y O21 O22 = R2

= C1 = C2 = N

Observed:

Word 2 = coffee Word 2 ≠ coffee

Word 1 = black 2.6 44,419.4

Word 1 ≠ black 10,302.4 178,957,690.6

Word 2 = coffee Word 2 ≠ coffee

Word 1 = black 139 44,283 R1 = 44,422Word 1 ≠ black 10,166 178,957,827 R2 = 178,967,993

C1 =10,305 C2 =179,002,110 N = 179,012,415



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


All procedures seem to have some drawback, and chi-squared is known to be inaccurate when small numbers are involved. Dunning (1993) therefore recommends using the log-likelihood ratio instead, which is more robust at lower frequencies. Log-likelihood also makes use of the contingency tables described above, and is calculated as follows (this version of the equation comes from Evert (2004)):

log likelihood ln - 2= ∑ ij

ij

ij ij

O

O

E


2

139 10,166139 * ln + 10,166 * ln

2.6 10,302.4*

44,283 178,957,827+ 44,283 * ln + 178,957,827 * ln

44,419.4 178,957,690.6

= 840.1

⎡ ⎤⎛ ⎞⎛ ⎞⎛ ⎞⎛ ⎞⎜ ⎟⎢ ⎥⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠ ⎝ ⎠⎝ ⎠⎢ ⎥

⎢ ⎥⎛ ⎞ ⎛ ⎞⎛ ⎞ ⎛ ⎞⎢ ⎥⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠ ⎝ ⎠⎢ ⎥⎣ ⎦

As noted above, the rationale behind all of these statistics is that of test-ing the null hypothesis that a word combination appears together no more frequently than we would expect by chance alone. If we take this concep-tion literally, we can consult tables of critical values to determine a confi-dence level in rejecting the null hypothesis. A t-score of greater than 2.576, e.g. would enable us to reject the null hypothesis with 99.5% confidence (Manning and Schütze, 1999: 164). However, Durrant (2008) cautions us to note exactly what is meant by a word pair’s being more frequent that we would expect ‘by chance’. The calculation of expected occurrence in the above formulas is based on a model in which words are drawn entirely at random, as if from a hat. But language is far more regular than a ‘random word generator’ (Manning and Schütze, 1999: 166). Semantics, grammar, discourse organization, and real-world occurrences all mold and constrain the construction of language, and so it is not uncommon for word com-binations to co-occur ‘more frequently than random’, regardless of colloca-tional relations.

With this in mind, levels of ‘statistical significance’ are probably not best used as cut-off points in identifying collocations. Rather, they are much more useful in ranking word combinations according to their rela-tive collocational strength (Durrant, 2008; Manning and Schütze, 1999: 166; Stubbs, 1995: 33). In fact it is difficult to set minimum scores for the



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


identification of collocations, although a figure of 2 has been suggested for t-score (Hunston, 2002). However, one interesting study looked at this issue by comparing several strength of association measures and deter-mining how well they predicted word association results. Durrant (2008) calculated the minimum score from each measure for predicting which word pairs would be linked on association lists (i.e. whether one word pair member as stimulus would elicit the other word pair member as a response in association tasks). He did this using all association responses (including any association given by at least one respondent), and also with only more ‘robust’ associations (those given by at least 5% of the respond-ents). The minimum scores are given in Table 3.1, and provide some initial guidance as to the lower end of the values which could be acceptable for identifying collocations, at least those that have psychological (word asso-ciation) correlates.

Values for identifying collocations that describe textual collocation, but not psychological association, may be different, as collocation cannot always be equated with association. Of course they are often linked, but there are also many associations which are not strong collocations. Reanalyzing two widely-used sets of association norms, Hutchison (2003: 787) finds only 15.7% of associates to be ‘phrasal associates’. Similarly, Fitzpatrick (2006) found only about a quarter of native-speaker word associations to be based on collocation. This merely reinforces the point that formulaic language (including collocation) is difficult to unambiguously identify, and that few firm selection criteria currently exist.

Table 3.1 Scores at which each measure becomes informative about collocations which are also word associations

Measure All associations a Robust associations

Raw frequency 16 16

Z-score 38 56

T-score 3.9 4.2

Chi-squared 1,520 3,112

Log-likelihood 60 142

MIb 3.7 5.0

Conditional probabilityb 0.21 .056

a Occurrences per 100 million words.b See discussion of these statistics below.(Adapted from Durrant, 2008).



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


3.3 Strength of association – mutual information

The other strength of association approach is mutual information (MI) (Church and Hanks, 1990). It employs the following formula:

MI O

E= log2

For black coffee, the figures look like this:

MI log139

2.6 = 5.8= 2

Although the formula also compares expected and observed occurrences of word pairs, the results are often quite different from the hypothesis testing statistics. Mutual information can be thought of as a ‘measure of how much one word tells us about the other’ (Manning and Schütze, 1999: 178). That is, when we encounter one part of a word pair which has a high mutual information score, there is a high probability that the other member of the pair is nearby. This is fundamentally different from the hypothesis test-ing methods described above. Clear (1993: 279–282) lucidly spells out this difference: ‘MI is a measure of the strength of association between two words’, while hypothesis testing methods are measures of ‘the confidence with which we can claim there is some association’ (emphases in original). The practical effect of this is that different types of word pairs tend to be retrieved by the two methods.

This can be illustrated with a word pair having a high MI score: tec-tonic plates. This pair is not particularly frequent in general English, and occurs 73 times in the New Longman Corpus. (Note this figure has been pushed higher than one might expect by the science component of the corpus.) However, crucially, 73 out of the 214 occurrences of tectonic in the corpus appear with the word plate(s). This means the two words are strongly associated, because where we find tectonic, we are also likely to find plate(s). Conversely, every day is typical of a pair with a high score based on the hypothesis testing statistics. The pair appears together much more frequently (5,991 occurrences) than tectonic plate, and thus the connection is more reliable, even though the strength of association between the two words is weaker. In short, MI tends to highlight word pairs which may have relatively low frequency, but which are relatively ‘exclusive’ to one another; hypothesis testing methods highlight items which maybe less closely associated but which occur with relatively high frequency.



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


A commonly cited threshold for statistical significance for MI is 3 (e.g. Hunston, 2002; Evert and Krenn, 2001). However, it is important to note that, like z-score, MI uses ‘expected occurrence’ as its denominator. It can therefore give very high scores for collocations which include low-frequency words, even if the total number of occurrences of the collocation is very low. To safeguard against accepting word pairs as strong collocations on the basis of minimal evidence, MI needs to be used in conjunction with a mini-mum frequency threshold3 (e.g. Church and Hanks, 1990: 24). Minimum figures of 3–5 occurrences have been suggested (e.g. Church and Hanks, 1990; Clear, 1993; Stubbs, 1995). Another suggestion is that MI be used in conjunction with a minimum t-score threshold to ensure the MI colloca-tions are valid (Church and Hanks, 1990) with Stubbs (1995) suggesting a t-score cut-off of 2.

It is debatable which strength of association measure is the best, although t-score and MI are probably the ones most commonly used. The choice will depend on what type of collocation you wish to work with. Hypothesis test-ing statistics like t-score tend to highlight frequent collocations made up of relatively frequent words (e.g. fresh air), while MI score tends to highlight collocations made up of less frequent words, but those with stronger and more exclusive links (cloven hoof ).

3.4 A directional measure of collocation

The measures of collocation discussed above are all non-directional, in the sense that it makes no difference which part of the word pair is taken as node and which as collocate. However, for some collocations, direc-tionality may be a feature. Stubbs (1995: 35) points out that though the pair kith and kin have the same score on all of the measures regardless of which word is taken as the node, the relationship between the two words is clearly not symmetrical: kith predicts kin with virtually 100% certainty, whereas kin can stand alone. The same asymmetry is found with to and fro, and starlit night. The non-directionality of the above measures may not be problematic in most corpus research, after all, flexibility is a feature of many (most?) collocations: suspicious circumstances; the circumstances were suspicious. The situation is different when identifying collocations for use in pyscholinguistic experiments. In this case, one member of a word pair may well prime the other better than vice versa. For example, it seems highly likely that any associative links running from kith to kin will be stronger that those running in the opposite direction, and this could have an effect in experiments using techniques like word associations or timed judgements. It would therefore be useful to have a statistic which could describe any directionality bias.

Durrant (2008) proposes one potential procedure for calculating the con-ditional probability of one word, given the other. He suggests dividing the



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


frequency of the word pair by the frequency of the node. Since the condi-tional probabilities will usually be rather small, this figure is multiplied by 100 for ease of reading:

|

Frequency of the Word Pair100

Frequency of Node

P (Collocate Word Node Word) =

×

For kith and kin, the New Longman Corpus frequencies are kith (17), kin (994), and kith and kin (16). Thus, the conditional probability of the collocate kith, given the node kin, is:

16100 = 1.61

994×

Conversely, the conditional probability of the collocate kin, given the node kith, is:

×16

100 = 94.1217

This provides quantitative evidence supporting our previous impression that the associative links running from kith to kin will likely be much stronger that those running from kin to kith.

3.5 Formulaic language with open slots

Up until now we have looked at procedures for analyzing the fixed elements of formulaic language. However, as Moon (1997, 1998) shows, there is ac-tually much about formulaic language that is not fixed. One of the most important types of non-fixed formulaic language is the ‘open slot’ variety. This type combines a number of words which are frozen, but also allows variety in one or more slots. These slots can be filled with various words or phrases, but they also involve semantic constraints. For example, the phrase a(n) ——— ago is usually completed with a word or words which have the meaning of ‘time’, e.g. hour, year, very long time. The phrase is a common way of expressing a particular meaning, i.e. signifying a point in the past. The useful thing about this phrase is that the slot allows the ‘time point’ to be adjusted, and so is maximally useful in describing many different temporal settings.

We can see the same thing in a longer formulaic item with two slots:

——— thinks nothing of ———. Again we find that the slots are semantically



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


constrained, with first being filled an ‘animate object’, and the second with some ‘activity which is surprising, unexpected, or unusual’. It is commonly used to express the meaning ‘someone habitually does something which we would not expect’. It is very flexible in allowing us to express this under-lying meaning about a wide range of situations:

Diane thinks nothing of running 20 miles before breakfast.

He thinks nothing of sleeping 15 hours a day.

The body builder thinks nothing of having eight raw eggs for a post-workout snack.

The semantic constraints of each slot are easy to see if they are violated, as the results are either amusing or strange:

The house thinks nothing of standing on the side of a cliff. (not animate)She thinks nothing of eating lunch every day. (not very unusual)

In teaching seminars (although not in his published work), Sinclair called these flexible open slot phrases variable expressions and argued that they are very widespread, simply because they realize semantic concepts that people commonly wish to use, and because their flexibility allows their use in a wide range of contexts. They may well be a major (perhaps the major) com-ponent of language, but research is still embryonic (See Sinclair, 2004, for some analyses of variable expressions.) The main problem with researching variable expression is that their variable slots make them difficult to iden-tify and describe. Unlike N-grams, where computers can be told to auto-matically extract contiguous strings of various lengths, the slots in variable expressions can be filled with a wide variety of different words. As com-puters cannot currently search for semantic categories (such as ‘person’ or ‘unusual activity’), this type of analysis needs to be done manually. It can take hours to identify and describe a single variable expression, and this can severely limit the scope of any study. An alternative is to develop semantic/functional tagging of corpora, so that concordancers can use these tags in their searches. This approach is now being pursued with the International Corpus of Learner English by Sylviane Granger (personal communication). (See also Fellbaum, 2007.)

There is one automatized approach which has considerable promise. It is called ConcGrams, and is designed to find ‘all of the permutations of con-stituency variation and positional variation generated by the association of two or more words’ (Cheng, Greaves, and Warren, 2006: 414). That is, the program searches for the patterning which forms around a number of speci-fied words, rather than only a single node word. Furthermore, the patterning does not have to be contiguous, but rather within a preset span (e.g. +/– 5 words). The ConcGram procedure, being automatized, has real potential for



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


exploring the extent of variable expression in language. This is especially true as it is available on the web and is now part of a mainstream concordan-cing software package, Mike Scott’s WordSmith Tools (see Section 6.3).

Variable expressions may be part of even more widespread patterning in language. Extensive phraseology has been a prominent feature of recent accounts of the interrelationship between lexis and grammar. For example, Hunston and Francis (2000) propose a description of language in terms of patterns. They give the example of the word matter, which is found often to occur in the expression ‘a matter of -ing’ (as in a matter of developing skills; a matter of learning a body of information; a matter of being able to reason co-herently). The structure ‘a ——— of -ing’ may therefore be described as a characteristic pattern of matter. It is possible that this type of patterning (combining some fixed and some open components) will turn out to be the major feature of the lexico-grammatical system of language organization.

3.6 Processing formulaic language

Formulaic language is very common in language overall, but one might ask why it is so widespread? The answer must be that it achieves some useful pur-pose in communication. Several of these purposes can be discerned. First, the syntagmatic aspects of phraseology help to shape, define, and enhance meaning, following Firth’s (1935) proposal that some of a word’s meaning is derived from the sequences in which it resides. This can be shown with the word border. In isolation it means ‘the edge or boundary of something’. It might also be assumed that the various inflections of border (bordered, border-ing, borders) carry a similar meaning, but this would be wrong.

Schmitt (2005) looked at the behavior of the border lemma4 in the 100 mil-lion word British National Corpus and came up with the following figures:

From these figures we can see that border and borders (mainly noun forms)

are the most frequent members of the family. This is not at all surprising as most word families have more and less frequent members. However, once we put the words into phrases (in this case by adding the preposition on), the behavior changes dramatically. Only 1–3% of the cases of border and borders occur in combination with on, but about one-quarter of the

BNC frequency X + on Figurative sense

border 8,011 89 (1%)

borders 2,539 84 (3%)

bordering 367 177 (48%) 71%

bordered 356 99 (28%) 75%



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


occurrences of bordered do, as do almost one-half of the occurrences of bor-dering. Clearly there is a strong tendency for bordered and bordering to occur in a pattern with on. But the patterning involves not only the combination of the words; it also affects the meaning. Whereas border and borders almost always refer to the expected or literal meaning of ‘edge or boundary’ (even when in combination with on), in about three-quarters of the cases bordering on and bordered on refer to some ‘figurative’ meaning not to do with edges or boundaries. In fact, when we look at concordance lines from the BNC, we find quite a different usage:

– His passion for self-improvement bordered on the pathological.– But his approach is unconscionable, bordering on criminal.

For further evidence of this usage, here are some other words which occur to the right of bordered/ing on:

a slump arrogance chaosa sulk austerity conspiracyacute alcoholic poisoning bad taste contemptantagonism blackmail crueltyapathy carelessness cynicism

There is clearly a trend here, and I would argue that it stems from an under-lying variable expression that looks something like this:

SOMETHING/ (be) bordered/bordering on AN UNDESIRABLE STATESOMEONE (OFTEN OF MIND)

The point to take from all this is that the lexical patterning is intrinsically linked to meaning, and in this case, changes the meaning of border from ‘boundary’ to ‘nearing an undesirable state’.

Phraseology also serves to separate synonyms. Although their underlying meaning is similar, near synonyms like sheer, pure, complete, utter and abso-lute can be distinguished in terms of their typical collocates (Partington, 1998: Chapter 2). Similarly, Hoey (2005: Chapter 5) shows how the differ-ent meaning senses of polysemous words are systematically distinguished by their characteristic co-occurrences, and how violation of these distinct preferences may lead to ambiguity or humor.

However, the main reason for widespread formulaicity must be that formu-laic language typically is attached to common meanings or functions which people need to use. As we have seen, formulaic language is tightly connected to functional and transactional language use and much of the communica-tive content of language is tied to these phrasal expressions. As such, they ease the cognitive burdens of language production and comprehension;



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


i.e. they represent an easy-to-employ and easy-to-understand convention-alized way of realizing real-world needs.

This ‘easiness’ has often been asserted since Pawley and Syder (1983) and Kuiper and Haggo (1984) made the case for the notion that formulaic sequences offer processing efficiency because single memorized units, even if made up of a sequence of words, are processed more quickly and eas-ily than the same sequences of words which are generated creatively. In effect, the mind uses an abundant resource (long-term memory) to store a number of prefabricated chunks of language that can be used ‘ready-made’ in language production. This compensates for a limited resource (working memory), which can potentially be overloaded when generating language on-line from individual lexical items and syntactic/discourse rules.

The case for this has always looked convincing, but it is only recently that the assertion has been put to empirical test. There is now consider-able converging support for the notion that formulaic language provides processing advantages over creatively generated (i.e. non-formulaic) lan-guage. Results into the processing of idioms (e.g. Gibbs, Bogdanovich, Sykes, and Barr, 1997) provide evidence that L1 readers quickly under-stand formulaic sequences in context and that they are not more diffi-cult to understand than literal speech. Formulaic sequences are read more quickly than non-formulaic equivalents, as shown by eye-movement studies (Underwood, Schmitt, and Galpin, 2004; Siyanova, Conklin, and Schmitt, under review) and self-paced reading tasks (Conklin and Schmitt, 2008). Grammaticality judgements of formulaic items were both faster and more accurate than judgements for matched non-formulaic control strings (Jiang and Nekrasova, 2007).

Finally, looking at actual language use in the real world, Kuiper (1996, 2004) found that the speech of ‘smooth talkers’ (people who need to produce fluent speech under severe time pressure, such as auctioneers and sports announcers) was largely formulaic in nature. This mirrors findings by Dechert (1983), who found that the spoken output of a German learner of English was smoother and more fluent when she used formulaic language. These formulaic sequences were so useful in providing a platform for more fluent and accurate output, that Dechert called them ‘islands of reliability’, suggesting that they may anchor the processes necessary for planning and executing speech in real time.

3.7 Acquisition of formulaic language

The learning of individual words is incremental and each word has its own particular learning burden (Schmitt, 2000; Nation, 1990), and there is no reason to believe that formulaic language is any different in this respect. This would suggest that many formulaic sequences are partially known for a number of exposures until the point where they become mastered.



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


While some may be learned quickly as wholes, especially short salient ones like Go Away!, there are good arguments for why some formulaic sequences are not learned in an ‘all-or-nothing’ manner. Some first language (L1) acquirers seem to acquire an initial phonological mapping of formulaic sequences proceeding from the whole to the individual parts, but with some elements still incompletely grasped, especially the unstressed phon-emic constituents (Wray, 2002, Chapter 6). In these cases, the formu-laic sequences are learned over time, with the later stages of acquisition consisting of ‘filling in’ the gaps in the initial incomplete rendering of the sequence. Likewise, some of the component words in the formulaic sequence, as well as the syntactic structure may not be known initially either. Peters (1983) suggests that these elements may be later extracted from the formulaic sequence through a process of segmentation (see also Myles, Hooper, and Mitchell, 1998).

Another way formulaic sequences are learned over time involves the flex-ible slots in variable expressions. If the formulaic sequences are initially acquired with these slots as part of the structure, one might expect that it would take longer to learn the appropriate language insertions for these slots than to learn the fixed elements of the sequence. Alternatively, if the slots are created when paradigmatic variation is noticed at one location in a previously fully-fixed string, then this learning is also incremental in the sense that a fixed formulaic sequence must first be acquired before it is ana-lyzed to form a formulaic sequence with slots. Moreover, shorter formu-laic sequences can be combined together into longer and more complex formulaic sequences (Peters, 1983: 73), which means that the component formulaic sequences need to be learned as the initial step to acquiring the subsequent formulaic sequence.

Certainly, some L1 acquirers do learn and use formulaic sequences be-fore they have mastered the sequences’ internal makeup. Moreover, the ac-quisition of formulaic sequences might depend to some extent on whether children are referential or expressive learners, that is, whether they are ‘system learners’ more than they are ‘item-learners’ (Cruttenden, 1981) (see also Brown, 1973; Peters, 1983). Nelson (1973) found that children who had referential preferences (naming things or activities and dealing with indi-vidual word items) usually learned more single words, particularly nouns. Conversely, children who had more expressive tendencies (having inter-actional goals; focusing on the social domain) were more likely to learn whole expressions which were not segmented. The reason for these prefer-ences may be psycholinguistic in nature (Bates and MacWhinney, 1987), or may only reflect what the child ‘supposes the language to be useful for’: pre-dominantly naming things in the world or engaging in social interaction (Nelson, 1981: 186). It may also reflect the input a child receives: games for naming things in the world or social control clumps such as ‘D’ya wanna go out?’ (Nelson, 1981). Regardless of the underlying reason, there seems to



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


be a link between the need and desire to interact and the use of formulaic sequences.

In L2 acquisition, formulaic sequences are also relied on initially as a quick means to be communicative, albeit in a limited way. This can lead to quicker integration into a peer group, which can result in increased lan-guage input. Wong Fillmore (1976) found this was the case with five young Mexican children trying to integrate into an English-medium school envir-onment. She identified eight strategies the children used, and at least three of them directly involved formulaic language:

Give the impression, with a few well-chosen words (phrases), that you ●

speak the languageGet some expressions you understand, and start talking ●

Look for recurring parts in the formulas you know. ●

The use of formulaic sequences enabled the realization of these strat-egies even though the children’s language capabilities were quite limited. Furthermore, the use of formulaic sequences to facilitate language produc-tion is not restricted to L2 children. Schmidt’s (1983) study of ‘Wes’ is a good example of the phenomenon in L2 adults; Wes’s speech is filled with formulaic language as a means of fulfilling his desire to be communicative, but not necessarily accurate. Additionally, Adolphs and Durow (2004) found that the amount of social integration into the L2 community (with presum-ably a commensurate need to be communicative in the L2) was linked to the amount of formulaic language produced in the speech of L2 postgraduate students.

But formulaic sequences may provide language learners with more than an expedient way to communicate; they might also facilitate further language learning. For L1 learners, it has been proposed that unanalyzed sequences provide the raw material for language development, as they are segmented into smaller components and grammar (Peters, 1983). For example, when a child realizes that the phrase I wanna cookie (previously used as a holistic unit) is actually I wanna + noun, he or she gains information about the way syntax works in the language, as well as the independent new word cookie. Wray (2000) looked at a number of studies and concluded that some children segment formulaic sequences into smaller units, and in doing so, advance their grammatical and lexical knowledge. It seems that formulaic sequences serve the same purpose for L2 learners (e.g. Bardovi-Harlig, 2002; Myles, Hooper, and Mitchell, 1998). Moreover, there is little doubt that the automatic use of acquired formulaic sequences allows chunking, freeing up memory and processing resources (Kuiper, 1996, and Ellis, 1996, who explores the interaction between short-term and long-term phonological memory systems). These can then be utilized to deal with conceptualising and meaning, which must surely aid language learning.



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


L1 children are exposed to many formulaic sequences in their input, but how do they decide what to analyze and what to keep at the holistic level? Wray (2002) suggests that a ‘needs-only analysis’ is the mechanism. Rather than segmenting every sequence into the grammar system, children will operate with the largest possible unit, and only segment sequences when it is useful for social communication. Thus the segmentation process is driven by pragmatic concerns (communication), rather than an instinctive urge to segment in order to push grammatical and lexical acquisition. The default would be to not analyze, and to retain holistic forms. Thus chil-dren maintain many formulaic sequences into adulthood, even though the components of those sequences are likely to be stored individually as well (perhaps being acquired from the segmentation analyses of other formulaic sequences). This suggests that dual storage is the norm.

Of course, relying on holistic versus analytical approaches to language ac-quisition and use is not an either/or proposition, and children will use both approaches in varying degrees. However, Wray and Perkins (2000) and Wray (2002) suggest that the relative ratios between the approaches may change according to age. During Phase 1 (birth to around 20 months), the child will mainly use memorized vocabulary for communication, largely learned through imitation. Some of this vocabulary will be single words, and some will consist of sequences. At the start of Phase 2 (until about age 8), the child’s grammatical awareness begins, and the proportion of analytic language com-pared to holistic language increases, although with overall language develop-ing quickly in this phase, the amount of holistically-processed language is still increasing in real terms. During Phase 3 (until about age 18), the ana-lytic grammar is fully in place, but formulaic language again becomes more prominent. ‘During this phase, language production increasingly becomes a top-down process of formula blending as opposed to a bottom-up process of combining single lexical items in accordance with the specification of the grammar’ (Wray and Perkins, 2000: 21). By Phase 4 (age 18 and above), the balance of holistic to analytic language has developed into adult patterns.

The course of formulaic sequence development is more difficult to chart in L2 learners. Typically there is early use of formulaic sequences, often after

Quote 3.2 Wood on the possible double role of formulaic sequences in language acquisition

They are acquired and retained in and of themselves, linked to pragmatic compe-tence and expanded as this aspect of communicative ability and awareness devel-ops. At the same time, they are segmented and analyzed, broken down, and combined as cognitive skills of analysis and synthesis grow. Both the original formulas and the pieces and rules that come from analysis are retained.

(2002: 5)



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


a silent period. As learners’ proficiency improves, there is the reasonable ex-pectation of language which is more accurate and appropriate. In natives, this is achieved to a large extent through the use of formulaic sequences. Unfortunately, the formulaic language of L2 learners tends to lag behind other linguistic aspects (Irujo, 1993), but this is not so much a case of the amount of formulaic language use, but rather a lack of native-like diversity. This is probably largely due to a lack of sufficient input. In 1986, Irujo sug-gested that one specific class of formulaic language (idioms) is often left out of speech addressed to L2 learners, leading to a lack of idioms in learner output. More recently, Durrant and Schmitt (2009) show that a more general type of formulaic language (collocations) seems to be tuned to frequency, with L2 learners producing frequent, but not infrequent, collocation pairs. Furthermore, Siyanova and Schmitt (2008) showed that spending a year in an English-speaking country (with presumably a great increase in the amount of L2 input) led to better intuitions of collocation.

However, it may not be just the amount of input that is crucial, but also the quality. Siyanova and Schmitt (2007) found that the amount of exposure to native-speaking environments did not have an effect on the likelihood of using the multi-word verbs. This, however, might be explained by Adolphs and Durow’s (2004) findings that sociocultural integration was the key to their case study learner’s acquisition. This suggests that it may not be ex-posure per se that is important, but the kind of high-quality exposure that presumably occurs in a socially-integrated environment.

The nature of formulaic language and its acquisition is likely to become of ever-greater interest as the field turns to more pattern-based models of language acquisition (e.g. pattern grammar (Hunston and Francis, 2000) and construction grammar (Tomasello, 2003)), which posit that the human facility for language learning is based on the ability to extract

Quote 3.3 Dörnyei, Durow, and Zahran on the sociocultural aspects of L2 formulaic language acquisition

Success in the acquisition of formulaic sequences appears to be the function of the interplay of three main factors: language aptitude, motivation and sociocultural adaptation. Our study shows that if the latter is absent, only a combination of particularly high levels of the two former learner traits can compensate for this, whereas successful sociocultural adaptation can override below-average initial learner characteristics. Thus, sociocultural adaptation, or acculturation, turned out to be a central modifying factor in the learning of the international students under investigation.

(2004: 105)



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


patterns from input, rather than being under the guidance of innate prin-ciples and parameters which determine what aspects of grammar can and cannot be acquired (see Ellis, 1996, 2002; SSLA 24, 2). This line of think-ing suggests that we learn the letter sequences which are acceptable in a language (the consonant cluster sp can be word-initial in English, but hg cannot) simply by repeatedly seeing sp at the beginning of words, but not hg. This learning is implicit, and may not be amenable to conscious meta-linguistic explanation. Of course, learners may eventually reach the point where they can declare a ‘rule’ for this consonant clustering, but the rule is an artefact of the pattern-based learning, rather than the underlying source of learning.

This pattern-based learning also works for larger linguistic units, such as how sequences of morphemes can combine to form words (un-question-able, un-reli-able, un-fathom-able). Moving to words, we gain intuitions about which words collocate together and which do not (blonde hair, *blonde paint; auburn hair, but only for women, not men). Many of these collocations must be based solely on associative pairing, because there is often no semantic reasoning behind acceptable/non-acceptable combinations (*blonde paint makes perfectly logical sense). Neither are most collocations likely to be learned explicitly, because they are not normally taught, and even if they are, only possible cases are illustrated, not inappropriate combinations. Longer formulaic strings, which are also based on patterns rather than rules, seem to fit very nicely with such sequence-based models of acquisition as well. Time will tell whether this kind of model best captures the mechanics of formulaic sequence acquisition (and that of language in general), but one thing seems certain. Given the increasingly evident importance of formu-laic sequences in language use, convincing explanations of the mechanics of their acquisition must become an essential feature of any model of lan-guage acquisition.

3.8 The psycholinguistic reality of corpus-extracted formulaic sequences

We know that formulaic language occurs very frequently in language output, as evidenced by corpus data, and that formulaic language is an important part of overall language processing. This makes the relationship between the two (formulaic sequences extracted from corpora and their psycholin-guistic bases in the mind) a very interesting issue. Some scholars believe that collocation is an entirely textual phenomenon, and is not indicative of how language is represented in the mind (e.g. Bley-Vroman, 2002). They believe that collocations arise spontaneously in text as an epiphenomenon of the meaningful use of language in context, rather than being linguistic patterns which can be learned and used. For example, the words dark night occur together simply because nights are dark, and so people will naturally



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


use these words together when speaking about the nighttime. However, given all the evidence for the processing advantages of formulaic language, it is difficult to believe that it does not somehow exist in the mind. For one thing, if all language were generated without any formulaic component, we would not expect the degree of conventionality and repetition that we find in corpus evidence.

So the question is whether the formulaic sequences extracted from corpus data are psycholinguistically real. There is indirect evidence from word association studies that they may be, because collocation is one of the main categories of association response (Aitchison, 2003; Fitzpatrick, 2006). However, to date there have been few studies which have addressed the issue directly. An exception is Schmitt, Grandage, and Adolphs (2004). They identified a number of different types of for-mulaic sequence and embedded them in a spoken dictation task. Each ‘burst’ of dictation was longer than short-term memory could hold (i.e. 20–24 words), so the respondents were not able to repeat a burst from rote memory. This meant they were forced to reconstruct the lan-guage. The researchers assumed that if the formulaic sequences in the bursts were stored holistically, then they would be reproduced intact, with no hesitation pauses or transformations. The results showed that many of the formulaic sequences in the dictation responses did meet this ‘holistic’ criterion, but also that many did not. A sort of continuum of holisticness seemed to emerge. The authors concluded that many of the corpus formulaic sequences were not stored holistically, but that this varied from individual to individual. From this one study, it seems that any particular formulaic sequence extracted from a corpus may or may not be stored holistically by any particular person.

3.9 Nonnative use of formulaic language

We have seen that formulaic language is very widespread in L1 language use (e.g. Biber et al., 1999; Erman and Warren, 2000; Foster, 2001). In other words, native speakers use formulaic sequences a lot. But what about non-native speakers? There is a widespread feeling that formulaic language is especially problematic for L2 learners, and its lack/misuse is a major reason why L2 output can feel unnatural and nonnative-like, at least in their com-positions (most research on formulaic language has focused on written dis-course). Research has only partially supported this impression. We can look at nonnative mastery of formulaic language along at least three dimensions: amount of use, accuracy/appropriacy of use, and goodness/speed of the underlying formulaic intuitions. There is a growing literature about the first two dimensions (based largely on learner corpus data; see Magali Paquot’s web-based bibliography – Section 6.6) but only embryonic research on the last. Let us look at each dimension in turn.



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


Amount of Use

It is easy to assume that that problem with nonnatives is that they simply do not use as much formulaic language as natives. This is largely incorrect, although there can be an element of avoidance (Laufer and Eliasson, 1993; Laufer, 2000b). A series of studies5 have found that L2 usage depends on which formulaic sequences one is focusing on. It is now clear that nonna-tives actually use more of certain favorite formulaic sequences which they know well and tend to overuse as ‘safe bets’, compared to natives (De Cock, 2000; Foster, 2001; Granger, 1998). Conversely, they use fewer of other sequences, presumably because they do not know them as well and are not as confident in their use (e.g. Foster, 2001; Granger, 1998; Howarth, 1998). One type of formulaic sequence which seems to be particularly underused is multi-word verbs. For example, Altenberg and Granger (2001) found that their EFL learners had great difficulty with the verb make, especially the delexicalized uses, such as make a decision and make a claim. This is particu-larly problematic as high frequency verbs like make, look, and do are used in numerous formulaic sequences. Interestingly though, Granger, Paquot, and Rayson (2006) compared formulaic sequences in a 1 million word native-speaker academic corpus and 1 million words from the ICLE (International Corpus of Learner English) learner corpus and found more cases of overused formulaic sequences than underused ones.

Durrant and Schmitt (2009) go some way in explaining which formulaic sequences are overused and which underused. Using a corpus composed of written academic output from Turkish and Bulgarian university EFL students and a mixed group of international university students studying in the UK, they found that these students tended to use frequent premodifier-noun collocations at a rate similar to native students. (Congruently, Siyanova and Schmitt (2008) found that their nonnatives used adjective-noun colloca-tions in frequencies similar to natives.) These are the kind of collocations which are identified by the hypothesis testing measures which focus on frequency such as t-score (good example, long way, hard work).

However, the nonnatives produced many fewer low-frequency colloca-tions (densely populated, bated breath, preconceived notions), even though these were very strongly linked (the kind identified by MI). Because of their strong ties, and relative infrequency, they are likely to be especially salient for natives, and so their absence in nonnative output is particularly noticeable. The authors conclude that the lack of these ‘MI’ collocations is one key feature which distinguishes native from nonnative production. In terms of acquisition, L2 learners seem to be able to acquire and use the collocations which appear frequently, but do not seem to pick up as many non-frequent collocations, whose individual component words may also be infrequent in themselves. This is highly suggestive of the role of frequency in the acquisition process. This is supported by Ellis, Simpson-Vlach, and Maynard (2008), who found that for natives, it is predominantly the MI of



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


a formula which determines processability, while for nonnatives, it is pre-dominantly the frequency.

Accuracy/appropriacy of use

Oppenheim (2000) found that much of the language which her nonnative subjects produced in consecutive speeches on the same topic consisted of the same recurrent word strings, but that most of these were idiosyncratic in comparison to native-speaker norms. Thus, just because L2 learners produce formulaic language, it is not necessary formulaic in sense of matching what natives would produce.

Nesselhauf (2003, 2005) gives us some idea of how formulaic language can be ‘nonnative’. She extracted 1072 English verb-noun combinations from 32 essays in the ICLE written by German university students. Almost one-quarter of these collocations were judged to be incorrect; moreover, the L1 was deemed to be an influence in 45% of the errors.6 However, the incor-rect usage was often the result, not of combining words in an unconven-tional way, but of using conventional word pairs which are not appropriate (Nesselhauf, 2005). This suggests that the difficulty learners have is not only that of learning which words go together, but also learning how to employ the chunks they know. Therefore, at least for the more frequent colloca-tions, the problem may not be so much in the amount of formulaic language learners use, but in using the formulaic sequences they know appropriately in the right contexts.

Goodness/automaticity of intuitions of formulaic language

So formulaic sequences can be overused, underused, and misused by non-native writers (most of this research has been based on analysis of written text), but they are definitely used; there is no question that L2 output is de-void of formulaic language. But how good are the nonnative intuitions of this language? There is little research which addresses this; however, three studies found that nonnative intuitions were not as well-developed as native intuitions.

Siyanova and Schmitt (2008) directly compared native and nonnative judge-ments of the frequency of high-frequency, mid-frequency, and low-frequency adjective-noun collocations on a six-point Lickert scale. They also measured how long it took to make these judgements. They found that the natives had fairly good intuitions of the collocation frequency, and that they made their frequency judgements relatively quickly. Compared to these native norms, the nonnatives judged the high-frequency collocations as being lower fre-quency, and judged the low-frequency collocations as being much higher. Furthermore, natives were able to distinguish the frequency difference be-tween mid- and high-frequency collocations, but the nonnative as a whole were not. Interestingly though, the nonnatives who spent a year or more in an English-speaking country, were able to make this distinction. Also, the



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


nonnatives took much longer to make their frequency judgements. Taken to-gether, Siyanova and Schmitt conclude that the nonnatives’ intuitions were not as developed as the natives’, nor were they as automatized.

Hoffman and Lehmann (2000) elicited native and non-native speakers’ intuitions regarding 55 collocations from the BNC with high log-likelihood scores (mainly adjective-noun and noun-noun pairs). Respondents were pre-sented with each node in a questionnaire, and were to supply the collocates. On average, the native speakers supplied the ‘correct’ collocate in 70% of cases, which, like the results in Siyanova and Schmitt, indicates relatively good intuitions by the natives. The nonnatives did far less well, achiev-ing an average accuracy of only 34%. This shows a major gap between the native and nonnative intuitions, although in absolute terms, the nonnative results (producing about one-third of the collocates) still indicate consider-able knowledge.

Phongphio and Schmitt (2006) found that their 21 Thai university under-graduates were quite confident of their ability to recognize multi-word verbs when listening or reading, but they only scored only 55% on a multiple choice test. Indeed, there was little relationship between the self-rating intu-itions and multiple choice test percentages, showing that the learners did not know the verbs as well as they thought they did. However, when given a context to guess the meaning of the verbs, the learners were able to pro-duce a Thai definition 75% of the time. This 20 percentage-point increase indicates that the Thai learners were able to use the context relatively suc-cessfully to infer the meanings of many of the unknown multi-word verbs. This suggest the usefulness of lexical inferencing as a strategy in dealing with formulaic language.

This poorer intuitive mastery is reflected in learners’ production. While natives tend to resort to formulaic language to get through time- pressurized communicative situations, nonnatives do not seem to make greater use of formulaic language in such cases, either in speech or writing (Foster, 2001; Nesselhauf, 2005). In terms of speech, nonnatives tend to use many recurrent dysfluency markers (such as filled pauses and hesitation markers), although it seems that extensive interaction with native speakers enables them to overcome this (Adolphs and Durow, 2004; De Cock, Granger, Leech, and McEnery, 1998). However, in terms of writing, neither amount of use nor accuracy of collocation appears to increase with time spent in an English-speaking country (Nesselhauf, 2005; Yorio, 1980). So, although a year or more spent in an English-speaking country can lead to better intuitions of collocation (Siyanova and Schmitt, 2008), it seems difficult to extend this into increased production of formulaic language.

Summary

It seems that mastery of formulaic language takes a long time to acquire, and is a hallmark of the highest stages of language mastery. Language testers



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


have picked up on this and often include items which focus on phraseology in their highest level examinations. Formulaic language is an important element of language overall, perhaps the essential element. Research into it is only now gaining momentum, but given its ubiquitousness and demon-strated processing advantages, it looks to be one of the most important areas of enquiry in the applied linguistics field for the foreseeable future. This will become even more so as interest increases in newer approaches to language description which focus on larger lexico-grammatical units, such as pattern grammar and construction grammar. For the moment, there are many im-portant questions about its acquisition and use that remain unanswered. For example, one basic one concerns the size of the formulaic lexicon, i.e. just how many formulaic sequences do natives and nonnatives know (Kuiper et al., 2009; Schmitt, Dörnyei, Adolphs, and Durow, 2004)? Furthermore, the questions posed by Schmitt and Carter in 2004 cover some other key areas which are still awaiting research attention, and which would make excellent topics for PhD research:

How are formulaic sequences acquired in naturalistic and formal set-1. tings? What is the same/different about learning formulaic sequences in these settings? What is the best way to teach formulaic sequences? Can they be taught at all?What is the relationship between knowledge of formulaic sequences and 2. knowledge of their individual component words?How many exposures are necessary to learn formulaic sequences with 3. various kinds of input? Is it the same as for individual words?What is the nature of attrition of formulaic sequences? Are some ele-4. ments retained better than others, or is the whole chunk either retained or forgotten?Which elements of a formulaic sequence are most salient? Do formulaic 5. sequences cluster around a key word or core collocation?Are formulaic sequences learned in an all-or-nothing manner?6. Does giving attention to formulaic sequences increase the chances of 7. their acquisition?



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

Part 3

Researching Vocabulary



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

149

4Issues in Research Methodology

4.1 Qualitative research

Although this book focuses mainly on quantitative research, it is worth noting the value and uses of qualitative research. Current best research practice is to use multiple measurements to triangulate results from more than one approach in order to achieve more robust findings. Qualitative methodologies can often enhance the information we get from quantitative approaches. For example, interviewing a sample of the participants used in a quantitative study can often provide much rich information which sup-plements the statistical findings. A good example of this is in the validation study for the Vocabulary Levels Test (VLT), where Schmitt, Schmitt, and Clapham (2001) interviewed a subset of the participants who completed the VLT and asked them in individual interviews about the meanings of the words on the test. In this way, the researchers could confirm whether the words were actually known or not, and this could then be compared to the results on the test.

Such in-depth interviews can also be very informative about self-report data, such as questionnaire responses. Interviewing a sample of informants can tell much about how carefully and accurately they completed their sur-veys. Similarly, observation can be useful in verifying self-report behavioral data. For example, this is an obvious methodology for confirming the self-report questionnaire data usually used in strategy research, but is hardly ever taken up, leading to serious questions about the veracity of most of the questionnaire results.

Rich qualitative descriptions of vocabulary (such as from case studies) can go some way towards providing an account of how well lexical items are used. They can be particularly useful in studies of productive lexical output, where post hoc statistical analyses of vocabulary (e.g. type token analyses) have often proved to be less than informative. In one such case study, Li and Schmitt (2009) followed a single advanced learner (‘Amy’) and described her acquisition of formulaic sequences. They reported that Amy learned



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

150 Researching Vocabulary

166 sequences over a ten-month MA course. However, their in-depth inter-views and analyses also allowed them to discuss the source of acquisition for these sequences, and Amy’s improvement in her appropriacy of usage with them. This is the kind of information that can often only be acquired from concerted qualitative engagement.

In short, vocabulary researchers should consider what value qualitative methodologies might add to their studies. A number of research manuals provide guidance on qualitative approaches, including Dörnyei (2007), Heigham and Croker (2009), and Wray and Bloomer (2006).

4.2 Participants

One of the most difficult aspects of doing research is often getting subjects. Even if you have an unusually cooperative subject pool, it is important to find a ‘carrot’ for the informants (and often the teachers of student inform-ants), for, without the feeling that there is something ‘in it for them’, it is difficult to maintain motivation and interest of the participants, or even gain access to them in many cases. This is particularly true in studies which require large numbers of subjects, or require longitudinal data. A good example of this is Albrechtsen, Haastrup, and Henriksen (2008), who had to maintain interest in their 140 participants over four four- to five-hour sessions completed within two weeks (i.e. 15 hours and 30 minutes time-on-task plus breaks). Personal conversation with the researchers revealed the full extent of their motivational techniques, including payment, cook-ies, reminders of each upcoming session, some pleading, plus a party and a lottery for a trip to London! Unsurprisingly, there was still a certain amount of attrition, but the researchers’ efforts kept this to a minimum. This points out the wisdom of starting out with more subjects than you wish to end up with in longitudinal studies, as attrition is inevitable, although it can be managed to some extent if participant motivation is addressed in the initial research design.

Meara (1996a) notes that lexical researchers need to carefully consider the number of subjects required for a study. This should be enough to iron out the variation due to individual differences. For psycholinguistic tasks with very precise measurement (Section 2.11), this may require only 10 or 20 subjects. For some statistical procedures (e.g. IRT (Item Response Theory) analysis, factor analysis) it may take hundreds. Even more crucially, the characteristics of the learners need to be considered. The L1 is of paramount importance, as the following example by Meara clearly illustrates:

... different types of language present quite different learning problems to individual learners. Take for instance, the cases of a Dutch speaker, a Spanish speaker, an Arab and a Vietnamese learning English. By and large, the Dutch speaker will find basic English vocabulary easy, since



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

Issues in Research Methodology 151

most of it is cognate with items in his own language. He might have prob-lems with less frequent vocabulary, but by the time he gets to that stage, he probably has reached a high level of independence and autonomy anyway. In contrast, a Spanish speaker will generally find basic English vocabulary difficult: it is structurally very different from basic vocabu-lary in Spanish, and there are few cognates. However, Spanish speakers have a huge latent vocabulary of low frequency English words which are cognate with Spanish items, and this should mean that their ability to acquire new words improves dramatically with their general level of com-petence in English. The Arab and the Vietnamese speakers have no such help from their L1, and the process of acquiring new words will never get any easier for them. At the same time, however, these two learners will find English vocabulary difficult in different ways because of the way their L1 lexicons are shaped and structured. (1996a: 6)

Just as lexical items have different characteristics which affect their acqui-sition and use, so do participants. As part of controlling potential confound-ing factors in a study, it is necessary to consider informant characteristics as well. Perhaps the most obvious one is the informants’ L1. Another learner variable is L2 language proficiency, as acquiring words in a completely new language is quite different from acquiring the same number of words in a language that is moderately well-known. Subjects who are stark beginners will not have any background knowledge of the target language to help them learn. Conversely, more advanced learners will have previous lan-guage knowledge which facilitates their learning. Such learners

will already have developed a good feel for the formal aspects of words in the target language. This should reduce the learning burden considerably and make it easier to acquire the target language words, the more profi-cient the learner is. At the same time, morphological information and comparisons with known words of similar meaning should also make it easier to fix the meaning and form of a target language word. (Meara, 1996a: 5)

Language proficiency also determines to what degree learners can take advantage of any contextualization in language learning tasks or tests. Unless it is carefully adjusted to their level, beginners may not understand much of the language in a contextualized task/test, and so it may offer no more support than a non-contextualized one.

However, lexical acquisition and use can also vary systematically accord-ing to several other factors, including: experience, exposure, type of expo-sure (e.g. academic texts, casual conversation), and type of strategy used (e.g. reading technical manuals in one’s field, talking with native taxi drivers). It is important to think about all these factors in the initial research design



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


stage, so that there is a good match between the research goals and the informant base. In addition, it is useful to gather detailed biodata during the study, as otherwise unexplainable cases of vocabulary variation can some-times be resolved from this information. For example, the type of previ-ous exposure and strategy preferences might explain why some participants might know more academic and technical vocabulary (exposure mainly from technical written discourse), while others might know much more about casual spoken conversation (strategy of talking to native speakers). From a practical standpoint, the information gathered might turn out to be crucial to the analysis, but even if not, its non-use will not spoil the study.

The above considerations also mean that researchers need to be very care-ful when generalizing from a group of participants in a study to the wider population of L2 language learners in general. Most often, generalizations will have to be constrained in terms of various learner characteristics men-tioned above, most particularly L1 and proficiency.

4.3 The need for multiple measures of vocabulary

We have seen that vocabulary knowledge is a complex construct, and that any single measure of it will give only a very minimal impression of the overall lexical knowledge constellation. This means that good vocabulary research is advantaged by multiple measures of vocabulary, to better capture a wider range of lexical knowledge. This can include facets of vocabulary size, depth, automaticity, and network richness. Of course, it is virtually impossible to measure all of these aspects at the same time. But vocabulary researchers should be committed to using more thorough measures when their research contexts allow, in order to provide a better description of the lexical effects they are exploring.

The need to use multiple measurement takes on several aspects. One is describing vocabulary knowledge according to receptive/productive mas-tery. If using a productive measure, it might be reasonable to assume that a demonstration of productive knowledge also implies receptive mastery, based on research which shows that this is generally true. The real danger is making generalizations in the other direction. Meara (1996a) warns of

Quote 4.1 Meara on individual differences in vocabulary acquisition

It seems to me that the question of how much individual variation there is in vocabulary skills really needs to be made a top priority in L2 vocabulary acquisi-tion research.

(1996a: 8)



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


making generalizations about lexical items being ‘learned’ on the basis of only receptive multiple choice tests, as is very often the case in vocabu-lary studies. Most receptive multiple choice tests measure only the form- meaning link, and then only at a recognition level. This can probably be considered the most basic initial stage along the incremental learning proc-ess, and so should be described as such when this type of item is interpreted. Researchers need to take the ideas in Section 2.8 on board and report clearly the degree of receptive/productive mastery their study is addressing. This caveat about overgeneralization can probably be extended to any receptive format test format.

Another reason to include both receptive and productive measures is that they can give quite different impressions of vocabulary knowledge. For example, Groot (2000) describes tests on participants who learned vocab-ulary with word lists and a computerized vocabulary training program. When tested with receptive translation/definition tests, it was found that word lists lead to better learning. However, when tested with a productive cloze format, then the computer program led to better results. Thus, each individual test format provided only partial information, and if interpreted by themselves, would give a misleading picture of the vocabulary learning which took place. By combining these measures, Groot concluded that it is probably best to use both teaching methods in conjunction. The word list builds on L1 knowledge, and the learning program reinforces this knowl-edge and goes on to enhance it.

Another aspect involves describing lexical knowledge according to the various types of word knowledge. Again, most studies measure only the form-meaning link, and if only one thing can be measured for practical reasons, this is a logical choice, because it is the essential ‘core’ knowledge,

Quote 4.2 Meara on translation as a vocabulary test format

[The studies Meara critiqued] simply ask for the Target Language word to be trans-lated into English, and this means that even in the experiments where words were initially learned in contexts, only the ability to recognise decontextualised words was measured. It is not obvious to me that this measure is a good test of how well vocabulary items have been learned. At best it tests passive recognition skills rather than active acquisition of items ... Testing in this way gives no indication of whether a particular word can be put to active use, or whether some partial knowl-edge might have been acquired which could facilitate learning in future encoun-ters. Furthermore, this kind of testing gives no indication of how resistant the word might be to forgetting or to confusion with other words, both problems which increase as the number of words to be learned gets larger.

(1996a: 7–8)



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


without which little constructive use can be made of the lexical item. Furthermore, measuring the form-meaning link makes sense for beginners, who are unlikely to have developed much of the more advanced types of word knowledge for any of the items in their lexicon. In fact, this is true for even more advanced learners and native speakers: any time a new lexical item has just been learned, little more is likely to be known than the form-meaning link. However, for learners beyond the beginning stage, there are likely to be lexical items which are more advanced along the incremental pathway, and form-meaning test items are likely to miss much of the addi-tional knowledge in place. Thus, for more advanced learners (or even some of the lexical items relatively well-known by beginners), it may be appropri-ate to measure word knowledge types beyond just form and meaning.

It is also useful to note that there is an interaction between word knowl-edge aspects and receptive/productive mastery. For example, learners may know the form-meaning link of a lexical item productively, but only know its various derivative forms receptively. This makes it important to discuss both receptive/productive mastery and word knowledge when reporting results. Similar kinds of observations can be made concerning knowledge of spoken versus written vocabulary.

Up until now, the discussion has focused on individual items, but based on the ideas in Section 2.4, it may be appropriate to also consider how well lexical items cohere to make up a systematic lexical network (i.e. lexical organization). Likewise, automaticity is a key requirement of real-time vocab-ulary use. This aspect has generally been ignored in applied linguistic lexi-cal research (although a mainstay in the psycholinguistic approach), and is deserving of attention. Again, we can hypothesize an interrelationship with the other aspects of lexical knowledge (e.g. Henriksen, 1999). Improvement of overall lexical mastery is likely to consist of both declarative knowledge (e.g. the form-meaning link), and the ability to use this knowledge with ever-greater (hopefully) speed and ease. An improvement of mastery may well involve a plateau of knowledge, but an increase in speed of processing. We might speculate1 that an increase in knowledge would speed up process-ing by adding connections between the lexical item and the rest of the lexi-con. (Or would the additional lexical information slow down the processing initially, until it is integrated?) Measures of only the ‘knowledge’ aspects of lexical mastery will fail to spot any improvements in automaticity. The exception might be if there is a timed element added to the measurement.

The upshot is that different measures tap into different facets of lexical knowledge. When reporting results, it is essential to report exactly what lexical knowledge can be inferred from the item format(s) used, and prefer-able to also highlight what knowledge cannot be inferred, so that readers have a very clear idea of the extent of lexical knowledge/gain in the study. If all researchers do this, the various studies will be much more comparable, and there will be a much greater chance of subsequent studies being able



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


to build upon previous ones in a logical and coherent manner, in the way Meara (1996a) envisages.

4.4 The need for longitudinal studies and delayed posttests

Although there have been non-treatment studies which sought to describe the development of vocabulary over time (e.g. the research reported in Albrechtsen et al., 2008), most vocabulary acquisition studies involve some form of treatment (various kinds of instruction or input) and then an imme-diate posttest to determine the effect of the treatment. While this ‘immedi-ate posttest’ information can be informative, it is limited in a number of ways. First, one or a small number of exposures are unlikely to lead to long-term acquisition, and so an immediate posttest has the very real danger of overestimating the degree of durable learning. Similarly, learning is always liable to attrition. In fact, one of the most reliable findings in vocabulary studies is that scores on immediate posttests almost inevitably drop when later measured on a delayed posttest. This means it is not possible to inter-pret scores on an immediate posttest as long-term learning. Immediate post-tests can only give a snapshot of vocabulary knowledge, and cannot inform about the dynamic and incremental nature of the learning process. We know that attrition occurs in any learning, and so need to include delayed posttests in order to capture the long-term learning.

Second, learning is not linear, and so the rate of learning achieved in a study may or may not be applicable over the longer term. For example, if a student was able to ‘learn’ five new words from a 300-word reading passage, this does not mean she would necessarily be able to learn another five new words from the next 300 words in the passage tomorrow. In some cases, the learning rate may decrease because the input becomes less effective over time. In other cases, the learning may accelerate, as learners become more skilled in using the input. Only by measuring participants’ learning after subsequent sessions can the effect of cumulative learning be effectively determined.

Third, it is a well-known phenomenon that the effectiveness of prac-tice decreases with the amount. This is known as the power law of learning, where the effects of practice are greatest at early stages of learning, but then eventually reach asymptote (e.g. Ellis, 2006a). This means that the initial practice will be more effective than later practice. Thus the rate of learning from a small amount of vocabulary learning practice will probably not be maintained as the amount of practice increases; there will be diminishing returns. Where studies typically have a limited amount of engagement with the vocabulary items for practical reasons, the resulting learning rate may be higher than if the study examined a longer-term/more intensive treat-ment regimen. Non-longitudinal studies thus need to be interpreted care-fully when generalizing about possible effects of increased practice.



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


Fourth, the different aspects of word knowledge will require longer or shorter periods of time to master. While an initial form-meaning link might be established with small amounts of input over a short period of time (habit = a tendency to behave in a particular way, especially regularly and repeatedly over a period of time), it will probably require a large amount of input to develop intuitions about the more nuanced meaning connotations of the word (habit very often refers to negative behavior: Smoking is a nasty habit; annoying habit).

In summary, vocabulary learning is longitudinal and incremental in nature, and only research designs with a longitudinal element can truly describe it. There are two main ways of incorporating a longitudinal ele-ment. The first is simply by adding one or more delayed posttests. The value of delayed posttests is self-evident, having the key attribute of confirming durable learning. Given the incremental nature of vocabulary learning, and its susceptibility to attrition (especially if not well-established), only delayed posttests can demonstrate if long-term retention (i.e. learning) has occurred.

I would suggest that immediate posttests should be interpreted mainly as showing whether the treatment had any effect (e.g. did the process of acquisition begin?, were the target lexical items enhanced in any way?, did learners notice the target items in the treatment?), and only delayed post-tests interpreted as showing learning. Of course, it is not always practical to use delayed posttests (participants disappear or are unwilling to repeat an assessment), but if immediate posttests are used as the sole measurement, researchers must be extremely cautious when interpreting the type and amount of knowledge enhancement that these tests demonstrate.

When interpreting learning, any interim posttest exposures should be factored into the interpretations. That is, the immediate posttest is an addi-tional exposure which will increase learning in a delayed posttest, compared to a research design with only a delayed posttest. This is especially true as participants tend to give tests a great deal of focused attention, which gener-ally facilitates learning. We would therefore expect better delayed scores in a T1-treatment-T2-delayed T3 design than a T1-treatment-delayed T2 design, even though the treatment is equivalent.

There is then the practical question of the length of the delay. The short answer is that there is no standard period of delay, and that any delay beyond the immediate posttest is better than nothing. Research by Gaskell and colleagues (Davis, Di Betta, Macdonald, and Gaskell, 2008; Dumay and Gaskell, 2007; Gaskell and Dumay, 2003; Dumay, Gaskell, and Feng, 2004;) has shown that the integration of a new lexical item into the mental lexicon begins to take place within 24 hours after exposure, with the key factor being a night’s sleep. This indicates that the delayed posttest must be a min-imum of two days after the treatment in order to capture this integration. Furthermore, memory research shows that most attrition occurs relatively



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


soon after a learning event, and then the rate of forgetting decreases over time (Baddeley, 1990). This suggests that the delayed posttest needs to be administered after the period of initial large loss.

Most of the colleagues I have spoken with feel confident that a delayed posttest of three weeks should be indicative of learning which is stable and durable. If this three-week ideal cannot be met for practical reasons, I would suggest that any delayed posttest of less than one week is likely to be relatively uninformative, and should be avoided if possible. But whatever interval is practical, delayed posttests should be included in all acquisition research designs.

There is also the issue of test format type for measuring durable learn-ing. Groot (2000) notes that if the assessment goal is long-term learning (as opposed to results from immediate posttests), then productive formats such as cloze tests might be more suitable than receptive test formats. (See Quote 4.3). However, balanced against this, I think that learning of some lexical items (e.g. low-frequency words) to only a receptive level of mastery is a perfectly reasonable goal, and for these, receptive tests would be more appropriate.

Concept 4.1 Cloze

Cloze is simply another name for the fill-in-the-blank format. It is commonly used in teaching materials and in language tests. The name cloze comes from the psychological notion of closure, where the mind abhors a vacuum, and attempts to fill in any noticed gap with a logical concept or linguistic feature.

Quote 4.3 Groot on measuring long-term retention

Knowing a word may be seen in operational terms as a continuum ranging from vague recognition of its spelling to (semantically, syntactically, stylistically) cor-rect and contextually appropriate productive use. Retrieval of a word from the mental lexicon for productive use requires a higher degree of accessibility or, in other words, a more solid integration in various networks than is needed for receptive use. For measuring this higher level of mastery, a test which asks testees to simply recognise a word and give its meaning is unsuitable; a test [such as] using the cloze technique, which measures testees’ ability to produce the word themselves, is much more valid for that purpose ... [F]or a meaningful interpreta-tion of the data, it is essential to give an accurate description of what one under-stands by the trait ‘knowing a word’ and of what trait is intended to be measured by what testing method.

(Groot, 2000: 76)



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


The second way of incorporating a longitudinal element into vocabulary studies is to design longitudinal studies which track the learning process over a period of time through a series of measurements. Such studies can be much more difficult to set up, and require the longer-term cooperation of participants, but are often very worthwhile. If participant numbers are a problem, longitudinal case studies can provide some of the best information on the way vocabulary learning progresses because of the level of detail that can be achieved. Including longitudinal elements is particularly important when studying word knowledge types that take longer to learn, such as con-notation or collocation.

4.5 Selection of target lexical items

In vocabulary research, one of the most basic (and critical) steps is selecting the target lexical items. If they are mismatched to the research purpose for whatever reason, the resulting study is likely to be severely compromised. It is thus important to consider the vocabulary characteristics discussed in Chapter 2 when making this selection. Some of the implications for selec-tion are outlined below.

Single words and multi-word items

One decision is whether to include only single word lexemes, only multi-word items, or both. Most vocabulary research has focused on one-word items, because they have been traditionally seen as the ‘words’ that make up vocabulary, with multi-word items being only a relatively unimpor-tant peripheral category of lexis, mainly made up of low-frequency items like idioms or proverbs. However, corpus research has demonstrated that much more lexical patterning exists in language than previously realized, and that this patterning plays an important role in the way language is both used and processed (see Chapter 3). With this in mind, not includ-ing multi-word lexemes runs the risk of excluding a major component of vocabulary.

However, if you choose to use multi-word items, several hurdles must be overcome. First, there are many types of formulaic sequence, including but not limited to: idioms, proverbs, clichés, sayings, explanations, lexical phrases, lexical bundles, collocations, and phrasal verbs. There is no prin-cipled way to decide which of these types to include, unless the research is interested in a particular category, for instance, idioms. Moreover, there is no comprehensive listing of these multi-word items to refer to, and in any case, new ones are always coming into play, and old ones dropping out of the language. This is one factor that makes a principled sampling of formu-laic sequences almost impossible.

In addition, although formulaic sequences as a category occur frequently in discourse, with the exception of a few high-frequency items (e.g. on the



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


other hand), any particular individual multi-word item is likely to be rela-tively infrequent. This also makes the selection of a representative sample extremely difficult. Just because a certain formulaic sequence on a meas-urement instrument is correctly answered, it is not necessarily possible to extrapolate from this that other formulaic sequences are also known. The lack of information on the scope and makeup of the phrasal lexicon means that chance can play a large, yet unknowable, role in whether any par-ticular formulaic sequence is selected for a study, and how representative it might be.

In the end, frequency may be the best way of deciding which (if any) formulaic sequences to include in a study. When studying the acquisi-tion of L2 learners, there will probably be a focus on the most frequent vocabulary, which excludes most multi-word items (which are generally low frequency). However, the class of phrasal verbs might be included, as many are quite frequent, particularly in spoken discourse. Conversely, formulaic sequences can be a very good class of lexis to use when dis-tinguishing between very advanced learners. Of course, if research is focused on formulaic sequences, they may be the only vocabulary items included in the study (e.g. the series of studies on formulaic sequences in Schmitt, 2004).

Formal similarity

Beginning learners of English often confuse words that are orthographi-cally similar with one another. This occurs in reading, when learners come across a word and confuse it with a similarly-spelled word. Unfortunately, once a word is misrecognized, learners often do not realize their error, even if the context is totally inappropriate for the mistaken meaning they have assigned to the word (Haynes, 1993). There is relatively little research into whether this happens in listening as well (slips of the ear?), but there is no reason to believe that it does not. Association tasks have also shown that learners, especially beginners, often confuse words of a similar form in what are called clang associations (Section 2.4). Based on this tendency to misrecognize formally similar words, it is probably best not to select target words which have a similar form to other words in the target word pool (unless you are specifically studying this phenomenon). Also, L2 words with large orthographic neighborhoods (i.e. having formal similarities to many other words in the L2) might be more difficult than L2 words which are similar to few other L2 words (Section 1.1.8). A testing implication is that even though a learner indicates that they know a word (such as on a self-report Yes/No test), this does not necessarily mean they know that word, for they might be mistaking it for another word they know with a similar form. Therefore some check for this is advisable when validating the measurement instrument, in the case of Yes/No tests, usually through the use of plausible nonwords.



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


Word class

Although Laufer’s (1997) review of early vocabulary research concluded that word class made little difference in the ease or difficulty of learning words, psychological researchers have found it does make a difference, at least in their research paradigms. It thus seems prudent to control for word class in all vocabulary research. This can be done by making sure that each word class being addressed in a study has the same number of target items. Another way is to only work with one word class in a study, to ensure that this factor does not confound the results.

Homonymy and polysemy

Homonymy and polysemy come into play mainly when counting vocabulary (see also Section 2.1.3). For instance, should the homonyms bank (financial), bank (river), and bank (plane turning) be counted as one lexeme or three? Similarly, should the different polysemous versions of chip be counted as one or several lexemes? The choices made can lead to quite different size fig-ures, and so the method used must be clearly stated in your research report. For more details see Section 5.2.1.

The method of counting these multiple form-meaning correspondences should also affect the way we interpret the results of such counting stud-ies. A large percentage of the most frequent words (at least in English) are polysemous, and this has enormous implications for the learning load of these words. Clearly mastering a highly polysemous word will involve more learning burden than learning a technical word with one specific meaning. Therefore researchers need to be careful about equating the learning of dif-ferent types of vocabulary.

Literal and idiomatic meaning

The knowledge of idiomatic meaning senses is a hallmark of more profi-cient language users, and so a researcher must normally match target words with idiomatic meanings to participants of a relatively higher proficiency. If one is interested in whether idiomatic targets are known or not, then these targets are appropriate with any level of student. However, if one is using a vocabulary test to discriminate between learners, then learner proficiency is crucial. Idiomatic targets can make up some or most of the test items for dis-criminating between learners with a very good knowledge of lexis who can operate comfortably in most situations, and those with an excellent lexical knowledge, who can understand and use the less common meaning senses in a language. In fact, many of the standardized proficiency tests use idio-matic targets, particularly phrasal verbs, to separate the very best examinees from the rest. Conversely, it makes little sense to include idiomatic targets on a test meant to discriminate between beginners. This is because very few beginners will be successful with these items, and as a result, all of the ‘zero’



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


scores will tell you little about the differences in lexical knowledge between the beginners in your study group.

Frequency

Frequency is probably the most important aspect of vocabulary you must control for in your research designs. This is because frequency is the best indicator we have of how likely people are to know words, and in what order those words will be learned. Frequent words are for the most part not inherently any easier than nonfrequent words, but, on average, they will be encountered more often, which means that they are more likely to be known than nonfrequent words. (Or more precisely, the most frequent meaning senses of these mostly-polysemous high-frequency words will be more likely to be known.) One might argue that frequent words have some ‘ease advantage’ due to Zipf’s law, which roughly states that the shorter a word is, in terms of syllables or letters, the more frequent it is. However, shorter, more frequent words also suffer from a greater degree of polysemy than longer, less frequent words, and so any ease in terms of learning the form of a word might be offset by the difficulty in dealing with multiple meaning senses.

Although lexemes can be inherently more or less difficult to learn due to other factors than word length (Laufer, 1997), the effects of frequency can easily override this inherent ease/difficulty, and so if you wish to create a pool of target words that are of equivalent difficulty, then you must ensure that they are roughly the same frequency. This does not mean that they must appear in adjacent positions on a frequency list (e.g. useful – 1,000th most frequent word in English; extent – 1,001st most frequent word), but they should normally be within 100–200 positions of one another for most study designs. If one is working within a psycholinguistic paradigm, e.g. using reaction times measured in milliseconds to measure familiarity with words, then the words should as similar as possible in frequency, because frequency is a powerful influence in such experimental designs.

Frequency is important in a range of other issues as well. A few examples will serve to give an idea of the many ways frequency needs to be consid-ered when setting up a vocabulary study. In most studies it is useful to take a measurement of the general vocabulary knowledge of your participants. This is usually done with a vocabulary size test (see Section 5.2). There are a range of such tests, with the Vocabulary Levels Test (Nation, 1990; Schmitt et al., 2001) being commonly used in second-language vocabulary studies. It samples English vocabulary at four frequency bands: 1,000 frequency level, 2,000 level, 5,000 level, and 10,000 level. It also includes an estimate of academic vocabulary. The entire test takes between 30 and 75 minutes to administer, depending on the examinee. However, it is not usually neces-sary to give all five sections. If given to beginners, certainly the 10,000 level,



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


and probably the 5,000 level as well, should be excluded. They will provide little useful information, because examinees are likely to know very few of the words at these lower-frequency levels, and are only likely to become discouraged and frustrated in attempting them. Conversely, the 2,000 and 3,000 levels are good candidates for exclusion for advanced learners, as these examinees should obtain maximum scores at these levels. While they will not get frustrated by getting all of the items correct at these levels, a better use of time might be to measure more of the lexical feature you are focusing on in the study.

Another example of the effects of frequency is in association tasks. In these, a stimulus lexeme is given, and participants asked to give one or more responses as quickly as they can (Section 2.4). The general behavior from a group of native speakers is that a small number of association responses are relatively frequent (e.g. for the stimulus sun: moon, shine) and a larger number of responses are relatively infrequent (warmth, ray). However, the response behavior depends to a great degree on the stimulus lexemes. Stimuli which have a high frequency of occurrence in a language tend to have more sta-ble responses. For example, in the Edinburgh Associative Thesaurus, the high-frequency stimulus night elicits day over half of the time (52 out of 99 responses; 26 different response types). On the other hand, its lower- frequency near-synonym evening elicits such a wider range of responses (37), with the most frequent being night (20/98) and morning (17/98). Because of this, it is better not to use very high-frequency words as stimuli (Fitzpatrick, 2007), unless perhaps if you are working with beginners (Henriksen, 2008).

Characteristics of lexical items for use in psycholinguistic experiments

The selection of target lexis important in all vocabulary studies, but it becomes even more critical in psycholinguistics experiments. When meas-uring lexical processing in milliseconds, any uncontrolled confounding fac-tor that affects processing speed can jeopardize the entire experiment. Thus it is especially essential to control as much as possible for all factors which can affect processing speed. A number of authors have commented on these factors (e.g. de Groot and van Hell, 2005; de Groot, 2006; Ellis and Beaton, 1993; see also Chapter 2), of which the following are among those usually highlighted:

phonotactic regularity ●

structural complexity (e.g. containing more or less complex consonant ●

clusters)degree of correspondence between graphemes and phonemes ●

familiarity of formal features ●

conformity of L2 word form to L1 norms ●

morphological complexity ●



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


word length ●

word class ●

concrete/abstract words ●

imageability of concept ●

word meaningfulness ●

cognate status ●

frequency. ●

The effect of some of these factors has been shown to be substantial. For example, de Groot and van Hell (2005) review the literature and report that recall scores for concrete words are from 11% to 27% higher than for abstract words. Likewise, cognate words produced scores 15–19% higher for highly experienced foreign-language learners, and 25% (receptive test) to 50% (productive test) higher for less-experienced learners. This indicates that factors like these need to be controlled for, but getting measures for all the above factors would be quite a task. For instance, quantifying just imageability would entail having groups of participants rate lists of words according to how easy or difficult it is to form a mental image of the under-lying concept. Doing something like this for each potential confounding factor would be prohibitive.

Luckily, there is a website with data for the different word factors. It is the MRC Psycholinguistic Database (Coltheart, 1981) and is available at <http://www.psy.uwa.edu.au/mrcdatabase/uwa_mrc.htm>. To create lists of tar-get words, the researcher checks the boxes of the factors which need to be controlled for, and then sets the minimum and maximum parameters, e.g. check the ‘Number of letters’ box, and set the length between three and five letters long. Below are the factors which are available on the site.

Number of letters Meaningfulness: Paivio Norms Number of phonemes Age of acquisition rating Number of syllables Word Type Kucera–Francis written freq Comprehensive syntactic

category Kucera–Francis no. of categories Common part of speech (N/V/

adj/Other) Kucera–Francis no. of samples Morphemic status (Prefix/Suffix/

Abbrev/etc.) Thorndike–Lorge written freq Contextual status (Colloquial/

Dialect/etc.) Brown verbal frequency Pronunciation variability Familiarity rating Capitalization Concreteness rating Irregular plural Imagability rating Stress pattern Meaningfulness: Colorado Norms



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25




There is one caveat to using the MRC database, or others like it. You may have noticed that it uses quite old frequency information, i.e. the Kucera–Francis (1967) and Thorndike–Lorge (1944) counts (see Section 5.1.2). Because frequency is one of the most important factors affecting process-ing, it makes sense to refer to more current frequency counts, such as Leech, Rayson, and Wilson (2001), which are based on more contemporary English and much larger corpora.

Researchers can also choose to use nonwords in their studies (see Section 5.1.2). A website which automatically generates nonwords formed to the norms of English is the AKC Nonword Database (Rastle, Harrington, and Coltheart, 2002), available at <http://www.maccs.mq.edu.au/~nwdb>.

4.6 Sample size of lexical items

Time is precious. You never have enough when doing vocabulary research. This is because the more samples you obtain from your participants, the more valid and reliable your results should be. This measurement truism is always in tension with the practical reality that the amount of time to administer a study is always limited. It is usually limited by practical con-siderations, e.g. teachers will give you one class period to do your research with their students, and this quite often is around an hour long. Even if you are doing one-to-one interviews and there is no ‘official’ time limit, you will find that there is a limit of how long even the most enthusiastic participant can concentrate on a task. This means there is still an effective time limit, beyond which data becomes invalid, or at least suspect. The upshot of this is that you must carefully design your study with time limitations in mind. This will entail prioritizing which information you are able to measure. You will need to pilot your instruments, not only to see if they work, but also to see if they can be completed in the time period you have available to administer them.

Time constraints have an inevitable impact on the number of lexical items that can be incorporated into a study. Most lexical studies try to generalize to quite large amounts of vocabulary (e.g. the size of a learner’s lexicon, the number of words in a book, the first 2,000 word families in English). It is therefore important have a large sample population of words (e.g. the target words on a test) from which to generalize to the whole population of vocabulary one wishes to make statements about (e.g. a learner’s total vocabulary size). Thus, in terms of generalizability, the more target items the better, with the (usually) unachievable ideal being the measurement of all the lexical items in question. In practice, a researcher is usually able to measure only a small proportion of all of these lexical items. Also, the test type has an effect. Some item formats, like checklist tests, allow relatively more lexical items to be measured, while more involved formats, like cloze items or think-aloud methodologies, will take more time and so fewer items



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

http://www.maccs.mq.edu.au/~nwdb


can be included. There is also a tension between the amount of depth of knowledge which can be measured, and the number of items included.

This leaves the tricky question of how large (or how small) a target sample is acceptable. Unfortunately, there is no set answer, only the observation that ‘The more, the better’. The only concrete guidance is that the researcher needs to be able to argue that the sample which is measured gives reliable and meaningful information about the vocabulary being discussed. In other words, the sample size needs to be large enough to realistically model the behavior of the total vocabulary population. A discussion about this needs to be included in all research reports, including any limitations to the gen-eralizations drawn from the sample to the whole. One should take account of the points made in Quote 4.4, and carefully consider how the acquisition and use of larger sets of vocabulary may differ from the much smaller sets of target lexical items commonly used in studies. For example, the time required to learn a word in a small set of lexis may be much less than the time required to learn the same word in a larger set of vocabulary, due to the possible decreasing efficiency of whatever learning strategy is being used. It would thus be erroneous to generalize the faster rate of acquisition from a small set of words to a much larger set of words.

Quote 4.4 Meara on the problems of using small numbers of target words in lexical studies

The basic problem is that ... [researchers assume] it is possible to model the acquisi-tion of an entire vocabulary by looking at how effectively a tiny subset of this vocabulary is acquired in tightly controlled conditions. There are a number of obvious reasons why this position is untenable. Firstly, learning a set of 20–40 words may pose some difficulties for short-term memory, but seen from a long-term perspective, and in comparison with the number of words a fluent speaker needs to know, such numbers are basically trivial. Many people can handle a vocabulary of a few tens of words by using simple mnemonic techniques, for example. It is not obvious, however, that these techniques would enable a learner to handle, say, two thousand new words – the number of words you need to han-dle about 80% of English text. Secondly, and more importantly, a vocabulary of 30–40 words can be efficiently handled by treating it as an unconnected list of discrete items. Bigger vocabularies on the other hand will contain subsets of words which are linked together either on semantic or morphological grounds, and these linkages must make it inefficient to treat the vocabulary as a simple list. At the very least some sort of network structure must develop in a large vocabu-lary which reflects these relationships between the component items of the total vocabulary. Presumably, what makes it difficult to acquire a large vocabulary is that it takes time and effort for these connections to develop, and for a properly organised lexicon to emerge. This problem does not arise when the target lexicon contains only a handful of words.

(1996a: 6–7)



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


4.7 Interpreting and reporting results

A key requirement of good vocabulary research is that researchers need to be quite clear about which aspects of vocabulary they are addressing, since vocabulary knowledge is too complex to capture everything within a single study. This entails thinking precisely about target lexical items, and what one wishes to find out about them. Rather than just considering a construct as broad as ‘vocabulary knowledge’, it is necessary to specify which elements of vocabulary knowledge are being addressed. Much of the contradictory and confusing results have stemmed from imprecise research definitions and reporting. For example, the wide variation in the estimates of the vocabulary size of the average native or nonnative speaker stems mainly from the fact that different studies used different units of count-ing (e.g. individual word forms, lemmas, word families) (see Section 5.2.1), but regardless of whatever unit was used, the term employed in the report was usually word. Likewise, there is a large range of results for how much vocabulary can be learned from various input techniques, but more of this variation can usually be attributed to the measurement technique than the method of input (e.g. a relatively easy multiple choice meaning-focused format versus a more difficult productive spelling task). Furthermore, it is known that performance is usually better when the test conditions mirror the learning task than when they are incongruent (Lotto and de Groot, 1998). The point is that research reporting needs to be very explicit about what measures are used, and what they tell us about the degree of lexical knowledge or learning. This entails reporting whether the knowledge is receptive or productive in nature, what facets of word knowledge are being measured, and the degree to which the learning is stable (e.g. through the use of delayed posttests).

Beyond more precise reporting in general, a number of other issues come into play in the interpretation and reporting of vocabulary studies. The first is reporting the results of statistical tests. Previously, when the significance of statistics was determined manually with tables, it was standard practice to set a p value, and stick to it throughout the research report (e.g. p < .05). Nowadays, virtually everyone uses statistical packages that give the exact p values. Given that exact p values provide more information than a range value, the APA Publication Manual (fifth edition) now recommends provid-ing exact p values. One of the things an exact p value can provide is in indirect indication of the strength of effect. However, the standard practice is to now provide a separate effect size measure as well as a p value if possible. This is to be encouraged, as an effect size statistic makes the magnitude of an observed effect explicit, and makes comparison between studies much easier. There are various different effect size statistics (e.g. Cohen’s d and Pearson’s correlation coefficient), and in general, Cohen’s figures for effect size have been widely accepted (Field, 2005: 32):



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


● r = .10 (small effect): the effect explains about 1% of the total variance● r = .30 (medium effect): the effect accounts for 9% of the total variance● r = .50 (large effect): the effect accounts for 25% of the variance.

It is quite common to make comparisons in lexical studies, e.g. compar-ing learning from paired-associates input versus keyword method input. However, in addition to reporting the statistical comparisons (e.g. the sig-nificance of t-test comparisons), it is also important to report and interpret the absolute results. It is possible to have statistically significant results that are essentially meaningless. I have read numerous studies where Learning Method A was compared to Learning Method B, and a great deal was made of the significant result that one was better than the other. The discussion section then typically went on to discuss the great facility of using this method. However, when the descriptive statistics were checked carefully, it became obvious that neither method was of much value. It might be that Method A leads to the learning of two new words out of 50, while Method B leads to four new words being learned. While Method B might have been significantly better, the important news is that both methods were ineffec-tive. Always consider acquisition/use in terms of the broader view; is there enough real learning/use occurring to make meaningful statements about? In short, report the absolute learning and use in addition to the compari-sons, as they may be the more important result.

Second-language learners vary considerably in their mastery of most lin-guistic aspects, and vocabulary is the prime case, as even native speakers vary a great deal in the size and content of their lexicons, especially at the lower-frequency levels. While frequency gives a good rough indication of the likelihood of a lexical item being known, even people with very similar lexical sizes will have a proportion of their lexicon which is idiosyncratic. For example, a learner with a vocabulary size of 1,000 items will likely know words like cat, look, wall, and pretty, but may also know much lower-frequency words like carburettor and fanbelt if they have a special interest in automobile engines. Conversely, they may not have yet picked up quite frequent words like policy or management simply due to the chance lack of exposure. When doing group research, it is important to indicate the central tendency of the group behavior, but it is likewise important to note any diversity in indi-vidual scores. This is usually indicated by the standard deviation statistic, and sometimes by the range of the scores. However, it is essential to inter-pret the variation for the reader, and indicate the extent to which the mean/median figures reported represent the behavior of the group, and the extent to which individuals vary around the central tendency. Individuals typi-cally vary widely in their lexical knowledge; a report should always consider the extent to which this is true in the particular study.

The way that mean scores can hide individual variation is dramatically shown in a study by Li and Schmitt (in press). They followed four Chinese



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


students (‘LH’, ‘TT’, ‘WL’, ‘YJ’) doing an MA-ELT course at a British uni-versity. The feature of interest was adjective-noun combinations in the students’ written assignments. To determine whether these combinations were in fact collocations typical of academic discourse, the academic sub-corpus of the BNC (see Section 6.2) was searched to see if the combina-tions occurred there, and a t-score (Section 3.2) calculated. Higher t-scores indicate very typical, higher-frequency collocations (e.g. black coffee, future role). Li and Schmitt found that the t-score means for assignments submit-ted over three university terms indicated a plateau from Term One to Term Two, but then an improvement at Term Three. This suggested that the stu-dents as a group used relatively more typical and frequent adjective-noun collocations at the end of the academic year. However, when individual behavior was examined, it was found that every student had a completely different profile, none of which were represented even vaguely by the mean scores! This is illustrated in Figure 4.1. The lesson seems clear: central ten-dency figures usually are useful in indicating group trends, but they can sometimes hide a great deal of important information.

To finish this section, I will present an example of what I consider misin-terpretation of research results due to a lack of consideration of the amount of vocabulary engagement various tasks engender. I suggest you first read

T-Score of bigrams over time

5.00

5.10

5.20

5.30

5.40

5.50

5.60

5.70

5.80

5.90

6.00

LH

TT

WL

YJ

Mean

LH 5.20 5.47 5.10

TT 5.09 5.40 5.68

WL 5.60 5.29 5.94

YJ 5.36 5.05 5.04

Mean 5.31 5.30 5.44

Term One Term Two Term Three

Figure 4.1 T-scores of adjective-noun combinations over three terms



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


the study for yourself and come to your own conclusions, before reading the following critique. The study is:

B. Mason and S. Krashen (2004). Is form-focused vocabulary instruction worthwhile? RELC Journal 35, 2: 179–185. Available at <http://rel.sagepub.com/cgi/content/abstract/35/2/179>.

The study

Mason and Krashen compared vocabulary learning from two different input conditions. The first was a Story-only condition, where L2 learners listened to a 15-minute story The Three Little Pigs. The Story-plus-study condition entailed listening to the story plus doing explicit exercises to learn 20 target words from the story. The two conditions were carried out in two separate classes in a Japanese junior college with students of relatively low English proficiency. The procedures are detailed below. The first three steps were the same for both groups. Time on task in minutes is shown in parentheses. The pretest, posttests, and delayed posttest are identical.

Both groups took the same delayed posttest five weeks later.

Story-only condition Story-plus condition

1. 20 target words from a story were written on the board in front of the class

1. 20 target words from a story were written on the board in front of the class

2. The students took a translation test (pretest) on the target words (5 minutes)

2. The students took a translation test (pretest) on the target words (5 minutes)

3. The students listened to the story (15)

3. The students listened to the story (15)

4. Comprehension questions (10)

4. Posttest on the 20 words (5) 5. Posttest on the 20 words. (5)

6. Students correct posttest with a partner and the teacher (10)

7. Students read a written version of the story (10)

8. The students retold the story with encouragement to use the target words, but in fact they make no special effort in this regard (20)

9. Students retake the posttest (5) 10. Teacher gives correct answers (5)



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

http://rel.sagepub.com/cgi/content/abstract/35/2/179

http://rel.sagepub.com/cgi/content/abstract/35/2/179


The results

These are the results (means) reported in the study for the various admin-istrations of the test. Maximum score is 20. Standard deviations are in parentheses.

Mason and Krashen also calculated the efficiency of the learning, in terms of words learned per minute of treatment. The calculations are based on a posttest score minus the pretest score (pre-existing knowledge) divided by the number of minutes spent on task. For example, the 1st posttest Story-only efficiency figure was calculated as 13.9–4.6 = 9.3 ÷ 15 minutes = .62 words per minute.

The interpretation

Mason and Krashen acknowledge that the Story-plus condition led to greater vocabulary learning overall, but question its efficiency. They con-clude, on the basis of their calculations, that the learning from the Story-only condition was actually more efficient in learning per time expended, and that ‘additional focus on form in the form of traditional vocabulary exercises is not as efficient as hearing words in the context of stories’ (p. 179).

A re-evaluation

Schmitt (2008) suggests that one of the most important elements in deter-mining whether various tasks facilitate vocabulary learning is the degree of engagement, a cover term for all the exposure, attention, manipulation, and time spent on lexical items. If we carefully consider the level of engage-ment of the various tasks in the Story-only and Story-plus procedures, we are led to quite different conclusions than Mason and Krashen came to. Let

Pretest 1st posttest 2nd posttest5 week delayed

posttest

Story-only 4.6 (2.3) 13.9 (3.4) – 8.4 (3.5)Story-plus-study 4.7 (1.7) 15.1 (2.6) 19.7 (.6) 16.1 (2.2)

1st posttest 2nd posttest5 week delayed

posttest

Story-only .62 – .25Story-plus-study .42 .23 .16



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


us consider several points. The first, which Mason and Krashen briefly dis-cuss in a footnote, is the effect of testing. Research shows that an increase in virtually any aspect of engagement leads to better vocabulary learning, and there few tasks which capture and focus learners’ attention like tests. This indicates that the time spent on testing needs to be included in the cal-culations for vocabulary exposure, as it is likely to have engendered deeper engagement than any of the other tasks.

Second, some of the tasks in the Study-plus procedure which Mason and Krashen include in the ‘vocabulary learning time’ did not actually generate engagement with the target vocabulary, and therefore should be excluded from the efficiency calculations. In particular, when the stu-dents read the written version of the story, they were instructed to under-line the words they wanted to learn. So while they should have been briefly exposed to the 20 target words again within the story, there is no indication that they focused on these words, and so calculating that three minutes out of the ten required for completion of the task is probably generous. Furthermore, Mason and Krashen indicate that students made no special effort to use the target vocabulary in their story retellings, and so all of the time on this task (20 minutes) cannot be counted as learning time for those target words. Again, there might have been some engage-ment, so let us generously count five minutes as the vocabulary learning time in this task.

A recalculation of the time on task where engagement occurred with the target vocabulary looks like this:

Based on these revised time figures the efficiency table looks like the following:

Story-only Story-plus-study

Pretest (5) Pretest (5)

Listening to story (15) Listening to story (15)

Posttest (5) Comprehension questions (10)

Posttest (5)

Correction of posttest (10)

Reading written version of story

(3)

Story retelling (5)

2nd posttest (5)

Correction of 2nd posttest (5)

Total 25 minutes 63 minutes



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


While the Story-only condition had a somewhat higher efficiency figure at the first posttest, the discussion in Section 4.3 argues that an immediate posttest should be mainly interpreted in terms of the effect of the treatment. The figures of importance are the delayed posttest scores. After a period of five weeks, any remaining knowledge can surely be classed as durable learn-ing. Looking at these ‘learning’ figures, we find that, contrary to Mason and Krashen’s conclusions, the Story-plus-study condition actually produces slightly better figures, meaning that the insertion of explicit study leads to more efficient learning. It is important to also look at the absolute figures: after 53 minutes of listening plus explicit exercises, the relatively weak Japanese learners had scored virtually maximum scores on the second posttest, with 74% of the students achieving perfect scores. Clearly, the addition of explicit teaching was not only relatively efficient, it was highly effective. It was also durable: a gain of 11.4 words out of 15.3 possible (75%) is an excellent result after five weeks (4.7 out of 20 target words were already known at the pretest, and so not available for learning). We can compare this to the durable learn-ing of the Story-only condition: a gain of 3.8 out of 15.4 possible = 25%. We should also note that one element which likely led to this good learning is the effect of the additional posttest (second posttest) in the Story-plus-study treatment which may well have helped the learners to consolidate the learn-ing achieved in the listening and explicit exercises. This suggest that tests can act as a useful recycling in the vocabulary learning process.

In sum, contrary to Mason and Krashen’s general questioning of the efficiency of the traditional vocabulary exercises in this study, the Story-plus-study condition lead to students learning three times as many words with slightly better efficiency. This is in line with almost all other studies which compare incidental learning with incidental-plus-explicit learning. However, this would not have shown up without a careful consideration of the tasks and the engagement they entailed. This is a warning to both be very careful with one’s own interpretation and reporting, and cautious about too easily accepting other authors’ interpretation of their data.

1st posttest 2nd posttest5 week delayed

posttest

Story-only 9.3a – 3.8

20b 25

.47c .15

Story-plus-study 10.4 15.0 11.4

30 53 63 .35 .28 .18

a Gain score.b Time on task in minutes.c Efficiency in words learned per minute.



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

173

5Measuring Vocabulary

5.1 Global measurement issues

A great deal of vocabulary research involves measurement of lexical items in one way or another. This means that vocabulary researchers must be able to choose or design measurement instruments or procedures which validly and accurately describe the aspects of vocabulary being explored. This section will discuss a number of measurement issues specifically related to vocabulary. It will start with some general issues which need to be considered.

In discussing vocabulary measurement, it is useful to first explore ways in which the various measurements’ formats differ. Read (2000: 9) proposes that formats can vary along three clines as illustrated in Figure 5.1.

Tests which focus specifically on vocabulary knowledge are likely to be discrete in the sense that particular lexical items are highlighted. However, vocabulary measures can be a component of measures of broader linguistic proficiency, and in this case, the test would be embedded. Receptive vocabu-lary measures are almost always selective, because the test writer needs to select the lexical items to measure, determine their characteristics, and then write test items for them. On the other hand, a measure of the complete vocabulary output of learners’ speaking or writing production would be comprehensive. If this is ‘free’ output, it poses difficulties for the tester, as there is no way to know in advance exactly what the produced vocabulary will be. In terms of context, vocabulary items can range from completely context-independent (e.g. an L2-L1 translation task), to completely context- dependent (e.g. define the target word according to the meaning sense used in Passage X). Context-dependent formats will obviously provide a better way of tapping into the ‘contextualized’ facets of word knowledge like col-location and register.



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


5.1.1 Issues in writing vocabulary items

When measuring knowledge of a lexical item, it is necessary to ensure that the test format does not limit the ability of participants to demonstrate whatever knowledge they have of the item. One of the most basic problems to avoid is using unknown vocabulary (or any other linguistic feature for that matter) in measurement instruments. This includes the instructions and any context provided in the test items. If the purpose of the measure is to ascertain knowledge of the target vocabulary, it does no good for partici-pants to underachieve due to constraints not related to those targets, e.g. not

Concept 5.1 Test terminology

The field of language testing has its own technical vocabulary to describe different parts of a test. Although I have tried to avoid much of this jargon, a quick overview of basic terms is probably useful. Each individual ‘question’ on a language test is called an item, simply because most items are not in fact ques-tions. In multiple choice items, the part that is given is called the stem, and the possible answers are called options. The correct option is the key and the others are called distractors (their purpose is to distract the examinees away from the key if it is not known).

Discrete Embedded

A measure of vocabularyknowledge or use as anindependent construct

A measure of vocabularywhich forms part of theassessment of someother, larger construct

Selective Comprehensive

A measure in whichspecific vocabulary itemsare the focus of the assessment

A measure which takesaccount of the wholevocabulary content of theinput material (reading/listening tasks) or test-takers’ response (writing/speaking tasks)

Context-independent Context-dependent

A vocabulary measure inwhich the test-taker canproduce the expected responsewithout referring to any context

A vocabulary measure whichassesses the test-taker’sability to take account ofcontextual information in order toproduce the expected response

Figure 5.1 Dimensions of vocabulary assessment



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

Measuring Vocabulary 175

recognizing a word that they know due to its being placed in an unknown syntactic construction (e.g. passive sentence). This principle is most critical in devising definitions, where participants must know the defining vocabu-lary in order for items to be effective. This contrived example illustrates an extreme case of definitions being ‘harder’ than the word being tested.

cow (a) a feline animal (b) a bovine animal (c) a porcine animal (d) a marsupial animal

Avoiding this kind of problem is not completely straightforward, as vocabu-lary difficulty is not an absolute intrinsic characteristic (as we have seen in Section 2.3), and depends to some extent on the similarity or difference with a participant’s L1 (Laufer, 1997). If you are researching in a homogeneous L1 environment, you may be able to consider the L1-L2 relationship in determining lexical difficulty. For example, while the above example may seem absurdly dif-ficult, for speakers of some Romance languages it may not be quite so extreme, as bovine is cognate with their L1 (Spanish bovina; French bovin; Italian bovino). (See Jarvis, 2000, on more on measurement of L1-L2 transfer.) However, many research contexts will involve mixed groups with speakers of many L1s, which makes considering the various L1-L2 relationships infeasible.

The most common method of ensuring that defining vocabulary is easier than the target vocabulary is through using higher-frequency words in the definitions. It is a well-attested phenomenon that learners typically learn more frequent lexical items before less frequent ones (e.g. Schmitt, Schmitt, and Clapham, 2001), and so it is a reasonable assumption that if the defining vocabulary is of higher frequency than the target vocabulary, the participants will know it if they know the lexical item being measured. This frequency approach usually works well, both because frequency data is now readily avail-able (either through published lists or corpus analysis), and because it is not L1-specific. Using higher-frequency defining vocabulary, the following example largely avoids the problem of unknown vocabulary in the definition.

bovine(a) cat-like(b) cow-like(c) pig-like(d) kangaroo-like

The only potential exception is kangaroo. A check of the 179 million word New Longman Corpus shows that bovine occurs 293 times, while there are 241 instances of kangaroo. In terms of simple frequency, this suggests that kan-garoo is somewhat less frequent than bovine, and so a bad component for a



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


definition. Conversely, the VocabProfiler on the Lextutor website puts kan-garoo in the 7,000 frequency band, while bovine is in the 12,000. This shows that raw frequency information needs to be used with some caution and com-mon sense. The makeup of a corpus obviously has a great effect on frequency counts, and researchers need to carefully consider whether a particular corpus (and the resultant frequency data) is suitable for their needs. For example, in the ACE and ICE-AUS corpora of Australian English, kangaroo occurs 33 and 51 times per million words respectively, making it a high-frequency word for learners of that national variety of English. Given the inevitable differences in frequency figures between even very comparable corpora, it is prudent to con-sult several appropriate corpora and their frequency counts when determining frequency information (e.g. Schmitt, Dörnyei, Adolphs, and Durow, 2004).

It is also useful to consider how lexical items are used in the corpora. An analysis of the concordance lines in the Longman corpus indicates that bovine mainly occurs in medical or scientific contexts, while kangaroo occurs in gen-eral English contexts. Moreover, kangaroo is likely to be a loanword in many languages and so known through the L1 for many learners. Taken together, this is additional evidence that most participants would probably know kan-garoo before they knew bovine, unless perhaps they were medical students.

While the defining vocabulary should always be of higher frequency than the target vocabulary, in general it is best to limit oneself to the highest frequency words when possible. For example, even though the relatively rare word lithe (Lextutor 14,000 band) could be defined with the higher- frequency word supple (8,000 band), it would probably be better to use the even higher-frequency words flexible or graceful from the 2,000 band. It is possible to define most words with only vocabulary from the most frequent 2,000–3,000 words of English, as demonstrated by learner dictionaries, which tend to have a defining vocabulary of around this size (e.g. Macmillan English Dictionary for Advanced Learners, 2002: < 2,500 words; Longman Advanced American Dictionary, 2000: ≈ 2,100 words). In fact, learner dictionaries are good sources from which to extract accessible definitions for research instru-ments, as the lexicographers have already taken great pains to make them as easy and transparent as possible. Also, as copyright issues make it difficult for different dictionaries to have exactly the same definitions, a perusal of several learner dictionaries will provide you with a number of definition options to choose from. This is illustrated by the following similar, but sub-tly different definitions of lithe, most of which could probably be suitably adapted for a discrete item test of the form-meaning link:

having a body that moves easily and gracefully ●

(Longman Advanced American Dictionary, 2000)moving and bending in a graceful way ●

(Macmillan English Dictionary for Advanced Learners, 2002)young, healthy, attractive, and able to move and bend gracefully. ●

(Cambridge Advanced Learner’s Dictionary, 2003)



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


A ● lithe person is able to move and bend their body easily and gracefully.(COBUILD English Dictionary for Advanced Learners, 2001)

The ease of these definitions is illustrated with a comparison to a popular online monolingual dictionary. For example, the second definition below is clearly more difficult than the above definitions and thus less suitable for L2 language research.

easily bent or flexed1. characterized by easy flexibility and grace; 2. also athletically slim.(Merriam-Webster Online Dictionary)

It is also important that the test items used in measurement instruments are natural and make sense to the participants. Too often, they are created to fulfil the requirements of the research design, but end up being awkward and atypical of the target language. This can happen when contriving con-text for lexical items, or when developing distractors. One way around this is to use or mimic authentic language in the test items. Another way is to elicit native judgements of ‘naturalness’ to ensure that the target items do not have any unintended ‘strangeness’ about them. For example, if creating sentences for target lexical items, a panel of native speakers can be asked to rate each of the sentences on a six-point Lickert scale according to how ‘natural’ they are (1 = ‘very unnatural’, 6 = ‘completely natural’). Then only sentences receiving a mean rating of 5 or above can be retained. This is an easy way to avoid uninterpretable results based on unnatural stimuli.

Best practice is to use delayed posttests in acquisition studies (Section 4.4), but this raises the issue of memory effect. Many researchers are inclined to use different versions of a test in order that participants won’t score artifi-cially highly because they remember a test from a previous administration. However, the problem is that it is difficult to make different versions of a vocabulary test which are truly equivalent. For example, in the validation report of two versions of the Vocabulary Levels Test (Schmitt et al., 2001), the researchers concluded that while the two VLT versions were statistically equivalent for groups of learners, they were not for individual learners. In fact, with the possible exception of psycholinguistic designs that use lists of individual words which can be precisely matched in their characteristics, it is virtually impossible to create multiple versions of a vocabulary test on which individuals will produce the same score. This indicates that pretest-posttest(s) designs need to use the same test in all administrations to avoid unknown variation due to use of not-fully-equivalent tests.

If memory effect is a concern because two or more administrations are given close together, a good technique is to give participants a cognitively-challenging task (e.g. a math exercise) immediately after the test admin-istration, in order to get them thinking about something else, and ‘flush out’ their memories of the test. It is also useful to disguise the true targets



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


of the research by adding ‘red herring’ distractor elements to the pretest. Let us take the example of a research design which is interested in the inci-dental learning of vocabulary from reading, where it is necessary to give a pretest to measure pre-existing knowledge of the target lexical items. The researcher can add some reading-based test element(s) (e.g. a short test of reading speed) to avoid participants guessing that vocabulary is in fact the target. It might also be useful to add some extra words to the vocabulary pretest, so that the actual target items are less prominent. These additional pretest elements will take up extra time, but would be worthwhile if they distract the participants from the true nature of the research design.

Although many commentators would have vocabulary always taught and tested in context, this probably depends on what is being measured. If the construct is the form-meaning link, then non-contextualized formats like L2-L1 translation can work well. However, if the desired construct is the ability to recognize a lexical item from context in written texts or speech, then context is obviously required, presumably the type (level) of context in which you expect your participants to meet these items in during non-testing (i.e. real usage) conditions. This follows language testing views that emphasize the need for testing situations to replicate situations of language use or learning (e.g. Bachman and Palmer, 1996). Similarly, if the construct is productive ability, the demonstration of the lexical items in contextual-ized speech or writing would also be required. Of course, this makes testing harder and messier, as it can be difficult to tell if you are testing the target items or the context.

Quote 5.1 Cameron on test items which measure vocabulary in isolation

In the process of acquiring a vocabulary item, the meeting and making sense of a new word in context is likely to be the first step in a longer process; initial encoun-ters with a word do not necessarily lead to that word being recognized on further occasions. Further meaningful encounters will be needed to establish the full range of a word’s meaning possibilities, and to engrave the word in memory. Eventually, after sufficient contextualized encounters, a word will be recognized when it is met in a new context or in isolation. If we then think about the process of completing a word recognition test, we can surmise that decontextualized pres-entation of a word in a test does not imply that a testee makes sense of the test word in a ‘decontextualized’ mental void. Rather, the recognition process may activate recall of previous encounters and their contexts. Since we would expect and want secondary-level [or any other L2] students to be able to operate this way with large sections of their vocabulary, it does not seem unreasonable to test to see how much vocabulary can be recognized without extended linguistic or textual contexts.

(Cameron, 2002: 150–151)



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


Although most vocabulary tests have been of the paper-and-pencil variety until now, it is worthwhile to consider the advantages of computerized test-ing and internet-based testing. They can make it easier to develop partici-pant pools, as it is possible to set up the tests so that participants can take them at their convenience. It is also possible to set up the tests so that they are automatically scored, with immediate feedback to informants. Such tests can also be multimedia, with spoken, written, and graphic elements. They can also be interactive, and can adjust to the proficiency levels of the par-ticipants. It is beyond the scope of this book to discuss the development of computerized tests, but one relatively easy way into this area is Mike Scott’s freeware WebQuiz, which is available at <http://www.lexically.net/software/index.htm>.

5.1.2 Determining pre-existing vocabulary knowledge

Research into the acquisition of vocabulary necessitates determining what vocabulary knowledge exists at a point in time (usually before an experi-mental treatment), and then establishing what the state of knowledge is at a later point. This is often explored with some form of T1 (Test 1/pretest)–treatment–T2 (Test 2/posttest) research design. The need for the pretest is obvious, because if pre-existing knowledge is not established at the begin-ning, it is impossible to know whether T2 knowledge is new acquisition, or simply knowledge that was in place before the study began. (Pretests are also important for determining whether the groups being compared are actually similar before the treatment.) There are two main ways in which the degree of pre-existing knowledge can be controlled for. The first involves ensuring none exists, and the second entails measuring and then adjusting for it.

Ensuring no pre-existing knowledge exists

One case where no pre-existing vocabulary should be in place is with rank beginners. If learners have had no exposure to an L2, then it can be assumed that they will have no knowledge of its vocabulary, as long as it is not a cog-nate language and there are no loanwords from the learners’ L1.

Picking the lexis from an unknown language as targets can be a useful technique, as in the Clockwork Orange studies, where non-Russian speak-ers were tested to see how much Russian slang vocabulary they picked up from reading this novel (Saragi, Nation, and Meister, 1978). However, when English is the target language, it is probably unsafe to assume that the learn-ers have had zero experience with it, given its wide usage and influence around the world.

Another approach is to use very low-frequency words from the L2 the learners are studying, e.g. glimmer and coalesce in English. This usually works well, but researchers should be aware that learners do not learn words in strict frequency order, and will very likely know some low-frequency



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

http://www.lexically.net/software/index.htm

http://www.lexically.net/software/index.htm


words even if they are only beginners. While the numbers of low-frequency vocabulary known will be quite small, it is prudent to check for any pre-existing knowledge by pretesting the participant group or a sample thereof, or a similar group with the same type of participant. It is also important to ensure that the low-frequency vocabulary has no word parts which can make guessing feasible, e.g. fishhook, which is not frequent in itself, but is still relatively transparent if learners know the very frequent words fish and hook.

An increasingly common approach is using nonwords (sometimes called pseudowords or nonce words). This is effective, as it guarantees informants do not know the invented items in advance (e.g. Hall, 2002; Webb, 2007a). There are no technical reasons not to use nonwords, as de Groot and Keijzer (2000) report that the success rate with nonword training is comparable to that of training words from an existing language. This is on the condition that the nonwords are phonologically/orthographically legal in the learners’ L1. One way of doing this is to manually take real words and then change one or more letters to make nonwords (e.g. prod → prok). However, it is easier and more comprehensive to use a nonword generator, such as the ARC Nonword Database <http://www.maccs.mq.edu.au/~nwdb/> (Rastle, Harrington, and Coltheart, 2002). This website creates lists of nonwords based on a wider range of criteria than it would be feasible to control for manually, including: the number of letters or phonemes, the number of real words with a similar form, and the frequency of bigram and trigram within the nonword.

The success rate with nonword training also suggest that learners can be motivated to learn non-real material, but there may be ethical issues, espe-cially if the subjects are actively involved in language learning and expect-ing that participation in a study will enhance their language skills. In such a case, it is probably unreasonable to trade on their goodwill to teach them imitation vocabulary. However, if the subjects are paid, get some other credit or benefit from participation, or agree in advance to learning nonwords, then there would be no ethical barrier to the use of nonwords.

Measuring and adjusting for pre-existing knowledge

This approach entails using a pretest to measuring any pre-existing knowl-edge of the target lexical items. As mentioned in Section 5.1.1, it is usually best to use the same test for the pretest and subsequent posttest(s), often with a memory-flushing task after the pretest. It is also useful to have dis-tractors in the pretest to minimize the chances of learners becoming aware of the target items. The practice in psychology studies is to have about one-third targets and two-thirds nontarget distractors. If the target vocabulary is at the level which the learners would naturally be learning at their level of proficiency, there will probably be considerable variation in pretest knowl-edge, which often makes it sensible to report gain scores (T2 scores minus T1 scores) rather than raw T2 scores.



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

http://www.maccs.mq.edu.au/~nwdb


5.1.3 Validity and reliability of lexical measurement

Vocabulary measurement needs to be both valid and reliable. These are technical terms in the field of language assessment, and have been widely discussed (e.g. Bachman, 1990; Bachman and Palmer, 1996). Although validity is a complex construct, for the purposes of our short discussion, let us think of it as whether vocabulary tests actually measure what they purport to measure. This can be established in different ways. One com-mon way is through criterion validity. In this method, a new test is judged according to how closely it correlates with an already established meas-ure. This can work well if an accepted standard measure already exists to compare against. However, in vocabulary, few such standards exist, with the Vocabulary Levels Test being perhaps the closest we have at the moment. Moreover, the complex nature of vocabulary knowledge dictates that any particular test would be severely limited as a criterion measure. For example, the VLT is a receptive test, measuring knowledge of the form-meaning link only, and is based on four frequency levels (2,000, 3,000, 5,000, 10,000). For an alternative test that has these precise characteristics, the VLT will serve as an adequate criterion. But it would probably not be suitable as a criterion for a productive vocabulary test, one which measures other word knowledge facets, or which sampled vocabulary from different frequency levels. Therefore, a criterion validity approach has serious limi-tations at the moment.

This means that the validity of a vocabulary test will usually have to be demonstrated through its own development and performance. The devel-opment part starts with specifying the content, i.e. content validity. It is here where one must apply the themes running through this book, and not spe-cify a ‘general’ vocabulary, but be much more specific about what lexical items the test includes and what is being measured about those items. Some of these specifications will include:

whether the test measures only a specific set of lexical items (such as ●

a group of items which have been previously taught), or whether the lexical items on the test are supposed to represent a wider population of vocabulary

Quote 5.2 Read questioning whether language tests always measure what they purport to

We need to be cautious in making assumptions about what aspect of language is being assessed just on the basis of the label that a test has been given.

(2000: 99)



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


if the test items represent a wider population, what is the basis for this ●

generalization, and is it supportable (e.g. frequency band, type of vocabu-lary, word class)what word knowledge aspects are being addressed ●

whether the tests measures recall/recognition/receptive/productive levels ●

of mastery.

It should be possible to develop detailed and focused specifications for new vocabulary tests, based on the literature in the field and previous research. After test items have been written for these specifications, it is time to gauge how well the test captures the specified content. That is, when examinees take the test, how well do their scores represent this content? There are numerous statistical analyses available which shed light on this, but I feel the best way is through a separate direct, in-depth assessment of the under-lying knowledge. Perhaps the best way of determining ‘true’ underlying knowledge is through interactive face-to-face interviews where the inter-viewer can probe the examinees lexical knowledge in detail, and come to a very confident determination of this knowledge.

This method was carried out in the validation of the VLT. A subsample of learners who had taken the VLT were asked to attend individual inter-views with a two-person interview panel. The two interviewers probed the learners about their knowledge of a 50-word subsample from the test. The learners could demonstrate their knowledge in a number of ways, and the probing continued until both interviewers were satisfied the learner either did or did not know the form-meaning link for the word. Once the inter-views were completed, the results from the interviews were compared to the results from the test. This comparison is shown in Table 5.1.

Using the interview procedure, the raters could be confident of the learn-ers’ knowledge of the target test words. This was reflected in high inter-rater reliability figures of .95–.97. Note that interviewers did not know the learners’ responses to the VLT, and so this could not bias their interview judgements. The contingency table gives a wealth of information about the validity of the test. Ideal performance entails all responses being either in Boxes A or

Table 5.1 Comparison of interview results with levels test results

Levels Test

Correct Incorrect

Interview Knew A 731 B 47 778

Did not know C 65 D 255 320 796 302 1,098

(Schmitt et al., 2001: 75).



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


D. While not perfect, the VLT performs well in this regard: 90% (986/1,098) of the responses were so placed. What is more, the table illustrates the prob-lems with the remaining 10% of responses, where the VLT did not indicate the examinees’ true lexical knowledge, as indicated by the interview. The mismatches came in two types. The first, knowing a word but not matching the correct option on the Levels test (Box B), did not seem to be too much of a problem with an occurrence rate of only about 4%. Thus, if learners knew the word, they usually were able to answer the test item correctly. The second, not knowing a word but still matching the correct Levels option (Box C), occurred slightly more often – about 6%. This box relates to guess-ing, and a 6% ‘guess rate’ must be considered quite acceptable in a matching test format where guessing can never be eliminated completely.

The interviews also suggested that many of the mismatches were not the result of guessing, but of partial lexical knowledge. For example, one learner believed that collapse meant ‘to break’. While this was not a full enough understanding to warrant a ‘know’ rating in the interview, it did allow him to choose the correct option, ‘to fall down suddenly’, on the VLT. The infor-mation from the table, combined with additional insights like above, dem-onstrate the value of this type of methodology in providing rich data from which to fashion a validity argument for a vocabulary test.

Another requirement of good measurement is that testing instruments give consistent (i.e. reliable) results. In other words, if a participant takes a test today and scores 50%, he/she should also score 50% on that test tomor-row, assuming no learning has taken place. In reality, no test will be able to deliver 100% reliability, because participants are human and vary day-to-day in their performance. However, if tests produce scores which vary widely even though the underlying participant abilities remain the same, it becomes impossible to interpret them: are the higher scores closer to ‘true’ ability? Or the lower scores? Or an average? Because reliability is essential to valid testing, reliability should be determined for all your instruments, and reported.

Quote 5.3 Bachman on reliability

We can all think of factors such as poor health, fatigue, lack of interest or motiva-tion, and test-wiseness, that can affect individuals’ test performance, but which are not generally associated with language ability, and thus not characteristics we want to measure with language tests ... When we minimize the affects of these various factors, we minimize measurement error and maximize reliability. In other words, the less these factors affect test scores, the greater the relative effect of the language abilities we want to measure, and hence, the reliability of the test scores.

(1990: 160)



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


Because developing valid and reliable measurement instruments is time consuming, it can useful to use established tests if they are available and appropriate for your purpose. One advantage of this is that it makes your research more comparable to previous research because the tests are the same and can be interpreted in similar ways. Another advantage is that the tests (hopefully) have been through a validation process where some reliability evidence has been collected. Having previous reliability evidence is useful and offers some quality assurance; however, test performance can vary from one participant population to another, even if they appear to be similar on the surface. It is not difficult to understand why this is: people are inherently individuals, and vary according to factors such as L1, language proficiency, motivation, how well they like their teacher, parental support for learning, and many others. Because of this, best practice dictates that reliability needs to be established for the measurement instruments for each participant population in their own environment. That is, although previous reliability evidence can suggest the consistency of a test, you do not really know how it will work with your particular participants until you try it with them. Therefore, even when using existing measurement instruments, you should report the reliability figures of those measures with your population.

The reliability of most research instruments can be established using dif-ferent methods. The most easily conceptualized is the test-retest method. A test is given one day, and then again quite soon before any learning can occur. However, there are several problems with this method: it requires two administrations, participants may remember something of the test in the second administration if it is given too soon after the first administration, they may have forgotten some of their knowledge if the second administra-tion is given too long after the first, and participants are usually not keen to take the same test twice in row. It is also possible to establish reliability if there are two equivalent versions of the test, which can be compared against each other. However, in my experience, it is difficult to create two tests which are truly equivalent (e.g. Schmitt et al., 2001), and so this may not be a viable option, although Rasch analysis can be a useful aid.

Concept 5.2 Rasch analysis

Rasch (also known as one-parameter) analysis is part of the item response theory approach to language measurement. It involves complex statistical modelling, but in essence, ranks examinees according to their ability (as determined by their total test scores) and simultaneously ranks the test items according to difficulty (in terms of how many examinees were successfully able to answer them). Comparisons between the examinees and test items can then be made (e.g. we would expect the most difficult items to be answered by only the strongest exam-inees). Any items which ‘misfit’ in these comparisons, can be flagged for further consideration of their merit.



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


In practice, reliability is usually established via an internal consistency approach. Instead of giving a test twice as in the test-retest method, it is given only once, but then split into smaller parts which can then be com-pared. For example, the split-half method divides the test into two halves, each considered an alternative form in its own right. The scores from the two halves can be compared, and if the scores are similar, the overall test is considered to be reliable. Internal consistency methods have the obvi-ous advantage of only requiring a single administration, and Hughes (2003: 40) concludes they work well providing ‘that the alternate forms are closely equivalent to each other’.

When the construct being measured is a single rule or ability (third person-s, reading speed), it is possible to create tests where all the items address that single construct. However, vocabulary is largely item-based learning, and so each item addresses a separate construct, i.e. knowledge of a single lexical item. Therefore, just because one lexical item is known (e.g. the word succinct) does not mean that another will be known (pithy). This is true even if the words are similar in terms of form, meaning, or topic area. In fact, the best guide for whether words are known or not seems to be frequency of input. The easiest way of determining input is frequency lists, but these can only ever be a general guide. for example, the words pencil, eraser, and notebook are not among the most frequent words in English, but they will be especially frequent in classrooms of beginning language learners. Furthermore, while L2 learners tend to learn words in frequency order, their lexicon will follow a frequency profile rather than strict frequency order. This is illustrated by the vocabulary profiles of three Japanese learners (Figure 5.2).

Figure 5.2 Vocabulary profiles of three EFL learners (max score = 30)

0

5

10

15

20

25

30

2,000 3,000 5,000 10,000

Frequency band

Sco

res

(max

= 3

0)

Student 1

Student 2

Student 3



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


From Figure 5.2, we see that knowledge of vocabulary generally decreases as frequency decreases, in a type of stair-step pattern. That is, learners typi-cally know fewer families in the lower-frequency bands than they do in the higher-frequency bands, but they usually still know some. Student 1 is a beginner, and knows only a few of the 2,000 and 3,000 families. Student 2 still has not mastered the 2,000 level, but knows some words at the 3,000 and 5,000 levels. Student 3 has a much larger lexicon, and scored maximum points on the 2,000 and 3,000 levels (and nearly so on the 5,000 level), and knows about two-thirds of 10,000 level. These profiles show that it can-not be assumed that all of the lexical items within a frequency band will be learned together. To state this another way, just because some lexical items in a frequency band are known, it does not follow that other items at that band will also be known. This is illustrated by five words from the Leech, Rayson, and Wilson (2001) BNC frequency counts. Film, process, use-ful, conference, and operation are five words in frequency sequence, but just because one or more of these are known, it does not mean the others are, just because of the frequency placement. Frequency is a good tool to gauge the probabilities of a target word being known, but is not strong enough to predict knowledge with certainty.

This causes problems for internal consistency methods. They work by assuming that the division of a test results in equivalent forms. However, because it is difficult to establish whether the lexical items of one part of a test are of equal difficulty to another part, this assumption is suspect. For vocabulary, it would be ideal to establish reliability using a test-retest format, because the lexical items on both administrations of the test would have exactly the same characteristics, as they would be exactly the same items. However, practical limitations may make this infeasible in many con-texts. However, if internal consistency measures are used, it may be better to manually determine that the alternate forms are equivalent as possible, using criteria such as frequency, typical acquisition order of lexical items for your type of participant, and similarity with L1.

A similar problem lies with using item-total score correlations. Again, when a single underlying construct is being assessed, every single item should reflect the total score on a test, and high correlations are one indica-tion of a well-working item. However, when a test is made up of individual lexical items, with knowledge of each essentially a different construct, then item-total correlations do not make sense as a measure of an item’s good-ness. A perfectly good item may well behave differently from others, e.g. when the item measures a word that for some reason has occurred less often than other words in a frequency band in a particular learning environment or for a particular age group. Even though this item would generate lower scores than those for other ‘higher frequency in environment’ words, it is not necessarily a bad item. The point that matters is whether that test



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


item truly reflects testees’ knowledge of the lexical item, and not whether it matches the results from other test items.

The methods of establishing reliability are just another case where vocabu-lary might require different techniques than other linguistic features which are more rule- or system-based.

5.1.4 Placing cut-points in study

Although this is not strictly an issue specific to lexical research, it occurs often enough to warrant a brief mention. When one has a set of data or participants, and wants to divide them into separate groups, it is necessary to divide them at one or more places. However, where the ‘cut’ is made is crucial. Typically, it is made at a some point, and all of the cases below this point are put into the lower group and everything above it are placed in the higher group. The problem is the borderline cases. We can see this in the small contrived dataset below.

40, 44, 45, 48, 49, 50, 59, 60, 62, 65

Many researchers would make the cut-point at 50, and so have five cases below the cut-point and five above. But when dividing into groups, we need to be able to argue that the members of the various groups are exhibiting separate behavior, and are clearly different from each other. With this cut, though, 49 and 50 would be in different groups even though they are only one point apart. Clearly, they are much more similar to each other than the 40 and 44 scores are in the same lower group. In this data set, there is a ‘natural’ cut-point, where data and below have clear differentiation: between 50 and 59. A researcher can use such a natural cut-point when it exists to form groups that are supportably different. However, if no natural cut-point exists, or if it is necessary/desirable to retain the same number of cases in the different groups, it may be necessary to delete some of the close borderline cases in order to clearly differentiate the groups. For this dataset, deleting 49 and 50 would result in to two groups of four cases (40–48 and 59–65) which have an obvious gap and should represent truly different behaviors. However one places the cut-point, it is important to be able to argue that the resulting groups in fact represent separate behavior.

5.2 Measuring vocabulary size

Much vocabulary research involves counting lexical items for some reason, e.g. to discover how many items a learner has acquired, to find out how many items a person needs to know to understand a conversation, or to count how many academic words a book contains. This section will first focus on issues involved in measuring vocabulary size, and will then review a number of vocabulary size testing instruments.



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


5.2.1 Units of counting vocabulary

The main issue in counting vocabulary is in setting the unit of measure. Different ways of counting lexical items will lead to vastly different results, and a persistent problem in lexical studies is that size figures are reported, but without a clear indication of how they were derived. The following describes the main methods of counting vocabulary.

Tokens and types

Two useful terms when discussing vocabulary, particularly corpus research, are token and type. Tokens are the number of running words in a text, while types are the number of different words. Thus there are five tokens in the following example, but only four types, as the two occurrences of fat belong to the same type.

Fat cats eat fat rats.

Word forms

The most basic way to calculate vocabulary is to count each type (also called an individual word form) separately. In the nearby box, I list all of the variations of teach that I could find in the BNC (see Section 6.2). You will see that there are 11 different word forms in the box. This is the easiest way to count, as lexemes with even slightly different spellings are counted as new word forms. No judgements need to be made about issues like mean-ing, word class, or frequency. There is also some evidence that word forms are the basic psycholinguistic element. The speech production research done by Levelt and his associates shows that both in terms of production and perception, speakers generally activate the base word form. This is true even if it is an inflection or derivation (i.e. base form + inflectional or derivational affix). This evidence suggests that people acquire mainly base word forms, although in some cases inflected forms of those base forms are acquired where the frequency is high. (See Kuiper, van Egmond, Kempen, and Sprenger, 2007, for more on lexical activation, especially of formulaic sequences.)

Quote 5.4 Meara on the importance of vocabulary size

All other things being equal, learners with big vocabularies are more proficient in a wide range of language skills than learners with smaller vocabularies, and there is some evidence to support the view that vocabulary skills make a significant contribution to almost all aspects of L2 proficiency.

Meara (1996b: 37)



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


Lemmas

It is clear that the items in the box are all semantically related, and in addi-tion to this, some have a very close grammatical relationship. The first item is the base (or root) form of the verb teach. The next three items (taught, teaching, teaches) are grammatical inflections of the base form. It is not dif-ficult to argue that these four forms are so closely related that they should be counted as one item – a lemma. Teacher and its plural inflection teachers combine to form another lemma, but the rest of the items in the box would each be a separate lemma.

One good reason to count vocabulary using lemmas concerns the way the mind processes vocabulary. Some psycholinguistic research indicates that the mind stores only the base form of a lemma and then attaches inflec-tional suffixes (which are usually regular and consistent) on-line when they are needed (Aitchison, 2003). Thus, to the mind, a lemma operates as one lexeme, albeit one which can be grammatically manipulated. The exception is irregular forms (e.g. taught), which need to be stored separately as indi-vidual lexemes. There are only around 200 verbs with irregular past forms and many fewer irregular plural nouns in English (Schmitt and Marsden, 2006), so individual storage of irregular lemma members is definitely the exception compared to the vast majority of regular nouns and verbs which are inflected on-line. It is important to note however, that individual irregu-lar word forms can be quite frequent (e.g. be, men, ran).

Another reason to use lemmas concerns learning burden. If a student knows the inflectional system and the base form of a word (sew), then learn-ing its inflected forms (sewed, sewing, sews) should be relatively easy. This is not the case for irregular forms though, and so there it is unclear whether they should be included in a lemma, or be counted as a new one.

Word families

While lemmas are groups of related word forms within a word class, words from other word classes can also be related to a base form. For the verb teach,

Variations of teach found in the BNC with number of occurrences

teach 3,298 teachable 12

taught 4,224 teachability 1

teaching 9,581 teachableness 1

teaches 536 unteachable 8

teacher 9,145 teacher-like 0

teachers 12,370



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


for instance, there is the noun teacher and the adjective teachable. These word forms have related meanings, and clearly fit with the lemma teach. We call all of the word forms which are semantically related a word family. Thus, the word family formed around the base form teach includes all of the words in the box. This unit of counting is the best at capturing all of the word forms related to a concept.

There is some psycholinguistic evidence that the mind processes word forms like these together in some way, e.g. Nagy, Anderson, Schommer, Scott, and Stallman (1989) found that the speed at which a native speaker could rec-ognize a word was better predicted by the total word family frequency than by the frequency of the individual word form. (See also Bertram, Baayen, and Schreuder, 2000, and Bertram, Laine, and Virkkala, 2000.) However, Kon Kuiper (personal communication) points out that the psycholinguistic reality of word families is not straightforward. The question as to whether there are word families in the mental lexicon depends in part on how one looks at formal similarity/idiosyncrasy. Let us suppose that two words are lexically related, such as silly and silliness. There is some formal similarity here, but there is also idiosyncrasy. Some related lexical items will have only very small amounts of idiosyncrasy (i.e. the forms are very similar: activate/activation), while others will have as much, or even more, idiosyncratic infor-mation than there is predictable information (deceive/deception). But even if one assumes that the learner has less to learn when there is much formal similarity, it does not necessarily follow that speakers also store lexically-related items under a single mental entry. Psycholinguistic evidence shows that activation of a lexical item spreads to related lexical items, but that the activation of each item is unique. So it seems that the psycholinguistic sta-tus of word families is still undetermined.

In practical terms, word families can be much more difficult to use than other units of counting, especially in deciding which word forms belong in the family and which do not. While relatively frequent word forms like teacher are easy to include, other seemingly acceptable forms like teachable are relatively infrequent. Then we have very uncommon forms like teachable-ness which still follow the rules of morphology and can be found in a native corpus. Finally, there are forms like teacher-like which one might intuitively place in the word family, but which might not occur in a corpus at all. There are no hard-and-fast rules to determine the members of a word family, but if one has a good idea of their research purpose, it is still possible to make the selection in a principled way. Bauer and Nation (1993) give some guidance with a hierarchy of inflection and derivation affixes based on the criteria of frequency, regularity of form, regularity of meaning, and productivity.1

Deciding the best unit of counting

The best unit of counting will depend somewhat on the technical resources available. Using word forms is often the easiest option, as concordancing



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


programs can count only items which have an exact formal match. However, there are now lemmatizers available which can automatically ‘tag’ a corpus and lemmatize frequency counts (see Sections 6.2 and 6.3), but they still need some manual oversight. I am aware of no program which can reliably count word families automatically on its own. For families, it is necessary to first give the program a list of family members, against which it can match forms in the corpus.

However, it is better to decide the unit of counting based on the research questions, and perhaps one’s conceptualization of lexical storage. There are some research situations where it makes the most sense to distinguish between individual word forms. Charting the early vocabulary acquisition of L1 children is a case in point. Each new word form spoken at this ini-tial stage has significance, and the acquisition of past and plural forms are important milestones in a child’s linguistic development. Thus a counting measure which captures each new individual word form is the appropriate methodology for this research setting.

However, when studying the acquisition of new lexemes by older native speakers, then lemmas may well be a better measure, because counting word forms might give an overestimate of lexical growth. We would expect older natives to have mastered the basic morphology of their language, and so be able to attach grammatical affixes relatively automatically to newly learned word forms. This might result in several new word forms being counted (using word form as counting unit), when in reality perhaps only one word form was newly learned, and the rest generated via the on-line attachment of affixes.

With nonnatives, the interpretation of the numbers becomes trickier. They are likely to vary widely in their ability to attach grammatical affixes, and so we cannot always assume that all members of a lemma are available to nonnative learners, particularly beginners, just because one member has been demonstrated. To be truly confident in our count, word forms seem a better option. Despite this caveat, lemmas are widely used in acquisition studies for several reasons. First, if we use word forms as our unit of count-ing, any measurement instrument is likely to have a variety of the various forms of inflection on it (e.g. boy, cars, walk, carried, seeing, sleeps). This is potentially confusing for the participant, and so measurement instruments typically include only base forms (or sometimes the most frequent forms) of a lemma or word family (boy, car, carry, see, sleep). Also, with more proficient learners, we can be more confident that they will have some mastery over the morphology of a language, and so for them lemmas are an appropri-ate measure. Lemmas have a transparent definition, and so the readers of research studies can form a clear idea of exactly what vocabulary is being discussed. This is not always true of word families, where the criteria for word form inclusion vary considerably.

However, word families have their own advantages as a unit of count-ing. They correspond most closely to the notion of headwords used in



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


dictionaries, a concept most participants are familiar and comfortable with. This makes word families a reasonable measure when using dictionaries as a baseline for how many words exist in a language. Word families also reduce the redundancies that can occur in word counting, i.e. all the semantically related members are included in a word family, they do not have to be han-dled again in another category, as they would have to be with lemmas, and especially word forms.

Nevertheless, word families inevitably also have drawbacks as a unit of counting.

Some of these have already been alluded to: that they are difficult to inter-pret because the criteria for word form inclusion varies, and that concord-ancing software does not usually automatically tabulate word family figures. Moreover, we cannot assume that even relatively advanced L2 learners will have good control over the derivational affixes used to form the members of a word family (e.g. permit + -ion = permission; + -ive = permissive). Bauer and Nation (1993) suggest that if one member of a word family is known, then the other members can probably be recognized as well. There is evidence to show that this holds true, at least to some extent, for receptive knowl-edge. However, we cannot make the same assumption for productive knowl-edge. Schmitt and Zimmerman (2002) found that learners studying to enter English-medium universities typically did not know all of members of a word family productively, even though they knew one or more. If they knew any of the forms of a word family, it was usually the noun and verb forms, but they had much more trouble producing the adjective and adverb forms.

There is also the issue of receptive versus productive use to consider. Paul Nation (personal communication) suggests that, in general, for receptive use, word families are the best unit to use, with the definition of what is included in the word family being related to the proficiency level of the participants involved. This makes sense because learners should be able to perceive the similarities between members of a word family in the recep-tive mode, at least to some extent. For productive use, Nation feels that the lemma, or even word form, is the best unit of counting to use. This is because productive use is more difficult, and having productive mastery over one member of a word family does not necessarily entail having it over other members. He also points out that John Sinclair would probably argue for individual word form, because the collocates are often different for the different word forms included in a lemma, and thus each word form would require a slightly different kind of knowledge for productive use.

While this argument is persuasive, it does create problems for vocabulary measurement. If a researcher uses families for counting receptive research and lemmas/word forms for productive research, the different units of counting would mean that there could be no truly parallel receptive/pro-ductive tests. This would make it difficult to directly compare the receptive and productive knowledge of participants.



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


So what are we to make of all this? I am personally developing a feeling that lemmas might be the best general unit of counting for four reasons: (1) the unit is relatively straightforward, which means that consumers of research studies will know what it means; (2) this relative simplicity also makes replication and comparison of studies more feasible; (3) lemmas might be a reasonable compromise for counting both receptive and productive vocabulary, and thus making receptive and productive studies comparable; and (4) it takes a lot of vocabulary to function in a language, and estimates based on word families may give the impression that less is necessary than is the case, especially as many consumers may simply interpret word fam-ily figures as ‘words’. Using lemmas as the counting unit counteracts this because the lemmas are easier to conceptualize, and because the figures will be higher than word family figures in any case.

In sum, the unit of counting vocabulary must reflect the goals, participants, and resources of your study. If it is a corpus study, then choose the unit that best matches the facet of lexis you are exploring, the amount of tagging in your corpus, and capabilities of your concordance software. However, if you are running a study using participants, you must take their background and abilities into account. You should consider how many affixes they are likely to know, and how well they can use them. For most aspects of lexis, word form counts make sense for less proficient participants, while lemma or word family counts can be more appropriate for more proficient participants. There is also the issue of standardization. Vocabulary research will be much more comparable, and thus useful, if all researchers used the same unit of count-ing. On balance, I feel that lemmas are probably the best unit overall, as it is relatively easy to lemmatize words and they are unambiguous to interpret.

Whichever unit you choose to use, it is critical that you report this unit prominently in your research account, and clearly explain how it was used in making the size counts. If not using individual word forms as your unit, it is also probably useful to include a discussion describing how your unit compares to word forms, as many consumers will think in terms of word forms as a default.

5.2.2 Sampling from dictionaries or other references

One of the most common ways of establishing the vocabulary size of both natives and nonnatives is to first establish the overall population of lexical items which could be known, select a sample of these items to fix on a test, and then assume that the percentage of items answered cor-rectly on the test represents the percentage of items known in the total population. There is no one reference source which includes all of the lexical items which could possibly be known, including high-, mid-, and low-frequency words, formulaic sequences, technical words and phrases, brand names, names of people and places, etc. The closest we have to a comprehensive source is dictionaries, although these vary widely in their



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


coverage. In English, the most comprehensive dictionary available is the Oxford English Dictionary. It is a massive multi-volume work, but still does not include many items, such as scientific terminology or the names of many geographical features.

Nation (1993) reviews early size measurements and concludes that none sampled from dictionaries in a manner that produced valid results. Most included far too many high-frequency words (which participants are more likely to know), which generally led to inflated estimates of vocabulary size. He outlines a number of essential, but rarely followed, procedures which are necessary for a representative sampling from dictionaries and other refer-ence sources.

1. Choose a dictionary that is big enough to cover the known vocabulary of the people being investigated.

For educated adult native speakers, Nation suggests using a dictionary with at least 30,000 base words. Most modern dictionaries, both native and learner, meet this criterion. In fact, with second language learners, smaller dictionaries or word frequency lists may be more appropriate, so that learners have some chance of knowing a reasonable proportion of the target lexical items. If a full-sized dictionary were sampled, then begin-ning learners would know only a few of the sampled items, leaving most of the lower-frequency vocabulary items far beyond their level, and so effec-tively wasted. It is better to choose a more limited vocabulary population, so there are more samples of vocabulary within the examinees’ potential knowledge range.

2. Use a reliable way of discovering the total number of entries in the dictionary.

Dictionary makers are not particularly concerned about the number of entries in their dictionary, except in terms of keeping the length of the dictionary to a reasonable size. However, in advertising, a greater number of entries is more impressive, and Nation found that publishers’ statements greatly exaggerated the number of entries. Therefore it is not possible to rely on these publishers’ figures. The true total number of entries can be found by counting each entry either manually or with a computer, or by counting a sample of the dictionary.

3. Use explicit criteria for deciding and stating (a) what items will not be included in the count and (b) what will be regarded as members of a word family.

All dictionary publishers have criteria for what will be included in their dictionaries. However, criteria based around lexicographic purposes will



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


probably not be suitable for research purposes. Therefore researchers need to set exclusion criteria to delete entries which are not appropriate for their studies into vocabulary size. Entries can be deleted for several reasons:

because they may not be important for the participant population (e.g. ●

scientific terminology, geographical place names)in order to lower the number of words in the total population, so that ●

the resulting test can have a greater percentage of feasible items (see above)to avoid counting multiple headwords with related forms ( ● legal, legalese, legalism, legality, legalize). Such duplication can lead to inflated estimates of vocabulary size, and so are better coalesced into word families.

4. Use a sampling procedure that is not biased towards items which occupy more space and have more entries.

Some lexical items have many meaning senses, and will take up much more space than other items. These longer entries tend to be high-frequency vocabulary, and so it is important not to over-sample these items, or inflated estimates of vocabulary size will occur, because participants are more likely to know the high-frequency vocabulary. Ways to compensate for this prob-lem include using numbered entries (choosing every nth word), choosing every nth complete entry (which is not a homograph of a previous entry) on every mth page, random sampling, and stratified sampling based on let-ters of the alphabet.

5. Choose a sample that is large enough to allow an estimate of vocabulary size that can be given with a reasonable degree of confidence.

More items on a vocabulary test lead to more confident estimates of vocab-ulary size. In general, it is advisable to include the greatest number of items possible. This will depend primarily on the amount of time available for the test and the test item format (e.g. checklist items are very quick to answer; gap-fill items less so).

6. The sampling should be checked for the reliability of the application of the criteria for exclusion and inclusion of items.

There are several ways of checking whether the inclusion criteria for lexi-cal selection is being consistently applied. For example, the sample can be done in sections and the figures for each section compared, or more than one rater used and inter-rater reliability established.



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


7. The sample should be checked against a frequency list to make sure that there is no bias in the sampling towards high-frequency items.

Once the sample is taken, it should be checked against frequency counts to see if it contains the appropriate number of lexical items at each frequency level. For example, if each item in the sample represents 100 words in the dictionary, then there should be ten words in the sample within the first 1,000 frequency band (10 x 100). Similarly, there should be ten words in the second 1,000 band, etc.

8. In the written report of the study, describe clearly and explicitly how each of the previous seven procedures was followed in sufficient detail to allow replication of any or all of the procedures.

This reflects the same need for clear and precise reporting discussed in Section 4.7.

5.2.3 Recognition/receptive vocabulary size measures

The next two sections will overview a number of vocabulary tests which measure recognition/receptive and recall/productive mastery of lexi-cal items. The tests discussed have all been used in research to various degrees, and some, but certainly not all, have a degree of validity evidence in place.

Peabody Picture Vocabulary Test (PPVT)

Most vocabulary tests have been used for measuring L2 lexical knowledge, simply because most native speakers have a considerable-sized vocabulary, with Goulden, Nation, and Read (1990) estimating sizes of about 17,000 word families for their New Zealand university students. With relatively large vocabularies forming comparatively early in life, a more interesting facet of lexical knowledge for most native speakers is their relative lexical automaticity. However, for young children where vocabulary is in an early stage of development (two to six years old), and in the elderly (90+ years), where language is attriting, measures of vocabulary size can be interesting. One test used in these L1 cases is the Peabody Picture Vocabulary Test. It is a meaning-recognition test in which examinees listen to words spoken by the test administrator, and then point to the picture which best represents this meaning from a group of four simple, black-and-white illustrations. The test takes about 11–12 minutes on average. It is available in two parallel forms, each containing four training items followed by 204 test items divided into 17 sets of 12 items each. The sets are progressively more difficult. The PPVT test is commercially available at <http://ags.pearsonassessments.com/Group.asp?nGroupInfoID=a12010#dvd>.



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

http://ags.pearsonassessments.com/Group.asp?nGroupInfoID=a12010#dvd

http://ags.pearsonassessments.com/Group.asp?nGroupInfoID=a12010#dvd


Vocabulary Levels Test (VLT)

Perhaps the most widely used vocabulary size test in the ESL context is the Vocabulary Levels Test. In 1996, Meara called it the ‘nearest thing we have to a standard test in vocabulary’ (1996b: 38), and it may still hold that dis-tinction today after going through several iterations (Nation, 1983, 1990; Beglar and Hunt, 1999; Schmitt et al., 2001) (although see the Vocabulary Size Test below). It is called the Levels test because it focuses on vocabu-lary at four frequency levels: 2,000, 3,000, 5,000, and 10,000. These bands coincided with the then current consensus of how much vocabulary was necessary for achieving key goals. Based on Schonell, Meddleton, and Shaw (1956), it was thought that around 2,000 word families were sufficient to engage in daily conversation; 3,000 families were thought to enable initial access to authentic reading, and 5,000 families independent reading of that material. In addition, 5,000 families represented the upper limit of general high-frequency vocabulary; 10,000 families was a round figure for a wide vocabulary which would enable advanced usage in most cases. (Note that current estimates of the vocabulary size requirements of English are much higher; see Section 1.1.2.) In addition, there is a section focusing on aca-demic vocabulary, which is not frequency-based.

The VLT test uses a form-recognition matching format, in which the stem is the definition, and the options are the target words. Each cluster of items contains three stems and six options. In the latest Schmitt et al. versions, each level has ten clusters (i.e. 30 items). Below is a sample cluster:

You must choose the right word to go with each meaning. Write the num-ber of that word next to its meaning.

1. concrete 2. era ——— circular shape3. fiber ——— top of a mountain4. hip ——— a long period of time5. loop6. summit

A number of points can be made about this format.

Based on the definitions in Section 2.8, the VLT is a form recognition test. ●

For consistency, each cluster contains words from only one word class. ●

Roughly reflecting the distribution of word classes in English, there are five noun clusters, three verb clusters, and two adjective clusters per fre-quency level.Both the three target words and the three distractors are from the particu- ●

lar frequency band, which means that the examinees are considering six words from the band in each cluster.



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


The definitions are kept short, so that there is a minimum of reading, ●

allowing for more items to be taken within a given period of time.The test is designed to tap into the initial stages of form-meaning link- ●

age. Therefore, the option words in each cluster are chosen so that they have very different meanings. Thus, even if learners have only a minimal impression of a target word’s meaning, they should be able to make the correct match.The clusters are designed to minimize aids to guessing. The target words ●

are in alphabetical order, and the definitions are in order of length. In addition, the target words to be defined were selected randomly from the six options in each cluster.The words used in the definitions are always more frequent than the ●

target words. The 2,000 level words are defined with 1,000 level words, and wherever possible, the target words at other levels are defined with words from the General Service List (GSL) (essentially the 2,000 level) (see Nation, 1990: 264, for more details). This is obviously important as it is necessary to ensure that the ability to demonstrate knowledge of the target words is not compromised by a lack of knowledge of the defining words.The word counts from which the target words were sampled typically ●

give base forms. However, derived forms are sometimes the most frequent members of a word family. Therefore, the frequency of the members of each target word family was checked, and the most frequent one attached to the test. In the case of derivatives, affixes up to and including Level 5 of Bauer and Nation’s (1993) hierarchy were allowed.As much as possible, target words in each cluster begin with different ●

letters and do not have similar orthographic forms. Likewise, similarities between the target words and words in their respective definitions were avoided whenever possible.

The test is not really designed to provide an estimate of a person’s overall vocabulary size, although some studies have combined the frequency levels to produce an total size figure. The test is better used to supply a profile of learners’ vocabulary, which is particularly useful for placement and diag-nostic purposes. The profiles illustrated in Figure 5.2 in Section 5.1.3 are products of the VLT.

Validity evidence for the VLT is available in Read (1988), Beglar and Hunt (1999), and Schmitt et al. (2001). The two (relatively) equivalent Schmitt et al. versions are available in Section 6.1.

Vocabulary Size Test (VST)

The Vocabulary Size Test made its first appearances in Appendix 2 of Focus on Vocabulary (Nation and Gu, 2007) and Appendix 4 of Teaching Vocabulary (Nation, 2008). It is now also available in an interactive web format on Tom



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


Cobb’s Lextutor website <http://www.lextutor.ca/tests/>. It employs a tradi-tional four-option multiple choice meaning-recognition format, with the target word and a non-defining example sentence as the stem. The VST is broken into 1,000-word frequency bands, and ranges from the first 1,000 band to the thirteenth or fourteenth 1,000, depending on the version. Each 1,000 word frequency band contains ten items, so each item represents 100 words within that frequency band. The item format is illustrated below:

sheriff: The sheriff was friendly.(a) person who flies aeroplanes(b) person who takes care of babies(c) person who makes sure the law is obeyed(d) person who teaches children at home

The words on the test were randomly selected from the Collins English Dictionary and sequenced into the 14 frequency bands based on range and frequency figures from the spoken section of the BNC. Beglar (2010) carried out a Rasch validation study on the VST on 178 Japanese EFL learners and 19 native speakers and found that the examinees’ scores generally decreased towards the lower-frequency bands (i.e. highest scores on first 1,000 band; lowest score on fourteenth 1,000 band. The Rasch model was able to account for 86% of the total variation in the test scores, and the test items generally had good technical characteristics. The reliability figures were very high (.96–.98). Although this is only an initial validation analysis, the results are promising, and give no indication why the test should not be used.

As opposed to the VLT, which produces a profile of knowledge at various frequency levels, the VST is intended as a test of overall vocabulary size. It should have value in measuring learners’ progress in vocabulary learning. The most frequent 14,000 words of English along with proper nouns account for over 99% of the running words in written and spoken text (Nation, 2006). Although adult native speakers’ vocabularies are much larger than 14,000 words, these 14,000 words include all the most important words. Initial studies using the test indicate that undergraduate nonnative speak-ers successfully coping with study at an English-speaking university have a vocabulary around 5,000–6,000 word families. Nonnative-speaking PhD students have around a 9,000 word family vocabulary.

Checklist tests from Meara and colleagues

A number of checklist tests have been developed by Paul Meara and his col-leagues. Checklist tests utilize the simplest format of any vocabulary test, where examinees read lists of lexical items in isolation and simply indicate whether they think they know the items or not. For this reason, they are also called Yes/No tests, and probably should be considered meaning-recall items, even though the meaning does not have to be demonstrated. The



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

http://www.lextutor.ca/tests


basic format will look something like the following extract from the Yes/No test, an interactive checklist test created by Paul Meara (1992), placed into web format by Tom Cobb, and available on the Lextutor website <http://www.lextutor.ca>.

Test 1, Level 1:

A number of points can be made about the checklist test format.

The test is easy to take. The examinees simply decide whether they think ●

they know an item or not, and either ‘check’ (✓) known words on a paper-and-pencil version, or click on the box in a computerized version. This sim-plicity means that the items are quick to take, and so many more items can be included on a test compared to more time-intensive formats. This makes it possible to have relatively higher sample rates with checklist tests.The test rubrics usually ask examinees to judge whether they ‘know’ the ●

lexical items. This means that examinee variability can have an effect on participants’ scores. If examinees are conservative, and only check items they are completely sure of, their scores will be relatively lower than examinees who are less rigorous and check items they have some sense they might know. In fact the true underlying knowledge may be the same, but the test results can be different based on the examinees’ relative judge-ment behavior. In addition to this ‘degree of confidence’ issue, examinees can differ in how they understand the notion of ‘know an item’. Most will probably take this to mean that they know the form-meaning link, but some may think in terms of being able to recognize the item when listen-ing or reading, or being able to produce the item in their speech and/or writing. To avoid these problems, the test rubrics should spell out more precisely what the criteria of ‘knowing’ are. For example, the rubrics could specify that examinees should check any items for which they know at least one meaning. Alternatively, a can-do approach can be taken: e.g. check any item which you are confident that you can use in your own writing without using a dictionary.A checklist test has no direct demonstration of knowledge, and there is ●

always the chance that examinees will overestimate their vocabulary knowledge, i.e. check items that they do not in fact know. This connects with the above point that examinees may be relatively more or less care-ful in checking words as known. This is usually controlled for by adding plausible nonwords to the test to see if examinees check them. The ratio is usually around 25–33% of the total items. There is no set number or percentage of checked nonwords which invalidate a test, but if more than

1 obey 2 thirsty 3 nonagrate 4 expect 5 large6 accident 7 common 8 shine 9 sadly 10 balfour



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25




a few are selected, it raises serious doubts about the real items which are checked. For example, on the Yes/No test illustrated above, there are 40 real words and 20 nonwords. If only four nonwords are checked out of the 20 (20%), this suggests that 20% of the real words might also be at risk, i.e. eight. Since the 40 real words on the test represent 1,000 words in the total population (the first 1,000 word band), each of the target words represents 25 words. Thus selecting four nonwords indicates that the esti-mate of total vocabulary size might be overestimated by 200 words. Thus even a few nonwords checked is a potential problem.

There are two approaches to dealing with this problem. The first is to set a maximum of nonwords, over which the data is discarded as unreliable. Schmitt, Jiang, and Grabe (in press) took this approach and deleted all par-ticipants who checked more than three nonwords out of a total of 30 (i.e. 3/30 (10%) maximum nonwords allowed to be checked). The other approach is to use some formula to adjust the test score downwards according to the number of nonwords checked. The problem is deciding which adjustment formula to use, as it is still unclear how well the various adjustment for-mulas work (Beeckmans, Eyckmans, Jansens, Dufranne, and Vande Velde, 2001; Huibregtse, Admiraal, and Meara, 2002; Mochida and Harrington, 2006). An example of a simple formula is Anderson and Freebody’s (1983):

True h =h f

1 f

−−

This formula follows from Signal Detection Theory, which compares hits (appropriate responses correctly selected) and false alarms (inappropriate responses incorrectly selected). In checklist testing, this translates to:

h = hit rate (real words selected as known)f = false alarm rate (nonwords selected as known)

A more complex formula (Index of Signal Detection) which attempts to correct for sophisticated guessing and response style was developed by Huibregtse et al. (2002):

Ih( f) (h f) ( + h f)

h( f) (h f) ( SDT = 1

4 1 2 1

4 1 1− − − − −

− − − ++ h f)−

There is even some question whether such adjustments are necessary for all types of participants. Shillaw (1996) found that his Japanese university EFL subjects were careful enough that a Rasch analysis of checklist results using only the real words on the test was not substantially different from results



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


including both the real words and the nonwords. Thus Shillaw concludes that ‘on these [checklist] tests and for these students, the presence of non-words had little effect on their test performance’ (p. 7).

Overall, it is still unclear which nonword adjustment method is best to cal-culate final checklist test scores, as the various adjustment formulas seem to lead to very similar results (Huibregtse et al., 2002; Mochida and Harrington, 2006). However, unless participants are predisposed to answer very care-fully, it will be necessary to use some adjustment formula, or to delete tests with too many nonwords chosen, in order to control for overestimation.

In addition to the Yes/No test on Lextutor, the following two checklist tests are available on Paul Meara’s lognostics website <http://www.lognostics.co.uk>.

X_LexThis computerized checklist list measures words up to the 5,000 level, and provides a profile of vocabulary known at the first five 1,000 frequency bands, as well as an overall vocabulary size estimate. At the time of writing, it was at Version 2.05.

Y_LexY_Lex is the advanced companion test to the X_Lex, and is aimed at more advanced speakers. It measures words in the 6,000–10,000 frequency range. Like X_Lex, it provides an overall vocabulary size estimate and a profile of vocabulary knowledge, but in this case with the 6,000, 7,000, 8,000, 9,000, and 10,000 frequency bands. It was also at Version 2.05 when this book was written.

Computer Adaptive Test of Size and Strength (CATSS)

Laufer and Goldstein (2004) developed a computerized test of vocabulary knowledge called CATSS. It uses four different item formats to give estimates of both vocabulary size and depth of knowledge, in the sense that an indication is given of how ‘strong’ the form-meaning link is. Several aspects of the test have already been discussed in Section 2.8, but an additional point is worth making. The test has the advantage of being adaptive in two ways. First, if an examinee does well on the early high-frequency words, the computer pro-gram quickly advances the test to the next frequency band. This repeats until the examinee starts missing a number of the words in a band. The computer can then concentrate on words at around that frequency level to get a more accurate picture of the examinee’s vocabulary size. This avoids wasting time on words which are far too easy or too difficult for the examinee, and allows a far better sampling at the level where there is uncertainty whether the words are known or not. This adaptiveness has a great advantage over static tests, where either the test administrators must guess the frequency levels to give to the examinees, or the examinees must work their way through the whole



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25




test in a lockstep fashion. CATSS is also adaptive in terms of the four levels of form-meaning link ‘strength’. If an examinee knows a higher level (e.g. form recall), then the easier ones are not tested (e.g. form recognition).

5.2.4 Recall/productive vocabulary size measures

The Productive Vocabulary Levels Test (PVLT)

Laufer and Nation (1995, 1999) used the words and frequency bandings from the form-recognition version of the VLT to create a form-recall ver-sion. The item format is a defining sentence context with a blank for the target word which examinees fill in. In order to disambiguate between the possible synonyms which could be inserted into the blank, enough initial letters are given at the beginning of the blank to hopefully limit the possible answers down to the target word.

1. Every working person must pay income t———.2. The differences were so sl——— that they went unnoticed.3. There are a doz——— eggs in the basket.4. The telegram was deli——— two hours after it had been sent.5. The pirates buried the trea——— on a desert island.6. The afflu——— of the western world contrasts with the poverty in other

parts.7. Farmers are introducing innova——— that increases the productivity per

worker.

A number of points can be made about this format. One of the most notice-able is that some of the target words have only one letter to disambiguate them, while others have up to six. What effect this variation has on the relative difficulty of the various target words is unclear. It may make little difference, but given the potential difficulty of learning form (Sections 1.l.8 and 2.1), one would expect that it might have a considerable effect. Another issue is the ‘transparency’ of the answers. In Example 1, the very strong collocation income tax serves to make the answer rather obvious in this sen-tence context (if it can be assumed that the examinee has intuitions about this type of more frequent collocation; see Durrant and Schmitt (2009) in Section 3.9). The same would appear to be true for Examples 3 (dozen eggs), 4 (telegram delivered), 5 (pirates buried the treasure). Examples 6 and 7 have associations which may help in answering the items (affluence – poverty; innovation – productivity). However, Example 2 lacks such a strong colloca-tion, or for that matter, any obvious schema from which to fill in the gap. For this reason, it might be expected that this item would be more difficult than the others. In addition, it only has two given letters, which may make it more difficult to identify the target word. Thus, the items on this test vary in both their formal characteristics (number of letters), and the defining power of the context sentences.



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


These issues may or may not be problematic, and the only way to know is to carry out a careful validation study to ensure that individual items and test overall are working as desired. Laufer and Nation (1999) carried out a small validation study and found some initial positive evidence. The different levels produced scores in an expected ‘stair-step’ profile, with higher-frequency levels being known better than lower-frequency levels. (Note that the University Word List (UWL)2 is not frequency-based, and so cannot be considered as part of this evidence, but is still shown for interest.) Furthermore, examinees at higher-grade levels (and presumably higher L2 English proficiencies) scored better than lower-grade examinees (Table 5.2). Thus, the levels as whole appear to behave as one would expect.

However, there was no inquiry into the behavior of the individual items within the frequency levels. Thus, we have little idea of the effects of the differing numbers of prompt letters and differing contextual defining power. There is also the question of what the test is measuring. I describe it above as a form-recall test, but this is not totally accurate, as some of the form is already given in the items, and in some cases, a great deal. Laufer and Nation (1999) describe the measure as a test of active vocabulary, but it is not entirely clear how this is to be interpreted (words which are available for productive use in writing, or only words which can be produced when prompted). In an earlier study, Laufer and Nation (1995) found some moder-ate correlations between the PVLT and the Lexical Frequency Profile (LFP – a frequency-based measure of learner writing; see below). This suggests that there is some relationship between the scores on the PVLT and participants’ ability to produce vocabulary in their writing.

Unfortunately, this suggestion is not straightforward. The LFP only meas-ures at the first 1,000, second 1,000, UWL, and Not in Lists (all other words not on the first three lists). This does not map well with the frequency band breakdowns in the PVLT, and so the correlations between the two measures are not direct comparisons. In another study, Laufer (1998) found no cor-relations between the 2,000+ level of the LFP and the PVLT, which also potentially raises doubts about the ability of PVLT to measure the produc-tive ability to use vocabulary. This leads Read (2000) to wonder whether the

Table 5.2 Scores on the Productive Vocabulary Levels Test (max = 18)

2,000 3,000 UWL 5,000 10,000

10th grade 11.8 6.3 2.6 1.0 0.0

11th grade 15.0 9.3 5.3 3.9 0.0

12th grade 16.2 10.8 7.4 4.7 0.9

University 17.0 14.9 12.6 7.4 3.8

(Adapted from Laufer and Nation, 1999: 39)



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


PVLT might be better considered an alternative way to measure receptive vocabulary knowledge rather than as a measure of productive vocabulary.

All of this is not to say that the PVLT is problematic; it is simply point-ing out that there is not enough evidence to know its true value. As with all vocabulary tests, the confidence we can place on interpretations drawn from this test is directly reliant on the rigor and comprehensiveness of the validation argument. I feel that a direct validation procedure, as suggested in Section 5.1.3, is an essential component of building such an argument. To further explore the PVLT, I would have examinees take the test, and then have them try to use the target words in a writing context as part of an inter-view process. Their ability or inability to do this would be very insightful into whether the PVLT is actually a productive measure as claimed. I would also run an item analysis, and, controlling for frequency, check whether the context/number of letters given has an impact on the difficulty of the indi-vidual test items. This fuller validation analysis would surely go some way towards understanding how the PVLT should be interpreted, and the degree to which those interpretations can be relied upon.

The PVLT has been promoted as an active/productive test, but accord-ing to my proposed terminology (Section 2.8), it is closest to a form-recall test which assesses the form-meaning link. Further, it is not productive in the sense that it does not require examinees to produce the lexical items in the course of their spoken or written output. Similarly, in Read’s (2000) terms, it is selective test, in that target items were preselected by the test creators. This preselection is necessary if the test developer wishes to create tests with particular lexical items. However, it would be extremely useful, and more ecologically valid, if a means were available to measure all of the lexical items in an examinee’s output, i.e. a comprehensive measure in Read’s terms (a measure which takes account of the whole vocabulary content of the examinees’ response (writing/speaking tasks)). There have been several approaches to creating comprehensive measures, with some of the major efforts outlined below.

Frequency-based comprehensive methods

One of the major methods is to classify the lexical output according to fre-quency. Several measures have taken somewhat different approaches using frequency.

Lexical Frequency Profile (LFP)One of the best known frequency-based measures is the Lexical Frequency Profile. It utilizes the VocabProfile software developed by Paul Nation, Alex Heatley, and Averil Coxhead from the Victoria University of Wellington, and is available on Nation’s website. VocabProfile breaks the vocabulary of inputted language into four categories: first 1,000 frequency band (1K),



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


second 1,000 frequency band (2K), words in the AWL (the UWL was used in early versions), and all remaining vocabulary not in any of these three categories (Off-list/2K+). The idea is that less proficient learners will produce texts mostly made up of the highest-frequency vocabulary (first 1,000), and very little of the lower-frequency bands and the AWL. Conversely, more advanced learners would have a larger vocabulary and so produce more of this lower-frequency vocabulary. Laufer and Nation (1995) trialled the measure with 22 mixed L1 low intermediate ESL learners in New Zealand, 20 Israeli first-year first-semester university students, and 23 Israeli first-year second-semester university students, which the authors believe to be three clearly-distinct proficiency levels. The means of these three groups at each LFP frequency level are shown below:

These results illustrate that all of the learners produce a typical stair-step profile, with the vast majority of words produced coming from the first 1,000 frequency band. Moreover, the more advanced learners do use increasingly higher percentages of lower-frequency vocabulary. This is evi-dence that the LFP is tapping into the ability to produce compositions with a richer vocabulary. However, this conclusion must be tempered somewhat by several observations. First, the AWL is not solely frequency-based, and so one cannot view the profile in a sequential order, with Not in List words being considered less frequent than AWL words. In fact, academic vocab-ulary varies widely in frequency, with some words like major being very frequent (426th most frequent) and others being much rarer (reluctance 5,455 th) (figures according to Adam Kilgarriff’s BNC frequency counts, see Section 6.4). It is probably best to interpret the LFP profile as first 1,000, second, 1,000, and then everything else, broken into two concurrent cat-egories: academic support vocabulary (AWL) and general English vocabu-lary (Not on Lists).3

With this in mind, the LFP only makes frequency distinctions at three levels first 1,000, second 1,000, and others (AWL or Not on Lists). In prac-tice this may be too crude to be all that informative. Beginning learn-ers of English will know little vocabulary beyond the 2,000 level, and so the LFP will only be able to give a rather unsophisticated distinction of

Lists 1st 1,000 2nd 1,000 AWL Not in

Low intermediate 87.0a 7.1 3.7 3.1University 1 semester 79.6 6.8 8.0 6.1University 2 semesters 75.5 6.1 9.1 8.1

a The figures are averages of scores from two compositions, and so the total percentages do not add up to 100% due to rounding.



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


the percentage of vocabulary produced at the first 1,000 versus the sec-ond 1,000 levels. More advanced learners will produce vocabulary at the 2,000+ level, but the LFP lumps all of this together, and so does not make any distinctions at this level, with the exception of the academic/non- academic dichotomy. Another problem is that the vocabulary produced by even very advanced learners (or native speakers for that matter) will still be largely made up of the first 1,000 words. For example, 71.4% of Coxhead’s Academic Corpus consisted of the first 1,000 headwords of the GSL, with the second 1,000 making up 4.7%. Thus the percentages of 2000+ words are always going to be relatively small in comparison. It can be considered problematic to base the major part of the analysis on a relatively small per-centage of the lexical output.

Another problem is that the LFP only indicates whether lower-frequency vocabulary appears in compositions; it can give no information about how well it is used. The words could be used inappropriately, or even totally incorrectly, and the measure would still indicate their usage. Thus the LFP procedure itself gives little indication of the degree of mastery of the pro-ductive lexical items. To avoid this problem, the texts need to be manually edited beforehand and words used incorrectly deleted because they cannot be considered as known. While this works, the necessity for previous manual analysis (i.e. correction) by a proficient language user before the data can be analysed by the software must be considered a weakness. Also, I think it is preferable to have measures that address all learner output, whether appro-priately used or not, in order to come up with an estimate of the quality of vocabulary knowledge/usage.

But in my own experience, the main problem is that I have found the LFP too blunt a measure to consistently indicate the lexical differences between compositions with more- and less-advanced use of vocabulary, as judged by rater judgement or intuition.

It can also have problems showing vocabulary improvement. Horst and Collins (2006) looked at narrative texts produced by 210 beginner French learners of English over four 100-hour intervals of intensive lan-guage instruction. The learners made substantial progress in language proficiency during this time, but an LFP analysis of the longitudinal compositions did not reflect this improvement. However, a more detailed analysis did show lexical improvement in terms of using fewer French cognates, a greater variety of frequent words, and more morphologically developed forms. In other words, there was clearly improvement in lexi-cal production, but just not of the frequency-based type which the LFP could discern.

So although I feel the frequency-based methodology behind the LFP is worth pursuing, I have to wonder just how informative this early operation-alization can be. For more discussion on the merits of the LFP, see Meara’s (2005) critique, Laufer’s rebuttal (2005b), and Section 2.8.



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


BNC 20,000 ProfileIt may be possible that a more fine-grained frequency analysis might have better measurement characteristics. The BNC-20 version of VocabProfile pro-vides such an analysis. It is based on the work of Paul Nation and Tom Cobb, and was launched in 2007 on the Lextutor website. The program gives a fre-quency breakdown of vocabulary inputted into the website, but has several advantages over the older version of VocabProfile upon which the LFP was based. First, BNC-20 gives frequency bandings at each 1,000 level up to and including the 20,000 level. This is a much fuller frequency description of the vocabulary, although the key additions are the 3,000–10,000 bandings, as the amount of vocabulary falling into the 10,000+ bands is relatively minor, sometimes making up only a handful of words. The second advantage is that there are no academic vocabulary categories in the BNC-20, therefore all of the banding categories are solely frequency-based, and so directly comparable. Third, the frequency information comes from the BNC, and so is probably a better representation of current English than the GSL informa-tion used by VocabProfile. Another advantage is that BNC-20 gives a wealth of frequency information: types, tokens, percentage of coverage of each band, cumulative percentage of coverage, and word families. The output is color-coded (as is output from the ‘classic’ VocabProfile also available on the website), with words in each frequency band indicated by a different color. This holds true for both a reproduction of the inputted text with each word color-coded for frequency, and for lists of words broken down per frequency level in three different ways: tokens, types, families. Table 5.3 gives an illus-tration of the summary frequency table, based on a little over 10,000 words taken from a draft version of this unit on measuring vocabulary.

It would seem that the BNC-20 is a much improved tool from which to draw an LFP-like frequency analysis of written compositions. However, it only came on line less than a year before the writing of this book, and so little research based upon it had reached publication. Nevertheless, while it still cannot address the ‘quality of use’ issue, it should prove to be a very valuable tool for describing the frequency distribution of written output. The BNC-20 should also be of use for analyzing spoken discourse, as the K1 and K2 levels are essentially based on the spoken component of the BNC.

P_LexAnother measure using the frequency-band approach is P_Lex (Meara, accessed 2008). Similarly to the LFP, it measures the lexical complexity of texts in terms of the amount of vocabulary beyond the 2,000 level. It divides the text into ten-word segments and determines how many 2,000+ words are in them. It then graphs the number of 2,000+ words per segment for each segment. This creates a curve, such as the one illustrated in Figure 5.3. In this case, a proportion of .4 (i.e. 40%) of the segments contained no (0) 2000+ words, .4 contained one 2,000+ word, .2 contained two 2,000+



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


words, and no segment contained more than two 2,000+ words. This curve is then fit to already-established theoretical curves. Each of the theoreti-cal curves has a lambda value (λ), and so we can assign the lambda value of the closest fitting curve to the data inputted into the P_Lex program. This has the advantage of reducing a complex frequency profile to a single parameter – lambda.4 The lambda values usually range from about .5 to about 4.5, although higher and lower values are possible. The lambda values get more reliable with longer texts, but even relatively short texts seem to produce lambda values that are workable.

Table 5.3 BNC-20 frequency analysis

Freq. level Families Types Tokens Coverage Cum. %

K1 Words 509 901 8,065 79.65 79.65

K2 Words 217 337 881 8.70 88.35

K3 Words 74 92 236 2.33 90.68

K4 Words 61 88 313 3.09 93.77

K5 Words 38 47 86 0.85 94.62

K6 Words 16 18 33 0.33 94.95

K7 Words 18 20 47 0.46 95.41

K8 Words 18 18 23 0.23 95.64

K9 Words 7 11 21 0.21 95.85

K10 Words 8 9 11 0.11 95.96

K11 Words 6 6 6 0.06 96.02

K12 Words 7 7 15 0.15 96.17

K13 Words 4 4 7 0.07 96.24

K14 Words 12 14 65 0.64 96.88

K15 Words 0.00 96.88

K16 Words 0.00 96.88

K17 Words 1 1 1 0.01 96.89

K18 Words 1 1 1 0.01 96.90

K19 Words 1 1 1 0.01 96.91

K20 Words 1 2 4 0.04 96.95

Off-list ? 152 310 3.06 100.00

Total 999+? 1,728 10,126 100 100

Pertaining to on-list only Words in text (tokens): 10,126 Tokens: 9,816

Different words: 1,728 Types: 1,576

Type-token ratio: 0.17 Families: 999

Tokens per type: 5.86 Tokens per family: 9.83 Types per family: 1.58

(Lextutor, October 2008)



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


It should be possible to match the theoretical curves with the vocabulary sizes of norming subjects. That is, norming participants can be asked to write compositions which are analyzed and lambda values derived. These values can then be compared to the vocabulary sizes of the participants as established by vocabulary tests. Thus, with adequate norming, it may be possible to accurately estimate productive vocabulary size from the lambda values. This is an ingenious method of estimating total vocabulary size from a single sample of participant written output. Furthermore, because it samples text in ten-word segments, it seems to work even with shorter texts, which is an advantage when assessing lower-proficiency learners, who typically write shorter compositions. (This method could be carried out for spoken output as well, but would almost certainly require new calculation of spoken curves, as spoken and written lexical discourse differs substan-tially in terms of frequency of lexical content.) However, at the moment, the P_Lex manual states that there is not yet enough good normative data to allow this type of vocabulary size extrapolation (Meara, accessed 2008). For more information on this tool, and general approach, see Bell (2002), Meara (accessed 2008), Meara and Bell (2001), and Miralpeix (2007, 2008).

V_SizeA somewhat different approach to generating estimates of productive vocabulary size from relatively small amounts of writing has been devel-oped by Paul Meara and Imma Miralpeix (accessed 2008a). Their V_Size soft-ware program creates lexical frequency profiles from inputted texts, and then estimates what these profiles tell us about the size of the productive vocabulary of the people who produced those texts. Like P_Lex, the input-ted text is matched against theoretical profiles, but unlike P_Lex, V_Size uses

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0 1 2 3 4 5 6 7 8 9 10

No. of difficult words per segment

P(n

wor

ds p

er s

egm

ent)

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0 1 2 3 4 5 6 7 8 9 10

No. of difficult words per segmentP

(n w

ords

per

seg

men

t)

Figure 5.3 P_Lex output

(Meara, accessed 2008: 1).



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


calculations based around Zipf’s Law distributions of vocabulary (see Section 2.5.2) to form the theoretical profiles.

Zipf argued that language was one of very many things where a sim-ple relationship could be found between the rank order of an event and the size of the event. For language, Zipf argued that the pattern of fre-quencies exhibited by words followed this law, and he claimed that there was a straightforward relationship between the number of times a word occurred in a corpus and its rank order in a frequency list generated from the corpus. In simple terms, Zipf claimed that some words are more fre-quent than others, and you could tell roughly how many times a word would occur in a large corpus by looking at its rank order. (Meara and Miralpeix, accessed 2008a: 1)

Also unlike P_Lex, there are some initial vocabulary size norms for the various theoretical Zipf curves. These are based on Miralpeix’s PhD thesis (Miralpeix, 2008) which compared V_Size estimates for groups of Spanish EFL learners with differing amounts of exposure to L2 classes, and different start-ing ages. Also, Meara (in preparation) carried out a more technical analysis using longer texts than Miralpeix, and found that the norms appeared to provide reasonable estimates of vocabulary size. However, the norms should probably still be seen as tentative until more data is collected, and anyone using V_Size with non-Spanish learners should establish norms for the L1(s) of their participants, or at least confirm that the V_Size size estimates have sensible correspondences with any other evidence of lexical size available.

To use V_Size, a text (.txt) version of the learner output is selected and the program initially analyzes it according to five frequency categories: A (500), B (1,000), C (1,500), D (2,000), and E (2,000+). The researcher is able to reclassify any words which are believed to be misclassified, including numerals and proper names. The program then matches the frequency pro-files to the theoretical profiles in its memory, and produces an estimate of vocabulary size. This is illustrated in a screen shot from the V_Size Manual (Figure 5.4), which shows the text words with their classification, a table which shows the percentages of vocabulary at each frequency band, a graph which shows the profile and the best-matching theoretical counterpart, and a vocabulary size estimate.

To further illustrate the program, I ran a passage from a draft version of this productive vocabulary measurement section, using a word form analysis based on BNC frequency data (the bnc~strict database shown in the screenshot), with the following results:

BAND A B C D E Err Size estimate

data 55 7 5 3 30 19,400model 62 7 4 3 24 86



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


Although I have tried to write this book in an accessible style, I am glad that the vocabulary size estimate puts me in the range of educated native speakers! (See Section 1.1.2.)

Type-token-based methods (lexical diversity)

Another method of quantifying participant output is by measuring its lexi-cal diversity/variation. That is, determining the variation in the number of individual types produced compared to the total number of tokens (i.e. establishing the type-token ratio). A relatively greater number of word types means that a wider range of vocabulary has been demonstrated, and the plausible assumption is that this reflects a larger, richer lexicon. The most

Figure 5.4 V_Size screenshot

(Meara and Miralpeix, accessed 2008a: 6).



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


basic way of quantifying the relationship between types and token is com-puting the type-token ratio. The formula for this is:

number of different types

total number of tokens 100×

This gives a simple idea of lexical diversity, but the problem is that it is strongly affected by the length of the text. As a text gets longer, there is less and less chance for new word types to appear, as a greater percentage of the frequent types have already appeared before. Thus, longer texts tend to have increasingly lower type-token ratios as an artefact of text length alone. This means that a basic type-token ratio cannot be used unless the text is con-trolled for length. As different participants tend to write different lengths of text (e.g. more proficient learners tend to write longer texts), a researcher would have to cut all of the texts to the length of the shortest one received, which has the effect of wasting a lot of valuable data from the longer texts.

Standardized type-token ratioOne way around this problem is to use a standardized type-token ratio. A pro-gram (such as WordSmith Tools) can divide a participant’s text into a number of 100-word samples (in some cases, every possible 100 sample), and then compute the type-token ratios for all of these. The average of all these com-putations is the standardized type-token ratio. This method avoids the prob-lem of length attenuation.

VocdAnother way to avoid the length problem is by using a curve-fitting method. David Malvern and Brian Richards have pioneered this approach (e.g. Malvern, Richards, Chipere, and Durán, 2004), and have created a statistic called vocd (i.e. vocabulary d statistic) and the software to compute it. It has now become the most accepted way of doing type-token analyses, in both L1 and L2 contexts. The software is available as part of the CHILDES L1 child database <http://childes.psy.cmu.edu/manuals/vocd.doc>, see also Section 6.2), but requires texts to be formatted in the CHILDES format. A more user-friendly version called D_Tools (Meara and Miralpeix, accessed 2008b) is available on the _lognostics website, which accepts text files (.txt).

The process behind vocd takes several steps. The program generates 100 samples of 35 randomly-selected words from a text, and calculates a type-token ratio for each of these. These 100 means are then averaged to produce a composite mean ratio for all 100 samples. The program goes on to do the same thing for samples of 36 randomly-selected words, 37, 38, ... all the way to samples of 50 words. The end result is a list of 16 means for the 35–50 word samples. These means form a curve, and it is compared to a number



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

http://childes.psy.cmu.edu/manuals/vocd.doc


of theoretical curves generated by the D formula. The value of D which produces the best matching curve is assigned to the source text. D typically varies between 0 and around 50, with lower values indicating more repeti-tion and a vocabulary which is not lexically rich, and vice versa for higher values.

Although vocd is a relatively sophisticated form of type-token analysis, and for the most part counteracts the length effect problem (although see McCarthy and Jarvis, 2007), it still has the same ‘quality of use’ limitations of other type-token analyses. Meara and Miralpeix illustrate this clearly with the following three examples:

1. The man saw the woman.2. The bishop observed the actress.3. The prelate glimpsed the wench.

Each of these sentences will produce the same type-token ratio, even though they clearly demonstrate different degrees of vocabulary usage. This is not only the differences of frequency; Meara and Miralpeix note that bishops observing actresses would be highly marked in English (perhaps indicating some form of inappropriate behavior), and so production of this sentence would require considerable command of culture and background knowl-edge beyond the mere denotative meaning of the individual words. For this reason, they suggest that type-token analyses are probably best used with low-level learners who produce texts with lots of repetition and high-fre-quency vocabulary (i.e. type-token ratios of 10–30). More advanced users tend to produce texts with higher type-token ratios which are ‘not easy to distinguish from each other’, which led Meara and Miralpeix to ques-tion whether ‘D has good measurement properties at higher levels’ (p. 6). Similarly, Jarvis (2002) found that D had trouble discriminating between groups of speakers with obvious differences in vocabulary (native speakers, Swedish-speaking EFL learners, Finnish-speaking EFL learning at Grades 5, 7, and 9). This point about lack of sensitivity reflects my own experience with these measures, where they often have trouble differentiating between compositions with clearly different degrees of lexical quality, as judged by intuition or raters. Ultimately, I feel that the type-token approach (in all its guises) can only offer very limited information about lexical output, and certainly nothing about the appropriateness of use of that output. It should therefore only be used to supplement other lexical measures, and rarely (if at all) as the sole means of measurement. For more discussion of vocd, see Meara and Miralpeix (accessed 2008b), Read (2005), Richards and Malvern (2007), and van Hout and Vermeer (2007).

It can also be pointed out that all of the comprehensive analyses in this section focus on individual word forms. They cannot cope with the multi-word units which are so common in language (Chapter 3). For example, the



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


term concentration camp can be analyzed as two individual words, but the sum of these two meanings will never add up to composite meaning of the dastardly forceful internment that has been used in so many wars. To what-ever extent the analysis methodologies discussed in this section can inform about individual words, they are likely to miss a large portion of the lexical behavior and meaning that is tied into formulaic language.

These ideas about the limitations of type-token methods of analysis, and indeed, all other vocabulary measurement, highlight the necessity for good validation too briefly discussed in Section 5.1.3. Quote 5.5 reminds us that researchers need to provide rigorous of tests of their measures, and indicate their limitations as well as their advantages when presenting them to the wider field to use. Although the authors are discussing measures of lexical diversity, the same point is appropriate for all types of vocabulary measure.

A comprehensive measure based on other criteria

Coh-MetrixAll of the above comprehensive measures rely on statistical counts of fre-quency or types and tokens. One other program gives a fuller description of the language in source texts by counting a wider range of linguistic charac-teristics: Coh-Metrix developed by McNamara, Louwerse, Cai, and Graesser <http://cohmetrix.memphis.edu/cohmetrixpr/index.html>. Coh-Metrix is a multi-component computerized analysis program that produces 60 indices of the linguistic and discourse representations of a text. Of these, a number are focused on the words in a text, and the characteristics of those words. The ones most obviously applicable to vocabulary study are listed below:

Number of words ●

Average syllables per word ●

Average words per sentence ●

Raw frequency, mean for content words (0–1,000,000) ●

Log frequency, mean for content words (0–6) ●

Raw frequency, minimum in sentence for content words (0–1,000,000) ●

Log frequency, minimum in sentence for content words (0–6) ●

Quote 5.5 Malvern, Richards, Chipere, and Durán on the need for rigorous validation of vocabulary measures

[Validation issues] matter. Much of the research based on flawed measures has significant implications for theory, practice, and policy. It is important therefore that methodological issues of measuring vocabulary richness are understood and that these confusions are cleared up.

(2004: 180)



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

http://cohmetrix.memphis.edu/cohmetrixpr/index.html


Type-token ratio for all content words ●

Proportion of content words that overlap between adjacent sentences ●

Latent semantic analysis, sentence to sentence ●

Latent semantic analysis, paragraph to paragraph ●

Concreteness, mean for content words ●

Concreteness, minimum in sentence for content words ●

Mean hypernym values of nouns ●

Mean hypernym values of verbs ●

Flesch reading ease score (0–100) ●

Flesch-Kincaid grade level (0–12) ●

Just as the MRC Psycholinguistic Database (Coltheart, 1981) (see Section 4.5) can give detailed information about individual lexical items, Coh-Metrix can give enhanced information about the vocabulary in texts. It is a relatively recent tool, but is worth reviewing by any vocabulary researcher. It is avail-able on-line after free registration. For more information, see the Coh-Metrix website and Graesser, McNamara, Louwerse, and Cai (2004).

5.3 Measuring the quality (depth) of vocabulary knowledge

The measures in the previous section focus on how many lexical items are known. This section will look at measures which attempt to quantify how well items are known. However, the two notions are not discrete. The observant reader may have noticed that all size measures have a (sometimes implicit) criterion of minimum knowledge for a lexical item to be counted as ‘known’. In the Vocabulary Levels Test, it is the ability to recognize the word forms which match the definitions given. In the frequency-based and type-token-based analyses, it is the fact that a lexical item was produced (however inaccurately or inappropriately) in a person’s written or spoken output. Thus, it can be said that all size measures are also depth measures in the sense that some quality of knowledge, no matter how minimal, must be operationalized as the criterion of sufficient knowledge. This size/depth con-nection is explicitly made in the CATSS test (Laufer and Goldstein, 2004; see Section 2.8), which deliberately uses four different criteria (i.e. four different types of test item) in order to give an indication of the depth of knowledge of the form-meaning link.

Read (2000) suggests that there are two main ways to conceptualize the quality of knowledge of individual vocabulary items. (Let us disregard issues of automaticity and organization for the moment). The first is describing the incremental acquisition of a word along a continuum of mastery rang-ing from ‘Do not know at All’ at the beginning end, all the way to ‘Full Mastery of a Lexical Item in All Contexts of Use’ at the advanced end. Read calls this the developmental approach. The second approach is specifying the various types of word knowledge one can have about lexical items. This has



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


usually been termed the dimensions or components approach. We will look at each in turn.

5.3.1 Developmental approach

Developmental scales

It is undeniable that vocabulary is learned incrementally, and so using a developmental scale to model this would appear sensible. However, the problem lies in operationalizing the developmental process into a work-able scale. In fact, we have little idea about how vocabulary development advances, so creating a valid scale is rather speculative at the moment. Vocabulary acquisition theory is not advanced enough to guide the creation of a principled developmental scale, and previous research has not really been that helpful either. The result is that current developmental scales are often based on pedagogical rationales. This has the advantage of their being useful in learning contexts, but they have a number of problems resulting from their atheoretical development.

First, for a scale to exist, there must be rational beginning and ending points. Having absolutely no knowledge of a lexical item seems a clear-cut beginning, but even this is not straightforward. If a person knows the spell-ing, pronunciation, and morphological rules of a language, then they will already know something about almost any new lexical item they meet. More problematic is the ending point. It must be something like ‘full knowledge of an item’, but how does one quantify this? There is no test imaginable which can verify that a word can be used accurately, appropriately, and flu-ently in every possible context. Thus any beginning and ending points will necessarily be approximations. This is despite the fact that the end points of scales are usually easier to establish than the gradations in the middle.

We then come to the question of how many stages there are in the acquisi-tion process. Where vocabulary learning is slow and gradual, built up over many, many meetings with a lexical item (although big jumps in knowledge can occur from focused, intentional learning), I tend to think that vocabu-lary learning is a continuum, with an uncountable number of small knowl-edge increments. But this is no good for developing a scale; we must have reasonable stages that are identifiable. However, there is currently no princi-pled way of knowing how many stages an acquisition scale should contain. At a minimum, there must be the beginning ‘no knowledge’ stage, the end-ing ‘acceptable mastery’ stage, and one stage in between corresponding to receptive, but not productive, knowledge. But even this is problematic, as words can be known productively and not receptively. For example, I knew the word indict and used it productively in my speech, but did not know it was spelled i-n-d-i-c-t, and so did not recognize it in written discourse. A three-point scale may be the minimum, but there is no way to determine the maximum, or more importantly the appropriate, number of stages. Is it five stages like in the Vocabulary Knowledge Scale (VKS) below, or four stages,



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


as in Schmitt and Zimmerman’s scale? Or will we eventually find there are ten or more stages? In the end, we have to decide what kinds of vocabu-lary knowledge is important, and build the best scale we can around our decision.

While setting the number of steps in a scale is not an insurmountable prob-lem, the question of equal intervals between scale steps may be. Researchers have typically assigned a numerical value to each of the stages, and then proceeded to run statistical analyses on the resulting data. The analyses are almost always inferential (t-tests, ANOVAs, correlations, etc.), which require the use of interval scales, where the distance between the intervals is equiv-alent and consistent. With the scales currently available, this assumption cannot be met. This problem does not disqualify the use of scales, but it does mean that we should be very wary of analyses that use parametric statistics on data derived from these scales. It would seem much more appro-priate to use non-parametric procedures with such data, because they rely only on rank hierarchies (i.e. ordinal data), which such scales should be able to provide. Another possibility would be to illustrate vocabulary knowledge gains graphically by showing pre-post patterns of movement from one scale stage/category to another (e.g. Paribakht and Wesche, 1997: 191).

The Vocabulary Knowledge Scale (VKS)

The best known and most widely-used depth-of-knowledge scale is the Vocabulary Knowledge Scale (see Paribakht and Wesche, 1997, and Wesche and Paribakht, 1996, for the most complete descriptions of the instrument). Although developmental scales have been around for a long time (e.g. Dale’s (1965) scale), Paribakht and Wesche were instrumental in reintroducing this measurement approach in more recent times. The VKS was designed to track the early development of learners’ knowledge of specific words at a given time in an instructional or experimental situation. As such, it was designed to provide a relatively efficient means of demonstrating certain changes in the receptive and initial productive knowledge of specific words result-ing from instructional interventions (e.g. vocabulary exercises) or activities (e.g. reading), and in showing comparative gains resulting from different treatments (Wesche and Paribakht, personal communication). The VKS was designed to capture initial stages in word learning that are amenable to accurate self-report or demonstration through the use of a five-category Elicitation Scale that provides information for scoring using a five-level Scoring Scale. For Categories III–V, there is also a requirement for a demon-stration of knowledge.

I. I don’t remember having seen this word before. II. I have seen this word before, but I don’t know what it means.III. I have seen this word before, and I think it means ———. (synonym or

translation)



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


IV. I know this word. It means ———. (synonym or translation) V. I can use this word in a sentence: ———. (Write a sentence.) (If you do

this section, please also do Section IV.)

The testee judgements and performance data are then evaluated according to the separate Scoring Scale, depending on levels 2–5 on the quality of the synonym, translation, or sentence responses (Figure 5.5).

The VKS was originally developed to measure vocabulary learning in the English language programs at the University of Ottawa (Paribakht and Wesche, 1993, 1997; Wesche and Paribakht, 1996). More recently, Paribakht (2005) and Wesche and Paribakht (2009) used it to seek evidence of reten-tion of new vocabulary knowledge by university ESL students after they attempted to infer the meanings of unknown words. The VKS proved useful in these studies, and seems to have value for its intended purpose of tapping into the early stages of vocabulary learning, rather than more advanced knowledge. Also, it seems to have good reliability (.89), being demonstrated using a test-retest method (Wesche and Paribakht, 1996). In addition, the fact that there is a requirement to demonstrate knowledge may well enhance the care in which the self-reports are made on the instrument.

Self-reportcategories

I 1

2

3

4

5

II

III

IV

V

Possiblescores

Meaning of scores

The word is not familiar at all.

The word is familiarbut its meaning is not known.

A correct synonym ortranslation is given.

The word is used with semanticappropriateness in a sentence.

The word is used with semanticappropriateness and grammaticalaccuracy in a sentence.

Figure 5.5 VKS scoring scale

(Paribakht and Wesche, 1997: 181).



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


However, the VKS, like all developmental scales, suffers from a number of limitations, and these need to be clearly understood before it can be used appropriately. These have been pointed out by Meara (1996c) and Schmitt (2000), and perhaps most fully outlined in Read’s (2000) critique of the VKS. First, as Paribakht and Wesche point out, the VKS is not an appropri-ate instrument to estimate lexical knowledge in general, nor does it provide a precise characterization of the process of learning individual words (see Henriksen, 1999, for a discussion of multiple developmental scales).

The second potential limitation is that the initial two stages of the Elicitation Scale are unverified, but after this, stages require a demonstration of knowledge. There is no obvious way to elicit verification of knowledge at the first two stages, but this may not matter much in practice. Testees should be able to provide an accurate self-assessment at these stages, and there is no reason why they should wish to fake them. Furthermore, the demonstration of knowledge in Stages 3–5 should provide more valuable information than would self-judgement scores alone.

Third, the knowledge constructs addressed between stages are not con-sistent. Categories I–IV of the Elicitation Scale essentially deal with various degrees of knowing the form-meaning link, but Category V jumps to mas-tery strong enough to use the word in a semantically appropriate way in a sentence. This can involve a constellation of lexical knowledge, including collocation, register, derivation (correct word family member), and gram-matical knowledge (noun, verb, etc.). Thus, it seems that the scale is not unidimensional.

Similarly, the intervals between the stages of the Scoring Scale do not seem to be consistent, which echoes the point made above about the inter-vals in developmental scales not necessarily being equidistant, and so inap-propriate for parametric statistics. The different stages of the Scoring Scale do, however, appear to be ordinal in nature, (i.e. representing the progres-sive initial steps in acquiring knowledge of a given lexical item).

The Elicitation Scale also mixes receptive and productive elements in ways that are not necessarily straightforward. Category I involves form- recognition, Categories II–IV elicit meaning-recall, and Category V requires full productive output. Likewise, the amount of contextualization varies among the categories, with Categories I–IV dealing with the lexical item in isolation, and only Category V involving context.

Another potential limitation is that Categories III and IV of the Elicitation Scale require a judgement of degree of mastery: I think I know the meaning versus I know the meaning. Such metalinguistic judgements can be difficult for some learners to make, and many learners are better at judging what they can do (see below). The point about examinee judgement variability raised in the discussion of checklist tests (Section 5.2.3) also pertains here: some examinees will only select Category IV if they are absolutely positive of their meaning knowledge, while other examinees might select it if they



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


barely know the lexical item’s meaning. It is for this reason that testees are asked to respond to Category III if they attempt IV.

A potential practical problem with the Elicitation Scale resides at Category V, where examinees are asked to produce a sentence which illus-trates the meaning of the target lexical item. Unfortunately, as Read notes, examinees all too often write sentences which do not clearly demonstrate knowledge of the item. For example, one of my respondents once produced the following sentence for the target word access: I like the word access. Read cites McNeill’s (1996) finding that Chinese trainee teachers of English in Hong Kong could often produce plausible, and even quite sophisticated, sen-tences without really knowing the target words. So it seems that sentence writing is an uncertain method of eliciting evidence of productive knowl-edge, but in its favor, the VKS has the backup of the meaning definition at Stage IV as an explicit means of dealing with this uncertainty.

However, some of the limitations of the Elicitation Scale might be miti-gated in the process of interpreting the responses according to the Scoring Scale. The various forms of lexical demonstration potentially provide raters with enough information to place the testee responses at the appropriate levels of the Scoring Scale. Nevertheless, this involves a degree of rater sub-jectivity (e.g. ‘Is the synonym/translation “close enough” to show knowledge of meaning?’ ‘Does the sentence clearly show semantic appropriateness?’). Thus the final testee score on the Scoring Scale comes from a potentially complex combination of testee interaction with the Elicitation Scale and rater judgement of the testee responses.

The variability in interpreting the VKS is well-illustrated in the VKS Scoring Scale (Figure 5.5). While Elicitation Scale Categories I and II map directly onto discrete interpretations, all of the other categories have mul-tiple interpretations, with Category V having four possible outcomes. This illustrates both the strength and weakness of the VKS. The strength derives from the knowledge demonstration components of the Elicitation Scale, which should provide more trustworthy information than self-judgement, and the chance to adjust scores where the demonstration result disagrees with the judgement result. In the hands of a researcher who has a good understanding both of the instrument and of the learners being studied, this should lead to more accurate ratings on the Scoring Scale. On the other hand, it is always desirable to have a direct, consistent, and unambiguous relationship between learner output on a test and the scoring interpretation of that output.

Schmitt and Zimmerman scale

An example of a less complex scale is the one developed by Schmitt and Zimmerman (2002), based on the earlier Test of Academic Lexicon scale (Scarcella and Zimmerman, 1998). Schmitt and Zimmerman recognized the limitations of the VKS, and opted for a simpler scale, utilizing a can-do



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


paradigm. Can-do protocols are common in the field of assessment (e.g. the Can-do statements connected with the Common European Framework of Reference (CEFR): <http://www.alte.org/can_do/alte_cando.pdf> and <http://www.cambridgeesol.org/exams/exams-info/cefr.html>, where exam-inees self-evaluate what they can do with their language proficiency, rather than providing metalinguistic judgements of what they know. For many examinees, it is easier to say what they are able to achieve with a language rather than making a statement about how well they know it. This can-do idea is incorporated into the Stages C and D of the scale, which essentially translate into receptive and productive knowledge respectively.

A. I don’t know the word.B. I have heard or seen the word before, but am not sure of the meaning.C. I understand the word when I hear or see it in a sentence, but I don’t

know how to use it in my own speaking or writing.D. I know this word and can use it in my own speaking and writing.

Like the VKS, the scale has advantages and limitations. One advantage might be its apparently transparent descriptions. Rather than making judgements about degrees of metalinguistic knowledge, the C and D stages are written so participants only need to reflect on whether they can under-stand a word, and whether they can use it use it in their speaking and writing. This might make them easier to judge, but as Kirsten Haastrup (per-sonal communication) has pointed out to me, it may not be straightforward to conflate speaking and writing mastery. Testees may well have facility with a lexical item in one mode, but not the other (e.g. can produce a word in compositions when not under time pressure, but cannot produce it in on-line speech).

This would make judgements more difficult, and consequently, the level descriptions may be less transparent than we originally thought. Read (2000) notes that this type of inconsistency might be inevitable when trying to reduce the complexity of vocabulary knowledge down to a single scale, but it still poses problems for interpretation. (Note that the VKS avoids this by being specifically worded for written vocabulary, although Joe (1995, 1998) adapted it for oral use.) Overall, I think the ‘can-do’ approach is valid, but it might be better to specify it for either speaking or writing, but not both in developmental scales.

Another stage which might be considered problematic is Stage B, which covers the somewhat tricky stage of knowledge where the word form is rec-ognizable but has not yet been connected to a meaning. If a form-meaning link is considered the minimum criterion of useful vocabulary knowledge, then it could certainly be argued that Stage B be deleted, leaving a three-stage scale corresponding to No knowledge/Receptive knowledge/Productive knowledge.



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

http://www.alte.org/can_do/alte_cando.pdf

http://www.cambridgeesol.org/exams/exams-info/cefr.html


As with all scales, deciding on the optimal number of acquisition stages is difficult, with little theoretical guidance to rely on. At the moment, it is probably best to select/develop scales which cover the aspects of acquisition which are important to the particular study at hand, as no scale currently describes the whole incremental acquisition process. Even if one could be developed, it may turn out to be too complex for any practical measurement usage.

The Schmitt and Zimmerman scale would seem to have rational begin-ning and ending points. Having no knowledge of a word is the obvious beginning point, and the ability to use a word in one’s speaking and writing would appear to be a reasonable end goal for learners (although the caveat above suggests it might be better to focus on a single mode). Of course, the intended simplicity of the scale prohibits the testing of notions of full semantic and collocational appropriacy, as learners would probably find this extremely high level of mastery extremely difficult to judge. Despite the rational beginning and end points, it is not clear that the scale has inter-val spacings. Certainly the A→B increment intuitively feels smaller than the C→D increment, for instance. However, a three-stage version (A→C→D) may be closer to an interval scale, although it is difficult to think how this could be empirically established. Until a convincing argument can be made for equidistant stages, it is probably best to avoid parametric statistics with this and all scales.

Compared to the VKS, the scale has the limitation of no demonstration of knowledge. This can be addressed in at least three ways. A first approach is to simply build demonstrations of knowledge into the scale. Second, a sample of participants could be interviewed after completing the instru-ment in order to confirm the accuracy of their self-reports (as discussed in Section 4.1). Alternatively, nonwords can be used. Schmitt and Zimmerman realized that their scale, as all self-evaluation measures, suffered from poten-tial learner-over estimation of knowledge. To control for this, they borrowed an idea from checklist test methodology. Along with the AWL words they were testing, they included a number of nonwords, and alerted the partici-pants to this fact in order to encourage them to be careful in their judge-ments. Participants who rated a nonword at Stage C or D were deleted from the data pool. Judgements at Stage B were allowed to remain, as the non-words were purposely English-like, and so the researchers did not consider it unreasonable for learners to believe that they had seen a nonword before but did not know it. Of course, these last two techniques are not confined to the Schmitt and Zimmerman scale, but could be used with other devel-opmental scales as well.

It may seem that I have been quite harsh in my critique of developmen-tal scales. This is because I think it is very important for researchers to be aware of these scales’ limitations. No current scale gives a full account of the incremental path to mastery of a lexical item, and perhaps lexical



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


acquisition is too complex to be so described. I have also reviewed numer-ous journal submissions where the VKS was used inappropriately, without any understanding of its limitations. However, I still feel that develop-mental scales have considerable research potential, but a sustained cycle of research is required to better determine these scales’ psychometric properties, so that they can be consistently interpreted with confidence. Until this happens, researchers need to carefully consider what these scales can and cannot do, and whether they are suitable for their research purposes. If there is any doubt, my personal feeling at the moment is that simpler, more transparent scales are probably more useful than more complex ones.

5.3.2 Dimensions (components) approach

The second approach to measuring the quality of vocabulary knowledge involves specifying some of the types of word knowledge one can have about lexical items, and then quantifying participants’ mastery of those types. Schmitt (1998a) outlines some of the potential advantages of a dimensions/components approach. The first is its possible comprehensive-ness. While measuring knowledge of several types of word knowledge is time consuming and limits the number of lexical items that could be stud-ied, it can produce a very rich description of vocabulary knowledge, and so can be well worth the effort. The dimensions approach can also have a simplifying effect of breaking complex behavior (vocabulary acquisition) into its more manageable components for analysis. Furthermore, analyz-ing the components separately allows the possibility of discerning their relationships. A number of these relationships have long been obvious (e.g. between frequency of occurrence and formality of register; between word class and derivational suffixes), and one study has empirically demon-strated certain word knowledge interrelationships correlationally (Schmitt and Meara, 1997). An intriguing possibility is that some of these relation-ships are hierarchal; that is, learned in some type of developmental order. Developmental sequencing has been posited in other areas of language, syntactic structures (e.g. Pienemann and Johnston, 1987) and morphemes (e.g. Larsen-Freeman, 1975), so it would not be surprising if the principle obtained in the area of lexical acquisition as well.

In fact, it seems counterintuitive that word knowledge is not at least par-tially hierarchal. It is unlikely that the initial exposure to a word yields much more than some partial impression of its written or phonological form and one of its meaning senses. After more exposures (or some explicit study), a learner would gradually learn the other kinds of word knowledge, with perhaps collocational and stylistic knowledge being the last. Indeed, it doesn’t seem reasonable that a learner would have a rich associative and collocational network built up without a knowledge of the word’s form, for instance. Research designs based on a word knowledge framework allow



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


investigation into whether some kinds of word knowledge are acquired before others.

Finally, such word knowledge research may lead to a better understand-ing of the movement of vocabulary from receptive to productive mastery. This movement is still not well understood (Section 2.8), and researchers are not even sure whether receptive and productive knowledge forms a con-tinuum, as Melka (1997) argues, or whether it is subject to a threshold effect, as Meara (1997) has suggested. Part of the problem is the typical assumption that lexical items are either receptively or both receptively and productively known. The actual situation is probably that, for any individual item, each of the different types of word knowledge is known to different receptive and productive degrees. For example, an item’s spelling might be produc-tively known, some of its meaning senses receptively known, and its register constraints totally unknown. Thus, research into the underlying receptive/productive word knowledge states should prove informative about learners’ overall ability to use words in a receptive versus productive manner.

Of course, the dimensions approach has limitations as well. It is impos-sible in practical terms to measure all word knowledge aspects, as a test battery that comprehensive would soon become unwieldy, especially if both receptive and productive facets were addressed. Therefore researchers following this approach have typically focused either on one, or a limited number, of word knowledge aspects. Another limitation of this approach is that some word knowledge aspects seem much more amenable to testing than others. For example, I know of no test for the register/stylistic appro-priacy of lexical items, and this probably has much to do with the difficulty in devising a test for this knowledge aspect. Similarly, I can personally attest that devising a test tapping into frequency intuitions is not easy (Schmitt and Dunham, 1999).

This section will first discuss measures of individual types of word knowl-edge which have been used in research, and then overview a number of studies that have concurrently measured multiple aspects.

Quote 5.6 Read on the limitations of dimensions approach to vocabulary measurement

Further work in this area [the dimensions approach to vocabulary assessment] has value for research purposes, in helping us to understand better the complex nature of vocabulary knowledge at the microlevel of individual items. It is not so clear what the role of such measures is in making decisions about learners. If a whole set of them [dimensions tests] is created, there is a danger of finding out more and more about the test-takers’ knowledge of fewer and fewer words, unless we have a definite assessment purpose in mind.

(2000: 248)



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


Word Associates Format (WAF)

The test format which has been most utilized as a depth of knowledge meas-ure is probably the Word Associates Format. The format was originally created by John Read and has evolved over several versions by Read (1993, 1998, 2000), and other scholars (see below). As the name indicates, it measures word associations, and so could logically be presented in Section 5.5, but due to its popularity as a depth test, it is discussed here. The 1993 version consists of eight options for each target word. Four of the options are associ-ated to the target word in three ways: paradigmatic link (synonym – group), syntagmatic link (collocation – scientists), and analytic link (component – together). The other four options are unrelated distractors.

teamalternate chalk ear grouporbit scientists sport together

Read’s 1998 version contains eight options within two boxes for each target word, all of which are adjectives. The examinees are required to find four words that associate with the target word out of the eight options.

sudden

beautiful quick surprising thirsty change doctor noise school

common

complete light ordinary shared boundary circle name party

The associates in the first box are paradigmatic in nature: either syno-nyms or representing an aspect of the target items meaning (sudden – quick, surprising). The second box includes syntagmatic associates (sudden – change, noise). The four associates can be evenly divided (2–2) between the two boxes as in the examples, but they can also be split 1–3 or 3–1. This is to make guessing more difficult.

Read developed these receptive formats in response to the difficulty in judging the appropriateness of free associations (see Section 5.5). With a selective format (Section 5.1), the target words can be analyzed in advance and clear associates found via piloting of test items. The main advantage of the format is that it covers both meaning and collocation. Moreover, it can tap into multiple instances of both. For example, the two main meaning senses of common (ordinary and shared) are both addressed, as well as two collocations (common boundary, common name), in the above 1998 item. Also, while it focuses on individual words, the fact that it includes a collocation element means that taps into knowledge of formulaic language to some lim-ited extent.



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


The format has been used by a number of scholars in their study of vocabulary depth of knowledge (see Greidanus, Bogaards, van der Linden, Nienhuis, and de Wolf, 2004, for one listing). For example, Qian (2002) used it in a study relating vocabulary size and depth knowledge to the read-ing comprehension component of the TOEFL (Test of English as a Foreign Language) test. He found the WAF was reliable (.88) and correlated with the TOEFL reading test at .77 and with vocabulary size (based on an early Vocabulary Levels Test) at .70. Most researchers have modified the WAF for their own purposes, including type of vocabulary assessed, whether the test was targeted at L1 or L2 participants, number of options (most research-ers have opted for six-option (three response and three distractor) ver-sions), and nature of the distractors. Much of the research with the WAF has been carried out in northern Europe, with scholars considering various characteristics of the format. For example, Beks (2001, cited in Greidanus et al., 2004) found that it did not matter much whether there was a fixed number of correct responses per item (e.g. three), or whether the number varied across items. Greidanus and Nienhuis (2001) explored distractor type (semantically-related versus semantically-non-related), association type (paradigmatic, syntagmatic, analytic), and frequency. The semantically- related distractors worked better for the researchers’ advanced learners, who also showed a preference for paradigmatic responses. As expected, the learn-ers showed more knowledge on the test for more frequent words. Schoonen and Verhallen (2008) used a six-item version with participants aged 9–12 years. They found that the test appeared valid for use with their younger learners on the basis of IRT evidence, reliability of .75–.83, and concurrent correlations with a definition test of .82. They also found that their WAF could distinguish between students with previously-known differences in level of more advanced word knowledge.

One issue that has not been satisfactorily addressed to my knowledge is the problem of how to interpret ‘split’ WAF scores. If an examinee correctly chooses all associated options, this is good evidence that some advanced (receptive) knowledge of the target item is in place. Likewise, the inability to select any ‘correct’ associates indicates that little, if any, knowledge of the items exists. But how should a score of two associates and two distractors be interpreted (on an eight-option version), as an examinee would be expected to select two correct associates simply by random guessing. As with most multiple choice formats, if examinees are not prone to guessing, then this issue is probably not a problem. But active guessing can make the results of the test difficult to interpret. Unfortunately, Read (1998) found that guessing played a role in examinee performance on the test. Most researchers simply take the number of correct associates selected as their score (e.g. Qian, 2002), but this practice is questionable if examinees are guessing. If so, then a bet-ter approach might be to accept only scores above ‘chance level’ (e.g. two associates), but this approach would require a validation study to confirm



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


its efficacy. Another approach is to only count items as correct in which all appropriate options are marked, and none of the distractors (e.g. Schoonen and Verhallen, 2008).

The amount of interest in the WAF gives weight to the notion that it is a viable depth-of-knowledge approach, and indeed there is a growing amount of validation evidence (e.g. Schoonen and Verhallen, 2008). Most studies have found it to be reliable and to provide useful information about vocabu-lary knowledge. Thus, the WAF would appear to have plenty of potential, and has an emerging track record of use. The downside (in addition to the uncertainty surrounding scoring and interpretation) is that there is no sin-gle ‘accepted’ test version which is available to be used for a wide variety of lexical research purposes. Rather, the format is available, but must be carefully adapted to individual research purposes, which entails becoming familiar with the previous attempts to use the format.

Test of English Derivatives (TED)

Given the importance of form in vocabulary acquisition, it can often be a reasonable word knowledge aspect to measure, and has been in several stud-ies (e.g. Schmitt, 1998a; Webb , 2005, 2007a, 2007b). At the beginning stages of acquisition, it can be sensible to elicit demonstration of an item’s spelling or pronunciation. However, for lexis which is past the very beginning stages of acquisition, a more advanced facet of form to measure is knowledge of the different derivative forms within a word family. This can be informa-tive because Schmitt and Zimmerman (2002) found that even relatively advanced learners (students studying in presessional courses preparing to enter English-medium universities) typically did not know the main deriva-tives of AWL target words. Their measurement instrument, Test of English Derivatives, illustrates how this type of knowledge can be tapped. The items in TED consist of four sentences with blanks for the participants to write in the appropriate derivative form of the target item.

1. philosophy

Noun She explained her ——— of life to me.Verb She was known to ——— about her life.Adjective She was known as a ——— person.Adverb She discussed her life ———.

2. ethnic

Noun The people in his neighborhood shared the same ———.Verb The neighborhood ———.Adjective The people lived in ——— neighborhoods.Adverb The neighborhoods were divided ———.



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


The test measures form-recall, and is productive to the extent that this recall is done in the context of sentences. The researchers did not want to rely on participants’ metalinguistic knowledge by framing the prompts in metalin-guistic terms, e.g. by asking, ‘What is the noun form of ethnic?’, as Alderson, Clapham, and Steel (1997) found that even native speakers often lack this kind of grammatical metalinguistic knowledge. Rather, they drew on an idea from Nagy, Diakidoy, and Anderson (1993), presenting a series of four similar, contextualized sentences for each prompt word, to which partici-pants could respond whether or not they had the respective metalinguistic knowledge. However, the word class information was also provided for par-ticipants who did possess this knowledge. The participants were instructed to write the appropriate derivative form of the target word in each blank and were informed that the prompt word could be the proper form without alteration.

The sentences were written to be similar semantically and to recycle as much vocabulary as possible. The vocabulary was drawn exclusively from the 2,000-word GSL (West, 1953), with the exception of a few other words of relatively high frequency. The sentences were mainly designed to constrain the possible derivatives for each sentence to one word class.

A key concern with this format is producing a list of derivatives which would be accepted as correct responses. In order to build this list, dictionar-ies, corpora, and native judgements were employed. Sometimes there was more than one possible answer for a word class (adjective: philosophical, philosophic), and in these cases, either possibility was accepted. There were also many cases where a particular word class did not have a typical deriva-tive (e.g. *ethicize), and participants were instructed to fill the blank with an ‘X’ in these cases, to show positive knowledge that no such possibility existed. However, given the fact that native speakers are often creative with language, piloting showed that many natives sometimes ‘made up’ words which did not occur in any dictionary or corpus (verb: ?traditionalize), even while many other natives indicated that no derivative existed. Given this split opinion, decisions were made on a case-by-case basis, balancing the information from all three input sources. Given this potential fuzziness, any researcher developing new versions of this instrument will need to be careful about developing their answer list, and consider it an inventory of ‘typical’ derivative forms, rather than a list of ‘correctness’ in any absolute sense.

Collocation measures

One of the most important types of ‘contextualized’ word knowledge is col-location, which makes it a good candidate for a depth-of-knowledge test. Most researchers of collocation have used strength of association measures (e.g. t-score, MI) to identify and analyze collocations in learner output (i.e. a comprehensive approach in Read’s (2000) terms) and these have already



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


been discussed in Sections 3.2–3.4. However a smaller number of research-ers have explored the assessment of selected target collocations. Though there is not yet a test which can be put forward as an accepted approach, this section will look at the various test formats which have been used in the collocational studies.

One measurement method used to elicit productive collocation knowl-edge in early studies was translation. For example, Bahns and Eldaw (1993) used this format when testing German speakers on their knowledge of 15 English verb + noun collocations. Prompt sentences in German containing the translation equivalents of the target English collocations were given, with the instructions to translate these into English. In principle, this for-mat would seem to be an ideal way to elicit a productive demonstration of target collocations, but only if participants choose to use the collocations (if known) in their translations. There are usually many ways of making a translation, and there is no guarantee that the translated sentences will contain the target collocations. There is also no way of knowing whether the absence of a collocation indicates a total lack of knowledge of the col-location, some knowledge but avoidance of it, or just that the sentence was translated in a way that the collocation was not required. These issues can be mitigated by careful writing and piloting of prompt sentences, but prob-ably never fully eliminated.

Another early method to prompt the production of collocations was with cloze items. Farghal and Obiedat (1995) used this technique in their study of the collocation knowledge of Arabic-speaking students. The following example is their attempt to elicit the collocation weak tea:

I prefer ——— tea to strong tea.

As with translation, there is always the chance that participants will foil the test designer’s intentions, and fill in blank with an acceptable answer not related to the target collocation, such as herbal in the above example. The addition of a translation of the key target collocation could help guide the participants towards the intended collocation, but this would only work if the L1 translation is not a word-for-word calque of the L2 version, other-wise the task would then become a simple translation of individual words rather than the collocation overall. For example, for the English colloca-tion take a picture, the German translation ein foto machen would be a good translation prompt, as a direct translation would be *make a photo. Thus, in order to produce the appropriate English equivalent, collocation knowledge is required, rather than simply the ability to translate the German prompt word-for-word.

Also, items like the above example only require production of one ele-ment of a collocational pair. It may be possible to have cloze blanks for the entire collocation, but the relative lack of structure may make it difficult to



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


constrain the possible answers down to the desired one. Again, a transla-tion would help in this regard, and in principle, it may prove workable to even provide multiple translations for different L1s (for use with mixed L1 participant groups), as long as the direct translation problem is avoided. A researcher might also include initial letter(s) in the blanks to constrain choice.

I like to drink a ——— ——— in the morning in order to wake up. (French: café fort)

I need to ——— some ——— from my savings account into my checking account. (Czech: pervod penez; German: Geld überweisen; Swedish: överföra pengar)

(strong coffee, transfer money)

Unfortunately, early researchers using these approaches did not address measurement issues, and so their studies provide little guidance concerning either the formats’ validity or problems. Gyllstad (2005, 2007) highlights several limitations of these early studies: usually only a small number of items were tested (typically 10–20), which makes it difficult to draw any firm conclusions; selection of target collocations was usually made in an unsystematic way or not described at all; often no reliability values were reported for the test instruments, and few of the studies compared learners at different levels of formal instruction. Happily, research moves on, and some later researchers have been much better in describing the validity evi-dence for their instruments, and so we have a clearer idea of those instru-ments’ behavior.

One of these researchers is Bonk (2001) who explored three different col-location test formats. Two utilized sentence clozes, the first focusing on verb + object collocations with a gap for the verbs to be inserted:

Punk rockers dye their hair red and green because they want otherpeople to ——— attention to them.

(pay)

The second focused on verb + preposition collocations, with a gap for preposi tions to be inserted, and

Many of the birds in the area were killed ——— by local hunters.(to exterminate)

(off )



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


The third test utilized multiple-choice items. The target collocations were verb phrases, with each item containing four option sentences. The test-takers’ task was to identify which of the four sentences did not contain a correct usage of the verb.

(a) Are the Johnsons throwing another party?(b) She threw him the advertising concept to see if he liked it.(c) The team from New Jersey was accused of throwing the game.(d) The new information from the Singapore office threw the meeting into

confusion.

(b)

After checking the three tests with ten native speakers, Bonk gave the them to 98 mainly Asian EFL university students, along with a condensed TOEFL test to measure general language proficiency. He found that the students scored similarly on the three collocation tests (8.7/17, 8.8/17 and 7.8/16 respectively), which equates to about 50% of the maximum possible scores. A Kuder-Richardson-20 analysis showed that reliability for the three com-bined tests was.83, but also that the verb + preposition test was weak in this regard, with an unacceptably low figure of .47. Bonk also did classi-cal item analyses, including item facility, item discrimination, and point-biserial coefficients. These showed that most of the items functioned and discriminated well. A Rasch (IRT) and Generalizability analysis showed that the three combined collocation tests worked reasonably well on the whole participant population, but that the verb + preposition test was relatively weak. Bonk found no instances of low proficiency (TOEFL) scores combined with high collocation scores, or vice versa. Overall, Bonk’s analyses suggest that his verb + object cloze and multiple choice formats are valid methods of assessing collocational knowledge, but that the verb + preposition format could not be recommended.

Mochizuki (2002) also used a multiple choice format, but without sen-tence contexts. His four-choice format listed one component of the target collocation and testees had to decide with which of the four alternatives was the appropriate partner:

job (1) answer (2) find (3) lay (4) put

Mochizuki gave his collocation test as part of a vocabulary battery to 54 Japanese first-year university students in April and then in January (one aca-demic school year in Japan), in which students received 75 hours of instruc-tion (reading and conversation classes). Mochizuki found that although there was no significant improvement on the vocabulary size or paradig-matic knowledge tests in the battery, there was a significant gain in mean



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


collocation score (41.7 → 42.8 (max = 72)). The collocation test had vari-able reliability, .54 at the T1 and .70 at the T2 (Cronbach alpha). However, the low reliability scores may be partially caused by the Japanese partici-pant group being relatively homogeneous, as the lack of variance generally results in low internal reliability values (see Brown, 1983: 86). It might be noted that although the increase in collocation scores might be significant in this case, in absolute terms, an improvement of 1.1 on a 72-point test is not particularly meaningful.

There has also been an attempt to use a developmental scale approach to assessing collocation knowledge. Barfield (2003) developed a four-stage scale, onto which participants were asked to judge the frequency of the tar-get collocations:

I. I don’t know this combination at all. II. I think this is not a frequent combination.III. I think this is a frequent combination.IV. This is definitely a frequent combination.

Barfield’s test focused on decontextualized verb + noun collocations. He took 40 lexical verbs and found three noun collocates for each (e.g. break + ground, break + record, break + rules). Similar to the methodology in checklist tests, non-collocations were substituted for some of the verb + noun collo-cations to check on the validity of the responses (adopt + approach, adopt + child, *adopt + profit). Barfield gave the test (100 real collocations and 20 non-collocations) to 93 Japanese university students, after first confirming that they knew the nouns in the collocations (they did). The mean result for the collocation scale was 2.56 (SD .39), with real collocations scoring 2.65 (SD .47), and non-collocations 2.15 (SD .62). The reliability for the real collocations was high (.97, Cronbach alpha), as was the reliability for the non-collocations (.93).

Barfield studied two participant groups (high and low), and there was a significant difference between them on the real collocation scores, but not on the non-collocations. Thus, higher proficiency learners self-evaluated their knowledge of collocations more highly than lower proficiency learn-ers, which suggests more advanced collocation knowledge. Conversely, both proficiency levels judged the non-collocations about the same, which sug-gest that both groups shared a similarity in rejecting the non-collocations. (Note that many of these were rated at Stage 2 – I think this is not a frequent combination).

Barfield’s study is an interesting attempt to use a developmental scale with collocation knowledge, since it is clearly not an all-or-nothing propo-sition. However, one potential problem highlighted by Gyllstad (2005: 11) is that some of the non-collocations are possible in certain contexts, e.g. ?explain address, ?approve opportunity and ?create temperature are all possible



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


combinations in the contexts of ‘to explain an address to someone’, ‘to approve of a job opportunity’, and ‘to create a temperature at which certain solid elements become liquid’. However, although these combinations may be possible in certain constrained contexts, compared to the other much more frequent collocations, they certainly would not be typical. Given the creativity of language users, typicality of language use is probably a more reasonable criterion than possibility of use.

Another issue is the scale itself. It essentially measures self-evaluation of whether a verb + noun combination is a frequent collocation or not. This only makes sense if the target collocations are in fact frequent. If the tar-get combinations are collocations but somewhat infrequent ones (like high MI score collocations are likely to be), then Level 2 is actually the most appropriate judgement. It is not clear from Barfield’s report whether the collocations are frequent or not. Another issue is the whether the stages are equidistant and so can be considered an interval scale. I am not sure this could be established one way or the other, and so the use of means to sum-marize the results is dubious.

Gyllstad (2005, 2007) notes that most collocation studies focus on colloca-tions made up of content words. However, research has shown that delexical verbs (make, take, do, give have) occur frequently in English and are difficult even for advanced learners (Altenberg and Granger, 2001; Nesselhauf 2004). For this reason, he focuses on collocations containing these in his two col-locational test formats. In the format he calls COLLEX 5, testees must decide which verb + noun combination is a real collocation out of three options, and check the corresponding box:

a b c 1. do damage make damage run damage 2. turn out a fire put out a fire set out a fire 3. hold discussions do discussions set discussions 4. receive a cold fetch a cold catch a cold 5. press charges run charges push charges

The 50 items were created mainly from high frequency words (first 3,000), based on Kilgarriff’s frequency lists (see Section 6.4). This helped insure that participants knew the component words of the collocations. Also, to the extent possible, the three verbs in an item were similar in frequency to each other. The targets were confirmed to be indeed collocations by setting a minimum z-score of > 3, although most of the collocations had very high scores much above this. Similarly, the distractor combinations were checked with the BNC to make sure they were not collocations. The validation of the test showed that there was a clear progression in the COLLEX 5 scores as proficiency increased, and that the test had good reliability at .89.



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


The COLLMATCH 3 format presents verb + noun combinations and requires informants to make yes/no decisions as to whether they are col-locations or not:

1. press charges 2. lay pressure 3. bend a rule 4. score problems yes yes yes yes no no no no

As with COLLEX 5, COLLMATCH 3 produced a clear progression compared to proficiency level, and a reliability of at .89. Through an extended series of piloting and validation, Gyllstad builds a considerable validation case for the COLLEX 5 and COLLMATCH 3 formats. Interestingly, as part of the validation, Gyllstad found that both tests correlated highly with the Vocabulary Levels Test (COLLEX 5: .88; COLLMATCH 3: .83), indicating a strong relationship between vocabulary size and receptive recognition knowledge of English collocations.

As part of his piloting, Gyllstad used item-whole correlations, even though knowledge of any particular collocation does not necessary entail knowledge of other particular ones. According to my ‘item-whole test’ argument in Section 5.1.3, this is questionable. The approach may be more supportable if one can ensure that the target collocations are representative of collocations in general, but this requires a principled way to ensure representativeness. However, at the moment, I know of no way this can be accomplished, simply because our under-standing of the domain of collocations (its extent and nature) is too limited.

It is interesting to look at two of the earlier test formats in Gyllstad’s test development. The original format used for COLLMATCH 1 was a grid, such as illustrated below:

Check each combination which you think exists in use in English.

charges patience weight hints anchor blood

drop

lose

shed

The number of real combinations was not indicated to testees, making the test relatively demanding, since the number of alternatives is large. After piloting, Gyllstad decided to scrap this version, mainly because most of the items were in fact non-collocations (93/144, 65%). This meant that the test mainly measured learners’ ability to reject non-collocations rather than their ability to recognize real collocations. This ratio is inherent in the grid design, as it was difficult to find objects that partnered with two or all three of the verbs. For example, in the 18-box grid above, there are only eight



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


collocations (drop charges, drop hints, drop anchor, lose patience, lose weight, lose blood, shed weight, shed blood). This led to the revised COLLMATCH 2 format, where the number of real collocations could be controlled and a better ratio achieved (65 real collocations and 35 non-collocations). It presented five noun combinations with the target verb, and the participants’ task was to tick the collocations they thought existed in English, and leaving the boxes of the non-collocations blank:

While having better test characteristics than COLLMATCH 1, Gyllstad decided to abandon this format as well, mainly because it was hard to gen-eralize its results to the wider domain of general collocational knowledge, as each of these extended items were based around the collocations of only a single verb.

Studies illustrating multiple word knowledge measures

Schmitt (1998a)One of the first studies to concurrently measure a number of word know-ledge aspects was Schmitt (1998a), where I measured advanced learners’ knowledge of spelling, word class/derivative form, meaning senses, and association. In fact, I also attempted to measure collocation, but was not all that happy with the measurement instrument (Schmitt, 1998b). I tried to devise word knowledge measures which captured the incremental nature of vocabulary learning. The test of spelling consisted of a four-point rating system. Zero (0) on the scale indicated that the participant demonstrated no knowledge of a word’s spelling. One (1) signified that the participant could give the initial letters of the target word, but omitted some later letters, added unnecessary letters, or transposed letters. Two (2) indicated that the word was phonologically correct, but perhaps some vowels or consonants were replaced by similar-sounding but erroneous items (brood – *brud; illuminate – *elluminate). Three (3) indicated fully correct spelling. This approach is simi-lar to Barcroft (2002), who proposed a five-point scale.5

The association measurement procedure asked participants to give three responses for each target word stimulus. These responses were compared to a native norming list. In Category 0, none of the three responses matched any of those on the norming list, in which case, no native-like association behavior was demonstrated. In Category 1, some responses matched infre-quent ones on the norming list, indicating a minimal amount of native-like association knowledge. In Category 2, the responses were similar to those typical of the native norming group, indicating native-like associations. Lastly, in Category 3, the responses were similar to those in the top half of the native norming group, indicating a native-like rating in which even more confidence could be put (see Schmitt, 1998c, for more detail).

a. draw the curtains b. draw a sword c. draw a favour d. draw a breath e. draw blood



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


The norms for the word class and derivational forms were obtained from three dictionaries. The participants received one point for knowing the word class of the target word, and one point for knowing how to transform it into each of the three other word classes. If a form for a word class did not exist, participants got credit for being able to state that fact. When two or more forms were possible for any word class, only one was required for credit. For example, participants were awarded one point for knowing that illuminate is a verb, and one point each for knowing illumination is the noun form, illuminated or illuminating (only one required) is the adjective form, and that no common adverb form exists. During the development of this measure, I noted that the norming data from the dictionaries sometimes conflicted with the native pilot participants’ answers, particularly for adverbials, with the dictionaries occasionally listing forms that the natives found strange. In these cases, I consulted the BNC to check those forms’ frequency of occur-rence. If it was very low, I still accepted it as a possible form for that word class, but I also considered acceptable an answer that no form existed. For example, the very rare adverb form of circulate, circularly, is so uncommon that I also accepted the answer “No form exists.” Thus, the possible scores ranged from 0 (knowledge for no word class) up to 4 (knowledge for all four word classes).

Because this study attempted to describe vocabulary acquisition up to the level of full mastery, it was important to measure knowledge of all of the major meaning senses of the target words. (Knowing only a single meaning sense for a polysemous word must be considered only partial knowledge.) I consulted three dictionaries to determine the major meaning senses. For cases in which they disagreed, I made decisions based on the responses from both native and nonnative pilot participants and on corpus data. Whereas I only measured the other word knowledge types productively, it was both feasible and desirable to measure both receptive and produc-tive knowledge of word meaning, because a major part of the incremental acquisition of word meaning probably involves the move from receptive to productive mastery of different meaning senses. I asked the participants to explain all of the meaning senses they knew for each target word. After the participant could not think of any additional senses, I gave prompt words designed to elicit additional senses that the participant might know but could not recall. The prompts were designed to trigger the related sense if the participant knew it, but not to give it away if it were unknown. For example, for the target word spur, the prompt word horse was designed to suggest the meaning, “metal device worn on the heel of a boot used to guide or encourage a horse”. Unprompted explanations of a meaning sense demonstrated meaning recall and were awarded 2 points. Acceptable expla-nations given after a prompt were assumed to be less well mastered (i.e. still meaning recall but with the aid of a semantically-related prompt) and received 1 point. If the participant could not describe the meaning sense



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


after the prompt, that sense was scored ‘unknown’ (0 points). Because the target words had differing numbers of meaning senses, a way of compar-ing the different words was to calculate a meaning proportion by taking the participant’s total point score for each word and dividing that by the number of possible points (i.e. number of meaning senses × 2 points each). While convenient, this method was somewhat problematic in that a sin-gle proportion score could represent different constellations of meaning knowledge. For example, a meaning proportion of .50 could indicate know-ing all meaning senses receptively, half productively, or some combination of the two.

Webb (2005, 2007a, 2007b)Stuart Webb has taken the notion of concurrent multi-componential vocab-ulary measurement and pushed it to a new level. In a series of studies, he used an extensive test battery measuring receptive and productive knowl-edge of orthography, the form-meaning link, grammatical functions, col-location, and association. The ten-part battery is illustrated below. As with Schmitt’s (1998a) battery, the individual tests were carefully sequenced to avoid the risk of earlier tests affecting answers to later tests. The target items were nonwords matched with the meanings of low-frequency vocabulary (e.g. dangy = boulder; masco = locomotive). As Webb employed the battery to explore the efficacy of various learning tasks, the use of nonwords was beneficial in ensuring that participants had no previous knowledge of the target words.

Test 1: Productive knowledge of orthography:

Participants heard each nonsense word pronounced twice, and then had ten seconds to write it down. Only fully correct spelling was marked as correct.

Test 2: Receptive knowledge of orthography:

The correct spelling of the target nonword was given along with three dis-tractors with similar phonetic and orthographic forms.

(a) dengie (b) dengy (c) dungie (d) dangy

Test 3: Productive knowledge of meaning and form:

This is a form-recall test using translation. The L1 translation was given and the informants were required to write the L2 nonword beside it. As the abil-ity to spell the nonword had already been measured in Test 1, the response was not marked down for slightly deviant spelling, as long as the nonword could be clearly discerned.



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


Test 4: Productive knowledge of grammatical functions:

Participants were asked to write a sentence using the target nonword. The instructions made clear that the sole criterion for correctness was grammati-cal accuracy in the usage of the nonword. For example, The girl mascoed to school would be marked as incorrect (verb form), while It is a masco (noun form) would be correct, regardless of the semantic quality of the sentence. Thus, this test essentially measured knowledge of word class and the attach-ment of inflections appropriate to that class.

Test 5: Productive knowledge of collocation:

It is interesting to note that Webb describes this test as a ‘syntax’ test in his papers. However, it is probably most accurately labelled an association test, but one which only allows for syntagmatic associations, which are essen-tially collocations. This can be compared to his ‘association’ tests (Tests 6 and 9) which focus solely on paradigmatic associations. Participants were required to produce one syntagmatic association to the nonword prompt. Webb reports that it was made clear to the participants that only a syntag-matic response would be accepted, but it is not reported how this was done in practice. It is also not clear how judgments concerning the informant responses were made. He reports using a ‘common sense’ approach in decid-ing whether the responses are typically encountered in contexts with the meaning attached to the nonword, e.g. locomotive (masco) station, tracks, left, arrived, or whether they are less frequently found in that that context (clock, ate, hard). Moreover, it is not reported whether this was based on the researcher’s sole intuition, a panel of judges, corpus evidence, or a combina-tion. However, the description in Webb (2007a) indicates that a second rater was employed with good inter-rater reliability, so this hints at individuals using their intuition. This point holds true for all of the association-based tests.

Test 6: Productive knowledge of association:

This used the same test format as Test 5, but focuses on meaning-based paradigmatic associations. Therefore responses such as masco: train, airplane, vehicle would be scored as correct, but non-associations and syntagmatic associations would not.

Test 7: Receptive knowledge of grammatical functions:

This multiple choice test gave the nonword in three sentence contexts in which it had different word classes. The participants needed to choose the sentence where it was illustrated correctly. As the sentences were semantically



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


barren, the only type of knowledge from which to make this judgement was grammatical. Thus knowledge that masco (locomotive) is a noun, should lead to the selection of the correct answer (a).

(a) It is a masco.(b) It mascoed.(c) It is very masco.

Test 8: Receptive knowledge of collocation:

This is another multiple choice test, where informants need to circle the words which are most likely to appear in context with the target nonwords. Note that the options are all from the same word class. The first item is for dangy (boulder) and the second is for hodet (lane).

dangy (a) fall (b) wash (c) walk (d) catchhodet (a) drive (b) sit (c) take (d) know

Test 9: Receptive knowledge of association:

This test is the same format as Test 8, but focuses on paradigmatic associations.

dangy (a) stone (b) plant (c) tree (d) personhodet (a) park (b) highway (c) garden (d) building

Test 10: Receptive knowledge of meaning and form:

(2005, 2007a studies):The last test is the counterpart to Test 3, but here the L2 word form is given, and the meaning must be recalled. To demonstrate this meaning, the cor-responding L1 word must be written in the blank.

masco ———

(2007b study):This study used a multiple choice meaning-recognition test. This is illus-trated by the following example of ancon (hospital).

ancon (a) hospital (b) house (c) car (d) city

Several observations can be made about Webb’s test battery. The first is that it is much more comprehensive than any used before to explore pedagogi-cal issues in second language vocabulary, which allowed him to describe vocabulary learning much more fully. His 2005 study showed that both the short reading and writing tasks he used led to more than just form-meaning



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


links, enhancing all of the word knowledge aspects he measured. Also, it seemed that the productive task led to somewhat more learning for all word knowledge types, in terms of both receptive and productive mastery. In contrast, the 2007a study compared vocabulary learning from a L1 transla-tion with learning from a L1 translation plus a short context sentence. He found that the addition of a single sentence context made no difference in the enhancement of any of the word knowledge aspects over the transla-tion input alone. However, more exposures do lead to acquisition. Webb (2007b) exposed learners to nonwords in reading texts once, three times, seven times, and ten times. He found that each additional exposure band led to enhancement of at least one type of word knowledge, and usually many/most of them.

The second point is that although his individual tests are scored dichoto-mously (correct/incorrect), the fact that he uses both receptive and produc-tive formats for each type of word knowledge, means that he can show a progression from no receptive knowledge→receptive knowledge→productive knowledge. This is very useful as he was able to show not only that different pedagogical tasks enhanced different types of word knowledge, but they did to different degrees for the different word knowledge types. This much more detailed description gives a much clearer insight into the mechanics of vocabulary acquisition and what factors tend to enhance it and how.

Almost inevitably with such an ambitious measurement program, there are limitations to the measurement methodology. The first is that only a very limited number of target items could be addressed (10–20). In none of the studies is it reported how long the entire battery took, but it must be assumed to be a considerable time, which presumably constrained the number of target items.

Summary

All of the tests in this section illustrate different ways of tapping into depth of knowledge. Although none of them are yet established as accepted standards, it seems obvious how they provide much richer information about informants’ lexical knowledge than typical form-meaning formats. This being the case, I feel that this approach definitely should be followed up and included to the extent possible in vocabulary research. While there will always be the issue of sampling rate vs. richness of measurement information elicited, many research questions require information on depth of knowledge to truly understand what is happening. Perhaps the best solution is to combine approaches, with some measures estimating the ‘quantity’ realm (e.g. size of lexicon), and others tapping in the ‘qual-ity’ of the lexical knowledge within that realm. These combined meas-ures could be contained within the same study, or if time is a constraint, then within consecutive studies, whose results can be linked for greater understanding.



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


5.4 Measuring auomaticity/speed of processing

Most vocabulary research has been couched in terms of knowledge rather than in terms of how automatically that knowledge can be deployed (Section 2.11). This is partly because it is easier to measure knowledge than automaticity with paper-and-pencil tests. It is no doubt also due to the feel-ing that having knowledge is more important than how quickly it can be utilized. For example, many vocabulary studies have had pedagogical aims, where knowledge was considered the key attribute. Conversely, psycholin-guistic research has commonly used measures of automaticity (e.g. reaction times), but in many cases the vocabulary involved was merely a convenient stimulus task, rather than the main focus of the research (e.g. the effects of the L1 lexicon on the L2 lexicon).

Regardless of this focus on knowledge (typically of form and meaning), automaticity is also a key factor in how well vocabulary can be used. This is obvious in the verbal skills, which are carried out on-line in real time. A person usually has one chance to catch the words an interlocutor has spoken; the mind cannot rewind and replay the stretch of discourse. This is unless one asks the interlocutor to repeat themselves, which is annoying if done too much, and should be considered part of discourse strategy com-petence. On-line processing deficits can be even more obvious in speech; if one does not have words at one’s disposal, the speech can become very disfluent indeed.

Speed of processing is also important in reading and writing. Recognition speed is essential in reading, as sight vocabulary is a key requirement for flu-ent reading (Carrell and Grabe, 2002). Van Gelderen, Schoonen, de Glopper, Hulstijn, Sirnis, Snellings, and Stevenson (2004) found evidence support-ing this, in the form of substantial correlations between speed of process-ing and reading comprehension, both in the L1 and L2 (although speed did not add additional predictive power to a regression analysis which included vocabulary, grammar, and metacognitive knowledge). In the companion study to Van Gelderen et al. (2004), Schoonen, van Gelderen, de Glopper, Hulstijn, Sirnis, Snellings, and Stevenson (2003) found similar results for writing: speed of processing correlated with writing proficiency in both L1 and L2, but added no unique contribution in a regression analysis.

The importance of automaticity means it is an aspect of vocabulary mas-tery that should be given more attention in vocabulary research. Luckily, new technology makes it increasingly possible to measure this construct. Many vocabulary measurement tasks can now be done either on a computer or online on the internet. In either case, programs can be set up to measure the response times as informants provide their answers. Researchers devis-ing new research which is computer/internet-based (and it is likely that this will increasingly become the norm) should definitely consider the possibil-ity of adding a timing element to their design.



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


Psycholinguistic tasks provide very precise timings, with responses meas-ured to milliseconds (e.g. in priming studies). However, psycholinguistic experiments typically use very basic and controlled tasks, most often lexical decision, where only a simple decision needs to be made (e.g. Is the item a real word or not?) and a button pushed. Lexical decision tasks have been used in studies addressing a wide variety of lexical phenomena, but they are not the only measurement task possible. More ‘lifelike’ tasks which measure vocabulary in use could also be used (e.g. recalling a word to be used in an essay, using a target item in a sentence, recognizing target items in a pas-sage). However, they are likely to be relatively ‘noisy’, as any differences in processing speed may be overshadowed by other factors, for example, dif-ferences in writing speed. Thus great care needs to be taken in setting up such studies to control for as much of this ‘external’ variation as possible. Still, many relatively-realistic vocabulary tasks can be sensibly timed if the research design is developed with this in mind. Once the design is set, it is fairly easy to set up an automaticity measurement with a software program like Eprime.

An example of this type of study is Siyanova and Schmitt (2008), who used a timed task to compare the speed of native versus nonnative judge-ments of collocation frequency. A total of 27 native and 27 advanced non-native speakers judged the frequency of adjective-noun combinations, half of which were frequent collocations in the BNC (criminal offence), while the other half did not occur in the BNC, although they made sense and were possible combinations (exclusive crimes). Siyanova and Schmitt found that the nonnatives were much slower in making the judgements of both typical collocations (NNS: 2,813 milliseconds, NS: 1,945) and noncollocation com-binations (NNS: 3,904, NS: 3,023). Based on this evidence, they concluded that, not only was the nonnatives’ collocational knowledge less accurate (based on another part of the study not reported here) than natives’, but that it was less automatic as well.

Another example of an automaticity task is self-timed reading, a tech-nique for measuring the speed at which participants can read target lexi-cal items in context. The target items are embedded in context which is then shown on a monitor screen one line/phrase/sentence at a time. Once this is read, the participant presses button to bring up the next screen. The participants are instructed to read as quickly as they can, while still understanding the meaning. The reading times (between but-ton pushes) can then be recorded. While mainly a technique to measure reading, it can be useful for measuring the speed at which longer formu-laic sequences are read.

Just such a use is illustrated by Conklin and Schmitt (2008), who embed-ded formulaic sequences and matched non-formulaic control strings in story passages. There were three conditions: formulaic sequences where the context forced an idiomatic reading (a breath of fresh air = an interesting



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


new situation), formulaic sequences with a literal meaning (a breath of fresh air = breathing nice air), and non-formulaic control strings which contained all/most of the words from the formulaic sequences, but in a different order (fresh breath of some air). These conditions are illustrated below in one of the study’s passages (italics = idiomatic, bold = literal, underlined = non-formulaic). (Note that this formatting is for the reader’s convenience; in the actual study, all of the text was rendered in the same regular font.)

Dave used to work in Japan.He really liked his job therebut he hated riding the trainsbecause they were so overcrowdedwith hundreds of peoplecrammed into the railway cars.Every morning the people werepacked in like sardinesduring the rush hour time.Dave hated having people pushingagainst him from every side sowould try to get a placewhere he could stand withhis back against the wallif there was a place there.At least the wall was cooler than ahot and sweaty body against his back.In the summer the cars were alwayshot and the open windowshardly seemed to help at all.On one extremely hot trip,he felt like he was chokingand desperately needed afresh breath of some airbefore he became ill.He managed to hold outuntil the next station andstaggered off of the train.He decided then and therethat it was time to leave Japanand escape the rush-hour craziness.(Conklin and Schmitt, 2008: supplementary material)

Conklin and Schmitt found that their participants read both types of for-mulaic sequence more quickly than the matching non-formulaic control



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


strings, but there was no difference between the idiomatic and literal mean-ings. They interpreted this as evidence that formulaic language is more eas-ily processed than non-formulaic language. Their study demonstrates how self-timed methodology can be used to measure the processing of formulaic sequences in context.

Although I have used the terms automaticity and fluency as synonyms for relatively quick speed of processing, it is important to note that some scholars have made rather more precise distinctions. Lennon (2000) distin-guishes between a lower-order fluency (essentially speed, e.g. rate of articu-lation) and a higher-order fluency which involves a broader proficiency: ‘the rapid, smooth, accurate, lucid and efficient translation of thought into language’ (p. 40). Likewise, automaticity has been more tightly defined. Segalowitz and Segalowitz (1993) and Segalowitz and Hulstijn (2005) point out that increased speed can accrue from two different sources. One is the simple speeding up of processes which a person already possesses. Another is through the development of new processes which allow quicker opera-tions. They argue that this second route (which they term automaticity) is important because it represents some mental restructuring or reorganiza-tion which makes language processing more efficient (and for nonnatives, more native-like), compared to a general speeding up of existing processes. Segalowitz and Segalowitz (1993) propose a statistical method of distin-guishing between these two sources of faster processing based around the coefficient of variance (CV). CV is defined as the standard deviation divided by mean reaction time. The procedure is described in detail in Segalowitz and Segalowitz, who use it in studies, as do Segalowitz, Segalowitz, and Wood (1998), and Segalowitz, Poulsen, and Segalowitz (1999). However, it is probably best illustrated by the explanation in Segalowitz and Hulstijn (2005: 374–375):

Suppose a videotaped recording of a person making a cup of tea on 50 different occasions is viewed. Each component of the action – putting the water on to boil, pouring the hot water into a cup, inserting the tea bag, and so on – will take a particular length of time. A mean exe-cution time and a standard deviation for this mean can be calculated across the 50 repetitions for both the global action of ‘making tea’ and for each of component of this event. Suppose now a new videotape is created by rerecording the original at twice the normal speed. On the new tape, the entire event will appear to be executed in half the time with half the original standard deviation overall; moreover, the mean duration of each component and the standard deviation associated with each component will also be reduced by exactly half. This corre-sponds to what Segalowitz and Segalowitz (1993) argued to be the null case of generalized speed up; performance becomes faster because the underlying component processes are executed more quickly and for no



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


other reason ... Suppose now we are shown still another videotape in which the mean time for the global action of making tea is again half of the original mean time, but the standard deviation for the 50 rep-etitions is far less than half the original standard deviation. This tape cannot have been produced simply by rerecording the original at twice the normal speed. Instead, there must have been some change in the way the activity of making tea had been carried out, such that some of the slower and more variable components of the action sequence had been dropped or replaced by faster, less variable components. In other words, there must have been a change that involved more than sim-ple speed-up, namely, some form of restructuring of the underlying processes.

This speeding up versus automaticity distinction may not be important if the purpose is determining if vocabulary is being processed more quickly and what teaching or input led to this speed increase. But it may well be important to researchers who are interested in understanding the men-tal lexicon and explaining the mechanisms underlying any increases in processing speed. However, Hulstijn, van Gelderen, and Schoonen (2009) wonder whether the distinction between speeding up versus automati-city can be easily made in practice by the coefficient of variance. They reviewed seven previous studies using CV, then analyzed two of their own which were part of the L1 Dutch/L2 English NELSON project. Overall, they found minimal support for the proposition that CV reliably indicates developing automaticity. They feel that it is problematic to use the CV approach as an operationalization of automaticity when interpreting the reaction-time data typically used in automaticity studies, and that the holistic nature of language acquisition makes it difficult to differentiate between gains in knowledge itself and gains in the skill of processing of that knowledge:

... gains in knowledge itself and gains in processing it cannot be ade-quately disentangled in the RT [reaction-time] tasks used in the studies reviewed and reported. This may not just be an unfortunate feature of the RT tasks but may be an inherent characteristic of language learning. Although conceptually skill acquisition can be distinguished from knowl-edge accumulation, in reality knowledge accumulation forms part of skill acquisition because, in real L2 learning, exposure to new words goes hand in hand to exposure of words encountered before. L2 learning is both a matter of knowledge accumulation and of an increase in the effi-ciency with which that knowledge can be processed in knowledge-access tasks (listening and reading) and in knowledge-retrieval tasks (speaking and writing). (2009: 576)



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


5.5 Measuring organization

We have examined measurement of vocabulary size, depth, and automatic-ity, but these have tended to focus on individual lexical items (with the principal exception being the WAF test format). However, Meara (1996b, 1996c) and Nation (2001) have noted that knowledge of individual items does not operate in isolation, but rather works in conjunction with knowl-edge of other lexical items. That is, the lexical connections between words (and the resulting organization of the lexicon) are also important to vocab-ulary usage. In fact, Meara and Wolter (2004) argue that the distinction between size and depth is somewhat unfortunate, as the real distinction which should be made is between size and organization. After all, they point out that size is not really about individual words, but rather about learners’ overall lexicons. Henriksen (1999) and Aitchison (2003) argue that two cen-tral processes of vocabulary acquisition are mapping (establishing and fine-tuning the form-meaning link) and network building (forming internal links between items in the mental lexicon). If this is right, then looking at lexical networks may be a useful way of considering the incremental acquisition of vocabulary, as network building can be seen as the outcome of elaborating the initial form-meaning link. Likewise, Meara (1996b) argues that size is the key lexical component for beginners, but as learners get a bigger lexicon, lexical organization becomes increasingly more important, as better organi-zation allows better access to the mental store of lexical items.

Quote 5.7 Lennon on fluency in L1 and L2 language processing

For proficient speakers, lower-level processes, such as phonological articulation, are likely to be highly automatized, as are much lexical accessing and syntactic and morphological processing. For less proficient speakers of a foreign language, these processes will be as yet imperfectly automatized and may require much time, effort, concentration, and monitoring, especially for those who have learned the foreign language in the classroom and have had little chance to use it com-municatively. Thus, correspondingly little mental energy will be freed for higher-order processes, such as the conceptualization of a message, discourse planning, and the sociolinguistic skills of turn taking, involving the interlocutor in dis-course, achieving rhetorical effect, and so on.

(2000: 28)

Quote 5.8 Henriksen on meaning versus network knowledge

Acquiring word meaning involves, as we have seen, two interrelated processes of (a) adding to the lexical store via a process of labelling and packaging (i.e. creating extensional links) and (b) reordering or changing the lexical store via a process of network building. There is a need for clarification in the research literature as to



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


This all suggests that lexical organization is worthy of research, and thus needs to be measured in some way. The main method of researching lexical organization is word associations. It is clear that association data provides insights in the organization of the mental lexicon (Section 2.4), but equally, it must be said that the data is often confusing and difficult to interpret. It is still unclear just how much associations can tell us about lexical organi-zation (as well as lexical acquisition and processing), and it seems that this approach is still waiting for a breakthrough in methodology which can unlock its undoubted potential.

One reason why concrete conclusions have eluded many previous asso-ciation studies is flawed methodology. Fitzpatrick (2006, 2007) highlights a number of recurrent problems which have led to difficult-to-interpret data. The first has to do with the lexical items used as stimuli. While target word selection is important for any vocabulary study (Section 4.5), it seems to be particularly crucial for association research, as responses are influ-enced or even determined by characteristics of the stimulus words. Prompt words from different word classes tend to elicit responses from similar or closely related word classes, with nouns prompting nouns, verbs prompt-ing verbs, and adjectives prompting nouns, etc. Moreover, high-frequency stimuli tend to elicit more predictable responses (Meara, 1983), which is not particularly useful in studies set up to investigate differences between sub-jects. Many studies have used the 100 words from the Kent-Rosanoff asso-ciation list (1910), simply because the response norms are already available. Unfortunately, the words on this list may not be suitable for L2 research, as they are of very high frequency, and are almost all adjectives or nouns, and so produce very similar responses in both the L1 and L2 (e.g. black→white; noir→blanc). This makes it difficult to decide whether an association is a direct response to the L2 prompt, or whether it is produced via translation into the L1 and back again. Furthermore, almost all the words in the list are from the highest frequency band, and so will be among the first words that a learner acquires in his second language. It is not clear whether the word association behavior for these basic L2 words is the same as or different from more advanced lower frequency vocabulary, and it might be misleading to generalize from one to the other (Meara, 1983).

The way around this problem is to choose stimulus words which are less frequent, but are still known to the participants. It also makes sense to choose words which are matched to the participants. Albrechtsen, Haastrup,

which process is being described, tested, and discussed. There has, in my view, been a tendency in L2 vocabulary research to focus on the first aspect (i.e. mapping meaning onto form) and to disregard the second aspect (i.e. network building).

(1999: 309)



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


and Henriksen (2008) studied participants ranging from Grade 7 to uni-versity level, and needed to ensure the lower-proficiency Grade 7 students would know the prompt words. They used 24 concrete nouns drawn from six semantic fields which the younger informants were expected to be famil-iar with (people – child, body – stomach, animals – eagle, house – window, food – cheese, geography – mountain) and 24 adjectives (beautiful, hungry, blue, afraid). Fitzpatrick (2006) studied more advanced L2 students who had gained entrance to a British university, and so chose words from the Academic Word List. The AWL does not include the highest frequency words, and includes relatively few concrete nouns, which tend to produce predict-able responses. If the participant pool is more specific, e.g. ESP students in a particular field, it may be worth considering using the technical vocabulary of that field as stimuli.

Another problem Fitzpatrick highlights is the categorization of associ-ation responses. Previous studies have tended to use three main categories: paradigmatic, syntagmatic, and clang. These categories do not account for all possible association types, as indicated by the frequent inclusion of an ‘other’ category for the (sometimes numerous) responses which did not fit comfortably into the main categories, or for which no decipherable link was obvious. Fitzpatrick also suggests that these categorizations are too broad. She points out that the paradigmatic category includes such diverse rela-tionships as synonymy, hierarchal relationships, and quality associations (x is a quality of y), while the syntagmatic categories contains collocations (xy and yx) and words that are part of longer formulaic sequences.

To address this problem, a finer-grained categorization system can be used.

In addition to the paradigmatic, syntagmatic, and clang categories, Albrechtsen et al. (2008) discuss categorizing responses according to whether they are canonical or not, i.e. very common, primary, almost ‘standard’ responses such as black→white, eat→food, and house→home from Table 2.3 in Section 2.4. These associations are so strong that they are likely to have some important role in structuring the lexicon, and so distinguishing between canonical and less common associations may have value. Namei (2004) explored clang/syntagmatic/paradigmatic responses, but also looked at the word frequency of responses. She found that frequency had a relationship to the proficiency of the L1/bilingual informants, with more advanced inform-ants tending to supply more low-frequency and abstract associations.

Fitzpatrick (2006) took the approach of developing a much more detailed system based on three sets of information. First, she looked at the categori-zations from previous research. Second, she examined responses from pre-vious studies and determined which categories were necessary to classify those responses. Third, she drew upon Nation’s (2001) word knowledge tax-onomy to identify three main categories (meaning, position, and form) and 17 subcategories in total. The complete system is illustrated in Table 5.4.



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


Table 5.4 Fitzpatrick categories for word association responses (x = stimulus word, y = response word)

Category Subcategory Definition

Meaning-based associations

Defining synonym x means the same as y

Specific synonym x can mean y in some specific contexts

Hierarchical/lexical set relationship

x and y are in the same lexical set or are coordinates or have a meronymous or superordinate relationship

Quality association y is a quality of x or x is a quality of y

Context association y gives a conceptual context for xConceptual association x and y have some other

conceptual link

Position-based associations

Consecutive xy collocation y follows x directly, or with only an article between them (includes compounds)

Consecutive yx collocation y precedes x directly, or with only an article between them (includes compounds)

Phrasal xy collocation y follows x in a phrase but with a word (other than an article) or words between them

Phrasal yx collocation y precedes x in a phrase but with a word (other than an article) or words between them

Different word class collocation

y collocates with x + affix

Form-based associations

Derivational affix difference y is x plus or minus derivational affix

Inflectional affix difference y is x plus or minus inflectional affix

Similar form only y looks or sounds similar to x but has no clear meaning link

Similar form association y is an associate of a word with a similar form to x

Erratic associations

False cognate y is related to a false cognate of x in the Ll

No link y has no decipherable link to x

(Fitzpatrick, 2006: 131).



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


There are a number of other methodological issues in association research. One is using an adequate number of prompt words. Some studies have used only a handful of stimulus items, e.g. Kruse, Pankhurst, and Sharwood Smith (1987) used 12, while Ruke-Dravina (1971) used only four. Clearly this is not much of a sample to extrapolate from. Much better in this respect is Henriksen (2008), which used 48 stimulus words. We know that associa-tion responses vary according to word class, semantic category, word fre-quency, etc., and so it is necessary to have enough stimuli to smooth out the inevitable variation in order to obtain useful results. If it is only possible for practical reasons to use a limited number of prompts, it may be necessary to constrain the prompts to a narrowly focused set, e.g. only adjectives from a certain frequency range.

Another issue is how many responses to require from participants. Most studies have asked for a single response, and this makes sense if there are numerous stimuli, or if the study is interested in whether canonical responses are given. But it is also possible to ask for several responses (e.g. Schmitt, 1998c, asked for three). This may be more appropriate if a researcher is inter-ested in depth of knowledge, as several appropriate responses give evidence of a greater breadth of lexical knowledge than a single appropriate response does, and because the simple ability to produce three responses at all will also discriminate between learners.

Asking for multiple responses can also be an expedient for gathering greater amounts of data from a limited number of stimuli. Indeed, some stud-ies have asked informants to list as many associations as possible, with the total number produced being the variable of interest. However, if too many responses are asked for, there is the danger that the later responses in the chain will actually be associations of earlier responses rather than responses to the original prompt word (e.g. snow→cold, winter, ski, white, black).

Sometimes it is difficult to categorize a response in terms of its relation-ship to the prompt item. Fitzpatrick (2006: 125) gives the example of the stimulus partnership and response business. It could be a collocation (They have a business partnership) or it could be a synonymous response (Their partnership/business went bankrupt). She also gives the cautionary example of habit→red eyes, grass, big ears, where it was only the third response which gave the researchers the clue that the first two were not responses in line with a ‘drug habit’, but rather, that the informant mistook the prompt habit for rabbit! Fitzpatrick suggests conducting retrospective interviews with informants to confirm the links between prompts and responses. While this would clearly be too time-intensive if large numbers of participants were involved, a compromise solution would be to check the responses quickly after administration, and then only going back to the informants which produced responses which needed resolving. Alternatively, Henriksen (2008) added a think-aloud session which she used in the coding to help her determine the response types.



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


Word association data can be analyzed in a number of ways. Responses can be sorted into categories and then the relationships between catego-ries explored, as in studies which compare paradigmatic and syntagmatic responses. Association responses can also be compared to norms of various types. Native data can be compared to existing norm lists, and L2 learner responses can be compared to native norms. Responses can also be analyzed according to how many other responses they relate to in a Graph Theory approach (see below).

There are a number of norm lists existing, although many are in out-of-print books, and difficult to access, e.g. Postman and Keppel (1970). Below are one book still in press and two internet sites.

Edinburgh Association Thesaurus• <http://www.eat.rl.ac.uk>The Edinburgh Association Thesaurus is an interactive site which gives word association responses for a number of stimulus words. There is a space into which to type a stimulus word, and its responses are then produced. It is also possible to type in a response in order to see the various stimuli which produced it. About 8,400 stimulus words are available, each having been administered to 100 British university students (not all the same). Thus each stimulus has a maximum of 100 responses. The full report of the collection procedure is available in Kiss, Armstrong, Milroy, and Piper (1973).

The University of South Florida word association, rhyme, and word fragment • norms (D.L. Nelson, C.L. McEvoy, and T.A. Schreiber, 1998)

<http://w3.usf.edu/FreeAssociation/>The website claims to be the largest database of English free associations ever collected in the United States. More than 6,000 participants produced nearly three-quarters of a million responses to 5,019 stimulus words. On average, 150 participants worked with sets of 100–120 words each. About three-quarters of the words are nouns (76%), with adjectives making up 13%, and verbs 7%. The site includes association norms, matrices showing the links between related sets of associations, and information on rhyme and assonance.

Birkbeck Word Association Norms ●

(Moss and Older, 1996)This more recent book contains free association norms for over 2,000 words, collected from groups of 40–50 British English speakers between the ages of 17 and 45. There is also an index of stimulus words organized according to semantic category to aid selection of experimental materials.

Most available norms, including those above, are of native English speak-ers. These are usually used without question, but Fitzpatrick (2007) sounds



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


http://w3.usf.edu/FreeAssociation


a word of warning. She found that her 30 native informants varied hugely in their responses to AWL prompt words not only in terms of the responses themselves, but also in the category of association they produced (i.e. Table 5.4). Her finding of a lack of homogeneity among natives is congruent with earlier research by Rosenzweig (1961, 1964). He found that association responses from speakers of a language can vary according to education, and that responses differed somewhat between speakers of different languages. This suggests researchers need to be careful in how they determine ‘native-like’ responses. Unless a study uses very frequent stimulus words (which tend to have canonical responses), then native-like behavior is likely to con-sist of a wide range of responses.

An important part of association research is deciding on the elicitation methodology. Free association tasks are useful in that they require a pro-duction of association responses, without the ‘hints’ available in a receptive format. However, as informants can come up with any response, these tasks have the difficulty of categorization and scoring, as we have seen above. However, at least the elicitation instruments are easy to design, being some variation of the following basic format:

Write the first word you think when you see each of the following words. Do not think about your answer, but write the first word that comes into your mind.

1. available ———

Henriksen (2008) developed the following scoring system, based on the var-ious developmental shifts in response behavior across proficiency levels: (1) a shift from form-related to meaning-based responses, (2) an increase in the number of canonical responses (based on her two norming groups), and (3) an increase in the number of low frequency responses. The scoring rubric is outlined in Table 5.5. Each response from a participant was rated accord-ing to this scale, and then the scores from all of that participant’s responses were summed. This total score was then divided by the number of responses given, which gave the response type score for the participant. Weaker par-ticipants tend to give the same response to a number of prompt words, and to minimize the effect of such repetition on her dataset, Henriksen adjusted for this by including a lexical variation score. It was calculated by dividing the number of lexical types an individual participant used by the number of semantically related responses given by that participant and multiplied this by 100. For example, in a 50 item association task, if the participant produced 45 response types and repeated five of them, then the calculation would be (45 ÷ 50) × 100 = 90.

Receptive formats have the disadvantage of not requiring active produc-tion of association responses, but also have the considerable advantage of



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


researcher control, where associations on an instrument can be selected and manipulated in a range of ways in order to research various aspects of association knowledge. This was seen in the Word Associates Format test (Section 5.3.2), where several types of association knowledge (paradigmatic, syntagmatic, analytic) were targeted. In addition to a free association task, Henriksen (2008) also used a receptive task to elicit information about the ability to discriminate between strong and weak association links. They selected the five most frequent responses to their target stimulus words from norming lists, and also five responses that were given by only one per-son on the norm lists (i.e. these responses represented potential, but clearly more peripheral, links in the lexical network). The task for the participants was to select the five strong associations.

Select the five words most strongly connected to the key word.

cold war water frost hand hot warm snow pain winter ice

A different approach to analyzing associations is related to Graph Theory. Paul Meara has pursued this approach with a number of colleagues (e.g. Meara and Wolter, 2004; Wilks and Meara, 2002). Instead of looking at individual responses and their categories, they instead concentrate on the interconnectivity of the responses, i.e. network density: the relative number of association links for each stimulus word. They used a software program called V_Links, which presents 10–12 stimulus words, and the participant’s task is to decide whether there is a connection between these words. The words were chosen so that there are some obvious association pairs and some less obvious pairs. The participant is also asked to indicate

Table 5.5 Scores awarded to different response types

Response type Examples from L2 with the stimulus word ‘bread’ Score

Inability to supply an L1 or L2 brød (L1 translation) 0 response (‘unqualified’) bread (repetition of stimulus) 0Form-related red 1Chaining table 1High frequency non-canonical, but semantically related

white, birds 2

High frequency canonical food, water 3Low frequency canonical toast, loaf 4Low frequency non-canonical, but semantically related

grainy, flour 5

(Henriksen, 2008: 50)



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


the strength of any connection they choose on a four-point scale. The V_Links interface is illustrated in Figure 5.6. In this example, the participant has indicated links between quiet–morning, quiet–peace, quiet–sound, heavy–sound, rest–peace, and dream–bed. He is also in the process of indicating a link between sound and health. This methodology clearly discriminates between native and nonnative speakers, with one pilot study showing Japanese EFL learners indicating only about half of the links of native speakers (Meara and Wolter, 2004). They also found only a modest correla-tion between V_Link scores and vocabulary size, indicating that V_Links is measuring something separate from size. This supports the notion that organization is a viable independent construct of vocabulary knowledge in addition to vocabulary size.

Meara has developed a streamlined version of V_Links called V_Quint which presents five randomly-chosen high-frequency stimulus words and asks informants to find a single link between them. His research indicates that the ability to do this can be extrapolated to estimate the total number of links in the lexicon. V_Quint is available on-line with documentation at Meara’s website _lognostics <http://www.lognostics.co.uk>. Meara and his colleagues admit that much more research is required before we know how to use this type of approach to best effect, but the possibilities are certainly exciting. (See more about Meara’s website in Section 6.5. It also contains a free association test, where informants produce up to four responses each for 30 stimulus words, and receive a score directly after finishing the test.)

It is probably noticeable that this section contained less firm guidance on measurement methodology than other measurement sections. This is down to the fact that there is still not a consensus on the best way to use associa-tions in language research, either in terms of how to run the methodology, or in what associations can tell us about the mental lexicon. Also, there is the major problem of finding a way of transforming patterns of association and shifts of patterns into a score which enables researchers to compare across learners and learner groups. Word associations have obvious poten-tial, but the field awaits innovative techniques which can fully exploit this potential.

Quote 5.9 Meara and Wolter on vocabulary size and the lexicon

Vocabulary size is not a feature of individual words: rather it is a characteristic of the test taker’s entire vocabulary.

(2004: 87)



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25



5.6 Measuring attrition and degrees of residual lexical retention

Vocabulary acquisition is dynamic, and while we hope to measure improve-ment in vocabulary mastery, there will inevitably be attrition as well. However, just as vocabulary acquisition is incremental and multi-faceted, we might expect that vocabulary attrition would also be complex. Thus, attrition is not an all-or-nothing concept, but may affect various types of

Figure 5.6 Screen shot from V_Links

(Meara and Wolter, 2004: 91)



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


word knowledge differently. Likewise, it is useful to distinguish between the attrition of receptive and productive mastery of lexical items.

Researchers have explored the attrition of these various lexical aspects through a number of elicitation methodologies: oral production of mono-logues (e.g. Cohen, 1989), production of conversation (Tomiyama, 1999), response to visual stimuli (Hansen and Chen, 2001), recognition of written and spoken form (Weltens, 1989), and measurement of speed of recognition (Grendel, 1993).

Attrition concerns vocabulary researchers in both short-term and long-term guises. This first has been a recurrent theme through this book: knowl-edge of lexical items learned in a study will usually decay over time after the treatment, and so only delayed posttests give a true indication of durable learning. This is the main reason why delayed posttests are so important in vocabulary research (see Section 4.4).

Beyond this, short-term attrition is an important issue for SLA, and vocab-ulary in particular. The key issue is how long a memory trace from an expo-sure can endure, so that it can be subsequently built upon. If this period is exceeded, then the next exposure will merely be ‘starting over’ with no incremental gain. There is very little research to inform this question, although the answer should drive most of pedagogy, at least that concern-ing the earliest learning stages. For example, syllabuses should be designed so that vocabulary recycling occurs within the ‘retention period’. Another example is incidental learning from reading. A learner must read enough so that a new lexical item will be met again before its memory trace disap-pears. The length of the retention period will dictate the maximum number of pages which can be read before the item needs to occur again (for any particular reading rate, i.e. number of pages per day). Again, if this number of pages is exceeded, the acquisition of that item will suffer. The writers of graded readers would particularly benefit from this kind of information. It may well be that different kinds of exposure lead to stronger memory traces, with most current research showing explicit engagement outperforms inci-dental engagement (see Schmitt, 2008, for an overview). Thus, the retention period may vary in systematic ways. A related issue about which little is known is the number of exposures which are necessary to make vocabulary knowledge durable.

Long-term attrition and retention is also of interest to vocabulary research-ers. Studies with this focus usually test people who learned a language previ-ously in their life, but for whatever reason have not used it for a long time. Sometimes there is a record of the previous level of knowledge (e.g. length of study and grades received) and so a T1–long deactivation–T2 approach can be used. For example, Bahrick (1984) studied the loss of L2 Spanish and found that some vocabulary knowledge was retained for more than 50 years. Moreover, recognition was less affected by attrition than production. Overall, Bahrick’s cross-sectional data suggests that vocabulary knowledge



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


declines exponentially for an initial period of from three to six years after instruction, but then remains steady for several decades, although with an additional decline in middle age. However, it appears that learners who achieved relatively high levels of proficiency are more resistant to the ini-tial attrition, and maintain a plateau before attrition begins (e.g. Hansen, Umeda, and McKinney, 2002).

Bahrick’s study focused on the form-meaning link, partially because there were no previous measurements of any other aspect of lexical knowledge, and so only a basic level of word knowledge could be assumed. This is a common problem, as any ‘previous knowledge’ indicators seldom include a precise specification of lexical knowledge, i.e. which lexical items inform-ants knew and how well. However, there have also been some studies where learners have been intentionally tested at the end of their language studies with the express goal of exploring their long-term retention/attrition. In these cases, dedicated lexical measurements can be given. A good example of this is Grendel’s (1993) dissertation research (reported in Weltens and Grendel, 1993), which studied the automaticity of receptive orthographic and semantic knowledge. Orthographic knowledge was measured by a lexical decision task, where Dutch learners of French were asked to judge whether stimuli were words or not. The stimuli included French words (poivre, ‘pep-per’), nonwords containing a high-frequency cluster (poible), nonwords con-taining a low-frequency cluster (poifle) and nonwords with unusual clusters (poizye). The nonwords with high-frequency clusters were expected to be recognized faster than those with low-frequency clusters, because they look more like real words. But if attrition set in, participants would become insensitive to the frequency of certain French vowel or consonant clusters in specific word positions, and so the high- and low-frequency cluster non-words would eventually become indistinguishable for the subjects. The par-ticipants were measured at the end of their language instruction, then after two and four years of language disuse. The results showed that the speed difference between these two categories was maintained across both two and four years of disuse, indicating there was no attrition of the awareness of these phonotactic patterns.

The semantic test was similar, but used a priming paradigm with stimuli including words and nonwords. Half of the real French word targets were primed by semantically related words (doux–dur, ‘soft–hard’), and half of the words were primed by semantically unrelated words (genou–rue, ‘knee–street’). Semantic priming typically speeds recognition (see Section 2.11), and so one would expect words primed by semantically-related words to be recognized faster than those primed by unrelated words. If attrition occurred, it could be expected that that this priming effect would decrease over time. However, the same result was obtained as in the orthographic part of the study, with the size of the priming effect remaining more or less the same over the four-year test period.



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


One of the most commonly used attrition methodologies is the ‘savings paradigm’ (de Bot and Stoessel, 2000). It highlights residual knowledge by comparing the relearning of old, previously-known items with the learn-ing of new items. Informants are tested on a list of previously-known words from the disused language, and those forgotten are noted. The informants are asked to study these forgotten items, along with a number of new words which were not previously known. Sometimes instead of these new words, nonwords are used. If so, they are carefully matched with the previously-known words in terms of complexity, so that the learning burden is equiva-lent. The informants are then tested on both the previously-known words and new words/nonwords, and the percentage of ‘old’ words relearned is compared to the percentage of ‘new’ words/nonwords learned. If there is a higher percentage of old items than new items, then it is assumed that this is because some residual learning remained, facilitating the relearning. In other words, some learning effort is ‘saved’ by the residual learning. Using this methodology, there is no evidence for complete attrition, as there are always some savings effects (see Hansen et al., 2002, for an overview). This indicates that, once learned, vocabulary never completely disappears, but only becomes inactive. Thus, people relearning the vocabulary of a previously-known language, even after a very long time and with no appar-ent knowledge still evident, will enjoy a substantial advantage over people learning the vocabulary of the language for the first time.

However, this conclusion must be limited to words, as, to my knowledge, there has yet been no research on the long-term attrition and retention of formulaic language. Similarly, Meara (2004) comments that the attrition methodologies suffer from the fact that they work with individual lexical items, and do not take into account that lexicons are structured networks of knowledge (see Section 2.4). He feels that computerized simulations can prove illuminative, in that they can model what attrition of such networks might look like. See Section 2.10 for a more detailed discussion.

Quote 5.10 Hansen, Umeda, and McKinney on the ‘Matthew effect’ extending to vocabulary retention

... Stanovich’s (1986) insight from the reading research literature that ‘[t]he rich get richer’ also applies to the relearning of vocabulary. The larger the lexical net-work retained, the greater the chances of reactivating successful pathways to old words and the greater the chances of having the relevant infrastructure in which to integrate new words. Further language attrition studies, incorporating careful control of the original proficiency levels of individual attriters, will allow us to verify the aptness of extending Stanovich’s maxim to read, ‘[t]he rich get richer, and they stay richer.’

(2002: 672–673)



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

260

6 Example Research Projects

Although this research manual is mainly written with the more advanced researcher in mind, I am well aware that everyone has to start someplace. With this in mind, I offer the following ten research projects on which emerging researchers can develop and hone their skills. They have all been designed to be challenging, but still ‘do-able’, hopefully with enough background information to make both the goals and the required meth-odology clear. However, although the basic research designs are relatively straightforward, understanding the full implications of the results will require a more sophisticated mastery of the ideas presented throughout this volume.

On a practical note, I do not mention a minimum number of partici-pants for the experimental studies. As usual, ‘more is better’, but if the main purpose of the study is develop research expertise, even a small number of participants can provide enough data to gain experience with the research techniques. However, if a more rigorous approach using inferential statistics is desired, the rule of thumb seems to be that around 30 participants are required to achieve a normal distribution of the data.

Research Project 1: Estimating the vocabulary size of native and/or nonnative speakers of English

Goal

1. To obtain valid estimates of the vocabulary size of your native and/or nonnative participants

2. To interpret the native results in terms of previous estimates of native lexical size (and/or)

3. To interpret the nonnative results in terms of what language skills their lexical resources will support.



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

Example Research Projects 261

Methodology

This project is essentially a replication of Goulden, Nation, and Read (1990; also available at <http://applij.oxfordjournals.org/cgi/content/abstract/11/4/341>. The first step is to read the original article, and sections of this book concern-ing vocabulary size and size measurement, especially Sections 1.1.2 and 5.2. The article provides five versions of a checklist test, and detailed instructions of how to administer the tests. Given that responses to any version of a test will be variable to some extent, you should use at least two of the versions and average the results. The first two versions are provided below.

Version 1

1 as 11 abstract 21 aviary 2 dog 12 eccentric 22 chasuble 3 editor 13 receptacle 23 ferrule 4 shake 14 armadillo 24 liven 5 pony 15 boost 25 parallelogram 6 immense 16 commissary 26 punkah 7 butler 17 gentian 27 amice 8 mare 18 lotus 28 chiton 9 denounce 19 squeamish 29 roughy10 borough 20 waffle 30 barf

31 comeuppance 41 cupreous32 downer 42 cutability33 geisha 43 regurge34 logistics 44 lifemanship35 panache 45 atrdpia36 setout 46 sporophore37 cervicovaginal 47 hypomagnesia38 abruption 48 cowsucker39 kohl 49 oleaginous40 acephalia 50 migrationist

Version 2

1 bag 11 avalanche 21 bastinado 2 face 12 firmament 22 countermarch 3 entire 13 shrew 23 furbish 4 approve 14 atrophy 24 meerschaum 5 tap 15 broach 25 patroon 6 jersey 16 con 26 regatta 7 cavalry 17 halloo 27 asphyxiate 8 mortgage 18 marquise 28 curricle 9 homage 19 stationery 29 weta10 colleague 20 woodsman 30 bioenvironmental



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

http://applij.oxfordjournals.org/cgi/content/abstract/11/4/341


31 detente 41 gamp32 draconic 42 paraprotein33 glaucoma 43 heterophyllous34 morph 44 squirearch35 permutate 45 resorb36 thingamabob 46 goldenhair37 piss 47 axbreaker38 brazenfaced 48 masonite39 loquat 49 hematoid40 anthelmintic 50 polybrid

Questions to consider

1. Goulden, Nation, and Read (1990) report that their New Zealand univer-sity undergraduates averaged about 17,000 word families on these tests. How do your native-speaking participants compare? Why are their scores higher or lower than the New Zealand university students?

2. Section 1.1.2 outlines current thinking on how much vocabulary is required to use English in various ways. Based on these targets, what language abili-ties do your nonnative participants have the lexical resources to pursue? If you know your participants’ language proficiency goals, how much vocabulary do they have to learn in order to help achieve those goals?

3. The scores on the various test versions you give are likely to vary. What does the degree of variation tell you about the validity and reliability of the tests? That is, if the scores are quite different, does this reflect a weak-ness in the tests? How much of the variation between versions should be considered a normal reflection of people’s intrinsic variability in doing a series of tasks?

Research Project 2: Exploring word associations

Goal

To explore the nature of word associations across increasing levels of language proficiency, and to explore various word association evaluation techniques.

Methodology

Read the sections on word associations in this volume (2.4 and 5.5). Then select the stimulus items (words and/or phrases), considering the issues dis-cussed in Section 4.5.

Decide how many responses you will require your participants to pro-duce for each stimulus (usually between one and three). If you are inter-ested in exploring how this variable affects the nature of the responses, ask your participants to produce X number of responses for some stimuli and Y number for other stimuli. Fix the stimuli on an instrument (either paper- or computer-based), with clear instructions of what the participants are to do. Administer the instruments to participants of three or more



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


levels of language proficiency. Use native speakers for your highest level of proficiency.

Once collected, the responses will need to be evaluated. Run three sepa-rate analyses. First, use the traditional distinction between paradigmatic, syntagmatic and clang associations. Second, try Albrechtsen, Haastrup, and Henriksen’s (2008) notion of canonical associations. Third, use Fitzpatrick’s categorization system illustrated in Table 5.4.

The first comparison is between native and nonnative responses. How are the responses from your native and nonnative participants similar and how are they different? Another source of native responses is the Edinburgh Associates Thesaurus (EAT) <http://www.eat.rl.ac.uk>. Also compare your nonnative responses with those of the EAT.

It has been posited that association responses reflect improving lexical knowledge and organization as language proficiency advances. How do the responses vary according to the different levels of language proficiency of your participants?

Association responses typically have a great deal of commonality among native speakers, especially among the most frequent responses, but also show considerable variability among the less frequent responses. How do your native responses match up with those of the EAT?


1. How well do the association responses differentiate between levels of pro-ficiency? Can they differentiate between larger differences in proficiency (beginner versus intermediate learner; native versus nonnative speaker)? Can they differentiate between closer levels of proficiency (lower inter-mediate versus higher intermediate learners)?

2. How closely do your native responses compare to those from the EAT? If they are different, how much of the difference might be caused by nationality or education level? (The EAT consists of responses from British university students.)

3. Which of the evaluation methods (paradigmatic/syntagmatic/clang; canonical; Fitzpatrick’s categories) worked best in showing the differ-ences between proficiency levels?

4. How did the responses differ according to whether the stimuli were indi-vidual words versus phrases?

5. How did the nature of the responses vary depending on the number of responses requested for each stimulus word/phrase?

Research Project 3: Validate a vocabulary test with an interview approach

Goal

To determine the extent to which a target vocabulary test is producing a valid indication of the construct it is purporting to measure.



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25



Methodology

Read Chapter 5 from this volume, and Schmitt, Schmitt, and Clapham (2001). Then choose a vocabulary test to analyze. Taking account of the issues brought up in the Chapter 5 discussion, decide what the test is purporting to measure. Consider whether the test focuses only the form-meaning link, or whether it measures other types of word knowledge. Also, consider whether the test taps into a receptive or productive level of mas-tery. Compile a list of the lexical items from the test (or a sample thereof). For each item, write down the element(s) of lexical knowledge which satisfy the criteria of ‘knowing’ the items in this particular test. You will use this list in an interview (see Step 2 below) to judge whether the participants actually ‘know’ the items on the test. The Schmitt et al. article explains how we did this with the Vocabulary Levels Test. We judged the VLT to be a ‘form- recognition’ format (see Section 2.8), and so we focused on the form-meaning link. Our interview list looked something like the one below, and on it the raters indicated whether they thought the form-meaning link was known or not. Because the raters (my wife and I) were educated native speakers, it was not necessary to specify the definitions of the target words on the list. However, you may find it useful to explicitly spell out your knowledge criteria on the list to have it handy for reference during the interview.

Item Knows meaning Does not know 1. birth ——— ——— 2. choice ——— ——— 3. cap ——— ——— 4. attack ——— ——— 5. cream ——— ——— 6. adopt ——— ——— 7. bake ——— ——— 8. burst ——— ——— 9. original ——— ———10. brave ——— ——— ...

You can do the rating yourself, but it is better to have two raters. In this way, if the raters agree (inter-rater reliability), you can be more confident of the results. For vocabulary tests, the inter-rater reliability should be at least .90, although .95+ is desirable.

Next, find some participants which it would be suitable to give this test to. The administration will be given to individual participants in two steps:

Step 1: Allow the participant to take the test under conditions similar to those which would be in place in a normal testing situation (e.g. same amount of time, same instructions). Once the test is completed, take it from



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


the participant, but do not look at it. (This is so that you do not become biased in the second step.)Step 2: In this step you will interview the participant concerning their knowledge of the lexical items on the test using the list you have developed. Probe the participant until you (and your co-rater) are confident you can make a sound judgement concerning the participant’s knowledge of each lexical item.

Once you have finished the interview, compare the results from the test with the results from the interview. Probably the easiest way to do this is by creating a contingency table, such as Table 7 in Schmitt et al.


1. How closely to do the test and interview results tally? What percentage of test items indicates the participants’ ‘true’ lexical knowledge, as indicated by the in-depth interviews?

2. Is there any pattern of test items which do and do not indicate ‘true’ knowledge? That is, are certain types of items inherently stronger or weaker than other types on this particular test?

3. Does the type of lexical item (e.g. higher frequency vocabulary, individ-ual word versus formulaic sequence) make any difference in how well the test items function?

4. If you had two or more raters, how good was your inter-rater reliability?5. Based on your results, how valid is the test?

Research Project 4: Create a technical vocabulary list

Goal

To create a list of technical vocabulary for a specialized field

Methodology

First, look at some of the word lists mentioned in Section 7.4 and their docu-mentation, to get a feeling for what they look like, and how they were com-piled. This project will follow some of the methodology used by Coxhead (2000) in creating the Academic Word List, so also read her article.

Decide which field you wish to compile a list for. For this small-scale project, choose a field that is relatively specific (e.g. instead of the general field of engineering, it is better to choose the narrower subfield of electrical engineering). Consider the target field, and decide on the main categories of inquiry within it. For example, in electrical engineering, the catego-ries might include power, microelectronics, telecommunications, and computing. Then collect a sample of texts for each of those categories. The sample of texts should be as large as your time allows. You should also sam-ple from as wide a range of texts as possible. For example, instead of includ-ing one or two whole books, it is better to sample numerous 2,000-word extracts from a wide range of books. Coxhead describes in detail how she



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


sampled from a wide range of academic texts, and this serves as a good model. She was also careful to sample in a balanced manner, i.e. having the same amount and range of texts from each of her categories.

Scan your texts and extracts into an electronic format, and build a corpus of your specialist field. Then use a concordancing program like WordSmith (see Section 7.3) to compile a frequency-based list of all the words in the corpus. From this list, you will then need to extract the technical words. To do this, you first need to eliminate all of the general English words which are common to all fields. Coxhead did this by eliminating all the words on her initial list that also occurred in the General Service List. You can use the GSL, or use frequency lists from the BNC and eliminate the most frequent 2,000 BNC words from your list.

This should eliminate the high-frequency general English words from your list, and what remains should be the non-general words which still occur fre-quently in your specialized corpus. To further refine your list, check that the remaining words occur across a range of categories in your corpus, and also across a range of texts. (It is no good including a word which only occurs very frequently in a single text.) Again, Coxhead provides detailed advice on how to do this, although your thresholds (e.g. minimum number of texts a word appears in) will likely be much lower than hers. You will have to decide your own thresholds based on what is sensible for your own data. The words that remain after this further refinement will make up your final technical list.


1. How does the final technical list look? Does it seem reasonable? Are there some words included which do not seem to fit? Are there some obvious missing words?

2. If a word list for your target field already exists, how similar is your word list to it?

3. If a dictionary for your target field exists, contrast it to your list. Most dic-tionaries are complied by expert intuition, so how does the one in your field compare to your empirically-based list?

4. Was the list-building process straightforward, or did you find the decision-making difficult? For example, was it difficult to decide on the threshold levels which best produced a viable list of technical vocabulary?

5. What are the pedagogical implications of your list? Would it be useful in developing materials to teach the particular field? Would it be a useful reference for students of the field (or even experts)?

Research Project 5: Compare the effectiveness of different vocabulary teaching techniques

Goal

To compare the effectiveness of different vocabulary teaching techniques in terms of long-term acquisition.



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


Methodology

First, read Chapters 4 and 5 in this volume. Then think about your partici-pant pool at this early stage, because this is a longitudinal study that will require extended cooperation from your participants, and is probably best suited to researchers who already teach an existing class of students, or who have access to one.

Next, think about which teaching techniques you wish to explore. Although you can compare any teaching techniques, there should be some rationale why the ones you choose can be logically compared. For example, it would make sense to compare incidental learning approaches (e.g. learn-ing vocabulary from reading) with explicit approaches (e.g. learning vocab-ulary through dictionary lookup), or to compare the learning accruing from oral versus written input, or to compare different teaching techniques that are commonly used in your school system.

You then need to decide how you are going to define ‘learning’. This will involve consideration of (1) which word knowledge aspects will be required (just the form-meaning link, or some other aspects like derivation knowledge or collocation), and (2) the level of mastery required (usually conceptualized as receptive versus productive mastery). For this project, it is probably better to find an existing test format to use, rather than developing a new one, as this is time consuming and requires considerable expertise to do well. One way to find such formats is to look up research studies in journals and books which have made a similar comparison to that which you wish to make, and see what measurement instruments were used. You will probably have to adapt those instruments to your study (e.g. in terms of target lexical items, or level of difficulty), but this is much easier than starting from scratch.

The next step is to develop a list of target lexical items. The most ecologi-cally sound items would be those which are useful for your students and would be taught anyway. However, you need to control for previous knowl-edge, and this is most easily achieved by using either low-frequency items, or nonwords. See Section 5.1.2 for a detailed discussion of pre- existing knowledge and how to address this issue.

Comparisons of methodologies are usually carried out in one of two ways. The first is to use the different techniques on the same group of students. In this case, the students act as their own controls, i.e. the participants are exactly the same for the different methodologies, and so have the same levels of proficiency, aptitude, motivation, etc. The second way is to use different groups of students, but then it is necessary to determine that they are essentially equivalent in learning ability (usually indicated by some proficiency measure, or in the case of lexical acquisition, a vocabulary size measure). (There are also statistical ways of equating groups, but they are too complex to explain here.)

Decide which approach makes the most sense for your research situation, and set up a research design where the competing methodolo-gies are given an equal chance to succeed, often determined by having



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


the same amount of classroom time. The study will look something like this:

Pretest → Treatment → Immediate posttestor potentially no same amount of time optional, but showstest if low-frequency and attention given to whether treatment hador nonwords are used each method an effect

→ Delayed posttest Shows durable learning

Delayed posttest scores – Pretest scores = acquisition gainsORNo previous knowledge assumed, so delayed posttest scores = acquisition gains

The vocabulary gains from the different methods can then be compared. While this can be done by looking at descriptive statistics like mean or medians, it is normal to check any differences for statistical significance. If two techniques are being compared, it is likely that some form of t-test or nonparametric equivalent will be appropriate. If three or more techniques are being compared, then some form of ANOVA or nonparametric equivalent may be required. This will require either knowledge of statistics, or access to someone who has this knowledge.

This project is one of the more challenging, because it requires considera-ble expertise of research design and statistical analyses. However, it can also be one of the most rewarding, as it can potentially give tangible answers concerning the teaching methodologies which are more effective for the type of students you are involved with.


1. Was any teaching technique better than any other? If there was a difference in vocabulary gains, was it large enough to be statistically significant?

2. Your results come from your particular students and teaching situation, but would it be reasonable to argue that your results are also generaliz-able to other students and teaching situations? If so, which ones?

3. If you measured different aspects of word knowledge, or a different level of mastery, do you think the results would have been the same or different?

4. If you found little difference between the techniques, does this suggest that student factors (age, proficiency, motivation, etc.) might be more important to learning than the teaching techniques used?

5. Do your results suggest any implications for teaching change in your classroom or school system?

Research Project 6: Exploring formulaic language

Goal

To describe the larger patterning around collocations.



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


Methodology

This project explores lexical patterning through corpus analysis. Read Chapter 3 in this volume, with special attention to Section 3.5, and the sec-tions dealing with corpora (1.1.4, 7.2, 7.3). You will be looking for variable expressions similar to the following described in Section 3.5:

Description

[ANIMATE OBJECT] think(s) nothing of [DOING SOME ACTIVITY WHICH IS SURPRISING, UNEXPECTED, OR UNUSUAL]

Usage

It is commonly used to express the meaning ‘someone/something habitu-ally does something which we would not expect’.

Examples

Ron thinks nothing of writing a new book every six months.She thinks nothing of going out at midnight to begin her night on the town.The government thinks nothing of spending millions of the taxpayers’ dollars on worthless projects.

The first step in identifying and describing these expressions is accessing a corpus to query. Section 7.2 introduces an extensive range of corpus resources. For this project, it is probably the best to use a corpus which rep-resents general English, in order to find the phraseology which is common to English as it is most widely used. You will also need to access a concord-ancer, either one you can install on your own computer like WordSmith, or one built into an internet site, like the search engine available to inter-rogate the corpora on the BYU corpus website (see Section 7.2.1).

Once you have accessed these resources, you begin by choosing a number of target words. You can choose either a ‘variety pack’ of words (e.g. words from different word classes, frequencies, imageabilities, etc.) to see if lexical patterning varies according to the words’ characteristics, or choose a set of words which are similar (e.g. adjectives within the first 500 frequency band) to see if the patterning has any similarities.

Use your concordancer to find the main collocations for the target word. Most words have some collocations which can be extracted through t-score, MI, or other corpus analysis. (But note that this varies, and some words seem to lack identifiable collocates. In this case, simply choose another word to analyze.) Choose the ‘strongest’ collocation and call up concordance lines for it. For most concordancers, you simply type in the two words of the col-location. This should bring up a number of concordance lines with the ‘core



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


collocation’ highlighted in the middle of each. Some examples of concord-ance lines for think nothing include:

Shirley’s a gal who thinks nothing of spending $3,000 or moreHe thinks nothing of doing business with theJoe thinks nothing of his hurtful remark

has no money but thinks nothing of flying to Hawaii or She thinks nothing of staying up till three o’clock

You should also be able to sort the words to the left and right of the core col-location in most concordancers. Sort to the left (or right), and try to discern any patterning. Things to look for include:

Any words which consistently appear in a particular position. These ●

might be fixed elements of a larger variable expression, such as of after thinks nothing.Any recurring meaning. Variable expressions are typically used to con- ●

vey particular meanings, and these can only be discovered by a semantic analysis of the text surrounding the core collocation.Any recurring grammatical patterns. Do these help you in your semantic ●

analysis?

Now sort to the other side of the core collocation to find any patterning there.If you have found a variable expression which forms around the core col-

location, write a description of it. Include the fixed elements in italics, and write the semantic constraints for the open ‘slots’ in CAPITALS, as in the following example from Section 3.6:

SOMETHING/ (be) bordered/bordering on AN UNDESIRABLE STATE SOMEONE (OFTEN OF MIND)

Do the same type of analysis with a few other collocations of the same tar-get word. Are the variable expressions similar or different?

If you have access to a concordancer which does a ConCgram/kfNgram type of analysis, then also try this approach to open-slot pattern extraction. How do the results compare with your manual analysis?

Once you have finished your analysis of the first target word, move on to the other target words you have chosen and repeat the above analyses.


1. How much extended patterning did you find?2. Were you able to describe any variable expressions for the core colloca-

tions you looked up?3. Did some kinds of collocations have more patterning than others? For

example, did grammatical patterning make a difference (e.g. adjective +



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


noun versus Verb + object)? Did MI-type collocations lead to more vari-able expressions than t-score-type collocations?

4. Are there any similarities or differences in the lexical patterning you found according to the characteristics of the initial target words?

5. Do different core collocations for a target word each have their own iden-tifiable variable expressions?

6. Which approach seems better able to identify variable expressions: a manual analysis (like you carried out), or an automated one (like kfNgrams)?

Research Project 7: Exploring the vocabulary task types in language textbooks

Goal

To explore how language textbooks introduce and teach vocabulary.

Methodology

In this project, you will analyze a number of language textbooks, and consider what vocabulary is introduced, what word knowledge aspects are addressed, and how it is practised and recycled. As a start, read the background informa-tion on vocabulary knowledge in Section 1.1.5 and Chapter 2.

Next, find a number of textbooks you would like to analyze. You can choose textbooks which are similar in kind (e.g. integrated four-skill text-books teaching general English to intermediate students) and see if their treatment of vocabulary is similar or not. Alternatively, you could choose textbooks for different proficiency levels, or which teach different types of English (e.g. business English, academic English), and see if the vocabulary introduced and treatments used vary according to level or type. For this project, analyzing between five and ten textbooks should be about right.

Look at each textbook and first make a list of all the vocabulary which is explicitly introduced. Do not include the lexical items which occur as part of general language usage; rather, focus on the vocabulary which is somehow highlighted in some way. Some examples of this highlighting could include items occurring on word lists, items which are defined, items which are in bold format to attract attention, items linked to a picture which illustrates their meaning, and items in ‘vocabulary boxes’. There will be many fuzzy cases, and you will have to develop criteria for your selected textbooks to decide consistently whether certain items are explicitly highlighted or not.

Once you have decided on the target vocabulary, go back and explore how that vocabulary is treated. Some aspects you should consider include the following, but you will no doubt find other aspects you will wish to follow up:

What word knowledge aspects are addressed in the exercises? ●

Is there any logical progression in the way different word knowledge ●

aspects are addressed?



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


Is new vocabulary merely highlighted, or is it taught (i.e. is information ●

given which will help the student understand the new items)?After vocabulary is initially introduced, is it recycled in any principled way? ●

Does any recycling extend beyond the unit the item was introduced in?Is the authors’ intention to teach the lexical items to receptive or produc- ●

tive level of mastery? Or does this vary through the textbook?

If you have the resources, scan the books and then analyze them electroni-cally with a concordancing package or with Lextutor (Section 7.3). If you are able to do this, the amount of repetition and recycling of lexical items will be calculated for you by the software. You will also be able to get an overview of the entire vocabulary content of the textbooks. You can then compare the frequency profiles of each textbook with another, which should provide use-ful insights into the relative difficulty of each book’s vocabulary content.

When reporting your findings, you will want to put the results in a table format to make it easier for your readers to compare across the different textbooks.


In addition to the questions outlined above, you may wish to consider the following:

1. Are the word knowledge aspects addressed appropriate for the level of student the textbook is written for?

2. Does it seem that there is enough information given and enough practice and recycling generated for the target items to reach the desired receptive or productive level of mastery?

3. What is the balance between individual words highlighted vs. formulaic sequences?

4. Are individual words taught by themselves, or as part of a word family?5. Does it appear that the target lexical items are presented according to

a rationale, which allows the principled selection of items according to level or type of English?

Research Project 8: Exploring the repetition of vocabulary in texts

Goal

To determine whether the repetition of vocabulary in texts is enough to sup-port incidental vocabulary learning.

Methodology

Research suggests that it takes something like eight to ten exposures to establish the form-meaning link from incidental vocabulary learning. Graded readers



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


are written to maximize vocabulary recycling both by limiting the range of vocabulary used, and by using the principle of repeating vocabulary whenever possible. Also, children’s reading books often follow these principles, as do teenage novels to a lesser extent. Conversely, adult authentic texts generally have less repetition. In all of these text types, high-frequency vocabulary will almost always be repeated more often than low frequency vocabulary.

In this project, you will explore the repetition of vocabulary in several types of texts, and think about the implications of that repetition (or lack thereof) for incidental vocabulary learning for L2 learners. The background reading for this project includes Sections 1.1.6, 1.2, 2.5, and 2.8.You will analyze four types of text:

Graded readers meant for L2 learners (it would be interesting to look at ●

several levels of graded reader in the analysis)Children’s books meant for L1 beginning readers ●

Teenage novels or other literature meant for the developing L1 reader ●

Authentic texts meant for adult L1 readers (e.g. newspapers, magazines, ●

novels)

Collect materials from each of these categories. The texts designed for begin-ning readers (either in the L1 or L2) will generally be relatively short, and so a number of whole texts will be required. The teenage novels/literature and adult texts may be lengthy, and so it might be necessary to sample from the longer texts. However, many of these texts will be shorter (e.g. newspaper stories and magazine articles) and can be used in their entirety.

Scan the texts into an electronic format, and analyze them electronically with a concordancing package or with Lextutor (Section 7.3). These will give you frequency lists and show the amount of repetition and recycling of lexi-cal items within the texts.

Compare the amount of repetition among the four text types, according to the questions below. Next, choose several target words which have been repeated to different degrees (i.e. some frequently repeated, some seldom repeated). How many words of text would a learner need to read in order to be exposed to a target word eight times? Determine (through interview or questionnaire) the number of L2 words the learners you are associated with read in a typical day. Alternatively, assume 1,000 words per day for the pur-poses of the project. Then calculate how many days of reading it would take to reach the eight-exposure threshold for each target word. Is the period ‘compact’ enough that learning can incrementally accrue, or is it so spread out that words will likely be forgotten between the exposure intervals?


1. Is there more repetition for the lower-level texts than for the more advanced ones?



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


2. Do the texts at each level have similar amounts of repetition?3. If you compared different levels of graded reader, does the amount of

repetition vary according to level?4. What kind of vocabulary is repeated more often, and what kind only

occurs once or twice?5. Given the amount of repetition you find, how suitable is each kind of text

for supporting vocabulary learning by second-language learners?6. How do the different text types vary in their overall frequency profiles?

How do these profiles make the different text types suitable for L2 learn-ers of different proficiencies?

7. Given the amount of reading L2 learners do, is there enough repetition to support incidental vocabulary learning? Or is the amount of reading required between exposures simply too great for learning to build up?

Research Project 9: Measuring incidental vocabulary learning from reading

Goal

To measure the amount of incidental vocabulary learning resulting from reading for pleasure.

Methodology

This project is a replication of the classic Clockwork Orange research design which was first used by Saragi, Nation, and Meister (1978). They had native English speakers read the novel A Clockwork Orange, which included Russian slang words called nadsat. Afterwards, they tested for knowledge of the nad-sat words. Since none of the readers spoke Russian, or were exposed it outside the novel, any knowledge of the words must have been acquired incidentally from reading the novel. This design was most recently applied to second-language learners by Pellicer Sánchez and Schmitt (in press). We used an easier novel (Things Fall Apart) which included words from the Nigerian lan-guage Ibo, and a more extensive vocabulary test battery which measured recognition of spelling, recall of word class, and both recognition and recall of meaning.

To prepare for this project, first read Section 1.2, and the Saragi, et al. and the Pellicer Sánchez and Schmitt articles. In this project, you will use the novel Things Fall Apart and the methodology from Pellicer Sánchez and Schmitt. The article lists the 34 target words to use. It also gives examples of the for-mats for the spelling, word class, and meaning tests. Using these models, you will need to write tests for each word knowledge aspect for each target word.

Once you have the test battery prepared, give each of your participants a copy of Things Fall Apart to read. Tell them to read it as they normally would, to enjoy it, and that you will discuss it with them after they finished. Do NOT tell them that you will be focusing on their vocabulary acquisition!



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


After they have read the novel, administer the vocabulary test battery. Because they will have no previous knowledge of, or exposure to, Ibo, any knowledge they have of the Ibo words must be considered learning gains from the novel.1


1. How much learning of spelling, word class, and meaning occurred inci-dentally from the pleasure reading?

2. Was there more learning of some word knowledge aspects than others?3. Did the frequency of occurrence of the target words make any difference?

That is, were more-frequently-repeated words learned better than words repeated less often?

4. How do your results compare to those reported by Pellicer Sánchez and Schmitt? If they are different, how much of the variation might have been caused by different participants and how much by different tests?

5. Based on your results, do you feel that reading novels for pleasure is a viable way to learn new vocabulary?

Research Project 10: Exploring the relationship between receptive and productive knowledge of vocabulary

Goal

To explore and compare the different levels of mastery of the form-meaning link.

Methodology

Receptive knowledge of vocabulary is usually thought to precede productive knowledge, and so receptive vocabulary sizes are usually larger than produc-tive ones. However, this is complicated by several factors, as discussed in Section 2.8. Different word knowledge aspects are learned earlier or later, and so some may be at a productive level (e.g. spelling) while others may still be unknown or at a receptive level (e.g. collocation). Also, the degree of receptive and productive knowledge measured is highly dependent on the test instruments used.

In this project, you will compare the receptive and productive levels of mastery of a single word knowledge aspect: the form-meaning link. To pre-pare, first read Sections 1.1.8, 2.1, 2.2, 2.5, 2.8, 5.2.3, and 7.4, and Laufer and Goldstein (2004) and Laufer, Elder, Hill, and Congdon (2004). You will be using the test format originally developed by Laufer and Goldstein and used in their CATSS test. However, we will use my terminology to label the different test components and discuss the results.

Laufer and Goldstein insightfully realized that testing receptive versus productive knowledge of the form-meaning link involves two elements: which word knowledge aspect is required (form or meaning), and which



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


degree of mastery is required (recognition or recall). This leads to four pos-sible levels of mastery, which I have relabelled as follows:

1. Form recall: d———hund2. Meaning recall: dog h———3. Form recognition: hund a. cat b. dog c. mouse d. bird4. Meaning recognition: dog a. katze b. hund c. maus d. vogel(L1 = German [hund]; L2 = English [dog])

You will use these four formats to develop a form-meaning test battery for your participants. Note that the test battery makes extensive use of transla-tion. Therefore, this project is only suitable to researchers who (a) have a pool of participants with the same L1, and (b) are competent in both languages.

To develop the tests, first access a frequency list covering your L2 vocabu-lary. Sample from this list from the highest frequency levels to the low-est frequency level where you think your participants might know only a few words. Pilot this list with participants similar to those you will use in the study. Find the frequency band between where most learners know the words well and where most learners barely recognize the words. Within this band, sample words at a uniform rate (e.g. ten words per 1,000 frequency band). Once you have selected the target words, write four items for each, following the model of the items illustrated above. Of course, the number of target words you can put on the test will be constrained by the amount of time you have to administer the test battery.

Administer the test battery to your participants. First, give all of the form recall items in frequency order, then all of the meaning recall items, then all of the form recognition items, and finally all of the meaning recognition items. Calculate the means for all of the form-meaning tests. If you have enough par-ticipants and the statistical expertise, run a statistical analysis on the results.


1. Is there a hierarchy of form-meaning knowledge? For example, is mean-ing recognition the best known and form recall the least known?

2. Laufer and Goldstein (2004) and Laufer et al. (2004) found a hierarchy in their studies. How does your hierarchy (if any) compare with theirs?

3. If you find a hierarchy, check and see if all of the individual participants follow it. Or do many of the participants produce a different order of vocabulary knowledge than the rest of the group?

4. Are there frequency bands which consistently map onto certain form-meaning levels of knowledge? For example, is there a frequency level where most of your participants have form-recognition knowledge, and a different one for meaning-recall?

5. Given your results, do you now feel that a typical meaning recognition multiple-choice test is an adequate measure of ‘knowing’ a word?



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

Part 4

Resources



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

279

7 Vocabulary Resources

7.1 Instruments

7.1.1 Vocabulary Levels Test

These two versions of the VLT were developed by Schmitt, Schmitt, and Clapham (2001), where the validation evidence is presented. See Read (2000) for discussion of the test, and Nation and Gu (2007) for additional informa-tion on how to use and interpret the results. The test is © Norbert Schmitt, but is freely available for research and pedagogical purposes, as long as they are non-commercial.

This is a vocabulary test. You must choose the right word to go with each meaning. Write the number of that word next to its meaning. Here is an example.

l business2 clock ——— part of a house3 horse ——— animal with four legs4 pencil ——— something used for writing5 shoe6 wall

You answer it in the following way.

l business2 clock 6 part of a house

3 horse 3 animal with four legs4 pencil 4 something used for writing5 shoe6 wall

Some words are in the test to make it more difficult. You do not have to find a meaning for these words. In the example above, these words are busi-ness, clock, and shoe.

If you have no idea about the meaning of a word, do not guess. But if you think you might know the meaning, then you should try to find the answer.

—————————



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

280

1 bi

rth

2 d

ust

—

——

gam

e3

oper

atio

n

——

— w

inn

ing

4 ro

w

——

— b

ein

g b

orn

5 sp

ort

6 vi

ctor

y

1 ch

oice

2 cr

op

——

— h

eat

3 fl

esh

—

——

mea

t4

sala

ry

——

— m

oney

pai

d r

egu

larl

y fo

r5

secr

et

doi

ng

a jo

b6

tem

per

atu

re

1 ca

p2

edu

cati

on

——

— t

each

ing

and

lear

nin

g3

jou

rney

—

——

nu

mb

ers

to m

easu

re w

ith

4 p

aren

t —

——

goi

ng

to a

far

pla

ce5

scal

e6

tric

k

1 at

tack

2 ch

arm

—

——

gol

d a

nd

sil

ver

3 la

ck

——

— p

leas

ing

qu

alit

y4

pen

—

——

not

hav

ing

som

eth

ing

5 sh

adow

6 tr

easu

re

1 ad

opt

2 cl

imb

——

— g

o u

p3

exam

ine

——

— lo

ok a

t cl

osel

y4

pou

r —

——

be

on e

very

sid

e5

sati

sfy

6 su

rrou

nd

1 ba

ke2

con

nec

t —

——

join

tog

eth

er3

inq

uir

e —

——

wal

k w

ith

out

pu

rpos

e4

lim

it

——

— k

eep

wit

hin

a c

erta

in s

ize

5 re

cogn

ize

6 w

and

er

1 bu

rst

2 co

nce

rn

——

— b

reak

op

en3

del

iver

—

——

mak

e b

ette

r4

fold

—

——

tak

e so

met

hin

g to

som

eon

e5

imp

rove

6 u

rge

1 or

igin

al2

pri

vate

—

——

fir

st3

roya

l —

——

not

pu

blic

4 sl

ow

——

— a

ll a

dd

ed t

oget

her

5 so

rry

6 to

tal

Ver

sion

1: T

he 2

,00

0 w

ord

leve

l



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

281

1 cr

eam

2 fa

ctor

y —

——

par

t of

mil

k3

nai

l —

——

a lo

t of

mon

ey4

pu

pil

—

——

per

son

wh

o is

stu

dyi

ng

5 sa

crif

ice

6 w

ealt

h

1 br

ave

2 el

ectr

ic

——

— c

omm

only

don

e3

firm

—

——

wan

tin

g fo

od

4 h

un

gry

——

— h

avin

g n

o fe

ar5

loca

l6

usu

al

Ver

sion

1: T

he 3

,00

0 w

ord

leve

l1

bel

t2

clim

ate

——

— id

ea3

exec

uti

ve

——

— i

nn

er s

urf

ace

of y

our

han

d4

not

ion

—

——

str

ip o

f le

ath

er w

orn

5

pal

m

ar

oun

d t

he

wai

st6

vict

im

1 ac

id2

bish

op

——

— c

old

fee

lin

g3

chil

l —

——

far

m a

nim

al4

ox

——

— o

rgan

izat

ion

or

fram

ewor

k5

rid

ge6

stru

ctu

re

1 b

ench

2 ch

arit

y —

——

lon

g se

at3

jar

——

— h

elp

to

the

po

or4

mat

e —

——

par

t of

a c

oun

try

5 m

irro

r6

pro

vin

ce

1 b

etra

y2

dis

pos

e —

——

fri

ghte

n3

embr

ace

——

— s

ay p

ubl

icly

4 in

jure

—

——

hu

rt s

erio

usl

y5

pro

clai

m6

scar

e

1 en

cou

nte

r2

illu

stra

te

——

— m

eet

3 in

spir

e —

——

beg

for

hel

p4

ple

ad

——

— c

lose

com

ple

tely

5 se

al6

shif

t

1 as

sist

2 b

oth

er

——

— h

elp

3 co

nd

emn

—

——

cu

t n

eatl

y4

erec

t —

——

sp

in a

rou

nd

qu

ickl

y5

trim

6 w

hir

l

Ver

sion

1: T

he 3

,00

0 w

ord

leve

l – C

onti

nu

ed



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

282

1 b

oot

2 d

evic

e —

——

arm

y of

fice

r3

lieu

ten

ant

——

— a

kin

d o

f st

one

4 m

arbl

e —

——

tu

be

thro

ugh

wh

ich

blo

od

5 p

hra

se

f

low

s6

vein

1 ap

artm

ent

2 ca

nd

le

——

— a

pla

ce t

o li

ve3

dra

ft

——

— c

han

ce o

f so

met

hin

g4

hor

ror

h

app

enin

g5

pro

spec

t —

——

fir

st r

ough

for

m o

f6

tim

ber

som

eth

ing

wri

tten

1 an

nu

al2

con

ceal

ed

——

— w

ild

3 d

efin

ite

——

— c

lear

an

d c

erta

in4

men

tal

——

— h

app

enin

g on

ce a

yea

r5

pre

viou

s6

sava

ge

1 d

im2

jun

ior

——

— s

tran

ge3

mag

nif

icen

t —

——

won

der

ful

4 m

ater

nal

—

——

not

cle

arly

lit

5 o

dd

6 w

eary

Ver

sion

1: T

he 3

,00

0 w

ord

leve

l – C

onti

nu

ed

1 ba

llo

on2

fed

erat

ion

—

——

bu

cket

3 n

ovel

ty

——

— u

nu

sual

in

tere

stin

g th

ing

4 p

ail

——

— r

ubb

er b

ag t

hat

is

fill

ed5

vete

ran

wit

h a

ir6

war

d

1 al

coh

ol2

apro

n

——

— s

tage

of

dev

elop

men

t3

hip

—

——

sta

te o

f u

nti

din

ess

or4

lure

dir

tin

ess

5 m

ess

——

— c

loth

wor

n i

n f

ron

t to

6 p

has

e

pro

tect

you

r cl

oth

es

1 bl

end

2 d

evis

e —

——

mix

tog

eth

er3

hu

g —

——

pla

n o

r in

ven

t4

leas

e —

——

hol

d t

igh

tly

in y

our

arm

s5

pla

gue

6 re

ject

1 ab

olis

h2

dri

p

——

— b

rin

g to

an

en

d b

y la

w3

inse

rt

——

— g

ues

s ab

out

the

futu

re4

pre

dic

t —

——

cal

m o

r co

mfo

rt s

omeo

ne

5 so

oth

e6

thri

ve

Ver

sion

1: T

he 5

,00

0 w

ord

leve

l



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

283

1 ap

par

atu

s2

com

pli

men

t —

——

exp

ress

ion

of

adm

irat

ion

3 le

dge

—

——

set

of

inst

rum

ents

or

4 re

ven

ue

m

ach

iner

y5

scra

p

——

— m

oney

rec

eive

d b

y th

e6

tile

Gov

ern

men

t

1 bu

lb2

do

cum

ent

——

— f

emal

e h

orse

3 le

gion

—

——

lar

ge g

rou

p o

f so

ldie

rs o

r4

mar

e

peo

ple

5 p

uls

e —

——

a p

aper

th

at p

rovi

des

6 tu

b

info

rmat

ion

1 co

ncr

ete

2 er

a

——

— c

ircu

lar

shap

e3

fib

er

——

— t

op o

f a

mou

nta

in4

loop

—

——

a lo

ng

per

iod

of

tim

e5

pla

nk

6 su

mm

it

1 bl

eed

2 co

llap

se

——

— c

ome

bef

ore

3 p

rece

de

——

— f

all

dow

n s

ud

den

ly4

reje

ct

——

— m

ove

wit

h q

uic

k st

eps

and

5 sk

ip

ju

mp

s6

teas

e

1 ca

sual

2 d

esol

ate

——

— s

wee

t-sm

elli

ng

3 fr

agra

nt

——

— o

nly

on

e of

its

kin

d4

rad

ical

—

——

go

od

for

you

r h

ealt

h5

un

iqu

e6

wh

oles

ome

1 gl

oom

y2

gros

s —

——

em

pty

3 in

fin

ite

——

— d

ark

or s

ad4

lim

p

——

— w

ith

out

end

5 sl

im6

vaca

nt

1 an

tics

2 ba

tch

—

——

fo

olis

h b

ehav

ior

3 co

nn

oiss

eur

——

— a

gro

up

of

thin

gs4

fore

bo

din

g —

——

per

son

wit

h a

go

od

5 h

aun

ch

kn

owle

dge

of

art

or m

usi

c6

scaf

fold

1 ac

qu

iesc

e 2

bask

—

——

to

acce

pt

wit

hou

t p

rote

st3

crea

se

——

— s

it o

r li

e en

joyi

ng

war

mth

4 d

emol

ish

—

——

mak

e a

fold

on

clo

th o

r5

over

hau

l

pap

er6

rap

e

Ver

sion

1: T

he 1

0,0

00

wor

d le

vel

Ver

sion

1: T

he 1

0,0

00

wor

d le

vel –

Con

tin

ued



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

284

1 au

spic

es2

dre

gs

——

— c

onfu

sed

mix

ture

3 h

osta

ge

——

— n

atu

ral

liq

uid

pre

sen

t in

th

e4

jum

ble

m

outh

5 sa

liva

—

——

wor

st a

nd

mos

t u

sele

ss6

tru

ce

p

arts

of

anyt

hin

g

1 ca

sual

ty2

flu

rry

——

— s

omeo

ne

kill

ed o

r in

jure

d3

frot

h

——

— b

ein

g aw

ay f

rom

oth

er4

reve

lry

p

eop

le5

rut

——

— n

oisy

an

d h

app

y6

secl

usi

on

ce

lebr

atio

n

1 ap

par

itio

n2

bot

any

——

— g

hos

t3

exp

uls

ion

—

——

stu

dy

of p

lan

ts4

inso

len

ce

——

— s

mal

l p

ool

of

wat

er5

leas

h6

pu

dd

le

1 ar

sen

al2

barr

acks

—

——

hap

pin

ess

3 d

eaco

n

——

— d

iffi

cult

sit

uat

ion

4 fe

lici

ty

——

— m

inis

ter

in a

ch

urc

h5

pre

dic

amen

t 6

spor

e

1 bl

asp

hem

e 2

end

orse

—

——

sli

p o

r sl

ide

3 n

urt

ure

—

——

giv

e ca

re a

nd

fo

od

to

4 sk

id

——

— s

pea

k ba

dly

ab

out

Go

d5

squ

int

6 st

ragg

le

1 cl

inch

2 jo

t —

——

mov

e ve

ry f

ast

3 m

uti

late

—

——

in

jure

or

dam

age

4 sm

old

er

——

— b

urn

slo

wly

wit

hou

t fl

ame

5 to

pp

le6

wh

iz

1 au

xil

iary

2 ca

nd

id

——

— b

ad-t

emp

ered

3 lu

scio

us

——

— f

ull

of

self

-im

por

tan

ce4

mor

ose

—

——

hel

pin

g, a

dd

ing

sup

por

t5

pal

lid

6 p

omp

ous

1 d

ubi

ous

2 im

pu

den

t —

——

ru

de

3 la

ngu

id

——

— v

ery

anci

ent

4 m

otle

y —

——

of

man

y d

iffe

ren

t ki

nd

s5

opaq

ue

6 p

rim

eval

Ver

sion

1: T

he 1

0,0

00

wor

d le

vel –

Con

tin

ued



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

285

Ver

sion

1: A

cade

mic

Voc

abul

ary

1 b

enef

it2

lab

or

——

— w

ork

3 p

erce

nt

——

— p

art

of 1

00

4 p

rin

cip

le

——

— g

ener

al id

ea u

sed

to

5 so

urc

e

guid

e on

e’s

acti

ons

6 su

rvey

1 el

emen

t —

——

mon

ey f

or a

sp

ecia

l2

fun

d

p

urp

ose

3 la

yer

——

— s

kill

ed w

ay o

f d

oin

g4

ph

ilos

oph

y

som

eth

ing

5 p

rop

orti

on

——

— s

tud

y of

th

e m

ean

ing

6 te

chn

iqu

e

of l

ife

1 co

nse

nt

2 en

forc

emen

t —

——

tot

al3

inve

stig

atio

n

——

— a

gree

men

t or

per

mis

sion

4 p

aram

eter

—

——

try

ing

to f

ind

in

form

atio

n5

sum

abou

t so

met

hin

g6

tren

d

1 d

ecad

e2

fee

——

— 1

0 ye

ars

3 fi

le

——

— s

ubj

ect

of a

dis

cuss

ion

4 in

cid

ence

—

——

mon

ey p

aid

for

ser

vice

s5

per

spec

tive

6 to

pic

1 ac

hie

ve2

con

ceiv

e —

——

ch

ange

3 gr

ant

—

——

con

nec

t to

geth

er4

lin

k

——

— f

inis

h s

ucc

essf

ull

y5

mo

dif

y6

offs

et

1 co

nve

rt2

des

ign

—

——

kee

p o

ut

3 ex

clu

de

——

— s

tay

aliv

e4

faci

lita

te

——

— c

han

ge f

rom

on

e th

ing

5 in

dic

ate

in

to a

not

her

6 su

rviv

e

1 an

tici

pat

e2

com

pil

e —

——

con

trol

som

eth

ing

skil

lfu

lly

3 co

nvi

nce

—

——

exp

ect

som

eth

ing

wil

l4

den

ote

hap

pen

5 m

anip

ula

te

——

— p

rod

uce

bo

oks

and

6 p

ubl

ish

new

spap

ers

1 eq

uiv

alen

t2

fin

anci

al

——

— m

ost

im

por

tan

t3

fort

hco

min

g —

——

con

cern

ing

sigh

t4

pri

mar

y

——

— c

once

rnin

g m

oney

5 ra

nd

om6

visu

al

Ver

sion

1: A

cade

mic

Voc

abul

ary

– C

onti

nu

ed



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

286

Ver

sion

2: T

he 2

,00

0 w

ord

leve

l1

cop

y2

even

t —

——

en

d o

r h

igh

est

poi

nt

3 m

otor

—

——

th

is m

oves

a c

ar4

pit

y —

——

th

ing

mad

e to

be

like

5 p

rofi

t

anot

her

6 ti

p

1 ac

cid

ent

2 d

ebt

——

— lo

ud

dee

p s

oun

d3

fort

un

e —

——

som

eth

ing

you

mu

st p

ay4

pri

de

——

— h

avin

g a

hig

h o

pin

ion

of

5 ro

ar

yo

urs

elf

6 th

read

1 co

ffee

2 d

isea

se

——

— m

oney

for

wor

k3

just

ice

——

— a

pie

ce o

f cl

oth

ing

4 sk

irt

——

— u

sin

g th

e la

w i

n t

he

righ

t5

stag

e

way

6 w

age

1 ad

mir

e2

com

pla

in

——

— m

ake

wid

er o

r lo

nge

r3

fix

—

——

bri

ng

in f

or t

he

firs

t ti

me

4 h

ire

—

——

hav

e a

hig

h o

pin

ion

of

5 in

tro

du

ce

so

meo

ne

6 st

retc

h

1 ar

ran

ge2

dev

elop

—

——

gro

w3

lean

—

——

pu

t in

ord

er4

owe

——

— l

ike

mor

e th

an s

omet

hin

g5

pre

fer

el

se6

seiz

e

1 bl

ame

2 el

ect

——

— m

ake

3 ju

mp

—

——

ch

oos

e by

vot

ing

4 m

anu

fact

ure

—

——

bec

ome

like

wat

er5

mel

t6

thre

aten

1 co

llea

gue

2 er

osio

n

——

— a

ctio

n a

gain

st t

he

law

3 fo

rmat

—

——

wea

rin

g aw

ay g

rad

ual

ly4

incl

inat

ion

—

——

sh

ape

or s

ize

of s

omet

hin

g5

pan

el6

viol

atio

n

1 al

tern

ativ

e2

ambi

guou

s —

——

las

t or

mos

t im

por

tan

t3

emp

iric

al

——

— s

omet

hin

g d

iffe

ren

t th

at4

eth

nic

can

be

chos

en5

mu

tual

—

——

con

cern

ing

peo

ple

fro

m6

ult

imat

e

a ce

rtai

n n

atio

n

Ver

sion

1: A

cade

mic

Voc

abul

ary

– C

onti

nu

ed



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

287

1 cl

erk

2 fr

ame

——

— a

dri

nk

3 n

oise

—

——

off

ice

wor

ker

4 re

spec

t —

——

un

wan

ted

sou

nd

5 th

eate

r6

win

e

1 d

ozen

2 em

pir

e —

——

ch

ance

3 gi

ft

——

— t

wel

ve4

opp

ortu

nit

y —

——

mon

ey p

aid

to

the

5 re

lief

gove

rnm

ent

6 ta

x

1 an

cien

t2

curi

ous

——

— n

ot e

asy

3 d

iffi

cult

—

——

ver

y ol

d4

enti

re

——

— r

elat

ed t

o G

od

5 h

oly

6 so

cial

1 bi

tter

2 in

dep

end

ent

——

— b

eau

tifu

l3

love

ly

——

— s

mal

l4

mer

ry

——

— l

iked

by

man

y p

eop

le5

pop

ula

r6

slig

ht

1 bu

ll2

cham

pio

n

——

— f

orm

al a

nd

ser

iou

s m

ann

er3

dig

nit

y —

——

win

ner

of

a sp

orti

ng

even

t4

hel

l —

——

bu

ild

ing

wh

ere

valu

able

5 m

use

um

obj

ects

are

sh

own

6 so

luti

on

1 bl

anke

t2

con

test

—

——

hol

iday

3 ge

ner

atio

n

——

— g

oo

d q

ual

ity

4 m

erit

—

——

wo

ol c

over

ing

use

d o

n5

plo

t

bed

s6

vaca

tion

1 ab

and

on2

dw

ell

——

— l

ive

in a

pla

ce3

obli

ge

——

— f

ollo

w i

n o

rder

to

catc

h4

pu

rsu

e —

——

leav

e so

met

hin

g5

qu

ote

p

erm

anen

tly

6 re

solv

e

1 as

sem

ble

2 at

tach

—

——

look

clo

sely

3 p

eer

——

— s

top

doi

ng

som

eth

ing

4 q

uit

—

——

cry

ou

t lo

ud

ly i

n f

ear

5 sc

ream

6 to

ss

Ver

sion

2: T

he 3

,00

0 w

ord

leve

l

Ver

sion

2: T

he 3

,00

0 w

ord

leve

l – C

onti

nu

ed



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

288

1 an

alys

is2

curb

—

——

eag

ern

ess

3 gr

avel

—

——

loan

to

buy

a h

ouse

4 m

ortg

age

——

— s

mal

l st

ones

mix

ed w

ith

5 sc

ar

sa

nd

6 ze

al

1 co

nte

mp

late

2

extr

act

——

— t

hin

k ab

out

dee

ply

3 ga

mbl

e —

——

bri

ng

back

to

hea

lth

4 la

un

ch

——

— m

ake

som

eon

e an

gry

5 p

rovo

ke6

revi

ve

Ver

sion

2: T

he 5

,00

0 w

ord

leve

l

1 co

mm

ent

2 go

wn

—

——

lon

g fo

rmal

dre

ss3

imp

ort

——

— g

oo

ds

from

a f

orei

gn4

ner

ve

co

un

try

5 p

astu

re

——

— p

art

of t

he

bo

dy

wh

ich

6 tr

adit

ion

carr

ies

feel

ing

1 ad

min

istr

atio

n

2 an

gel

——

— g

rou

p o

f an

imal

s3

fros

t —

——

sp

irit

wh

o se

rves

Go

d4

her

d

——

— m

anag

ing

busi

nes

s an

d5

fort

affa

irs

6 p

ond

1 at

mo

sph

ere

2 co

un

sel

——

— a

dvi

ce3

fact

or

——

— a

pla

ce c

over

ed w

ith

gra

ss4

hen

—

——

fem

ale

chic

ken

5 la

wn

6 m

usc

le

1 d

rift

2 en

du

re

——

— s

uff

er p

atie

ntl

y3

gras

p

——

— jo

in w

ool

th

read

s to

geth

er4

knit

—

——

hol

d f

irm

ly w

ith

you

r h

and

s5

regi

ster

6 tu

mbl

e

1 br

illi

ant

2 d

isti

nct

—

——

th

in3

mag

ic

——

— s

tead

y4

nak

ed

——

— w

ith

out

clot

hes

5 sl

end

er6

stab

le

1 aw

are

2 bl

ank

——

— u

sual

3 d

esp

erat

e —

——

bes

t or

mos

t im

por

tan

t4

nor

mal

—

——

kn

owin

g w

hat

is

hap

pen

ing

5 st

riki

ng

6 su

pre

me

Ver

sion

2: T

he 3

,00

0 w

ord

leve

l – C

onti

nu

ed



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

289

1 ca

valr

y2

eve

——

— s

mal

l h

ill

3 h

am

——

— d

ay o

r n

igh

t b

efor

e a

4 m

oun

d

h

olid

ay5

stea

k —

——

sol

die

rs w

ho

figh

t fr

om6

swit

ch

h

orse

s

1 ci

rcu

s2

jun

gle

——

— m

usi

cal

inst

rum

ent

3 n

omin

atio

n

——

— s

eat

wit

hou

t a

back

or

4 se

rmon

arm

s5

sto

ol

——

— s

pee

ch g

iven

by

a p

ries

t in

6 tr

um

pet

a ch

urc

h

1 ar

till

ery

2 cr

eed

—

——

a k

ind

of

tree

3 h

ydro

gen

—

——

sys

tem

of

bel

ief

4 m

aple

—

——

lar

ge g

un

on

wh

eels

5 p

ork

6 st

reak

1 ch

art

2 fo

rge

——

— m

ap3

man

sion

—

——

lar

ge b

eau

tifu

l h

ouse

4 ou

tfit

—

——

pla

ce w

her

e m

etal

s ar

e5

sam

ple

mad

e an

d s

hap

ed6

volu

nte

er

1 d

emon

stra

te

2 em

barr

ass

——

— h

ave

a re

st3

hea

ve

——

— b

reak

su

dd

enly

in

to s

mal

l4

obsc

ure

pie

ces

5 re

lax

——

— m

ake

som

eon

e fe

el s

hy

or6

shat

ter

n

ervo

us

1 co

rres

pon

d2

embr

oid

er

——

— e

xch

ange

lett

ers

3 lu

rk

——

— h

ide

and

wai

t fo

r so

meo

ne

4 p

enet

rate

—

——

fee

l an

gry

abou

t so

met

hin

g5

pre

scri

be

6 re

sen

t

1 d

ecen

t2

frai

l —

——

wea

k3

har

sh

——

— c

once

rnin

g a

city

4 in

cred

ible

—

——

dif

ficu

lt t

o b

elie

ve5

mu

nic

ipal

6 sp

ecif

ic

1 ad

equ

ate

2 in

tern

al

——

— e

nou

gh3

mat

ure

—

——

fu

lly

grow

n4

pro

fou

nd

—

——

alo

ne

away

fro

m o

ther

5 so

lita

ry

th

ings

6 tr

agic



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

290

1 al

abas

ter

2 ch

and

elie

r —

——

sm

all

barr

el3

dog

ma

——

— s

oft

wh

ite

ston

e4

keg

——

— t

ool

for

sh

apin

g w

oo

d5

rasp

6 te

nta

cle

1 b

enev

olen

ce

2 co

nvo

y —

——

kin

dn

ess

3 li

en

——

— s

et o

f m

usi

cal

not

es4

oct

ave

——

— s

pee

d c

ontr

ol f

or a

n5

stin

t

engi

ne

6 th

rott

le

1 b

ourg

eois

2 br

oca

de

——

— m

idd

le c

lass

peo

ple

3 co

nso

nan

t —

——

row

or

leve

l of

som

eth

ing

4 p

relu

de

——

— c

loth

wit

h a

pat

tern

or

gold

5 st

up

or

o

r si

lver

th

read

s6

tier

1 al

cove

2 im

pet

us

——

— p

ries

t3

mag

got

——

— r

elea

se f

rom

pri

son

ear

ly4

par

ole

——

— m

edic

ine

to p

ut

on w

oun

ds

5 sa

lve

6 vi

car

1 d

issi

pat

e2

flau

nt

——

— s

teal

3 im

ped

e —

——

sca

tter

or

van

ish

4 lo

ot

——

— t

wis

t th

e b

od

y ab

out

5 sq

uir

m

u

nco

mfo

rtab

ly6

vie

1 co

nta

min

ate

2 cr

inge

—

——

wri

te c

arel

essl

y3

imm

erse

—

——

mov

e ba

ck b

ecau

se o

f fe

ar4

pee

k —

——

pu

t so

met

hin

g u

nd

er w

ater

5 re

lay

6 sc

raw

l

1 bl

urt

2 d

abbl

e —

——

wal

k in

a p

rou

d w

ay3

den

t —

——

kil

l by

sq

uee

zin

g so

meo

ne’

s4

pac

ify

t

hro

at5

stra

ngl

e —

——

say

su

dd

enly

wit

hou

t6

swag

ger

th

inki

ng

1 il

lici

t2

lew

d

——

— i

mm

ense

3 m

amm

oth

—

——

aga

inst

th

e la

w4

slic

k —

——

wan

tin

g re

ven

ge5

tem

por

al6

vin

dic

tive

Ver

sion

2: T

he 1

0,0

00

wor

d le

vel



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

291

1 al

kali

2 ba

nte

r —

——

lig

ht

joki

ng

talk

3 co

op

——

— a

ran

k of

Bri

tish

nob

ilit

y4

mos

aic

——

— p

ictu

re m

ade

of s

mal

l p

iece

s5

stea

lth

of g

lass

or

ston

e6

visc

oun

t

1 in

dol

ent

2 n

oct

urn

al

——

— l

azy

3 ob

sole

te

——

— n

o lo

nge

r u

sed

4 to

rrid

—

——

cle

ver

and

tri

cky

5 tr

ansl

uce

nt

6 w

ily

Ver

sion

2: A

cade

mic

Voc

abul

ary

1 ar

ea2

con

trac

t —

——

wri

tten

agr

eem

ent

3 d

efin

itio

n

——

— w

ay o

f d

oin

g so

met

hin

g4

evid

ence

—

——

rea

son

for

bel

ievi

ng

5 m

eth

od

som

eth

ing

is o

r is

not

tru

e6

role

1 d

ebat

e2

exp

osu

re

——

— p

lan

3 in

tegr

atio

n

——

— c

hoi

ce4

opti

on

——

— jo

inin

g so

met

hin

g in

to a

5 sc

hem

e

wh

ole

6 st

abil

ity

1 ac

cess

2 ge

nd

er

——

— m

ale

or f

emal

e3

imp

lem

enta

tion

—

——

stu

dy

of t

he

min

d4

lice

nse

—

——

en

tran

ce o

r w

ay i

n5

orie

nta

tion

6 p

sych

olog

y

1 al

ter

2 co

inci

de

——

— c

han

ge3

den

y —

——

say

som

eth

ing

is n

ot t

rue

4 d

evot

e

——

— d

escr

ibe

clea

rly

and

exa

ctly

5 re

leas

e6

spec

ify

1 co

rres

pon

d

2 d

imin

ish

—

——

kee

p3

emer

ge

——

— m

atch

or

be

in a

gree

men

t4

hig

hli

ght

w

ith

5 in

voke

—

——

giv

e sp

ecia

l at

ten

tion

6 re

tain

to s

omet

hin

g

1 b

ond

2 ch

ann

el

——

— m

ake

smal

ler

3 es

tim

ate

——

— g

ues

s th

e n

um

ber

or

size

4 id

enti

fy

of

som

eth

ing

5 m

edia

te

——

— r

ecog

niz

ing

and

nam

ing

6 m

inim

ize

a

per

son

or

thin

g

Ver

sion

2: A

cade

mic

Voc

abul

ary

– C

onti

nu

ed



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

292

1 ac

cum

ula

tion

2 ed

itio

n

——

— c

olle

ctin

g th

ings

ove

r ti

me

3 gu

aran

tee

——

— p

rom

ise

to r

epai

r a

brok

en4

med

ia

p

rod

uct

5 m

otiv

atio

n

——

— f

eeli

ng

a st

ron

g re

ason

or

6 p

hen

omen

on

n

eed

to

do

som

eth

ing

1 ad

ult

2 ex

plo

itat

ion

—

——

en

d3

infr

astr

uct

ure

—

——

mac

hin

e u

sed

to

mov

e4

sch

edu

le

p

eop

le o

r go

od

s5

term

inat

ion

—

——

lis

t of

th

ings

to

do

at6

veh

icle

cert

ain

tim

es

1 ex

pli

cit

2 fi

nal

—

——

las

t3

neg

ativ

e —

——

sti

ff4

pro

fess

ion

al

——

— m

ean

ing

‘no’

or

‘not

’5

rigi

d6

sole

1 ab

stra

ct2

adja

cen

t —

——

nex

t to

3 co

ntr

over

sial

—

——

ad

ded

to

4 gl

obal

—

——

con

cern

ing

the

wh

ole

wor

ld5

neu

tral

6 su

pp

lem

enta

ry

Ver

sion

2: A

cade

mic

Voc

abul

ary

– C

onti

nu

ed



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

Vocabulary Resources 293

7.1.2 Vocabulary Size Test

The Vocabulary Size Test was developed to provide a reliable, accurate and comprehensive measure of a learner’s receptive vocabulary size from the first 1,000 to the fourteenth 1,000 word families of English. Each item in the test represents 100 word families. If a test-taker got every item correct, then it is assumed that that person knows the most frequent 14,000 word families of English. A test-taker’s score needs to be multiplied by 100 to get his/her total vocabulary size up to the fourteenth 1,000 word family level. See Beglar (2010) for validation information.

Instructions: Choose the best meaning for each word. If you do not know the word at all, do not guess. Wrong guesses will be taken away from your correct answers. However, if you think you might know the meaning or part of it, then you should try to find that answer.

First 1,0001. see: They saw it.

a. cutb. waited forc. looked atd. started

2. time: They have a lot of time.a. moneyb. foodc. hoursd. friends

3. period: It was a difficult period.a. questionb. timec. thing to dod. book

4. figure: Is this the right figure?a. answerb. placec. timed. number

5. poor: We are poor.a. have no moneyb. feel happy

c. are very interestedd. do not like to work hard

6. drive: He drives fast.a. swimsb. learnsc. throws ballsd. uses a car

7. jump: She tried to jump.a. lie on top of the waterb. get off the ground

suddenlyc. stop the car at the edge of

the roadd. move very fast

8. shoe: Where is your shoe?a. the person who looks after

youb. the thing you keep your

money inc. the thing you use for writingd. the thing you wear on your

foot

9. standard: Her standards are very high.



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

294 Resources

a. the bits at the back under her shoes

b. the marks she gets in schoolc. the money she asks ford. the levels she reaches in

everything

10. basis: This was used as the basis.a. answerb. place to take a restc. next stepd. main part

Second 1,0001. maintain: Can they

maintain it?a. keep it as it isb. make it largerc. get a better one than itd. get it

2. stone: He sat on a stone.a. hard thingb. kind of chairc. soft thing on the floord. part of a tree

3. upset: I am upset.a. tiredb. famousc. richd. unhappy

4. drawer: The drawer was empty.a. sliding boxb. place where cars are keptc. cupboard to keep

things coldd. animal house

5. patience: He has no patience.a. will not wait happilyb. has no free timec. has no faithd. does not know what is

fair

6. nil: His mark for that question was nil.a. very badb. nothing

c. very goodd. in the middle

7. pub: They went to the pub.a. place where people drink

and talkb. place that looks after

moneyc. large building with many

shopsd. building for swimming

8. circle: Make a circle.a. rough pictureb. space with nothing in itc. round shaped. large hole

9. microphone: Please use the microphone.a. machine for making food

hotb. machine that makes sounds

louderc. machine that makes things

look biggerd. small telephone that can be

carried around

10. pro: He’s a pro.a. someone who is employed to

find out important secretsb. a stupid personc. someone who writes for a

newspaperd. someone who is paid for

playing sport etc



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


Fourth 1,0001. compound: They made a new

compound.a. agreementb. thing made of two or more

partsc. group of people forming a

business

d. guess based on past experience

2. latter: I agree with the latter.a. man from the churchb. reason given

Third 1,0001. soldier: He is a soldier.

a. person in a business b. studentc. person who uses metald. person in the army

2. restore: It has been restored.a. said againb. given to a different

personc. given a lower priced. made like new again

3. jug: He was holding a jug.a. a container for pouring

liquidsb. an informal discussionc. a soft capd. a weapon that explodes

4. scrub: He is scrubbing it.a. cutting shallow lines

into itb. repairing itc. rubbing it hard to clean itd. drawing simple pictures

of it

5. dinosaur: The children were pretending to be dinosaurs.a. robbers who work at seab. very small creatures with

human form but with wingsc. large creatures with wings

that breathe fire

d. animals that lived an extremely long time ago

6. strap: He broke the strap.a. promiseb. top coverc. shallow dish for foodd. strip of material for holding

things together

7. pave: It was paved.a. prevented from going

throughb. dividedc. given gold edgesd. covered with a hard surface

8. dash: They dashed over it.a. moved quicklyb. moved slowlyc. foughtd. looked quickly

9. rove: He couldn’t stop roving.a. getting drunkb. traveling aroundc. making a musical sound

through closed lipsd. working hard

10. lonesome: He felt lonesome.a. ungratefulb. very tiredc. lonelyd. full of energy



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

296 Resources

c. last oned. answer

3. candid: Please be candid.a. be carefulb. show sympathyc. show fairness to both sidesd. say what you really think

4. tummy: Look at my tummy.a. cloth to cover the headb. stomachc. small furry animald. thumb

5. quiz: We made a quiz.a. thing to hold arrowsb. serious mistakec. set of questionsd. box for birds to make

nests in

6. input: We need more input.a. information, power, etc. put

into somethingb. workersc. artificial filling for a hole in

woodd. money

7. crab: Do you like crabs?a. sea creatures that walk

sidewaysb. very thin small cakesc. tight, hard collarsd. large black insects that sing

at night

8. vocabulary: You will need more vocabulary.a. wordsb. skillc. moneyd. guns

9. remedy: We found a good remedy.a. way to fix a problemb. place to eat in publicc. way to prepare foodd. rule about numbers

10. allege: They alleged it.a. claimed it without proofb. stole the ideas for it from

someone elsec. provided facts to prove itd. argued against the facts

that supported it

Fifth 1,0001. deficit: The company had a large

deficit.a. spent a lot more money than

it earnedb. went down a lot in

valuec. had a plan for its spending

that used a lot of moneyd. had a lot of money stored in

the bank

2. weep: He wept.a. finished his courseb. cried

c. diedd. worried

3. nun: We saw a nun.a. long thin creature that lives

in the earthb. terrible accidentc. woman following a strict

religious lifed. unexplained bright light in

the sky

4. haunt: The house is haunted.a. full of ornaments



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


b. rentedc. emptyd. full of ghosts

5. compost: We need some compost.a. strong supportb. help to feel betterc. hard stuff made of stones and

sand stuck togetherd. rotted plant material

6. cube: I need one more cube.a. sharp thing used for joining

thingsb. solid square blockc. tall cup with no saucer d. piece of stiff paper folded in

half

7. miniature: It is a miniature. a. a very small thing of

its kindb. an instrument for looking at

very small objectsc. a very small living creature

d. a small line to join letters in handwriting

8. peel: Shall I peel it?a. let it sit in water for a long

timeb. take the skin off itc. make it whited. cut it into thin pieces

9. fracture: They found a fracture.a. breakb. small piecec. short coatd. rare jewel

10. bacterium: They didn’t find a single bacterium.a. small living thing causing

disease b. plant with red or orange

flowers c. animal that carries water in

lumps on its back d. thing that has been stolen

and sold to a shop

Sixth 1,0001. devious: Your plans are

devious.a. trickyb. well-developedc. not well thought outd. more expensive than

necessary

2. premier: The premier spoke for an hour.a. person who works in a law

courtb. university teacherc. adventurerd. head of the

government

3. butler: They have a butler.a. man servantb. machine for cutting up

treesc. private teacherd. cool dark room under the

house

4. accessory: They gave us some accessories. a. papers giving us the right to

enter a countryb. official ordersc. ideas to choose

betweend. extra pieces



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

298 Resources

5. threshold: They raised the threshold. a. flagb. point or line where

something changesc. roof inside a buildingd. cost of borrowing money

6. thesis: She has completed her thesis. a. long written report of study

carried out for a university degree

b. talk given by a judge at the end of a trial

c. first year of employment after becoming a teacher

d. extended course of hospital treatment

7. strangle: He strangled her.a. killed her by pressing her

throatb. gave her all the things she

wanted

c. took her away by forced. admired her greatly

8. cavalier: He treated her in a cavalier manner.a. without care b. politely c. awkwardly d. as a brother would

9. malign: His malign influence is still felt.a. evilb. goodc. very importantd. secret

10. veer: The car veered. a. went suddenly in another

directionb. moved shakilyc. made a very loud

noised. slid sideways without the

wheels turning

Seventh 1,0001. olive: We bought olives.

a. oily fruitb. scented pink or red flowersc. men’s clothes for swimmingd. tools for digging up weeds

2. quilt: They made a quilt.a. statement about who should

get their property when they die

b. firm agreementc. thick warm cover for a bedd. feather pen

3. stealth: They did it by stealth. a. spending a large amount of

money

b. hurting someone so much that they agreed to their demands

c. moving secretly with extreme care and quietness

d. taking no notice of problems they met

4. shudder: The boy shuddered.a. spoke with a low voiceb. almost fellc. shookd. called out loudly

5. bristle: The bristles are too hard.a. questionsb. short stiff hairs



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


c. folding bedsd. bottoms of the shoes

6. bloc: They have joined this bloc.a. musical groupb. band of thievesc. small group of soldiers who

are sent ahead of othersd. group of countries with a

common purpose

7. demography: This book is about demography.a. the study of patterns of land

useb. the study of the use of

pictures to show facts about numbers

c. the study of the movement of water

d. the study of population

8. gimmick: That’s a good gimmick. a. thing for standing on to work

high above the ground

b. small thing with pockets for holding money

c. attention-getting action or thing

d. clever plan or trick

9. azalea: This azalea is very pretty. a. small tree with many

flowers growing in groupsb. light material made from

natural threadsc. long piece of material worn

by women in Indiad. sea shell shaped like

a fan

10. yoghurt: This yoghurt is disgusting. a. dark grey mud found at the

bottom of riversb. unhealthy, open sorec. thick, soured milk, often

with sugar and flavouringd. large purple fruit with soft

flesh

Eighth 1,0001. erratic: He was erratic.

a. without faultb. very badc. very polited. unsteady

2. palette: He lost his palette.a. basket for carrying fishb. wish to eat foodc. young female companiond. artist’s board for mixing

paints

3. null: His influence was null.a. had good resultsb. was unhelpful

c. had no effectd. was long-lasting

4. kindergarten: This is a good kindergarten. a. activity that allows you to

forget your worriesb. place of learning for children

too young for schoolc. strong, deep bag carried on

the backd. place where you may borrow

books

5. eclipse: There was an eclipse. a. a strong wind



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

300 Resources

b. a loud noise of something hitting the water

c. the killing of a large number of people

d. the sun hidden by a planet

6. marrow: This is the marrow. a. symbol that brings good luck

to a teamb. soft centre of a bonec. control for guiding a planed. increase in salary

7. locust: There were hundreds of locusts.a. insects with wingsb. unpaid helpersc. people who do not eat meatd. brightly coloured wild

flowers

8. authentic: It is authentic.a. realb. very noisyc. oldd. like a desert

9. cabaret: We saw the cabaret.a. painting covering a whole

wallb. song and dance performancec. small crawling insectd. person who is half fish, half

woman

10. mumble: He started to mumble.a. think deeplyb. shake uncontrollablyc. stay further behind the

othersd. speak in an unclear way

Ninth 1,0001. hallmark: Does it have a

hallmark? a. stamp to show when it should

be used byb. stamp to show the qualityc. mark to show it is approved

by the royal familyd. mark or stain to prevent

copying

2. puritan: He is a puritan.a. person who likes attentionb. person with strict moralsc. person with a moving homed. person who keeps money and

hates spending it

3. monologue: Now he has a monologue.a. single piece of glass to hold

over his eye to help him to see better

b. long turn at talking without being interrupted

c. position with all the powerd. picture made by joining

letters together in interesting ways

4. weir: We looked at the weir.a. person who behaves strangelyb. wet and muddy place with

water plantsc. old metal musical instrument

played by blowingd. thing built across a river to

control the water

5. whim: He had lots of whims.a. old gold coinsb. female horsesc. strange ideas with no

motived. sore red lumps



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


6. perturb: I was perturbed.a. made to agreeb. worriedc. very puzzledd. very wet

7. regent: They chose a regent. a. an irresponsible personb. a person to run a meeting for

a short timec. a ruler acting in place of the

king or queend. a person to represent them

8. octopus: They saw an octopus.a. a large bird that hunts at

nightb. a ship that can go under

waterc. a machine that flies by means

of turning blades

d. a sea creature with eight legs

9. fen: The story is set in the fens.a. a piece of low flat land

partly covered by waterb. a piece of high, hilly land

with few treesc. a block of poor-quality

houses in a cityd. a time long ago

10. lintel: He painted the lintel. a. beam across the top of a

door or windowb. small boat used for getting

to land from a big boatc. beautiful tree with spreading

branches and green fruitd. board which shows the

scene in a theatre

Tenth 1,0001. awe: They looked at the

mountain with awe.a. worry b. interest c. wonder d. respect

2. peasantry: He did a lot for the peasantry.a. local peopleb. place of worshipc. businessmen’s clubd. poor farmers

3. egalitarian: This organization is very egalitarian.a. does not provide much

information about itself to the public

b. dislikes change

c. frequently asks a court of law for a judgement

d. treats everyone who works for it as if they are equal

4. mystique: He has lost his mystique.a. his healthy bodyb. the secret way he makes other

people think he has special power or skill

c. the woman who has been his lover while he is married to someone else

d. the hair on his top lip

5. upbeat: I’m feeling really upbeat about it.a. upsetb. good



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

302 Resources

c. hurtd. confused

6. cranny: We found it in the cranny!a. sale of unwanted objectsb. narrow openingc. space for storing things under

the roof of a housed. large wooden box

7. pigtail: Does she have a pigtail?a. a long rope of hair made by

twisting bits togetherb. a lot of cloth hanging

behind a dressc. a plant with pale pink

flowers that hang down in short bunches

d. a lover

8. crowbar: He used a crowbar.a. heavy iron pole with a

curved end

b. false namec. sharp tool for making holes

in leatherd. light metal walking stick

9. ruck: He got hurt in the ruck.a. hollow between the

stomach and the top of the leg

b. noisy street fightc. group of players gathered

round the ball in some ball games

d. race across a field of snow

10. lectern: He stood at the lectern.a. desk made to hold a book at

a good height for readingb. table or block used for

church sacrificesc. place where you buy

drinksd. very edge

Eleventh 1,0001. excrete: This was excreted

recently.a. pushed or sent outb. made clearc. discovered by a science

experimentd. put on a list of illegal

things

2. mussel: They bought mussels.a. small glass balls for playing a

gameb. shellfishc. large purple fruitsd. pieces of soft paper to keep

the clothes clean when eating

3. yoga: She has started yoga. a. handwork done by knotting

threadb. a form of exercise for the

body and mindc. a game where a cork stuck

with feathers is hit between two players

d. a type of dance from eastern countries

4. counterclaim: They made a counterclaim.a. a demand made by one

side in a law case to match the other side’s demand



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


b. a request for a shop to take back things with faults

c. an agreement between two companies to exchange work

d. a top cover for a bed

5. puma: They saw a puma.a. small house made of mud

bricksb. tree from hot, dry countriesc. very strong wind that sucks

up anything in its pathd. large wild cat

6. pallor: His pallor caused them concern.a. his unusually high

temperatureb. his lack of interest in

anythingc. his group of friendsd. the paleness of his skin

7. aperitif: She had an aperitif.a. a long chair for lying on with

just one place to rest an armb. a private singing teacherc. a large hat with tall feathersd. a drink taken before a meal

8. hutch: Please clean the hutch.a. thing with metal bars to

keep dirt out of water pipes

b. space in the back of a car used for bags etc

c. round metal thing in the middle of a bicycle wheel

d. cage for small animals

9. emir: We saw the emir.a. bird with two long curved

tail feathersb. woman who cares for other

people’s children in Eastern countries

c. Middle Eastern chief with power in his own land

d. house made from blocks of ice

10. hessian: She bought some hessian. a. oily pinkish fishb. stuff that produces a happy

state of mindc. coarse clothd. strong-tasting root for

flavouring food

Twelfth 1,0001. haze: We looked through the

haze.a. small round window in a shipb. unclear airc. cover for a window made of

strips of wood or plasticd. list of names

2. spleen: His spleen was damaged.a. knee boneb. organ found near

the stomach

c. pipe taking waste water from a house

d. respect for himself

3. soliloquy: That was an excellent soliloquy!a. song for six people b. short clever saying with a

deep meaning c. entertainment using lights

and music d. speech in the theatre by a

character who is alone



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

304 Resources

4. reptile: She looked at the reptile.a. old hand-written bookb. animal with cold blood and

a hard outsidec. person who sells things by

knocking on doorsd. picture made by sticking

many small pieces of different colours together

5. alum: This contains alum.a. a poisonous substance from

a common plantb. a soft material made of

artificial threadsc. a tobacco powder once put

in the nosed. a chemical compound usually

involving aluminium

6. refectory: We met in the refectory. a. room for eatingb. office where legal papers can

be signedc. room for several people to

sleep ind. room with glass walls for

growing plants

7. caffeine: This contains a lot of caffeine.a. a substance that makes you

sleepy

b. threads from very tough leaves

c. ideas that are not correctd. a substance that makes you

excited

8. impale: He nearly got impaled.a. charged with a serious

offenceb. put in prisonc. stuck through with a

sharp instrumentd. involved in a dispute

9. coven: She is the leader of a coven.a. a small singing groupb. a business that is owned by

the workersc. a secret societyd. a group of church

women who follow a strict religious life

10. trill: He practised the trill.a. ornament in a piece of

musicb. type of stringed

instrumentc. way of throwing a balld. dance step of turning

round very fast on the toes

Thirteenth 1,0001. ubiquitous: Many weeds are

ubiquitous.a. are difficult to get rid ofb. have long, strong

rootsc. are found in most

countriesd. die away in the winter

2. talon: Just look at those talons!a. high points of mountainsb. sharp hooks on the feet of a

hunting birdc. heavy metal coats to protect

against weaponsd. people who make fools of

themselves without realizing it



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


3. rouble: He had a lot of roubles.a. very precious red stonesb. distant members of his

familyc. Russian moneyd. moral or other difficulties in

the mind

4. jovial: He was very jovial.a. low on the social scaleb. likely to criticize othersc. full of fund. friendly

5. communiqué: I saw their communiqué.a. critical report about an

organizationb. garden owned by many

members of a communityc. printed material used for

advertising d. official announcement

6. plankton: We saw a lot of plankton. a. poisonous weeds that spread

very quicklyb. very small plants or animals

found in waterc. trees producing hard woodd. grey clay that often causes

land to slip

7. skylark: We watched a skylark.a. show with aeroplanes flying

in patterns

b. man-made object going round the earth

c. person who does funny tricks

d. small bird that flies high as it sings

8. beagle: He owns two beagles.a. fast cars with roofs that fold

downb. large guns that can shoot

many people quicklyc. small dogs with long earsd. houses built at holiday

places

9. atoll: The atoll was beautiful. a. low island made of coral

round a sea-water lakeb. work of art created by

weaving pictures from fine thread

c. small crown with many precious jewels worn in the evening by women

d. place where a river flows through a narrow place full of large rocks

10. didactic: The story is very didactic.a. tries hard to teach

somethingb. is very difficult to believec. deals with exciting actionsd. is written in a way which

makes the reader unsure of the meaning

Fourteenth 1,0001. canonical: These are canonical

examples. a. examples which break the

usual rules

b. examples taken from a religious book

c. regular and widely accepted examples



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

306 Resources

d. examples discovered very recently

2. atop: He was atop the hill.a. at the bottom ofb. at the top ofc. on this side ofd. on the far side of

3. marsupial: It is a marsupial.a. an animal with hard

feetb. a plant that grows for

several yearsc. a plant with flowers that

turn to face the sund. an animal with a pocket

for babies

4. augur: It augured well.a. promised good things for

the futureb. agreed well with what was

expectedc. had a colour that looked

good with something elsed. rang with a clear,

beautiful sound

5. bawdy: It was very bawdy.a. unpredictableb. enjoyablec. rushedd. rude

6. gauche: He was gauche.a. talkativeb. flexiblec. awkwardd. determined

7. thesaurus: She used a thesaurus.a. a kind of dictionaryb. a chemical compoundc. a special way of speakingd. an injection just under the

skin

8. erythrocyte: It is an erythrocyte.a. a medicine to reduce painb. a red part of the bloodc. a reddish white metald. a member of the whale

family

9. cordillera: They were stopped by the cordillera.a. a special law b. an armed ship c. a line of mountains d. the eldest son of the king

10. limpid: He looked into her limpid eyes.a. clearb. tearfulc. deep brownd. beautiful

7.1.3 Meara’s _lognostics measurement instruments

There are a number of measurement instruments on Paul Meara’s _lognostics website. For details, see the documentation on the lognostics site <http://www.lognostics.co.uk/index.htm>, discussion of Meara’s website in Section 6.5, and related sections in this book.

There are two vocabulary size tests in the Lex family:

X_Lex ● A 5K vocabulary size test



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

http://www.lognostics.co.uk/index.htm

http://www.lognostics.co.uk/index.htm


Y_Lex ● A 5–10K vocabulary size test

There are three tests which give a measure of productive vocabulary:

P_Lex ● A program for evaluating the vocabulary used in short textsD_Tools ● A program that calculates the mean segmental TTR statistic vocd for short textsV_Size ● A program that estimates the productive vocabulary underlying short texts

The site includes two depth-of-knowledge tests:

V_Quint ● An alternative assessment of vocabulary depthLex_30 ● Online word association test

There is also a suite of language aptitude tests:

Llama ● language aptitude tests

The site indicates that a number of other measurement instruments will also be added in the future, including a tests of short-term memory.

7.2 Corpora

As we have seen throughout this book, corpora have transformed the way we think about and research vocabulary. It is hard to imagine any area of vocabulary research into acquisition, processing, pedagogy, or assessment where the insights available from corpus analysis would not be valuable. In fact, it is probably not too extreme to say that most sound vocabulary research will have some corpus element.

With this in mind, how are researchers to obtain appropriate corpora? In many cases, the research purpose will require the compilation of one or more corpora which are custom-designed to achieve that goal. For example, if a researcher wished to determine the effect of a syllabus change on stu-dent writing within a particular school, then it would be necessary to obtain learner writing samples from both before and after the syllabus change. Other corpora built of student writing would probably not be able to indi-cate how the particular students within that particular school environment would react to the syllabus change.

For other purposes, the use of pre-existing corpora makes good sense for a number of reasons. First and foremost, corpus compilation can be a time consuming and expensive affair, and only organizations with substan-tial financial backing will be able to expend the hundreds of thousands of dollars and years of effort to put together larger corpora. For example,



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

308 Resources

the British National Corpus (100 million words) was put together with the resources of a consortium of several major publishers and universities (Longman Group Ltd (now Addison-Wesley Longman), Oxford University Press, Chambers Harrap, Oxford University Computing Services, the Unit for Computer Research on the English Language (Lancaster University), and the British Library Research and Development Department). Spoken cor-pora in particular are extremely time and money-intensive to compile, even though they are typically much smaller: the CANCODE took eight years to collect, transcribe, and code 5 million words.

Beside the financial and time issues, corpus compilation takes a consid-erable amount of expertise. Corpus linguists working on major corpora are often full-time specialists, who have years of experience, time to keep up with the latest developments in corpus linguistics, and access to the latest technology. Although it is perfectly possible for teachers and novice research-ers to put together and use basic corpora, it would take considerable time and effort to build the expertise which a larger corpus project requires. The last reason is simple common sense: why reinvent the wheel? If an existing cor-pus is appropriate for the research aims, it seems silly not to use it.

This section describes a range of corpora which are available to differ-ent degrees (some free, some for a fee, some with open access, some with restricted access). I will give extended coverage to the most accessible and what I consider the most useful corpora, and give briefer annotations of the rest. Obviously, whether a corpus is useful or not will depend on particular research purposes, but hopefully the descriptions here will guide you as to whether any of the corpora might be worth exploring for the vocabulary re-search you have in mind. There are more corpora than can be sum marized in the space available, but the interested reader can find more details from several sources, including:

David Lee’s comprehensive corpus website (< ● http://devoted.to/corpora>)Richard Xiao’s corpus websites ●

The original website is (< ● http://www.lancs.ac.uk/postgrad/xiaoz/papers/corpus%20survey.htm>). This website has also been turned into a written survey chapter for the book Corpus Linguistics (Lüdeling and Kyto, 2008). A companion website to the book contains the updated chapter (<http://www.routledge.com/textbooks/0415286239/resources/corpa.htm>).The corpus survey at the back of ● From Corpus to Classroom (O’Keeffe, McCarthy, and Carter, 2007)

Some of the following descriptions draw heavily on those sources, and others are mainly condensations of the information available on the respective corpus websites at the time of writing (July 2008). Note that many corpora cross over my categories (e.g. the MICASE is both spoken and academic) and that all quotations are drawn from the respective corpus websites.



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

http://devoted.to/corpora

http://www.lancs.ac.uk/postgrad/xiaoz/papers/corpus%20survey.htm

http://www.lancs.ac.uk/postgrad/xiaoz/papers/corpus%20survey.htm

http://www.routledge.com/textbooks/0415286239/resources/corpa.htm

http://www.routledge.com/textbooks/0415286239/resources/corpa.htm


7.2.1 Corpora representing general English (mainly written)

British National Corpus (BNC)[90 million written and 10 million spoken British English](<http://www.natcorp.ox.ac.uk>) and(<http://www.natcorp.ox.ac.uk/XMLedition/URG>) for full documentation

The BNC has been the ‘gold standard’ corpus of general English since its launch in the early 1990s, and a great deal of vocabulary research has uti-lized it. This is due to its large size (100 million words), and the fact that it is a balanced corpus, i.e. it was compiled according to predetermined percent-ages of a wide range of different types of English. This works to avoid bias, and gives some assurance that the corpus represents English reasonably well overall. The corpus also contains a substantial spoken component.

‘The BNC is a 100 million word collection of samples of written and spo-ken language from a wide range of sources, designed to represent a wide cross-section of British English from the later part of the 20th century, both spoken and written.’ The corpus includes ‘many different styles and var-ieties, and is not limited to any particular subject field, genre or register.’ Its written component makes up about 90% of the corpus, and the orthograph-ically transcribed spoken component about 10%. Although it would have been desirable to have a 50/50 split, the cost prohibited this. Regardless, the 10 million word spoken component is still a considerable achievement.

The written part is made up of many kinds of text, which were selected according to three criteria: domain, medium, and time. Domain indicates the kind of writing. About 75% of the written texts are informative writings, of which roughly equal quantities were chosen from ‘the fields of applied sciences, arts, belief & thought, commerce & finance, leisure, natural & pure science, social science, [and] world affairs.’ About 25% are imaginative, that is, literary and creative works. Medium refers to the kind of publication in which the text occurs. About 60% of written texts come from books, 25% from periodicals (newspapers etc.), 5–10% from other kinds of miscellan-eous published material (brochures, advertising leaflets, etc.), 5–10% from unpublished written material such as personal letters and diaries, essays and memoranda, etc, and a small amount (less than 5%) from material written

Concept 7.1 Changeability of the internet

The internet provides a valuable source of information and analysis tools for vocabulary research. However, by its nature, it is constantly being revised and updated. This means that some of the addresses will inevitably change or be with-drawn in the future. All of the web addresses were correct and accessible as we went to press, and I ask the reader’s indulgence when some of the addresses even-tually do not work as stated in this book.



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

http://www.natcorp.ox.ac.uk

http://www.natcorp.ox.ac.uk/XMLedition/URG

310 Resources

to be spoken (for example, political speeches, play texts, broadcast scripts, etc.). In terms of time, the imaginative texts date from 1960, and the inform-ative texts from 1975, and they end in 1993. Overall, the corpus includes samples of 45,000 words taken from various parts of single-author texts. In addition, shorter texts (up to a maximum of 45,000 words), and multi-author material like magazines and newspapers are included in full.

The spoken part contains two different elements: ‘a demographic part, con-taining transcriptions of spontaneous natural conversations made by mem-bers of the public and a context-governed part, containing transcriptions of recordings made at specific types of meeting and event.’ The demographic part was gathered by 124 volunteers with a balanced range of gender, age, social grouping, and location (38 across the UK). They used personal stereos to unobtrusively record all their conversations over two or three days, and after that logged the details of each conversation in a special notebook. The context-governed part contains roughly equal quantities of speech recorded in four broad categories of social context:

Educational and informative events, such as lectures, news broadcasts, ●

classroom discussion, tutorials.Business events such as sales demonstrations, trades union meetings, con- ●

sultations, interviews.Institutional and public events, such as sermons, political speeches, coun- ●

cil meetings, parliamentary proceedings.Leisure events, such as sports commentaries, after-dinner speeches, club ●

meetings, radio phone-ins.

Overall, the BNC contains 4,054 texts which total 100,467,090 orthographic words. It takes about 1.5 Gb to store on a computer, which taxed university servers when it first came out, but which is not a problem for modern per-sonal computers. Work began on the corpus in 1991, and it was completed in 1994. A slightly revised second edition was released 2001, called BNC World Edition, and the latest version of the full corpus came out in 2007 (BNC XML Edition). Two smaller sub-corpora drawn from the full BNC have also been made available, the BNC Sampler and the BNC Baby. All of these can be purchased on the BNC website.

BNC XML edition

The XML edition is annotated with word-class information (part-of-speech) and metatextual information (e.g. author, source). The XML edition is a revised version of BNC World (which has been superseded), with the main differences being that it has been transferred to an XML format, and that it has an improved concordance program, XAIRA, which allows more search options and an improved user interface than the previous SARA search pro-gram. ‘It is available on DVD for installation on a stand-alone PC or on a



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


Windows, Unix or OSX server. It is delivered with a copy of the XAIRA search program and all necessary XAIRA index files.’

BNC Baby

‘BNC Baby is a subset of BNC World. It consists of four 1-million word sam-ples, each compiled as an example of a particular genre: fiction, newspapers, academic writing and spoken conversation. The texts have the same anno-tation as the full corpus (part of speech, meta data, etc). The BNC Baby is in XML format and can be searched with the XAIRA Tool. It is distributed on a CD together with the BNC Sampler and an XML version of the American English Brown corpus.’

BNC Sampler

‘The BNC Sampler is a subset of the full BNC. It comprises two samples of written and spoken material of one million words each, compiled to mirror the composition of the full BNC as far as possible. The word-class annota-tion of the BNC Sampler texts has been carefully checked and manually corrected. The Sampler was first created at Lancaster University during the creation of the BNC. The BNC Sampler is in XML format and can be searched with the XAIRA Tool. It is distributed on the BNC Baby CD together with the BNC Baby and an XML version of the American English Brown corpus.’

The BNC XML costs £75, and the BNC Baby+Sampler costs £21 (July 2008) and can be purchased from the BNC website. Considering the cost involved in compiling the various corpora, this is not expensive. It is also possible to query the BNC through their website using the ‘BNC Simple Search’. It will give the frequency of a word or phrase in the full corpus and up to 50 randomly- selected sentences which contain the target lexical item. The use of wildcards is pos-sible (e.g. bread_butter will bring up bread and butter, bread with butter, bread or butter, etc.). However, the target words/phrases are not lined in the centre of the screen for easy comparison as is standard in concordancing software. Also, there is no facility for sorting or for a collocation search. Still, the site is free and is good for providing a number of authentic contexts for target lexemes.

Brigham Young University and Mark Davies host a more functional web site based on the BNC called the BYU-BNC: The British National Corpus (Davies, 2004: <http://corpus.byu.edu/bnc>). It shows the frequencies of the queried word or phrase in the spoken, fiction, newspaper, academic, and miscellaneous components of the corpus in a graphic or list format. With a single mouse-click, all of the instances of the word/phrase appear, with the node highlighted in a bold and underlined font. There are also wildcard options, including searches for all words from a single word class. The site allows the comparison of vocabulary in different registers, e.g. nouns near the word chair in academic versus fiction texts. It also allows semantically-oriented searches, which is good for comparing synonyms and other semantically-related words, such as comparing the most frequent



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

http://corpus.byu.edu/bnc

312 Resources

nouns that appear with the adjectives small and little. For more information on the site’s capabilities, see the Corpus of Contemporary American English below, as it shares the same interface.

There is also some important independent BNC support material on the web. One key destination is David Lee’s corpus resources site (http://clix.to/davidlee00), which contain his BNC Index. Adam Kilgarriff has a web-site which provides BNC frequency lists (<http://www.kilgarriff.co.uk/bnc-readme.html)>. The Phrases in English (PIE) interface (<http://pie.usna.edu>) allows the free online phraseological interrogation of the BNC (World ver-sion) in strings up to eight words long.

Corpus of Contemporary American English (COCA)[309 million written and 79 million spoken American English](<http://www.americancorpus.org>)

The COCA developed by Mark Davies is a very exciting new corpus resource for a number of reasons. First, it represents the American variety of English. This is essential as a counterpart to the other main variety (British English), which is covered by the BNC. (Apologies to other smaller, but still important, varieties like Australian and South African English!)

Second, it is large. The COCA contains more than 385 million words in over 150,000 texts, including 20 million words each year from 1990 to 2008 (as of December 15, 2008). This is nearly four times the size of the BNC, and vastly larger than any other available American English corpus. For example, the COCA compares very favorably with the American National Corpus, the other major American English corpus project, at least at the ANC’s current state of development. (See <http://www.americancorpus.org/american_compare.asp> for a comparison between the COCA and ANC.)

Third, the size has not been achieved at the expense of balance, with the texts being equally divided among five genre/registers. The website gives the following description:

Quote 6.1 BNC website on the size of the BNC

To put these numbers [100 million words] into perspective, the average paperback book has about 250 pages per centimetre of thickness; assuming 400 words a page, we calculate that the whole corpus printed in small type on thin paper would take up about ten metres of shelf space. Reading the whole corpus aloud at a fairly rapid 150 words a minute, eight hours a day, 365 days a year, would take just over four years.

(<http://www.natcorp.ox.ac.uk/corpus/index.xml.ID=numbers>)



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

http://clix.to/davidlee00

http://clix.to/davidlee00

http://www.kilgarriff.co.uk/bnc-readme.html


http://pie.usna.edu

http://www.natcorp.ox.ac.uk/corpus/index.xml.ID=numbers

http://www.americancorpus.org

http://www.americancorpus.org/american_compare.asp

http://www.americancorpus.org/american_compare.asp


Spoken ● : (79 million words) Transcripts of unscripted conversation from more than 150 different TV and radio programs (examples: All Things Considered (NPR), Newshour (PBS), Good Morning America (ABC), Today Show (NBC), 60 Minutes (CBS), Hannity and Colmes (Fox), Jerry Springer, etc.). [The website has a discussion of the naturalness and authenticity of this ‘unscripted’ language.]Fiction ● : (76 million words) Short stories and plays from literary magazines, children’s magazines, popular magazines, first chapters of first edition books 1990–present, and movie scripts.Popular Magazines ● : (81 million words) Nearly 100 different magazines, with a good mix (overall, and by year) between specific domains (news, health, home and gardening, women, financial, religion, sports, etc.). A few examples are Time, Men’s Health, Good Housekeeping, Cosmopolitan, Fortune, Christian Century, Sports Illustrated.Newspapers ● : (76 million words) Ten newspapers from across the US, including: USA Today, New York Times, Atlanta Journal Constitution, San Francisco Chronicle, etc. In most cases, there is a good mix between differ-ent sections of the newspaper, such as local news, opinion, sports, finan-cial, etc.Academic Journals ● : (76 million words) Nearly 100 different peer-reviewed journals. These were selected to cover the entire range of the Library of Congress classification system (e.g. a certain percentage from B (philoso-phy, psychology, religion), D (world history), K (education), T (technol-ogy), etc.), both overall and by number of words per year.

Fourth, as opposed to most corpora, the COCA will not be static. The plan is to update it at least twice each year from this point on (maintaining the balance proportions of the registers already in place). This promises to keep the COCA current, instead of being a ‘snapshot’ of English at a single point in time. This will also make it a useful resource for researching linguistic change in American English. Fifth, as seen above, it contains, like the BNC, a substantial element of unscripted spoken English. Sixth (and everyone’s favorite), the corpus is free to access online.

Another advantage of the corpus is its very powerful search interface (the same as used with the other BYU corpus suite). It allows searches for exact words or phrase (linguistics, linguistics professor), by using wildcards or part of speech, or combinations of these. You can look for lemmas (all forms of words, like swim, swam, swum), wildcards (un*ly or r?n*), and more complex searches such as re-X-ing, *term* (in terms of; to terms with). From the ‘frequency results’ window, a simple click on the word or phrase brings up a list of the target word or string in context in a lower window. The searches can be limited by any combination of genre/register that you define (spoken, academic, poetry, medical, etc.). It is possible to com-pare between registers, e.g., verbs that are more common in academic or fiction texts. The program also allows searches for surrounding words



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

314 Resources

(collocates) within a ten-word window (e.g. all adjectives within that span for girl).

The interface also usefully allows comparison ‘between synonyms and other semantically-related words. One simple search, for example, compares the most frequent nouns that appear with sheer, complete, or utter (sheer non-sense, complete account, utter dismay). The interface also allows you to input information from WordNet (a semantically-organized lexicon of English) directly into the search form. This allows you to find the frequency and dis-tribution of words with similar, more general, or more specific meanings.’ Interestingly for researchers focusing on linguistic change, one can compare language from different years from 1990 to the present time. The interface allows the creation and storage of personalized lists, which can be drawn upon as part of subsequent analyses.

Overall, the COCA represents a useful resource, and one that deserves to be consulted in future vocabulary research. It is likely to be the best source for information on American English for some time to come, and even trumps the BNC as a resource for general English on some points, particularly its size and currency. However, it too has limitations. The spoken component of the BNC is probably a better representation of spontaneous speech in informal situations, especially the demographic component. Also, the limited span of the concordance lines may limit some kinds of analysis which require more context to interpret, e.g. when studying some grammatical structures, it may be necessary to look at several contiguous sentences. Finally, the online nature of the corpus limits the analyses to those that the interface supports. Some types of analysis that require a corpus to be downloaded onto one’s own com-puter to use other software (see the ‘Tools’ section below) are simply not possible.

The TIME Magazine Corpus[100 million words written American English](<http://corpus.byu.ed/time>)

The Time Corpus is another part of the BYU stable of corpora developed by Mark Davies, and uses the same interface. It includes ‘more than 100 mil-lion words of text of American English from 1923 to the present, as found in TIME magazine.’ The corpus is taken from 275,000+ texts from the TIME Archive, which is freely available on-line. As with all of the other BYU cor-pora, clicking on the search/phrase, brings up KWIC (Key Word In Context) contexts. The full original texts are available on the corpus site through a hyperlink to the TIME Archive site (for copyright reasons), but in practice this is seamless, as the hyperlink is easily accessed. There is also the advan-tage of seeing the texts with the additional magazine features available on the archive site (e.g. related articles, quotes of the day), although with the



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

http://corpus.byu.ed/time


disadvantage of the target word/phrase not being highlighted, which may require a bit of scanning to find.

The website suggests the following ideas as examples of what could be use-fully researched with the corpus:

The overall frequency over time of words and phrases that were related to ●

changes in society and culture, or historical events, such as flapper (flap-per, flappers, flapperdom, etc.), cinemaddict, fascist, rocket, reds, hippy/hippies, impeach, new age, politically correct, e(-)mail, and global warming.Changes in the language itself ● , such as the rise and fall of words and phrases like far-out, famed, wangle, funky, beauteous, nifty, or freak out. You can also search for changes with grammatical constructions like end up V-ing, going to V, phrasal verbs with up (e.g. make up, show up), the use of whom, and the use of preposition stranding (e.g. someone to talk with).Parts of words (which show how ● word roots, prefixes, and suffixes are being used over time in other words), such as the roots -heart-, -home-, counter, and the suffixes -aholic (e.g. chocoholic), and -gate (e.g. Monicagate).You can also have the corpus generate a list of ● words that were used more in one period than another, even when you don’t know what the specified words might be. For example, you can find nouns whose usage nouns whose usage increased a lot in the 1960increased a lot in the 1960ss, , verbs that drop off in usage after the 1930verbs that drop off in usage after the 1930ss, or , or adjectives that have been used much more since 2000 than previousladjectives that have been used much more since 2000 than previouslyy.The corpus can also help to show how the ● meaning of words have changed over time, by looking at changes in collocates (co-occurring words). For example, the collocates of chip, engine, or web have changed recently, due to changes in technology. Notice also how this can signal cultural changes over time, such as adjectives used with wife in the 1920s–1930s (which might now be politically incorrect), or adjectives with fam-ilies (earlier versus later).

American National Corpus (ANC)[22 million total, 4 million spoken American English](<http://americannationalcorpus.org>)

The ANC was envisioned as being the counterpart to the BNC, with a similar size (100 million words) and comparable across genres. Unfortunately, the project seems to have stalled, and there are currently only 22 million words available in its second edition, released in 2005. These have been annotated for lemma, part of speech, noun chunks, and verb chunks. As the corpus is incomplete, the data currently available are not yet balanced.

The ANC 2nd edition can be purchased from the Linguistic Data Consortium catalog (<http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2005T35>) for $75, and comes on two DVDs. There is also a freely available 14 million word ‘open’ version of the ANC (non-copyright



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

http://americannationalcorpus.org

http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2005T35

http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2005T35

316 Resources

restricted) available for download at <http://americannationalcorpus.org/OANC/index.html#download> which contains 11.4 million of the written part and the Charlotte Narratives and Switchboard spoken components. The frequency counts for the corpus are available at <http://americannation-alcorpus.org/SecondRelease/frequency2.html>. (Note that the Switchboard corpus can be accessed independently from LDC Online, <http://www.ldc.upenn.edu/catalog>.)

Any corpus can be useful, as the unique composition of each corpus gives somewhat different perspectives of the language contained within. The ANC provides 14 million words for free (open version), or 22 million for a reasonable fee (second edition), which is not inconsiderable, especially if the current corpus content is relevant for your topic of research. However, it must be said that COCA looks to be a better resource for researching gen-eral American English at the moment, especially considering the balancing issues.

The ANC does contain a potentially interesting spoken component though, especially if one is interested in informal spontaneous speech, or telephone conversations. The second edition spoken component contains 24 unscripted telephone conversations between native speakers of American English, covering a contiguous 10-minute segment of each call, comprising 50,494 words (the CallHome component). The Switchboard component ‘con-sists of 2320 spontaneous [telephone] conversations averaging 6 minutes in length and comprising about 3 million words of text, spoken by over 500 speakers of both sexes from every major dialect of American English.’ These speakers come from a variety of American dialects, age groups, and educa-tion levels. There is also a set of narratives from residents of North Carolina (the Charlotte Narratives component). Interestingly for EAP researchers, the ANC also contains 50 transcripts from the MICASE corpus (see MICASE sec-tion below). However, the open version contains only the Charlotte Narratives and Switchboard components.

If completed, the ANC could be an extremely useful resource for research-ing American English. However, there seems to have been little progress since the release of the second edition in 2005, and with the free availabil-ity of the much larger COCA, there may be little impetus to continue the project. This only goes to show the difficulties involved in creating large language corpora, even if one has the backing of a sizable consortium of partners (<http://americannationalcorpus.org/consortium.html>).

The Brown University Standard Corpus of Present-Day Edited American English (Brown corpus)[1 million written American English]

The Brown corpus was the first major corpus designed with the intent of computerized analysis. It was compiled by Henry Kucera and Nelson Francis



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

http://americannationalcorpus.org/OANC/index.html#download

http://americannationalcorpus.org/OANC/index.html#download

http://americannationalcorpus.org/SecondRelease/frequency2.html

http://americannationalcorpus.org/SecondRelease/frequency2.html

http://www.ldc.upenn.edu/catalog

http://www.ldc.upenn.edu/catalog

http://americannationalcorpus.org/consortium.html


at Brown University in Providence, Rhode Island. It contains about 1 mil-lion words of written American English, taken from 500 texts of approxi-mately 2,000 words each, all published in 1961. These texts were distributed across 15 categories in rough proportion to the amount published in each of those genres:

PRESS: Reportage (44 texts)PRESS: Editorial (27 texts)PRESS: Reviews (17 texts)RELIGION (17 texts)SKILL AND HOBBIES (36 texts)POPULAR LORE (48 texts)BELLES-LETTRES: Biography, Memoirs, etc. (75 texts)MISCELLANEOUS: US Government & House Organs (30 texts)LEARNED (80 texts)FICTION: General (29 texts)FICTION: Mystery and Detective Fiction (24 texts)FICTION: Science (6 texts)FICTION: Adventure and Western (29 texts)FICTION: Romance and Love Story (29 texts)HUMOR (9 texts)

The corpus was eventually tagged with about 80 parts of speech, and a few other markers (e.g. compound forms, contractions, foreign words), and formed the model for a series of later corpora which mirrored its size and text selection criteria (the ‘Brown family’ corpora). See Kucera and Francis (1967), and the Brown Corpus Manual (available on the ICAME website, <http://icame.uib.no/brown/bcm.html>) for more information. The corpus itself is available as part of the ICAME CD-ROM Corpus Collection (<http://icame.uib.no/newcd.htm>), or as part of the BNC Baby/Sampler CD-ROM, <http://www.natcorp.ox.ac.uk/getting>.

The Brown corpus was a good first try at computerized corpora, and it has been used in numerous research studies, including some recent ones. However, it (and all of the other ‘Brown family’ corpora) suffer from a num-ber of limitations in comparison to today’s larger, more modern, corpora. First, it is small. This does not rule it out for researching highly frequent lin-guistic phenomena, particularly grammatical and morphological features, as this kind of feature occurs sufficiently often in 1 million words for much patterning to become evident. It might also be sufficient for exploring some aspects (e.g. most frequent meaning senses) of the most frequent vocabulary items.

However, it is probably not large enough to give good information about the more contextualized aspects of this high-frequency vocabulary (e.g. collocation, register constraints). For example, even for relatively frequent



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

http://icame.uib.no/brown/bcm.html

http://icame.uib.no/newcd.htm

http://icame.uib.no/newcd.htm

http://www.natcorp.ox.ac.uk/getting

318 Resources

words, it takes many occurrences to establish patterns with lower frequency collocates. A case of this is plate. Plate is highly frequent, but unless a large corpus is consulted, it may not become obvious that tectonic is a relatively infrequent, yet strongly related collocate (as indicated by MI). It also takes a large corpus to show up some of the less common meaning senses, even of frequent words. Take the case of try. Among it many meaning senses. there is one for a score in the game of rugby. In every 100 instances of try in the British English component of the New Longman Corpus, about four carried this meaning sense. This is based on corpus data collected from Great Britain, where rugby is relatively popular. Unsurprisingly, this mean-ing sense is missing in the Brown corpus, partly because of its small size, and partly because it is based on English from America, where the game is not nearly as popular. Overall, when it comes to corpora, bigger is usually better, because the more texts, the greater the chance for diversity, which usually supplies more rounded information about the contextual types of word knowledge.

There is also a temporal element. The language in the Brown corpus is now over 45 years old, and it is reasonable to expect that some change in usage has occurred since it was compiled. This may not be such a problem for grammar and morphology, which is relatively stable (although it is chan-ging even as you read this; see Schmitt and Marsden (2006: 72–74) for three examples of this). However, vocabulary is much more prone to change, par-ticularly more vernacular types like slang. Unless one is researching the lan-guage of the 1960s, the Brown corpus is likely to give a somewhat outdated version of American English.

Lancaster-Oslo/Bergen Corpus (LOB)[1 million written British English]The LOB corpus was compiled in Europe to be the British English coun-terpart to the Brown corpus. It was built along the same lines, with 2,000 text extracts from 1961 being sampled according to the same 15 categories. As such, it has the same limitations as the Brown corpus, but for British English. Perhaps its greatest use is as a comparison corpus to the Brown cor-pus. For other corpora based on the Brown format, see the ‘Other Corpora in the “Brown” Family’ section below.

HarperCollins COBUILD Bank of English (Bank of English)[524 million written and spoken](<http://www.collins.co.uk/books.aspx?group=153>)The Bank of English is one of the largest (semi-) accessible corpora of (mainly British) English, containing 524 million words as of July 2008, and it con-tinues to be expanded. It was a joint project launched in 1991 by Collins Publishers and the University of Birmingham. It was led by John Sinclair, probably the most influential of the pioneering corpus linguists, and has



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

http://www.collins.co.uk/books.aspx?group=153


been the basis of a great amount of research by Birmingham-based scholars like Sinclair, Susan Hunston, and Rosamund Moon. The Bank of English contains written texts from thousands of different sources, including ‘news-papers, magazines, fiction and non-fiction books, brochures, reports, and websites’. It also has a spoken component, but this is largely scripted speech from television and radio broadcasts.

The corpus can be accessed by subscription. Unfortunately, you do not get access to the whole corpus, but rather a much smaller 56 million word subcorpus. Also, the access fee is not cheap. A subscription for one month costs £50, for six months is £300, and one year is £500. This is pricey, espe-cially considering the fact that the complete 100 million word BNC XML Edition costs only £75, and the even larger COCA is free on-line. However, there is a free concordance and collocation query page on the website which could be useful for individual inquiries (<http://www.collins.co.uk/corpus/corpusSearch.aspx>). It allows searches for words and phrases, and up to 250 concordance lines. There is also a facility to type in node words or phrases and the software will identify a list of collocates according to t-score or MI.

Overall, the Bank of English would appear to be a very good research tool, but on the present terms (a high price for only 56 million words), it seems to be mainly reserved for in-house HarperCollins lexicographers and materials writers, and those affiliated with the University of Birmingham.

New Longman Corpus[179 million written and spoken]Many of the corpus examples in this book are drawn from the New Longman Corpus, and you may be interested in knowing more about its composition. It is a composite of the various corpora in the Longman Corpus Network (<http://www.pearsonlongman.com/dictionaries/corpus>). It amounts to179 million words in total, including written and spoken British and American English. It is an in-house Longman resource, and is not available to the public.

Other corpora in the ‘Brown family’

There are a number of other corpora which have used the Brown corpus as a model, using the same technique of sampling 500 texts to build a cor-pus of 1 million words. These corpora representing various permutations of English (different national varieties, different time settings) are summarized below in Table 7.1 adapted from Richard Xiao’s website.

SUBTLEXus Database[51 million words]Another approach to frequency is taken by researchers in the Department of Experimental Psychology at the University of Gent. They compiled a fre-quency measure based on the vocabulary in Ameican movies. They show



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

http://www.collins.co.uk/corpus/corpusSearch.aspx

http://www.collins.co.uk/corpus/corpusSearch.aspx

http://www.pearsonlongman.com/dictionaries/corpus

320 Resources

that their frequency figures are more closely related to reaction times in lexical decision tasks than the frequency information from the Brown corpus. As such, their frequency information may be better to use in set-ting up psycholinguistic experiments than the frequency information from smaller corpora like the Brown corpus. However, it is unclear whether their frequency information is better than that obtained from larger, bal-anced corpora like the BNC. The database and its rationale is available at (<http://expsy.ugent.be/research/Rdocuments/downloads/SUBTLEXus/index.htm>).

7.2.2 Corpora representing spoken English

London-Lund Corpus[500,000 spoken](<http://khnt.hit.uib.no/icame/manuals/LONLUND/INDEX.htm>)The London-Lund corpus was the first electronic corpus of spontaneous lan-guage. It contains half a million words of spoken British English, resulting from a combination of two projects: the Survey of English Usage (SEU) and the Survey of Spoken English (SSE). It consists of 100 texts, each of 5,000 words recorded from 1953 to 1987. It distinguishes between dialogues and

Table 7.1 Written Corpora of the Brown family

Corpus Language Variety Period More Information

Freiburg-Brown Corpus of American English (FROWN)

American English 1991–1992 <http://khnt.hit.uib.no/icame/manuals/frown/INDEX.HTM>

Freiburg-LOB Corpus of British English (FLOB)

British English 1991–1992 <http://khnt.hit.uib.no/icame/manuals/flob/INDEX.HTM>

Kolhapur Corpus of Indian English

Indian English 1978 <http://khnt.hit.uib.no/icame/manuals/kolhapur/INDEX.HTM>

Macquarie Corpus of Written Australian English (ACE)

Australian English 1986 <http://khnt.hit.uib.no/icame/manuals/ace/INDEX.HTM>

Wellington Corpus of Written New Zealand English (WWC)

New Zealand English

1986–1990 <http://khnt.hit.uib.no/icame/manuals/wellman/INDEX.HTM>



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

http://khnt.hit.uib.no/icame/manuals/frown/INDEX.HTM

http://khnt.hit.ofAmericanEnglish

http://khnt.hit.uib.no/icame/manuals/flob/INDEX.HTM




http://khnt.hit.uib.no/icame/manuals/kolhapur/INDEX.HTM





http://khnt.hit.uib.no/icame/manuals/ace/INDEX.HTM




http://khnt.hit.uib.no/icame/manuals/wellman/INDEX.HTM




http://expsy.ugent.be/research/Rdocuments/downloads/SUBTLEXus/index.htm

http://expsy.ugent.be/research/Rdocuments/downloads/SUBTLEXus/index.htm

http://khnt.hit.uib.no/icame/manuals/LONLUND/INDEX.htm





monologues in its organization. It is notable for being one of the few spoken corpora which has been annotated prosodically, i.e., having the features like stress, tones, and pausing marked in transcript. It is available on the ICAME CD-ROM (see below).

Michigan Corpus of Academic Spoken English (MICASE)[1.8 million spoken](<http://quod.lib.umich.edu/m/micase>)The MICASE is one of the most influential of the smaller specialized cor-pora. It is an on-line corpus consisting of 152 transcripts (1,571 speakers) of spoken academic English recorded at the University of Michigan, including lectures, seminars, labs, dissertation defences, interviews, meetings, and tutorials. The corpus can be searched free on-line, and is highly interactive, allowing filtering for gender, age, academic position, native/nonnative speaker status, L1, and speech event type. The text corpus can be ordered for off-line use from $50 for an individual licence from <http://lw.lsa.umich.edu/eli/micase/MICASE_OrderForm.pdf>. The corpus sound files can also be ordered at extra cost, although some 70 of them are accessible on-line in a streamed version. In addition, a comparable written academic corpus is now available for online access (Michigan Corpus of Upper-level Student Papers; MICUSP).

British Academic Spoken Corpus (BASE)[1.6 million spoken]The British Academic Spoken Corpus was designed as a British counterpart to the MICASE, but is not as broad, including only lectures (160) and seminars (39). These were recorded in a variety of university departments, distributed evenly across four broad disciplinary bands. Moreover, most of the record-ings are on digital video rather than audio tape. The lecture portion of the BASE can be accessed through the corpus analysis interface Sketch Engine, which can be accessed for a yearly €55 individual license subscription fee through <http://corpora.sketchengine.co.uk>, although this includes access to a number of other corpora as well, including the BNC, and the BASE’s written counterpart, the British Academic Written Corpus (BAWE). There is also a free 30-day trial subscription with full access to all resources. In addition, the text files can be downloaded from the BASE website, <http://www2.warwick.ac.uk/fac/soc/al/research/projects/resources/base>.

Corpus of Spoken Professional American English (CSPAE)[2 million spoken]The CSPAE contains 2 million words of spoken American English (1994–1998), including 1 million from White House question and answer sessions, and 1 million from academic discussions such as faculty council meetings and committee meetings related to testing. Two versions are available at



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

http://quod.lib.umich.edu/m/micase

http://lw.lsa.umich.edu/eli/micase/MICASE_OrderForm.pdf

http://lw.lsa.umich.edu/eli/micase/MICASE_OrderForm.pdf

http://corpora.sketchengine.co.uk

http://www2.warwick.ac.uk/fac/soc/al/research/projects/resources/base

http://www2.warwick.ac.uk/fac/soc/al/research/projects/resources/base

322 Resources

<http://www.athel.com/cspa.html>: $49 for the raw text version and $79 for the tagged version (both individual user license).

Santa Barbara Corpus of Spoken American English (SBCSAE)[250,000 spoken American English]The SBCSAE is based on hundreds of recordings of spontaneous speech from across the United States, including speakers from different regions, ages, occupations, and ethnic and social backgrounds. It represents spoken lan-guage in daily use as varied as gossip and bedtime stories to sales pitches and sermons. The corpus might be particularly useful for research into speech recognition, as each speech file is accompanied by transcript in which phrases are time-stamped, linking directly to the audio. The SBCSAE can be purchased from the LDC website (<http://www.ldc.upenn.edu/Catalog/index.jsp>).

Wellington Corpus of Spoken New Zealand English (WSC)[1 million spoken](<http://khnt.hit.uib.no/icame/manuals/wsc/INDEX.htm>)The 1 million spoken words in the WSC <http://www.vuw.ac.nz/lals/cor-pora/index.aspx> were collected between 1988 and 1994. The corpus is made up of 2,000 extracts of formal, semi-formal, and informal speech. The corpus contains an unusually high proportion of private material (75% of the corpus consists of informal dialogue, and 50% of private conversations), which makes the corpus a good candidate for research into informal spoken registers. It is available as part of the ICAME CD-ROM.

Vienna-Oxford International Corpus of English (VOICE)[1 million spoken](<http://www.univie.ac.at/voice>)The VOICE is a spoken corpus of English as a Lingua Franca. It consists of transcripts of a large number (1,250) of mainly nonnative speakers from approximately 50 different L1s (mainly European) using English to com-municate with each other in naturally occurring, non-scripted face-to-face interactions. About 10% of the speakers in the corpus are native English speakers. The 1 million words come from about 120 hours of recorded and transcribed lingua franca interactions. The corpus was releasd in May 2009, and it is freely available via a user-friendly on-line search interface.

Cambridge and Nottingham Corpus of Discourse in English (CANCODE)[5 million spoken](<http://www.Cambridge.ord/elt/corpus/cancode.htm>)Although unavailable to non-Nottingham staff and research students, it is worth mentioning the CANCODE (<http://www.cambridge.org/elt/corpus/cancode.htm>), simply because a great deal of research into spoken



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

http://www.athel.com/cspa.html

http://www.ldc.upenn.edu/Catalog/index.jsp

http://www.ldc.upenn.edu/Catalog/index.jsp

http://khnt.hit.uib.no/icame/manuals/wsc/INDEX.htm

http://www.vuw.ac.nz/lals/corpora/index.aspx

http://www.vuw.ac.nz/lals/corpora/index.aspx

http://www.univie.ac.at/voice

http://www.Cambridge.ord/elt/corpus/cancode.htm

http://www.cambridge.org/elt/corpus/cancode.htm

http://www.cambridge.org/elt/corpus/cancode.htm


discourse has been based on it, particularly by Ronald Carter, Michael McCarthy, and Svenja Adolphs, with the most prominent output being the Cambridge Grammar of English (Carter and McCarthy, 2006). It also informs the various student textbooks that Cambridge University Press produces. It is made up of 5 million words of unscripted British English, in contexts of use including casual conversation, workplace, and academic settings across different speaker relationships from intimate to professional.

7.2.3 Corpora representing national varieties of English

In addition to the corpora already discussed which focus on either American or British English, and Kolhapur (Indian English), ACE (Australian English), and WWC (New Zealand English) members of the ‘Brown family’ corpora, there are a number of other corpora which focus on the different national var-ieties of English. Perhaps the most interesting set belong to the International Corpus of English (ICE) (<http://www.ucl.ac.uk/English-usage/ice>). The ICE project was begun in 1990, with a goal of collecting material for the comparative study of English worldwide. Each national component of the ICE consists of 1 million words of spoken and written English produced after 1989. They are all compiled following the Brown corpus design (i.e. 500 texts × 2,000 words each) and a common scheme for grammatical annotation. They are available either through download from the ICE website, or on CD-ROMs available from addresses shown on the website.

ICE Great Britain(<http://www.ucl.ac.uk/english-usage/ice/icegb.htm>)ICE Hong Kong(<http://www.ucl.ac.uk/english-usage/ice/icehk.htm>)ICE East Africa(<http://www.ucl.ac.uk/english-usage/ice/iceea.htm>)ICE India(<http://www.ucl. ac.uk/english-usage/ice/iceind.htm>)ICE New Zealand(<http://www.ucl.ac.uk/english-usage/ice/icenz.htm>)ICE Philippines(<http://www.ucl.ac.uk/english-usage/ice/icephil.htm>)ICE Singapore(<http://www.ucl.ac.uk/english-usage/ice/icesin.htm>)

Hong Kong Corpus of Spoken English (HKCSE)[2 million spoken](<http://engl.polyu.edu.hk/department/academicstaf f/Personal/ChengWinnie/HKCorpus_SpokenEnglish.htm>)The HKCSE has 2 million words representing four spoken genres of nat-urally occurring speech: academic discourse (e.g. lectures, seminars,



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

http://www.ucl.ac.uk/English-usage/ice

http://www.ucl.ac.uk/english-usage/ice/icegb.htm

http://www.ucl.ac.uk/english-usage/ice/icehk.htm

http://www.ucl.ac.uk/english-usage/ice/iceea.htm

http://www.ucl.ac.uk/english-usage/ice/iceind.htm

http://www.ucl.ac.uk/english-usage/ice/icenz.htm

http://www.ucl.ac.uk/english-usage/ice/icephil.htm

http://www.ucl.ac.uk/english-usage/ice/icesin.htm

http://engl.polyu.edu.hk/department/academicstaff/Personal/ChengWinnie/HKCorpus_SpokenEnglish.htm

http://engl.polyu.edu.hk/department/academicstaff/Personal/ChengWinnie/HKCorpus_SpokenEnglish.htm

324 Resources

workshops), business discourse (service encounters, meetings, job inter-views), conversations, and public discourse (speeches, press briefings, dis-cussion forums). The corpus is made up of about 200 hours of speech. All of this has been transcribed orthographically, and 53% is also prosodically transcribed.

Scottish Corpus of Texts and Speech (SCOTS)[3.2 million written and .8 million spoken](<http://scottishcorpus.ac.uk/corpus>)The SCOTS corpus provides data on the Scottish national variety of English. It is accessed via an on-line interface that allows searches for words or phrases, which can be filtered along a number of parameters. The texts cover the period from 1945 to 2007, with most of the spoken texts dating from 2000. See the website for the on-line search engine and more details.

7.2.4 Corpora representing academic/business English

Various corpora described elsewhere in this section are either wholly or partly made up of academic language (e.g. MICASE, BASE, HKCSE, BNC). In addition to these, the following may be of interest.

Louvain Corpus of Native English Essays (LOCNESS)[324,304 words written]The LOCNESS corpus was compiled by the Centre for English Corpus Linguistics (CECL) at the Université Catholique de Louvain to provide a native baseline comparison to their ICLE learner corpus (see below). Mirroring the ICLE, it contains argumentative essays written by native uni-versity or pre-university students. They include 114 British A-level essays (60,209 words), 90 British university essays (95,695), and 232 American university essays (168,400). The corpus can be ordered from the CECL at <http://www.fltr.ucl.ac.be/fltr/germ/etan/cecl/Cecl-Projects/Icle/locness1.htm>.

Wolverhampton Business English Corpus[10 million written](<http://www.elda.org/catalogue/en/text/W0028.html>)The Wolverhampton Business English Corpus contains business English texts drawn from the 23 websites between 1999 and 2000. It is available in original web formatting, plain text, and SGML encoded files. In add-ition, a much larger corpus of written professional English (currently 17 million words, eventually 100+ million?) is currently being developed by the Professional English Research Consortium (PERC) (<http://www.corpora.jp/~perc04>).



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

http://scottishcorpus.ac.uk/corpus

http://www.fltr.ucl.ac.be/fltr/germ/etan/cecl/Cecl-Projects/Icle/locness1.htm

http://www.fltr.ucl.ac.be/fltr/germ/etan/cecl/Cecl-Projects/Icle/locness1.htm

http://www.elda.org/catalogue/en/text/W0028.html

http://www.corpora.jp/~perc04

http://www.corpora.jp/~perc04


7.2.5 Corpora representing young native English

The Child Language Data Exchange System (CHILDES)[about 20 million words covering 25 languages](<http://childes.psy.cmu.edu>)The CHILDES database is a widely-used resource for L1 developmental lan-guage. Transcripts of normal English-speaking children make up about half of the total CHILDES database, with other components including language impairments and bilingual acquisition. The data are transcribed in CHAT format and can be analysed using the CLAN programs available on the CHILDES website. CLAN allows lexical, morpho-syntactic, discourse, and phonological analysis. The database, analysis tools, and documentation are freely available on the CHILDES website. CHILDES is the child language part of the wider TALKBANK system, which also includes databanks for aphasia and conversation analysis (<http://talkbank.org>).

Bergen Corpus of London Teenage Language (COLT)[500,000 words spoken](<http://torvald.aksis.uib.no/colt>)The COLT corpus was collected in 1993 and consists of the language of 13–17-year-old teenagers from five different boroughs in London. The speak-ers in the corpus are classified into six age groups, gender, and three social classes, with most of the speech settings either school (48%) or home (32%). A 150,000 word subcorpus has been prosodically annotated. COLT can be ordered as part of the ICAME CD-ROM corpus collection, and holders of the CD-ROM can browse and search the corpus on-line.

7.2.6 Corpora representing learner English

International Corpus of Learner English (ICLE)[3+ million written](<http://cecl/fltr.ucl.ac.be/Cecl-projects/icle/icle.htm>)The ICLE is foremost of a number of learner-oriented corpora to come out of Sylviane Granger’s Centre for English Corpus Linguistics (CECL) at the Université Catholique de Louvain. It contains over 3 million words from advanced learners of English from 21 L1s (although some are incomplete): Bulgarian, Brazilian Portuguese, Chinese, Czech, Dutch, Finnish, French, German, Greek, Italian, Japanese, Lithuanian, Norwegian, Pakistani, Polish, Portuguese, Russian, South African (Setswana), Spanish, Swedish, and Turkish. The data consist of argumentative university essays written on a set of similar topics. The original corpus was not tagged, but a tagged version is forthcoming. The ICLE is available on CD-ROM for €181.50 from the I6doc.com website (<http://www.i6doc.com/I6Doc/WebObjects/I6Doc5.woa/wa/ClientDA/i6doc?language=EN&wosid=IU6GpIybsfVSOed0Ay5piw>).



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

http://childes.psy.cmu.edu

http://talkbank.org

http://torvald.aksis.uib.no/colt

http://cecl/fltr.ucl.ac.be/Cecl-projects/icle/icle.htm

http://www.i6doc.com/I6Doc/WebObjects/I6Doc5.woa/wa/ClientDA/i6doc?language=EN&wosid=IU6GpIybsfVSOed0Ay5piw



http://I6doc.com

326 Resources

Louvain International Database of Spoken English Interlanguage (LINDSEI)[100,000 spoken]LINDSEI is another Louvain corpus, being the spoken counterpart to ICLE. Each L1 subcorpus will contain transcripts of 50 15-minute interviews with third- and fourth-year university students. There is currently about 100,000 words from French students, and the website reports that other mother tongue components are currently being compiled. The corpus is scheduled for release in the near future.

Japanese EFL Learner Corpus (JEFLL)[700,000 words written](<http://jefll.corpuscobo.net>)The JEFLL (directed by Yukio Tono) is a collection of 20-minute in-class free compositions written by more than 10,000 Japanese EFL learners, mainly junior and senior high school students. The essays in each subcorpus are comparable across topics, proficiency, school years, school types, and other factors. The corpus should be available for on-line query by the time this book is published.

There are a number of other learner corpora specific to one particular L1 available. See general corpus references at the beginning and end of this section for more information.

7.2.7 Corpora representing languages other than English

Non-English corpora can either be parallel corpora or monolingual corpora. Parallel corpora contain translations of two or more languages of the same text, making it possible to compare those languages. Monolingual corpora are corpora of one non-English national language.

7.2.7.1 Parallel corpora

The Canadian Hansard Corpus[various sizes]A good example of a two-language parallel corpus is the Canadian Hansard Corpus. It contains legislative discourse from the country’s parliament pub-lished in French and English. One version has 1.3 million pairs of aligned text chunks (i.e. sentences or smaller fragments) from the Hansards (official records) of the 36th Canadian Parliament (1997–2000), making up about 2 million words each of English and French (USC) (<http://www.isi.edu/natural-language/download/hansard>). An on-line version (TransSearch: <http://transsearch.iro.umontreal.ca/help.cgi?topic=FAQ-tb&LJTLanguage=en&username>) includes all the Hansard texts from 1986 to 2006 (about 273 million words). Finally, a CD-ROM version covers the mid-1970s through 1988 (<http://www.ldc.upenn.edu/catalog/catalogentry.jsp?catalogID=LDC95T20>).



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

http://jefll.corpuscobo.net

http://www.isi.edu/natural-language/download/hansard

http://www.isi.edu/natural-language/download/hansard

http://transsearch.iro.umontreal.ca/help.cgi?topic=FAQ-tb&LJTLanguage=en&username

http://transsearch.iro.umontreal.ca/help.cgi?topic=FAQ-tb&LJTLanguage=en&username

http://www.ldc.upenn.edu/catalog/catalogentry.jsp?catalogID=LDC95T20


Richard Xiao’s survey (2008) describes a number of other two-language parallel corpora: English-Norwegian, English-Swedish, Slovene-English, Chinese-English, as well a several multiple-language parallel corpora.

European Parliament Proceedings Parallel Corpus (EUROPARL)(<http://www.statmt.org/europarl>)An interesting example of a multiple-language parallel corpus is EUROPARL. It covers proceedings from the European Parliament, and includes versions in 11 European languages: Danish, Dutch, English, Finnish, French, German, Greek, Italian, Portuguese, Spanish, and Swedish, with around 40 million words for most languages. There are also ten parallel two- language corpora, with each of the above languages matched with English (i.e. Danish-English, German-English), with each containing about 35 million English words and their parallel language equivalents. Although the language focuses on political issues, the availability of so many parallel languages offers many research opportunities. These corpus resources can be freely downloaded from the website.

7.2.7.2 Monolingual corpora

There is a welcome trend for the increasing compilation of non-English cor-pora and Table 7.2, taken from From Corpus to Classroom: Language Use and Language Teaching (O’Keeffe et al., 2007: 294–296), summarizes a number of these.

In addition, Mark Davies and the BYU website offers three Spanish/Portuguese corpora:

The Corpus del Español (CdE)[100 million written Spanish](<http://www.corpusdelespanol.org>)The Corpus del Español is the more comprehensive of the two Spanish corpora available online on the BYU website. It contains about 20 million words from the 1900s, 20 million from the 1800s, 40 million from the 1500s–1700s, and 20 million from the 1200s–1400s. It uses the same inter-face as other BYU corpora, which allows, among other things, searches and comparisons by frequency between the different genre/register categories (spoken, fiction, newspaper, and academic), and between different histor-ical periods (the centuries from the 1200s to the 1900s).

Corpus del Español: Registers[20 million written Spanish]The Corpus del Español: Registers (Davies, Biber, Jones, and Tracy, 2008, <http://www.corpusdelespanol.org/registers>) is an enhanced version of the 1900s component of the Corpus del Español, which has been equally divided between the spoken, fiction, newspaper, and academic genre/registers.



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

http://www.statmt.org/europarl

http://www.corpusdelespanol.org

http://www.corpusdelespanol.org/registers

328 Resources

Table 7.2 Some examples of non-English corpora

Banca dati dell’italiano parlato (BADIP)▪ 500,000 words of spoken Italian

developed at the University of Graz (Austria)

▪ Accessible on-line edition

<http://languageserver.uni-graz.at/badip/>

Basque Spoken Corpus▪ 42 narratives by native Basque/ Euskara

speakers, who tell the story of a silent movie they have just watched to someone else.

▪ Available with sound files in MP3 formatas well as transcripts

<http://www.elda.org/catalogue/en/speech/S0123.html>

Chambers-Rostand Corpus of Journalistic French▪ Almost 1 million words of journalistic

French▪ Made up of 1,723 articles published in

2002 and 2003, taken from three French daily newspapers: Le Monde, L’Humanité, La Dépêche du Midi

▪ Articles are categorized into types: editorial, cultural, sports, national news, international news, finance

<http://www.ota.ahds.ac.uk/%20texts/2491.html>

Chinese-English Translation Base▪ More than 100,000 English

translation units together with their Chinese translation equivalents and vice versa

<http://www.corpus.bham.ac.uk/ccl/Chinese.htm>

Corpus di Italiano Scritto (CORIS)▪ 100 million words of written Italian

sampled from categories such as press, academic prose, legal and administrative and ephemera

▪ Accessible online

<http://corpus.cilta.unibo.it:8080/ CORISCorpQuery.html>

Corpas Náisiúnta na Gaeilge/National Corpus of Irish▪ Consists of approximately 30

million words of text from a variety of contemporary books, newspapers, periodicals and dialogue

▪ Approximately 8 million words are SGML tagged

Corpas Na Gaeilge 1600–1882: The Irish language Corpus. 2004. Dublin: Royal Irish Academy.

Continued



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

http://languageserver.uni-graz.at/badip

http://languageserver.uni-graz.at/badip

http://www.elda.org/catalogue/en/speech/S0123.html

http://www.elda.org/catalogue/en/speech/S0123.html

http://www.ota.ahds.ac.uk/%20texts/2491.html

http://www.ota.ahds.ac.uk/%20texts/2491.html

http://www.corpus.bham.ac.uk/ccl/Chinese.htm

http://www.corpus.bham.ac.uk/ccl/Chinese.htm

http://corpus.cilta.unibo.it:8080/CORISCorpQuery.html

http://corpus.cilta.unibo.it:8080/CORISCorpQuery.html


Corpus Oral de Referenda del Español Contemporáneo. COREC▪ 1,100,000 of words of spoken Spanish

collected at Universidad Autónoma de Madrid

▪ Administrative, scientists, conversational and familiar, education, humanistic, instructions (megafonía), legal, playful, politicians, journalistic

<http://www.lllf.uam.es/corpus/corpus_oral.html>Sample of corpus<http://www.lllf.uam.es/corpus/corpus_lee.html#B4>

The CREA corpus of Spanish▪ 133 million words▪ Sampled from a wide range of written (90%)

and spoken (10%) text categories produced in all Spanish-speaking countries between 1975 and 1999 (divided into five-year periods). The domains covered in the corpus include science and technology, social sciences, religion and thought, politics and economics, arts, leisure and ordinary life, health, and fiction

▪ The texts in the corpus are distributed evenly between Spain and America

<http://www.rae.es/><http://corpus.rae.es/creanet.html>

Czech National Corpus (CNC)▪ Written component: 100 million words

including fiction and non-fiction texts▪ Spoken component: 800,000 words of

transcription of spontaneous spoken language sampled according to four sociolinguistic criteria: speaker sex, age, educational level, and discourse type

<http://ucnk.ff.cuni.cz/English/>

Hungarian National Corpus (HNC)▪ 153.7 million words of texts produced

from the mid-1990s onwards▪ Divided into five subcorpora, each

representing a written text type: media (52.7%), literature (9.43%), scientific texts (13.34%), official documents (12.95%), and informal texts (e.g. electronic forum discussion, 11.58%)

<http://corpus.nytud.hu/mnsz/index_eng.html>

Le corpus BAF (English-French parallel corpus)▪ Circa 400,000 words per language▪ Contains four subsets of texts:

institutional, scientific articles, technical documentation, Jules Verne’s novel De la terre á la lune in French and English

<http://rali.iro.umontreal.ca/>

Continued

Table 7.2 Continued



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

http://www.lllf.uam.es/corpus/corpus_oral.html

http://www.lllf.uam.es/corpus/corpus_oral.html

http://www.lllf.uam.es/corpus/corpus_lee.html#B4

http://www.lllf.uam.es/corpus/corpus_lee.html#B4

http://www.rae.es

http://corpus.rae.es/creanet.html

http://ucnk.ff.cuni.cz/English

http://corpus.nytud.hu/mnsz/index_eng.html

http://corpus.nytud.hu/mnsz/index_eng.html

http://rali.iro.umontreal.ca

330 Resources

‘This site allows you to find the frequency of nearly 150 different grammat-ical features (pronouns, tense, clauses, etc.) in 20 different registers of Modern Spanish (e.g. conversation, fiction, newspapers, academic). It also lets you see examples of each of these constructions in context, from a 20 million word corpus of Spanish, taken from the 100 million word Corpus del Español.’ You can perform two types of searches: finding the frequency of a given feature in all 20 registers, or finding which features are more common in one register than in another register (e.g. conversation versus. newspapers).

The Corpus do Português (CdP)[45 million written Portuguese]The Corpus do Português website (Davies and Ferreira, 2006, <http://www.corpusdoportugues.org>) is similar to the CdE site, except that it gives access to Portuguese texts. It contains more than 45 million words in almost 57,000 Portuguese texts. ‘There are 20 million words from the 1900s, 10 million from the 1800s, and 15 million words from the 1300s–1700s. For the 1900s, there are 6 million words from fiction, 6 million from newspapers and magazines, 6 million from academic texts, and 2 million from spoken. For each of these four genres (and therefore overall) the texts from the 1900s are evenly divided between texts from Portugal and texts from Brazil.’ The interface does all of the things mentioned in previous discussions of BYU corpora, including com-paring the frequency of and distribution of words, phrases, and grammatical constructions across texts, by genre/register, dialect (European and Brazilian Portuguese), and historical period (from the 1300s–1900s).

There are a number of other non-English corpora in various stages of com-pilation. Some of those which were completed by the time of publishing are briefly introduced below.

Lancaster Corpus of Mandarin Chinese (LCMC) (<http://www.lancs.ac.uk/fass/projects/corpus/LCMC>)A 1 million word Mandarin corpus which is part of the ‘Brown family’, and so has 500 samples of written text, each of about 2,000 words. The sampling window was 1991 +/– 3 years.

TRACTOR archive▪ Contains monolingual and multilingual

language resources available on-line in the following languages: Bulgarian, Croatian, Czech, Dutch, English, Estonian, Finnish, French, German, Greek, Hungarian, Italian, Latvian, Lithuanian, Polish, Romanian, Russian, Serbian, Slovak, Slovene, Swedish, Turkish, Ukrainian, and Uzbek

<http://www.corpus.bham.ac.uk/ccl/services.htm#tractor>

(O’Keeffe et al., 2007: 294–296)

Table 7.2 Continued



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

http://www.corpus.bham.ac.uk/ccl/services.htm#tractor

http://www.corpus.bham.ac.uk/ccl/services.htm#tractor

http://www.corpusdoportugues.org


http://www.lancs.ac.uk/fass/projects/corpus/LCMC


The Reference Corpus of Polish (PELCRA)(<http://korpus.ia.uni.lodz.pl>)The PELCRA corpus has 93 million words in 81,000 texts searchable on-line with a number of analysis tools.

Russian Reference Corpus (BOKR)(<http://corpus.leeds.ac.uk/ruscorpora.html>)The BOKR was designed as a Russian counterpart to the BNC. It follows 150 million words of modern Russian, following the sampling framework of the BNC. It can be searched on-line at <http://ruscorpora.ru/en/index.html>.

Hellenic National Corpus(<http://hnc.ilsp.gr/en>)The Hellenic corpus contains 47 million words of written modern Greek, mostly sampled from 1990 onwards. It can be searched on-line for free.

German National Corpus(<http://www.dwds.de/cgi-bin/rest/loginstart>)The German National Corpus is made up of two parts. The first is a balanced 100 million word ‘core’ roughly comparable to the BNC, and the second is a much larger opportunistic subcorpus. The website appears to be available in German only for the moment.

Another project is the huge German tracking corpus (currently more than 3 billion words) being compiled by the Instituts für Deutsche Sprache in Mannheim. It can be accessed through the COSMAS 2 interface at (<http://www.ids-mannheim.de/cosmas2>).

Slovak National Corpus(<http://korpus.juls.savba.sk/index.en.html>)The aim is to build the Slovak National Corpus into a 200 million word ref-erence, and at the time of writing, it was up to 30 million words, which is searchable.

Again, the websites at the end of this section and Xiao’s website/survey list a number of other non-English monolingual corpora, including Chinese, German, Portuguese, Dutch, Welsh, and Czech.

7.2.8 Corpus compilations

If you are an academic researcher, one way of easily obtaining multiple corpus resources is to buy the ICAME (the International Computer Archive of Modern and Medieval English) CD-ROM which includes 21 corpora for Norwegian Kroner 3,500 (US$690/UK£344/€435, as of July 2008). Below is a listing of these with a brief description. See <http://icame.uib.no> for purchasing information, samples from the corpora, and links to their manuals.



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

http://korpus.ia.uni.lodz.pl

http://corpus.leeds.ac.uk/ruscorpora.html

http://ruscorpora.ru/en/index.html

http://hnc.ilsp.gr/en

http://www.dwds.de/cgi-bin/rest/loginstart

http://www.ids-mannheim.de/cosmas2

http://www.ids-mannheim.de/cosmas2

http://korpus.juls.savba.sk/index.en.html

http://icame.uib.no

332 Resources

Written 1. Brown Corpus untagged/tagged 2. LOB Corpus untagged/tagged 3. Freiburg-LOB (FLOB) 4. Freiburg-Brown (Frown) 5. Kolhapur Corpus (India) 6. Australian Corpus of English (ACE) 7. Wellington Written Corpus (New Zealand) 8. The International Corpus of English – East African component

Spoken: 9. London Lund Corpus10. Lancaster/IBM Spoken English Corpus (SEC)11. Corpus of London Teenage Language (COLT)12. Wellington Spoken Corpus (New Zealand)13. The International Corpus of English – East African component

Historical:14. The Helsinki Corpus of English Texts: Diachronic Part15. The Helsinki Corpus of Older Scots16. Corpus of Early English Correspondence, sampler17. The Newdigate Newsletters18. Lampeter Corpus19. Innsbruck Computer-Archive of Machine-Readable English Texts (ICAMET)

Parsed20. Polytechnic of Wales Corpus21. Lancaster Parsed Corpus (LOB)

The Lexical Computing website also offers access to a number of English, Chinese, French, German, Greek, Italian, Japanese, Persian, Portuguese, Russian, Slovenian, and Spanish corpora, including the BNC and the BASE for a yearly €55 site licence subscription fee. See <http://www.sketchengine.co.uk> for details.

Other distributors of corpus resources include:

Centre for Spoken Language Understanding ●

<http://cslu.cse.ogi.edu/corpora/corpCurrent.html>European Language Resources Association ●

<http://www.elra.info>European Network in Language and Speech ●

<http://www.elsnet.org>European National Activities for Basic Language Resources ●



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

http://www.sketchengine.co.uk

http://www.sketchengine.co.uk

http://cslu.cse.ogi.edu/corpora/corpCurrent.html

http://www.elra.info

http://www.elsnet.org


<http://www.ilsp.gr/enabler/search_sel.asp>Linguistic Data Consortium ●

<http://www.ldc.upenn.edu>Oxford Text Archive ●

<http://ota.ahds.ac.uk>Trans-European Language Resources Infrastructure ●

<http://www.tractor.de>

7.2.9 Web-based sources of corpora

The above discussion discusses electronic corpora, but what about using the internet as a source, given that it has a massive size which no corpus can match? It definitely has its uses (see below), but first it is useful to highlight its limitations. Although Google (and other search engines) are good at finding internet matches to user queries, Mark Davies (<http://www.americancorpus.org>) outlines a number of advantages corpora have over internet searches. First, internet search engines are not good at semantically-based searches, because they do not do collocates, and much of the meaning of lexical items depends on the context those items reside in, including their collocations. Also, the search engines can’t use collo-cates to compare word meanings in different genres, or to see how they’re changing over time. It is also difficult to search by words that are related in meaning, such as all of the synonyms of a given word.

Second, the search engines do not allow searches by part of speech or lemma (e.g. all of the forms of a word), which makes grammatical ana-lysis difficult. Third, search engines do not really facilitate researching the changes in linguistic elements over time (e.g. is wireless used more or less now than twenty years ago?). Fourth, neither are they very handy at looking at differences between different styles or types of English.

Fifth, you need to know the words you are looking for with search engines in order to type in the query. The engines are not designed for you to set up parameters and then letting them find relevant words for you. On the other hand, corpus concordancers will produce lists of words for you, e.g. the most frequent words in academic speech. Finally, search engines do not give accurate figures for the frequency of word strings. Davies relates a search he made for the word string might be taken for a. Google showed 92,400 hits. However, when he paged through the hits, they ran out at about 530. According to Davies, ‘Google usually doesn’t “know” the frequency of anything more than single words – it’s usually just guessing.’ In this case, the Google ‘guess’ was about 200 times what it should have been.

David Lee (the Web as a Corpus page of his www.devoted.to/corpora website) adds several more ‘quality control’ warnings: (1) not everything on the web is language of a high standard, as many native speakers write (and type) badly; (2) nonnative speakers of a language (especially English) put up web



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

http://www.ilsp.gr/enabler/search_sel.asp

http://www.ldc.upenn.edu

http://ota.ahds.ac.uk

http://www.tractor.de



http://www.devoted.to/corporawebsite

334 Resources

pages too, and the quality is highly variable (just as with natives); (3) search engines such as Google give different results on different days, and have gaps, omissions and inclusions that are hard to explain (due to copyrighted, proprietary technology). Finally, Ute Römer pointed out to me that corpora are usually compiled in a principled way to answer particular research ques-tions. However, with the web, you do not know what the total ‘corpus’ con-sists of, which makes reasonable analysis and interpretation difficult.

Nevertheless, given the vast amount of language on the internet, it remains a very interesting alternative. This is a particularly true for lan-guages with no established corpora available (still the majority of languages around the world). There are several ways to harness the internet. The most basic is to use a search engine like Google, but this has serious disadvantages as we have seen. A better approach is to use software specially designed to extract specified elements from the vast internet pool. In this approach, software can extract words/phrases/texts from the internet according to pre-determined criteria.

One good way to do this is through Webcorp (<http://www.webcorp.org.uk>). It is a suite of tools which allows access to the internet as a corpus. It works like a normal concordancer, providing frequency lists (in frequency or alphabetical order) of particular web pages. This can be useful for find-ing words/phrases which are too new or too rare to appear in normal cor-pora. Webcorp also supplies concordance lines for target lexical items, which can be sorted left or right of the node. You can use wildcards and lemma searches. You can also limit the search to particular site domains. Perhaps the most exciting feature is the ability to call up the collocates of the target search item (something search engines like Google cannot do). Webcorp’s abilities begin to address many of the shortcomings outlined above, and with it, the internet can start to be used as a legitimate corpus-like resource. However, there is one snag limiting Webcorp’s utility, and that is its speed. The searches are slow (they reminded me of my multi-minute corpus searches in the mid-1990s), but the website suggests that increases in speed will be forthcoming.

A software program designed to select texts from the Internet according to predefined criteria is REAP (<http://reap.cs.cmu.edu>). It is a pedagogical tool which first tests learners for their vocabulary knowledge, and then selects texts from the internet based on the learner’s indication of interest in a number of topic categories (e.g. science, sports), and other criteria such as reading level, text contiguity, and length. Those texts then have the target vocabulary highlighted with hyperlinks that take the learner to an electronic dictionary entry for the target word. In addition, any other word in the text can be clicked on to call up the dictionary entry for it. REAP tracks the learn-er’s progress (completed documents, word look-ups, vocabulary exercises completed, comprehension question results, and a list of the learner’s focus vocabulary), and this is summarized for the teacher or researcher in a table



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

http://www.webcorp.org.uk

http://www.webcorp.org.uk

http://reap.cs.cmu.edu


format. Although the purpose of the program is pedagogical, it should also be possible to use it for text selection when building a corpus from the inter-net, in order to achieve some sort of balancing. It is also an exciting research tool for the acquisition of vocabulary from both incidental (reading) and in-tentional (dictionary glossing and vocabulary activities) approaches.

David Lee’s website also lists a host of other concordancers designed to analyze the web, including KWiCFinder/WebKWiC and WebConc.

7.2.10 Bibliographies concerning corpora

In addition to the corpora references already given, the following have a great deal of useful information:

Stanford Natural Language Processing Group(<http://nlp.Stanford.edu/links/statnlp.html>)The NLP list of resources contains references to a large number of corpora, corpus tools, and other corpus-related stuff.

Yukio Tono’s corpus website(<http://leo.meikai.ac.jp/~tono/lcorpuslist.html>)This site gives information on a number of learner corpora in Europe, America, and Asia.

Mike Scott’s Web(<http://www.lexically.net/wordsmith/corpus_linguistics_links/index.html>)Mike Scott’s links to other corpus linguistics websites.

Yvonne Breyer’s Gateway to Corpus Linguistics website(<http://www.corpus-linguistics.de>)Descriptions and links to corpora, concordancers, markup tools, a bibliog-raphy, research centers, and other corpus resources.

7.3 Concordancers/tools

There is now a large and diverse array of language analysis tools available. David Lee’s website has perhaps the most comprehensive listing of these resources. I have selected some of the better-known and more widely-used tools for com-ment below, but see his website for a much, much wider range of possibilities.

Concordancing packages

WordSmith Tools(<http://www.lexically.net/wordsmith/index.html>)This is the concordancing package of choice among most of the corpus lin-guists that I know. It does most of what you want, and more recent versions



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

http://nlp.Stanford.edu/links/statnlp.html

http://leo.meikai.ac.jp/~tono/lcorpuslist.html

http://www.lexically.net/wordsmith/corpus_linguistics_links/index.html

http://www.lexically.net/wordsmith/corpus_linguistics_links/index.html

http://www.corpus-linguistics.de

http://www.lexically.net/wordsmith/index.html

336 Resources

have been better able to handle the larger corpora, like the BNC. The most recent 5.0 version also has ConcGrams, which allows the investigation of ‘open slot’ formulaic sequences (see below). It is available for download for £50. If you are serious about vocabulary research, it is well worth the money.

MonoConc Pro(<http://athel.com/product_info.php?products_id=28>)This is the other major player in commercial concordancers. It is similar to WordSmith, but perhaps not quite as customizable. It costs $85.

AntConc(<http://www.antlab.sci.waseda.ac.jp/software.html>).David Lee considers this the best free concordancer available. It does most of the things that commercial concordancers do, including frequencies, con-cordances, collocations, and clusters/N-Grams.

Web-based concordancersWmatrix(<http://ucrel.lancs.ac.uk/wmatrix>)Wmatrix is a suite of tools developed by Paul Rayson for corpus annotation and analysis. It is a web-based environment which is accessed via a web browser through a password (after a one-month trial period has expired, a yearly fee of around £50 applies). A ‘tag wizard’ will automatically tag your corpus for grammatical part-of-speech with the CLAWS utility, and for semantic category with USAS utility. Wmatrix also generates frequency lists (lemmatized option available) and concordances, either all-inclusive or according to the POS and semantic tags. The program also does key-word comparisons and N-Grams. A interesting addition is Collapsed-Grams (C-Grams), which are merged versions of the N-Gram lists. They show you which 2-grams are subsets of 3-grams, which 3-grams are subsets of 4-grams, and so forth:1

Phrases in English (PIE)(<http://pie.usna.edu>)PIE (by Bill Fletcher) allows the free on-line phraseological interrogation of the BNC (World version) in strings up to eight words long. It incorporates a database of N-Grams from one to six words long occurring three or more

at the end of world war iithe end of world war ii

the end of world warend of world war

end of world.



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

http://athel.com/product_info.php?products_id=28

http://www.antlab.sci.waseda.ac.jp/software.html

http://ucrel.lancs.ac.uk/wmatrix

http://pie.usna.edu


times in the BNC. It is possible to explore the N-Gram lists and their fre-quencies, search for specific N-Grams and/or their collocations, e.g. 2-grams of the pattern ‘ADJ day’, to find the most frequent adjectives describing day. There is a phrase pattern discovery tool, and the ability to see N-Gram con-cordances from the BNC. In essence, PIE is a kfNgram program (see below) for the BNC.

Word Neighbors(<http://wordneighbors.ust.hk>)Word Neighbors is a basic pedagogically-oriented web concordancer which might be useful to researchers as a quick reference source. It shows the various derivatives of the target word or phrase and their frequencies, e.g. for stimulate, it shows stimulus (n. 2107), stimuli (n. 1276), stimulation (n. 962), stimulants (n. 85), stimulant (n. 79), stimulate (v. 1235), stimulated (v. 821), stimulates (v. 236), stimulating (v. 130), stimulating (adj. 691), stimulated (adj. 197), stimulant (adj. 23). Word Neighbors also gives collocations up to a +/– 4 word span, and looks for phrases up to 7 words. The program is linked to several dictionaries for definitions. Word Neighbor’s limitations in-clude sometimes questionable part-of-speech classifications. However, this is a problem common to all corpora which have been coded by automatic tagging software, and the relevant caveat is prominently given on the web page. Also, while the total available texts total a very respectable 141 million words (divided into seven categories), there is no documentation concern-ing the source of these texts, and so it is difficult to know how represen-tative they are. Nevertheless, Word Neighbors is likely to be a useful first point of reference when researchers need a quick idea of a word or phrase’s characteristics.

Just the Word(<http://193.133.140.102/JustTheWord>)A very quick and easy website that directly gives collocations for a search word, without the concordance lines. It shows results by POS, and graph bars give an indication of the t-score strength. Results are based on an 80 million word subset of the BNC.

Concorcordancers for identifying ‘open slot’ patterns

kfNgram(<http://www.kwicfinder.com/kfNgram/kfNgramHelp.html>)A number of concordancing programs generate N-Grams, that is, contiguous sequences of words of varying length, for example 2-grams (bigrams), 3-grams (trigrams), 4-grams, and so on. However as previously explained, some of the most interesting and potentially important patterning in lan-guage consists of sequences with some fixed elements and some ‘open slot’ elements (see Section 3.5). There are now software programs available which



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

http://wordneighbors.ust.hk

http://193.133.140.102/JustTheWord

http://www.kwicfinder.com/kfNgram/kfNgramHelp.html

338 Resources

can identify these flexible sequences. One of these programs is kfNgram, written by Bill Fletcher, and freely available. It generates lists of N-Grams, but also of ‘phrase-frames’ (also known as skipgrams), i.e. groups of N-Grams which are identical but for a single word. An example from the website illus-trates this. From BNC written text data, kfNgram identified the open slot sequence as * as the, with a frequency of 4,566, and 5 variants. This infor-mation is shown as follows:

as * as the 4566 5as well as the 2674as far as the 874as soon as the 652as long as the 316as much as the 50

ConcGrams(<http://www.edict.com.hk/concordance/default.htm>)Another program which can identify open slot sequences is the ConcGram List Builder ($20). It is similar to kfNgram, but has the advantage that the words do not have to be in the same order. This is important, as the follow-ing discussion summarized from the website explains.

There are many formulaic sequences which do not occur in one fixed grammatical pattern. The relationship of verbs/adverbs, verbs/nouns, nouns/adverbs, quantifier/noun, and many other sequence components are flex-ible and may occur in non-fixed patterns. For example, most adjectives can be used both attributively and predicatively. The bigram challenging exercise would show in an N-Gram search, but when the adjective is used predica-tively as in the exercise turned out to be quite challenging, it would not. The positions for challenging in this case would be –1 and +6 respectively, but in both cases, the collocation is important.

The result is that many co-occurrence patterns that occur in non- contiguous sequences may not be discovered by traditional N-Gram analysis. An additional problem is that user-nominated searches are limited by the requirement that the user must enter (and therefore know) items to enable the search to take place. The automated concgram search provided by ConcGram is able to reveal all formulaic sequence patterns (both contiguous and non-contiguous in a corpus, with both positional (AB, BA) and constituent (ACB) variation) and, since it is automated, the user does not have to first enter one or more search items.

To do this, it starts by automatically extracting all of the 2-word con-cgrams in the corpus. It can then use these to build up a list of 3-word, 4-word, and 5-word concgrams. Alternatively, the user can select the initial word to build from. Either way, quite a lot of patterns are identified, and so the program also has statistical tests for determining statistically significant



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

http://www.edict.com.hk/concordance/default.htm


cut-off points to limit the lists to the most important sequences. The t-score test will tend to identify the more frequent patterns (customer service) and the MI score will highlight the strongly associated (Coca Cola).

One shortcoming of ConcGrams is that it is too slow to be practical for large corpora, e.g. the BNC, but works fine with smaller corpora/subcorpora of 1 or 2 million words. While this may be addressed in future versions of ConcGrams, for the moment, the current versions of kfNgram and Collocate work better with larger corpora.

Collocate(<http://www.athel.com>)Michael Barlow’s Collocate program ($45) does similar things to the above two programs, and includes frequency and statistical information about col-locations and N-Grams found. It also allows the user to specify search words around which N-Gram extractions are made. The program seems to work with larger corpora, with Barlow illustrating an analysis on a 46 million word corpus.

FrameNet/FrameGrapher(<http://framenet.icsi.berkeley.edu>)The Berkeley FrameNet project has developed a very interesting alternative way at looking at the patterning in language based on Charles Filmore’s ideas on frame semantics. While the above packages search for lexical strings with both fixed and open components, FrameNet (and its graphic interface FrameGrapher – both free) illustrate the semantic relationships around a word based on case relationships (e.g. agent, goal, circumstances, degree). The program has 825+ ‘frames’ ranging from abandonment, abounding_with, absorb_heat, all the way to within_distance, word_relations, and working_on. A very small sample (only 6 of the 19 case information segments) of the frame ‘Destroying’ from the website illustrates the kind of information that FrameNet provides. The output is in color; in the example below, these are rendered as follows:

Red = BOLD CAPSLight blue = ITALIC CAPSSky blue = Bold + Italics CAPSDark blue = UNDERLINED CAPSBlack = BOLD + UNDERLINED CAPS

DestroyingDefinition:A DESTROYER (a conscious entity) or CAUSE (an event, or an entity involved in such an event) affects the UNDERGOER negatively so that the UNDERGOER no longer exists.



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

http://www.athel.com

http://framenet.icsi.berkeley.edu

340 Resources

Core:CAUSE [CAUSE] The event or entity which is responsible for the

destruction of the UNDERGOER. TORNADOS VAPORIZED this town a few decades back.

DESTROYER [AGT] The conscious entity, generally a person, thatSemantic Type performs the intentional action that results in

the UNDERGOER’s destruction.Sentient WHO can UNMAKE the ring?

UNDERGOER [UND] The entity which is destroyed by the DESTROYER.

Who can UNMAKE THE RING?

Non-Core:

DEGREE [DEGR] The degree to which the destruction is completed.

Semantic Type Degree I DESTROYED all signs of our presence

completely.

MEANS [MNS] An intentional action performed by the Semantic Type DESTROYER that accomplishes the destruction.State_of_affairs Samptu OBLITERATED the land of

Abde WITH A GREAT FLOOD, leaving only the sea.

Lexical Unitsannihilate.v, annihilation.n, blow up.v, demolish.v, demolition.n, destroy.v, destruction.n, destructive.a, devastate.v, devastation.n, dismantle.v, dismantlement.n, lay_waste.v, level.v, obliterate.v, obliteration.n, raze.v, unmake.v, vaporize.v

In addition, the case relationships are illustrated graphically:

Thus FrameGrapher gives the ‘big picture’ of how words associate with each other to create meaning, rather than just which words sequence together. As such, it would seem to be a very useful complement to N-Gram/ConcGram analyses. In addition to the English version, Spanish FrameNet was just released at the time of writing.



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


Compleat Lexical Tutor (Lextutor)

(<http://www.lextutor.ca>)This website is so good it deserves a category of its own. Created and con-tinuously updated and improved by Tom Cobb in Montreal, Lextutor is the most essential tool in the vocabulary researcher’s toolbox. It has a number of really useful functions, some of which are described below. Fabulous.

Frequency analysis ● Cut and paste a text into the web window (alterna-tively download larger texts) and Lextutor tells which frequency band the words in the text belong to, up to the 20,000th level (which will typically be all or nearly all of the words). The results are given in three ways. First, a frequency summary is given, showing what percentage of the text lies in each frequency band (see Table 5.3, p. 209, for an example of this, although it does not do justice to the colorized web output). Second, the text is given, with each word color-coded for fre-quency. Finally, lists of the words in each frequency band are given, according to token, type, and word family. This tool is excellent for get-ting an overview of the frequency profile of a text, and in highlighting low-frequency vocabulary that may be a problem for lower-proficiency learners in a study.Range analysis ● The Range programs tell you about the distribution of words or other lexical units across a set of two or more texts. The texts can be comparable corpora or subdivisions of a corpus, or a set of texts supplied by a user. Lextutor can use its internal corpora to make comparisons be-tween speech and writing in English (using BNC Sampler data), between speech and writing in French (150,000 words of each), and between the Press, Academic, and Fiction components of the Brown Corpus. You can also upload up to 25 of your own texts and see how many of them each word appears in, and in which specific texts it appears.

Objective_influence Change_of_state_initial_state

Transitive_action

38 childrentotal

Cause_to_fragment

Change_of_state_endstateEvent

Destroying

(FrameGrapher, accessed July 2008)



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


342 Resources

Vocabulary Tests ● There are several vocabulary tests available, including the Vocabulary Levels Test, Vocabulary Size Test, the Word Associates Test, a test of the first 1,000 words, and a checklist test.Other tools ● A range of other corpus-based research tools include a concord-ancer, frequency word lists, an N-Gram extractor, a frequency level-based cloze passage generator and a traditional nth-word cloze builder. There are also tools for helping to build your own corpora.Reaction time experiment builder ● Lextutor has ventured into the psycholin-guistic paradigm with a basic reaction-time experiment builder. You type in the words to be recognized, and the nonword distractors, and the pro-gram will build a word-recognition experiment where participants type 1 for ‘real word’, 3 for ‘nonword’, and 2 to move to the next stimulus. It then gives reaction time summaries for each of the real words.Pedagogical tools ● It is important to note that Lextutor is as useful for peda-gogic purposes as research ones, with features such as concordance line builders, spelling information and activities, and cloze builders. Teachers would be well advised to become familiar with these and other Lextutor features.

Tools for showing semantic associations

WordNet(<http://wordnet.princeton.edu>)WordNet is a freely-downloadable program which provides a range of infor-mation about queried words. It first acts like a dictionary, giving the various meaning senses, with definitions and examples. It then shows the various derived forms. It also gives thesaurus-like information, providing lists of synonyms, antonyms, hypernyms (X is one way to ...), and troponyms (par-ticular ways to ...), as well as information as to how commonly the word is used. It is a quick and easy resource for obtaining semantically-based infor-mation about vocabulary of interest.

WordNet is perhaps most accessible with a graphical interface, so that all of the associative links are more obvious. One free internet site that does this is the Visuwords Online Graphical Dictionary (<http://www.visuwords.com>). You type in a word, and it produces a 3D network of the connections, color-coded for word class (nouns = blue, verbs = green) and connection type (is a part of = turquoise, opposes = red). Rolling the cursor over any of the nodes brings up definitions and examples. A commercial graphical inter-face, Visual Thesaurus, costs $40 and has more features (<http://www.visu-althesaurus.com>). It allows you to rotate the 3D networks in any direction, and when you click on any of the nodes, that node automatically starts its own new network. This makes browsing through the semantic space around a topic area very easy.



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

http://wordnet.princeton.edu

http://www.visuwords.com

http://www.visualthesaurus.com

http://www.visualthesaurus.com


Experiment generator packages

E-Prime(<http://www.pstnet.com/products/e-prime>)This is one of the mainstream commercial programs which facilitates the design, administration, and analysis of psycholinguistic research designs, such as word recognition, reaction time, and a multitude of others. It takes some time to learn, but once mastered, allows great flexibility in research design, very precise timing of experimental output, and allows for easy randomiza-tion of stimuli. It is expensive ($795–995) and so is probably more practical for research groups or university departments than for individual researchers.

DMDX(<http://www.u.arizona.edu/~kforster/dmdx/dmdx.htm>)DMDX is a Win 32-based display system used in psychological laboratories around the world to measure reaction times to visual and auditory stimuli. It was programmed by Jonathan Forster at the University of Arizona. It is free software, but requires some technical expertise to use.

MiniJudge(<http://www.ccunix.ccu.edu.tw/~lngproc/MiniJudge.htm>)MiniJudge is a free on-line tool designed to allow researchers to gather par-ticipant judgements about linguistic features. Although originally created for syntacticians, it looks as if it can also be used for lexical judgements.

Program for handling speech

PRAAT(<http://www.fon.hum.uva.nl/praat>)Lexical researchers interesting in working with oral vocabulary might want to consider PRAAT, a free, comprehensive speech analysis, synthesis, and manipulation package. It can analyze speech according to a number of parameters, manipulate existing speech (e.g. change pitch and duration contours), synthesize new speech, and set up listening experiments.

Translation software

There are several internet translators available (e.g. iGoogle Translate, and Babel Fish), as well as numerous commercial translation packages, many of them expensive. Babylon is one that gets good reviews, and a free version is available at their website (<http://www.Babylon.com>).

Statistical packages and other analysis tools

SPSS(<http://www.spss.com>)The most widely-used statistical package in applied linguistics is SPSS (Statistical Package for the Social Sciences). It carries out a wide range of



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

http://www.pstnet.com/products/e-prime

http://www.u.arizona.edu/~kforster/dmdx/dmdx.htm

http://www.ccunix.ccu.edu.tw/~lngproc/MiniJudge.htm

http://www.fon.hum.uva.nl/praat

http://www.Babylon.com

http://www.spss.com

344 Resources

statistical procedures, and is widely supported by after-market instruc-tional material. My students and I have found Andy Field’s (2005) manual to be very useful, explaining the statistics themselves and their underlying assumptions, and also giving clear step-by-step instructions on how to make SPSS work. SPSS comes in modules, and is expensive, but there seem to be discounts for academic staff and students.

AMOSPart of the SPSS family, AMOS is a popular package for carrying out struc-tural equation modelling.

NVivo(<http://www.qsrinternational.com/products_nvivo,aspx>)NVivo is a software program that facilitates the organization and explor-ation of qualitative data, helping to find trends in output like interview data. It allows the user to interrogate the data and create categories and con-nections, which NVivo can then graphically illustrate, helping to make the underlying patterns more salient. It can thus be useful for organizing and understanding qualitative data, such as learners’ opinions about vocabu-lary learning which are gathered during open-ended interview sessions or through the collection of their email messages.

ITEMAN(<http://www.assess.com/xcart/product.php?productid=234>)A very useful program for doing classical test item analysis. This can help to determine the effectiveness of key and distractor options on multiple choice vocabulary tests, as well as analysing survey (e.g. Lickert-type rating scale) data. It is produced by Assessment Systems Corporation.

ConQuest(<http://shop.acer.edu.au/acer-shop/group/CON2>)While classical test analyses (e.g. with ITEMAN) can be just as informative as IRT (Item Response Theory) analyses if the testing population and re-sponse behaviors are relatively homogeneous (e.g. Cseresznyés, 2008), the more technical IRT approach can be useful with more divergent test output (as is often the case with vocabulary scores). There are three main software programs which do IRT analysis. The first is ConQuest. It is available with manual for AUS$699 from the online ACER website.

WINSTEPS and Facets(<http://www.winsteps.com>)These two programs are also popular means for doing IRT analyses. WINSTEPS does Rasch analysis for persons and items, while Facets is capable



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

http://www.qsrinternational.com/products_nvivo,aspx

http://www.assess.com/xcart/product.php?productid=234

http://shop.acer.edu.au/acer-shop/group/CON2

http://www.winsteps.com


of many-facet analyses for persons, items, judges, tasks, and more. Both are available at the website for $149 each.

7.4 Vocabulary lists

Frequency lists

There are a number of frequency lists based on the BNC and other corpora. Here are a few of them.

BNCThe book Word Frequencies in Written and Spoken English (Leech, Rayson, and Wilson, 2001) gives fairly comprehensive BNC frequency data. The com-panion website (<http://www.comp.lancs.ac.uk/ucrel/bncfreq/flists.html>) gives frequency lists for the whole BNC, for the spoken versus written com-ponents, for the conversational (i.e. demographic) versus task-oriented (i.e. context-governed) parts of the spoken component, and for the imaginative versus informative parts of the written component. There are also ranked frequency word lists according to parts of speech (e.g. all verbs), as well as frequencies for individual part-of-speech tags (e.g. NN1, VDG) based on the BNC Sampler.

Adam Kilgarriff’s BNC website (<http://www.kilgarriff.co.uk/bnc-readme.html>) includes lemmatized and unlemmatized frequency lists in various formats, and variances of word frequencies. However, the above Leech, Rayson, and Wilson book/website uses a newer text classification system, and contains fewer word-tagging errors, and so largely supersedes the Kilgarriff lists.

Brown CorpusThe word list from the Brown Corpus, originally published as the Computa-tional Analysis of Present-Day American English (Kucera and Francis, 1967), is available on the web at the following sites:

MRC Psycholinguistic Database <http://www.psych.rl.ac.uk> (Not disam-biguated by parts of speech)ICAME word lists <http://khnt.hit.uib.no/icame/wordlist/index.htm> (POS-differentiated version)

SUBTLEXus

The Department of Experimental Psychology at the University of Ghent has a web page presenting the SUBTLEXus frequency lists. They are based on a corpus of 51 million words of subtext from American movies (34.9m) and television series (16.1m). It was found that the frequency lists from this corpus accounted for more of the variance in accuracy and reaction times in psycholinguistic studies than the Kucera and Francis (1967) and CELEX



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

http://www.comp.lancs.ac.uk/ucrel/bncfreq/flists.html



http://www.psych.rl.ac.uk

http://khnt.hit.uib.no/icame/wordlist/index.htm

346 Resources

(1993) frequency data which psycholinguists often use. This is partially be-cause the SUBTLEXus corpus is larger than the these corpora (Kucera and Francis = 1m; CELEX ≈ 18m), but the SUBTLEXus also better reflects the fact that most people in America watch more television and movies than read books and newspapers. The lists can be accessed through the University of Ghent website (http://expsy.ugent.be/subtlexus/).

Other corporaThe ICAME word list website also contains frequency lists of the POS-tagged LOB (1960s, BrE), untagged FLOB (1990s, BrE), and untagged FROWN (1990s, AmE) corpora.

English trigram frequenciesIn addition to the frequency of words, there is a list of the most frequent tri-grams in English (e.g. the, and, ing, tio), based on frequency per 10,000 words of the Brown Corpus. See <http://home.ccil.org/~cowan/trigrams>.

Word/phrase Lists

Academic Word List (AWL)(<http://www.victoria.ac.nz/lals/resources/academicwordlist/default.aspx>)Averil Coxhead’s website at Victoria University of Wellington provides infor-mation on the AWL. The full AWL is given in ten AWL sublists with head-words and derivative forms. There is also a list with headwords only, and one with the most frequent words listed by sublist. The much-cited article outlining the development of the list is in TESOL Quarterly (Coxhead, 2000), but there is some background information about the list and the underlying academic corpus on the site. There are also web links to useful pedagogic sites which make use of the AWL, including Sandra Haywood’s (see below). The list was originally created as a Masters dissertation project, and shows the type of research that postgraduate students can do at the Masters level if they have ambition and good supervision.

The General Service List (GSL) John Bauman’s General Service List page (<http://jbauman.com/aboutgsl.html>) gives background information about the GSL and frequency/rank listings. The original GSL list was modified by Bauman and Brent Culligan to include headwords according to the standard set out in Bauer and Nation (1995), which led to a total of 2,284 headwords. They then attached fre-quency information to these headwords based on frequency figures from the Brown Corpus. The resultant list is given in rank order. The GSL list is also available on Sandra Haywood’s website, divided in 500 word segments.

James Dickins offers an extended version of the GSL downloadable in Excel format (<http://www.languages.Salford.ac.uk/staff/dickins.php>). This allows sorting of the list in a number of ways, including:



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

http://expsy.ugent.be/subtlexus

http://home.ccil.org/~cowan/trigrams

http://www.victoria.ac.nz/lals/resources/academicwordlist/default.aspx

http://jbauman.com/aboutgsl.html

http://jbauman.com/aboutgsl.html

http://www.languages.Salford.ac.uk/staff/dickins.php


order in which they appear in the printed GSL ●

headwords ●

lemmatized headword ●

McArthur category ●

word class ●

word count. ●

Academic Formulas List (AFL)Simpson-Vlach and Ellis (in press) developed a list of three- to five-word for-mulaic sequences (which they term formulas) which are typical of academic discourse. They list 200 formulas which are more common in written aca-demic discourse compared to written non-academic discourse. Similarly they list 200 which are more common in spoken academic discourse. They also identify 207 formulas which were relatively more frequent in both written and spoken academic discourse, which they consider core formu-las. The formulas are categorized into a number of functional categories: referential expressions (identification and focus, contrast and comparison, diectics and locatives, vagueness markers), stance expressions (hedges, epi-stemic stance, obligation and directive, ability and possibility, evaluation, intention/volition), and discourse organizing expressions (metadiscourse and textual reference, topic introduction and focus, topic elaboration, discourse markers). Simpson-Vlach and Ellis’s selection procedure for these formulas is interesting in that it combined three main criteria (range, MI, and fre-quency) in a way that was determined empirically (Ellis, Simpson-Vlach, and Maynard, 2008).

Function Word ListNation (2001) includes a list of the function words in English in Appendix 6, pp. 430–431.

7.5 Websites

A number of scholars/institutions host language-based websites which in-clude various material useful to the vocabulary researcher and teacher. Below are some of the most notable.

Paul Nation’s LALS Vocabulary website(<http://www.Victoria.ac.nz/lals/staff/paul-nation/nation.aspx>)The leading specialist in second-language vocabulary pedagogy has a personal website well worth visiting. To start with, his personal pub-lications list is a mini vocabulary bibliography in itself, and many are downloadable. He also offers his large vocabulary bibliography, sorted al-phabetically and by topic. The RANGE program, with either GSL/AWL lists or with BNC lists, is provided available for download. The website includes the GSL and AWL word lists, but in addition has a very interesting



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

http://www.Victoria.ac.nz/lals/staff/paul-nation/nation.aspx

348 Resources

set of lists of survival vocabulary for 19 languages which includes com-mon expressions like greetings and closings, numbers, units of money and weight and size, directions, and conversation gambits (e.g. please speak slowly). There is a list of graded readers divided into difficulty (i.e. vocabulary size) level.

One of the highlights of the website is the multitude of vocabulary tests available:

one receptive version of the revised Vocabulary Levels Test (VLT) ●

two productive versions of the VLT ●

bilingual 1,000 and 2,000 receptive versions of the VLT (Chinese, ●

Indonesian, Japanese, Russian, Samoan, Tagalog, Thai, Tongan, Vietnamese)a basic True/False VLT version focusing on the first 1,000 word level. It is ●

aimed at beginners, using very simple vocabulary and pictures to define the target wordsa monolingual English version of the Vocabulary Size Test (VST) ●

a bilingual Mandarin version of the VST. ●

Finally, for any researchers or students needing inspiration about vocabu-lary research topics, Nation offers a multitude grouped according to 11 categories, mirroring the organization of his book Learning Vocabulary in Another Language (2001).

Paul Meara’s _lognostics Vocabulary website(<http://www.lognostics.co.uk>)Meara’s _lognostics website includes a variety of material focusing on vocabu-lary acquisition, and features the VARGA (Vocabulary Acquisition Research Group Archive), which contains annotated bibliographies of most of the research on vocabulary acquisition since 1970. You can download the bibli-ography by individual year, or search the website database through keyword and range of years. This is the best vocabulary bibliography available, espe-cially given that most publications have abstracts and that fact that Meara was the pioneer in collecting vocabulary research beginning with his CILT publication Vocabulary in a Second Language, in 1983. There is also a selec-tion of downloadable papers from Meara and his colleagues.

Equally notable is an interesting range of innovative vocabulary tests, lan-guage aptitude test, and assessment tools which Meara and his colleagues have developed, all downloadable in ZIP files: X_Lex, Y_Lex, P_Lex, D_Tools, V_Size, V_Quint, and Llama. There is also an online association test (Lex_30). The website also promises some future programs, including WA, a program for handling word association data.

Other information on the site includes entries on the VocabularyWiki page on the Kent-Rosanoff association list, Spanish word frequency lists, the MacArthur Communicative Development Inventory (an assessment scale



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25



for monolingual children’s lexical growth). Finally, there are links to the websites of a number of other prominent vocabulary researchers.

Batia Laufer’s Vocabulary website(<http://english.haifa.ac.il/staff/blaufer.htm>)Batia Laufer’s university website contains an impressive personal publications bibliography, but is most notable for the CATSS test (Computer Adaptive Test of Size and Strength) available on-line (see Sections 2.8 and 5.2.3).

Rob Waring’s personal website(<http://www1.harenet.ne.jp/~waring>)Rob Waring’s site has some useful information on vocabulary and reading. For vocabulary, his bibliography is unusual in that it is coded for entry into a database program (such as Filemaker) which makes it much more search-able. Unfortunately, at the time of writing, it only contained references up until 2002. There are also other resources like a listing of word lists, and a page on how to find vocabulary resources on the internet. On the reading front, there is an extensive reading resources page and a link to the Extensive Reading Foundation’s website. There is also something not commonly seen: an extensive listening page.

Andy Gillet’s Vocabulary in EAP website(<http://www.uefap.com/vocab/vocfram.htm>)This site includes a range of vocabulary material, including information on selecting which words to learn, using the GSL and AWL word lists. It also has particularly useful sets of non-GSL/AWL vocabulary which occurs par-ticularly frequently in the fields of criminal law, environmental science, business, science and technology, music, health science, computer science, and mathematics. For example, there are 376 words for criminal law, includ-ing the A–Z sample below:

abet, blameworthy, complicity, defendant, exculpatory, felony, griev-ous, homicide, indecent, judicial, kidnapping, liability, maliciously, neg-ligence, offence, parole, quashed, rape, self-defence, tort, unjustifiable, verdict, warrant, (no X, Y, or Z)

There are numerous exercises to learn new words, and to work on vocabu-lary learning strategies, such as dictionary use, vocabulary notebooks, and lexical inferencing. Word parts are highlighted, with a useful list of affixes and an online quiz in different word classes to check affix knowledge.

Sandra Haywood’s AWL website(<http://www.nottingham.ac.uk/~alzsh3/acvocab/index.htm>)This website focuses on pedagogical tools for the AWL, demonstrates a number of vocabulary exercises focusing on AWL vocabulary, and features



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

http://english.haifa.ac.il/staff/blaufer.htm

http://www1.harenet.ne.jp/~waring

http://www.uefap.com/vocab/vocfram.htm

http://www.nottingham.ac.uk/~alzsh3/acvocab/index.htm

350 Resources

two AWL tools. The AWL Highlighter marks all of the AWL words in a text in bold, making them more noticeable for learners and helping researchers to see them in context:

Data was collected by the International Labour Office on hourly rates of pay in fifty different occupations, and on consumer prices for a sample of household items in about 100 countries. After analysis, it was shown that the worth of an hour’s work, in terms of purchasing power, varied considerably from one country to another.

The AWL Gapmaker creates cloze tests by replacing AWL words in a text with a gap. Learners can practise filling in the gaps, then checking their work by comparing their answers to a list of the deleted words. Researchers may also find this an easy way to create cloze tests focusing on AWL vocabulary for their studies.

Hong Kong Polytechnic University’s Vocabulary site(<http://elc.polyu.edu.hk/cill/vocabula.htm>)The vocabulary section of this learning website has loads of material, including information and exercises on phraseology, word parts and affixes, synonyms, and vocabulary games. I found the academic crossword puzzles and hangman particularly addicting. There is also an English-Chinese bilingual dictionary.

Gerry Luton’s Vocabulary website(<http://web.uvic.ca/~gluton/awl/index.htm>)This site focuses on academic vocabulary, first presenting the AWL and a rationale for using it. He then provides multiple choice matching exercises for the vocabulary in each sublist, divided into manageable 10-word blocks. The site gives feedback about the correctness of the answers, and provides a score. The exercises were created with Gerry’s Vocabulary Teacher, which is available for purchase from the site for $50. (There is a free demonstra-tion version.) It consists of a vast collection of sentences in context, illus-trating over 2,600 words with a minimum of 15 contexts each, for a total of over 50,000 sentences and 750,000 words of data, with the facility to edit, delete, or add additional sentences. Teachers can select target words for their students to study, and use the program to create example sentences, mul-tiple choice matching exercises, and cloze exercises. It is an excellent way to promote the introduction and recycling of academic vocabulary for inter-mediate to advanced learners of English.

Lexxica(<http://www.lexxica.com>)This site has a number of vocabulary learning activities including word games and flashcards, graded reading materials, a vocabulary test, and an



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

http://elc.polyu.edu.hk/cill/vocabula.htm

http://web.uvic.ca/~gluton/awl/index.htm

http://www.lexxica.com


application for teachers to manage their students’ use of the materials. The compilers of the site are very conversant with vocabulary research, with the result that the materials on the site are sound theoretically. For example, the flashcards are presented using the principle of spaced repetition (also known as expanding rehearsal), i.e. cards are repeated at gradually increasing time intervals as the word becomes better learned. The site is also one of the few to work on visual and listening speed. The target words are selected accord-ing to frequency criteria, and students are given feedback about how far their current vocabulary size will take them according to various learning goals (using general English, taking the TOEFL or TOEIC tests, using the Interchange textbook). Overall, the site is a good example of how current vocabulary re-search can be transformed into high-quality vocabulary learning materials.

Gabriella Nuttall’s Vocabulary Resource Centre(<http://web.scc.losrios.edu/nuttalg/vocabulary>)This website has links to other vocabulary websites, papers and presenta-tions on vocabulary teaching, and various learner activities.

Dave’s ESL Cafe(<http://www.eslcafe.com>)This is a wide-ranging ESL website without too much on vocabulary, but it does include useful lists of idioms and phrasal verbs with definitions and examples.

7.6 Bibliographies

In addition to the general vocabulary bibliographies on Nation’s, Meara’s and Waring’s websites, and the corpus bibliographies on David Lee’s and Richard Xiao’s sites, the following bibliographies are useful places to search for lexical references.

Paul Meara published annotated bibliographies on vocabulary research spanning the period 1960–1990 in three volumes:

(1983). 1. Vocabulary in a Second Language. London: Centre for Information on Language Teaching and Research. (1960–1980)(1987). 2. Vocabulary in a Second Language, Vol. 2. London: Centre for Information on Language Teaching and Research. (1980–1985)(1992). 3. Vocabulary in a Second Language, Vol. 3. Reading in a Foreign Language 9, 1. (1986–1990)

(Computational) Theories of Contextual Vocabulary Acquisition(<http://www.cse.buffalo.edu/~rapaport/refs-vocab.html#er1>)This bibliography from 1884 to the present has a large number of references on L1 acquisition, particularly from context.



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

http://web.scc.losrios.edu/nuttalg/vocabulary

http://www.eslcafe.com

http://www.cse.buffalo.edu/~rapaport/refs-vocab.html#er1

352 Resources

Phraseology Bibliography(<http://wwweuralex.org/bibweb>)A bibliography on phraseology with research up until 2003 is available at this website.

Formulaic Language BibliographyProbably the best bibliography for formulaic sequences is the massive bibli-ography at the end of Alison Wray’s 2002 book Formulaic Language and the Lexicon.

Stanford Natural Language Processing Group Bibliography(<http://nlp.Stanford.edu/links/statnlp.html>)The Stanford Natural Language Processing Group has an extensive anno-tated bibliography on corpus building tools (e.g. parsers and taggers), cor-pora, and other computational linguistics stuff.

Learner Corpus Bibliography(<http://cecl.fltr.ucl.ac.be/learner%20corpus%20bibliography.html>)The Centre for English Corpus Linguistics hosts an extensive bibliography on learner corpora, maintained by Magali Paquot. At the time of writing, it had 370 references.

Brian Richards and David Malvern’s Lexical Diversity Bibliography(<http://www.personal.rdg.ac.uk/~ehsrichb/home1.html>)This bibliography covers publications (up to 1997) concerning type-token type measures of lexical diversity, and was compiled as part of their project to develop their D measurement software (see Section 5.2.4).

7.7 Important personalities in the field of vocabulary studies

A vast number of scholars have contributed to our understanding of vocabu-lary through the ages. The following is a list of researchers who I feel have made sustained contributions to the field. Of course, the list is not compre-hensive, and many worthy personalities do not appear, e.g., corpus linguists. In particular, it reflects my research interest of L2 vocabulary studies, and so does not include many scholars whose primary interest is L1 vocabulary. I offer this personal selection as a initial list of the vocabulary scholars whose work can usefully inform your own research.

Currently active researchers whose primary interest is L2 vocabulary

Each of these scholars has made vocabulary their main area of research endeavor, and has published widely on a variety of lexical topics across a



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

http://wwweuralex.org/bibweb

http://nlp.Stanford.edu/links/statnlp.html

http://cecl.fltr.ucl.ac.be/learner%20corpus%20bibliography.html

http://www.personal.rdg.ac.uk/~ehsrichb/home1.html


sustained period of time. They are the leading authorities in the area of second-language vocabulary.

Batia LauferLearning and usage difficulties caused by similarity in word form, lexical frequency profile, CATSS test, lexical coverage requirements, vocabulary pedagogy, task-induced involvement load.

Paul MearaVocabulary acquisition and processing, the mental lexicon, word associ-ations, matrix models of vocabulary acquisition, vocabulary tests, particu-larly checklist tests, the Lex family of vocabulary tests/tools, VARGA, created a large and successful distance PhD program at the University of Swansea for students focusing on vocabulary issues.

Paul NationVocabulary pedagogy, word lists, frequency levels of vocabulary, creator of the original Vocabulary Levels Test, Vocabulary Size test, vocabulary and reading, word knowledge taxonomy, word families and affixation, the four-strand approach to vocabulary teaching, author of the landmark Teaching and Learning Vocabulary (1990) and Learning Vocabulary in Another Language (2001) books.

John ReadVocabulary testing, Word Associates Test, author of Assessing Vocabulary (2000).

Norbert SchmittVocabulary acquisition, acquisition and use of formulaic language, research based on the word knowledge framework, developed the revised versions of the Vocabulary Levels Test, vocabulary learning strategies, vocabulary and reading/listening, explicit versus implicit lexical knowledge, author of Vocabulary in Language Teaching (2000).

Current researchers with an interest in vocabulary studies

This category includes scholars who have made important contributions to the field of second language vocabulary studies, although it is not neces-sarily their main research area. It also includes developing scholars with a lexical focus who have already made a promising start to their research careers.

Joe BarcroftVocabulary acquisition, research methodology, cognitive constraints to acquisition, acquisition of form versus meaning.



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

354 Resources

Frank BoersFacilitating factors of vocabulary acquisition, including awareness of meta-phor categories, formulaic language, and etymology.

Doug BiberCorpus linguistics, formulaic language, particularly lexical bundles.

Ronald CarterVocabulary in discourse and literature, corpus linguistics, written vs. spo-ken vocabulary, author of Vocabulary: Applied Linguistic Perspectives (1998, 2nd edn.).

Tom CobbThe Compleat Lexical Tutor, the use of computers and technology in vocabu-lary learning, intentional versus incidental vocabulary learning, vocabulary and reading.

Averil CoxheadCreation and utilization of the Academic Word List, vocabulary pedagogy.

Kees de BotThe mental lexicon, modelling the multilingual lexicon, language attrition.

Annette de GrootWord recognition and the mental lexicon, bilingualism and multilingualism.

Nick EllisCognitive factors in vocabulary acquisition, effects of frequency, implicit versus explicit knowledge.

Rod EllisBest known for researching and reviewing SLA in general, he has also done considerable work on vocabulary acquisition.

Tess FitzpatrickWord associations, vocabulary acquisition, formulaic language, supervises the Swansea distance vocabulary PhD programme.

Keith FolseVocabulary pedagogy and teacher education on lexical issues, author of Vocabulary Myths (2004).



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


Dee GardnerVocabulary and reading, corpus-based vocabulary analysis.

Sylvianne GrangerDevelopment and analysis of learner corpora, formulaic language.

Kirsten HaastrupLexical inferencing, lexical processing, vocabulary acquisition, vocabulary and reading.

Birgit HenriksenThe nature of vocabulary knowledge, vocabulary pedagogy, reading/writing and vocabulary.

Marlise HorstIncidental vocabulary learning from reading, the use of online tools for learning academic vocabulary.

Jan HulstijnIncidental versus intentional vocabulary learning, glossing, task-induced involvement load, cognitive aspects of language learning.

Nan JiangPsycholinguistic approaches to vocabulary acquisition and processing, semantic representation and transfer.

Keiko KodaEffects of L1 word form on the learning of second-language form, second-language reading.

Kon KuiperFormulaic language, particularly in spoken contexts under time pressure.

Michael McCarthyWritten versus spoken vocabulary, author of Vocabulary (1990), and several books in the Cambridge University Press English Vocabulary in Use textbook series.

Margaret McKeownL1 vocabulary instruction, use of dictionaries.

Rosamund MoonMulti-word units, COBUILD dictionaries and materials.



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

356 Resources

William NagyThe relationship between vocabulary knowledge and L1 reading, vocabu-lary instruction.

Sima ParibakhtLexical inferencing and vocabulary learning from reading, the Vocabulary Knowledge Scale, vocabulary pedagogy.

Diana PulidoThe influence of vocabulary knowledge in reading, factors affecting lexical development, learner involvement in lexical development tasks.

Paul RaysonComputational analysis of vocabulary.

Susanna RottIncidental learning of vocabulary, number of exposures necessary for learn-ing, formulaic language.

Rob SchoonenAssessing vocabulary depth of knowledge, automaticity of lexical processing as prerequisite of L2 reading and writing.

Norman SegalowitzAutomaticity of lexical and language processing.

David SingletonThe second-language mental lexicon, author of Language and the Lexicon (2000).

Rob WaringVocabulary and reading, extensive reading, editor of the Heinle Cengage graded reader series (Foundations Reading Library), receptive versus pro-ductive knowledge.

Stuart WebbAcquiring depth of vocabulary knowledge from incidental and intentional input, using word knowledge test batteries.

Bert WeltensLexical attrition.

Mari WescheLexical inferencing, vocabulary acquisition through reading, L1 influences in initial acquisition, the Vocabulary Knowledge Scale.



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25


Brent WolterWord associations, vocabulary acquisition, psycholinguistic approaches to second-language vocabulary acquisition.

Alison WrayWrote the seminal overview of formulaic language Formulaic Language and the Lexicon (2002), created FLaRN (Formulaic Language Research Network), and is still the leading specialist in this area.

Cheryl ZimmermanPedagogical issues and vocabulary learning, knowledge of derivative forms, teacher training in vocabulary issues.

Past masters

These scholars have made important contributions in the past:

Isabel BeckL1 vocabulary instruction, vocabulary learning from reading.

John CarrollL1 vocabulary acquisition, co-author of The American Heritage Word Frequency Book (Carroll, Davies, and Richman, 1971), word parts and affixes.

Hermann EbbinghausOne of the first scholars (1885) to systematically research how vocabulary is learned.

Jim Nattinger and Jeanette DeCarricoAuthors of Lexical Phrases and Language Teaching (1992), the seminal book which first highlighted the importance of formulaic language.

Harold PalmerAlthough primarily interested in an oral approach to language learning, he collaborated with West in what became known as the ‘vocabulary control movement’.

Charles Ogden and Ivor RichardsCreated and promoted Basic English, featuring an 850-word lexicon (<http://ogden.basic-english.org/words.html>), which competed with the concur-rent frequency-based approach to vocabulary control.

Håkan RingbomThe effects of the L1 on L2 and L3 vocabulary acquisition, crosslinguistic influences in the second language lexicon, vocabulary learning.



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

http://ogden.basic-english.org/words.html

http://ogden.basic-english.org/words.html

358 Resources

Steven StahlL1 vocabulary acquisition.

John SinclairThe father of corpus linguistics, the idiom principle, guiding light behind COBUILD and the Bank of English Corpus.

Edward Thorndike and Irving LorgeCompilers of the influential early word lists, including The Teacher’s Book of 30,000 Words (1944).

Michael WestOne of the first scholars to systematically consider the influence of frequency of occurrence on vocabulary learning, compiled the General Service List (1953), used frequency-based approach to writing graded reading materials.

Dave and Jane WillisVocabulary in the syllabus, authors of The Lexical Syllabus (<http://www.cels.bham.ac.uk/resources/Lexsyll.htm>) and the COBUILD English Course, formulaic language, task-based instruction.



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

http://www.cels.bham.ac.uk/resources/Lexsyll.htm

http://www.cels.bham.ac.uk/resources/Lexsyll.htm

359

Notes

1 Vocabulary Use and Acquisition

1. Vocabulary and lexis will be used interchangeably in this book.2. It is beyond the scope of this book to discuss statistics and how to carry them out.

There are numerous textbooks on statistics, and one source which my students have found particularly useful is Field (2005), who shows how to perform statistics with the widely-used statistical program SPSS.

3. DIALANG is a European project for the development of diagnostic language tests in 14 European languages. It offers separate tests for reading, writing, listening, grammatical structures, and vocabulary in each of the languages.

4. The Common European Framework (2007) does not stipulate required vocabulary sizes for the various levels, but rather describes learner performance expectations at each level. The C2 descriptors for reading and vocabulary include the following, for which a 5,000 word family lexicon would appear inadequate (although firm research on this is lacking):

Can understand and interpret critically virtually all forms of the written lan- ●

guage including abstract, structurally complex, or highly colloquial literary and non-literary writings.Can understand a wide range of long and complex texts, appreciating subtle ●

distinctions of style and implicit as well as explicit meaning.Can exploit a comprehensive and reliable mastery of a very wide range of ●

language to formulate thoughts precisely, give emphasis, differentiate and eliminate ambiguity ... No signs of having to restrict what he/she wants to say.Has a good command of a very broad lexical repertoire including idiomatic ●

expressions and colloquialisms; shows awareness of connotative levels of meaning.

5. Terminology is a problem in this area. Wray (2002: 9) found over 50 terms to describe the notion that recurrent multi-word lexical items can have a single meaning or function. Some highlight the characteristic of multiple words (multi-word units, multi-word chunks), others the fixedness of the items (fixed expressions, frozen phrases), some the recurrent phraseology (phrasal vocabulary, routine formulas), while still others focus on the psycholinguistic notion that these multi-word lexical items are stored and processed in the mind as wholes (chunks, prefabricated routines). In this book, I will generally use the term formu-laic language as a cover term for this phenomenon, and formulaic sequences for the individual phrasal items. Finer distinctions will be discussed in Chapter 3, which covers formulaic language in more detail. Likewise, I will use the term lexical item to include vocabulary made up of either individual word forms or formulaic sequences.

6. I reserve the term collocation for two-word partnerships, which is a subcategory of the umbrella term formulaic language.

7. Corpus size figures as of November 13, 2008.

9781403_985354_09_not.indd 3599781403_985354_09_not.indd 359 6/11/2010 12:52:28 PM6/11/2010 12:52:28 PM


Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

360 Notes

2 Issues of Vocabulary Acquisition and Use

1. I know of no association research that used formulaic sequences as prompts.2. Actually, Michael West does not deserve all of the credit for the GSL. It was the

culmination of a long series of conferences and studies in which a number of linguists of the age played an prominent part, including Harold Palmer, Lawrence Faucett, Edward Thorndike, and Irving Lorge. See Howatt (2004) for a discussion of the development of the GSL.

3. Note the classical tradeoff between depth of assessment and sampling rate: Webb’s assessment is extensive, but only allows for the measurement of ten lex-ical items.

4. The tests were not in this order in the study.5. Early eye-movement measures (e.g. first-pass reading time) are thought to reflect

integration processes. Thus more frequent or more predictable lexical items have faster early measures. Items which are known will be easier to integrate into the unfolding meaning of a text. Unfamiliar items will be harder to integrate and will have longer reading times and fixation counts. In contrast, late measures (e.g. total reading time) reflect recovery when processing is difficult. So if an item is ambiguous or doesn’t fit with a context, long total reading times and fixation counts are found. Thus, if non-natives read kick the bucket and it is clear that this doesn’t mean ‘kick a pail’, there will be recovery time required, and therefore longer total reading time and more total fixations.

6. N400 is also sensitive to a range of lexical properties, including whether a letter string is a word in a language, frequency, phonological priming, morphological context within a string, and the sequential probabilities of the likelihood of words occurring in succession. Thus N400 amplitude reflects a combination of lexical and semantic/conceptual factors (Osterhout et al., 2006: 205).

3 Formulaic Language

1. There is evidence that individual lexical bundles are generally preferred in either spoken or written discourse, but seldom in both, at least in academic discourse (Biber et al., 2004).

2. This statistical section draws heavily on information in Church and Hanks (1990), Dunning (1993), Evert (2004), and Manning and Schütze (1999). I am particularly indebted to my former PhD student Phil Durrant for showing me the calcula-tions and formulas underlying common strength of association measures, and for allowing me to closely shadow his PhD thesis account of those measures. See Durrant (2008) for a fuller discussion of these methodologies.

3. Another less widely-used approach to correcting this problem is adjusting the MI formula to give greater weight to the ‘observed occurrences’ portion of the equa-tion (Evert, 2004). Proposed corrections include local MI (O x log2O

E), MI2 (log2O

E 2),

and MI3 (log2OE

3).

4. John Sinclair first introduced me to the idea of variable expressions using this example during my visit to the Tuscan Word Centre. This is an extension of his original analysis.

5. A number of these studies have come out of the Centre for English Corpus Linguistics (Université catholique de Louvain) headed by Sylviane Granger. The researchers in this centre usually analyzed the academic prose output of L2 university students in the International Corpus of Learner English (ICLE,



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

Notes 361

Granger, n.d.), and often compared it to the equivalent(ish) native university stu-dent output found in the Louvain Corpus of Native English Essays (LOCNESS, a native corpus compiled to mirror the ICLE). (See Section 6.2 for a more detailed description of these corpora, and how to order them.)

6. Although Nesselhauf finds evidence for extensive erroneous use of collocations, the issue of whether collocations are more problematic than non-collocations is not satisfactorily resolved (Durrant, 2007).

4 Issues in Research Methodology

1. It must be said that little is known about the relationship between explicit/de-clarative and implicit/procedural lexical knowledge. I see this as a key area for vocabulary research, probably incorporating some of the psycholinguistic and neurolinguistic methodologies discussed in Section 2.11.

5 Measuring Vocabulary

1. It is important to note that Bauer and Nation formed their hierarchy based solely on linguistic criteria, and not on any acquisition evidence. While there is reason to believe that the hierarchy should reflect acquisition order to a considerable degree, this needs to be empirically explored, and would be an excellent research project.

2. Note that the PVLT predates the Academic Word List (Coxhead, 2000) and so uses the older listing of academic vocabulary, the University Word List (Xue and Nation, 1984).

3. There was a theoretical basis for including academic words in the profile analysis. Frequency analysis of academic texts shows that the first 2,000 GSL words plus academic vocabulary typically covers a large percentage of academic texts, e.g. GSL + AWL = 86% (Coxhead, 2000). Thus VocabProfile has been more useful for the analysis of academic writing, rather than general English writing, where the percentage of academic vocabulary is lower, and the AWL words less important.

4. Laufer’s (1995) Beyond 2000 also produces a single figure based on an LFP-type analysis.

5. Barcroft (2002) proposes the Lexical Production Scoring Protocol-Written (LPSP-Written) as a way of quantifying the ability to spell words:

0.00 pointsNone of word is written; this includes:

nothing is written ●

the letters present do not meet any ‘for 0.25’ criteria ●

L1 word only is written ●

0.25 points1/4 of word is written; this includes:

any 1 letter is correct ●

25–49.9% of the letters are present ●

correct number of syllables ●

0.50 points1/2 of word is written; this includes

25–49.9% of letters correct ●

50–74.9% of letters present ●



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

362 Notes

0.75 points3/4 of word written; this includes:

50–99.9% of letters correct ●

75–100% of letters present ●

1 pointEntire word is written; this includes:

100% letters correct. ●

6 Example Research Projects

1. Of course, there may be contexts where this assumption does not hold true, e.g. if the study was carried out in Nigeria.

7 Vocabulary Resources

1. Thanks to Paul Rayson for this example.



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

362

References

Adolphs, S. and Durow, V. (2004). Social-cultural integration and the development of formulaic sequences. In Schmitt, N. (ed.), Formulaic Sequences. Amsterdam: John Benjamins. pp. 107–126.

Adolphs, S. and Schmitt, N. (2003). Lexical coverage of spoken discourse. Applied Linguistics 24, 4: 425–438.

Aitchison, J. (2003). Words in the Mind (3rd edn). Oxford: Blackwell.Albrechtsen, D., Haastrup, K., and Henriksen, B. (2008). Vocabulary and Writing

in a First and Second Language: Process and Development. Basingstoke: Palgrave Macmillan.

Alderson, J.C. (2005). Diagnosing Foreign Language Proficiency. London: Continuum.Alderson, J.C. (2007). Judging the frequency of English words. Applied Linguistics 28,

3: 383–409.Alderson, J.C., Clapham, C.M., and Steel, D. (1997). Metalinguistic knowledge, lan-

guage aptitude, and language proficiency. Language Teaching Research 1: 93–121.Al-Homoud, F. and Schmitt, N. (2009). Extensive reading in a challenging environ-

ment: A comparison of extensive and intensive reading approaches in Saudi Arabia. Language Teaching Research 13, 4: 383–402.

Altenberg, B. (1998). On the phraseology of spoken English: The evidence of recur-rent word-combinations. In Cowie, A.P. (ed.), Phraseology: Theory, Analysis and Applications. Oxford: Oxford University Press. pp. 101–122.

Altenberg, B. and Granger, S. (2001). The grammatical and lexical patterning of make in native and non-native student writing. Applied Linguistics 22, 2: 173–194.

Altmann, G.T.M. and Kamide, Y. (1999). Incremental interpretation at verbs: Restricting the domain of subsequent reference. Cognition 73: 247–264.

Anderson, R.C. and Freebody, P. (1981). Vocabulary knowledge. In Guthrie, J.T. (ed.), Comprehension and Teaching: Research Reviews. Newark, DE: International Reading Association.

Anderson, R.C. and Freebody, P. (1983). Reading comprehension and the assessment and acquisition of word knowledge. In Hutson, B.A. (ed.), Advances in Reading/Language Research. Greenwich, CT: JAI Press. pp. 132–255.

Arnaud, P.J.L., Bejoint, H., and Thoiron, P. (1985). A quoi sert le programme lexical? Les Langues Modernes 79, 3/4: 72–85.

Ashby, M. (2006). Prosody and idioms in English. Journal of Pragmatics 38, 10: 1580–1597.

Atay, D. and Kurt, G. (2006). Elementary school EFL learners’ vocabulary learning: The effects of post-reading activities. Canadian Modern Language Review 63, 2: 255–273.

Bachman, L.F. (1990). Fundamental Considerations in Language Testing. Oxford: Oxford University Press.

Bachman, L.F. and Palmer, A.S. (1996). Language Testing in Practice. Oxford: Oxford University Press.

Baddeley, A. (1990). Human Memory: Theory and Practice. Needham Heights, MA: Allyn and Bacon.

Bahns, J. and Eldaw, M. (1993). Should we teach EFL students collocations? System 21: 101–114.

9781403_985354_10_ref.indd 3629781403_985354_10_ref.indd 362 4/13/2010 2:41:13 PM4/13/2010 2:41:13 PM


Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

References 363

Bahrick, H.P. (1984). Fifty years of language attrition: Implications for programmatic research. Modern Language Journal 68: 105–118.

Barcroft, J. (2002). Semantic and structural elaboration in L2 lexical acquisition. Language Learning 52, 2: 323–363.

Bardovi-Harlig, K. (2002). A new starting point? Investigating formulaic use and input in future expression. Studies in Second Language Acquisition 24: 189–198.

Barfield, A. (2003). Collocation Recognition and Production: Research Insights. Chuo University, Japan.

Barrow, J., Nakashimi, Y., and Ishino, H. (1999). Assessing Japanese College stu-dents’ vocabulary knowledge with a self-checking familiarity survey. System 27: 223–247.

Bates, E. and MacWhinney, B. (1987). Competition, variation, and language learn-ing. In MacWhinney, B. (ed.), Mechanisms of Language Acquisition. Hillsdale, NJ: Lawrence Erlbaum. pp. 157–193.

Bauer, L. and Nation, I.S.P. (1993). Word families. International Journal of Lexicography 6: 253–279.

Beeckmans, R., Eyckmans, J., Jansens, V., Dufranne, M., and van de Velde, H. (2001). Examining the Yes/No vocabulary test: Some methodological issues in theory and practice. Language Testing 18, 3: 235–274.

Beglar, D. (2010). A Rasch-based validation of the vocabulary size test. Language Testing.

Beglar D. and Hunt, A. (1999). Revising and validating the 2000 word level and uni-versity word level vocabulary tests. Language Testing 16: 131–162.

Beks, B. (2001). Le degré des connaissances lexicales [The degree of lexical knowl-edge]. Unpublished MA thesis, Vrije Universiteit Amsterdam.

Bell, H. (2002). Using frequency counts to assess L2 texts. PhD thesis, University of Wales.

Bensoussan, M. and Laufer, B. (1984). Lexical guessing in context in EFL reading comprehension. Journal of Research in Reading 7, 1: 15–32.

Bertram, R., Baayen, R., and Schreuder, R. (2000). Effects of family size for complex words. Journal of Memory and Language 42: 390–405.

Bertram, R., Laine, M., and Virkkala, M. (2000). The role of derivational morphology in vocabulary acquisition: Get by with a little help from my morpheme friends. Scandinavian Journal of Psychology 41, 4: 287–296.

Biber, D., Conrad, S., and Cortes, V. (2004). If you look at ... : Lexical bundles in univer-sity teaching and textbooks. Applied Linguistics 25, 3: 371–405.

Biber, D., Johansson, S., Leech, G., Conrad, S., and Finegan, E. (1999). Longman Grammar of Spoken and Written English. Harlow: Longman.

Bishop, H. (2004). The effect of typographic salience on the look up and comprehen-sion of unknown formulaic sequences. In Schmitt, N. (ed.), Formulaic Sequences. Amsterdam: John Benjamins.

Bley-Vroman, R. (2002). Frequency in production, comprehension, and acquisition. Studies in Second Language Acquisition 24, 2: 209–13.

Blum, S. and Levenston, E.A. (1978). Universals of lexical simplification. Language Learning 28, 2: 399–416.

Bogaards, P. and Laufer, B. (eds). (2004). Vocabulary in a Second Language. Amsterdam: John Benjamins.

Bonk, W.J. (2001). Testing ESL learners’ knowledge of collocations. In Hudson, T. and Brown, J.D. (eds.), A Focus on Language Test Development: Expanding the Language Proficiency Construct Across a Variety of Tests. (Technical Report #21).



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

364 References

Honolulu: University of Hawai’i, Second Language Teaching and Curriculum Center. pp. 113–142.

Brown, D. (in press). What aspects of vocabulary knowledge do textbooks give atten-tion to? Language Teaching Research.

Brown, F.G. (1983). Principles of Educational and Psychological Testing. New York: Holt, Rinehart and Winston.

Brown, R. (1973). A First Language. London: Allen and Unwin.Brown, R. and McNeill, D. (1966). The ‘tip of the tongue’ phenomenon. Journal of

Learning and Verbal Behaviour 5: 325–337.Brown, R., Waring, R., and Donkaewbua, S. (2008). Vocabulary acquisition from read-

ing, reading-while-listening, and listening to stories. Reading in a Foreign Language 20, 2: 136–163.

Cameron, L. (2002). Measuring vocabulary size in English as an Additional Language. Language Teaching Research 6, 2: 145–173.

Cain, K., Oakhill, J., and Lemmon, K. (2005). The relation between children’s reading comprehension level and their comprehension of idioms. Journal of Experimental Child Psychology 90, 1: 65–87.

Carey, S. (1978). The child as word learner. In Halle, M., Bresnan, J., and Miller, G.A. (eds.), Linguistic Theory and Psychological Reality. Cambridge, MA: MIT Press. pp. 264–293.

Carrell, P.L. and Grabe, W. (2002). Reading. In Schmitt, N. (Ed.), An Introduction to Applied Linguistics. London: Arnold.

Carroll, J.B., Davies, P., and Richman, B. (1971). The American Heritage Word Frequency Book. New York: American Heritage Publishing.

Carter, R. (1998). Vocabulary: Applied Linguistic Perspectives (2nd edn). London: Routledge.

Carter, R. and McCarthy, M. (2006). Cambridge Grammar of English. Cambridge: Cambridge University Press.

Chamot, A.U. (1987). The learning strategies of ESL students. In Wenden, A. and Rubin, J. (eds.), Learner Strategies in Language Learning. New York: Prentice Hall.

Cheng, W., Greaves, C., and Warren, M. (2006). From n-gram to skipgram to conc-gram. International Journal of Corpus Linguistics 11, 4: 411–433.

Cho, K-S. and Krashen, S. (1994). Acquisition of vocabulary form Sweet Valley Kids Series: adult ESL acquisition. Journal of Reading 37: 662–667.

Church, K.W. and Hanks, P. (1990). Word association norms, mutual information, and lexicography. Computational Linguistics 16, 1: 22–29.

Clarke, D.E. and Nation, I.S.P. (1980). Guessing the meanings of words from context: Strategy and techniques. System 8, 3: 211–220.

Clear, J. (1993). Tools for the study of collocation. In Baker, M., Francis, G., and Tognini-Bonelli, E. (eds), Text and Technology: In Honour of John Sinclair. Amsterdam: Benjamins. pp. 271–292.

Coady, J. and Huckin, T. (eds). (1997). Second Language Vocabulary Acquisition. Cambridge: Cambridge University Press.

Cohen, A.D. (1989). Attrition in the productive lexicon of two Portuguese third lan-guage speakers. Studies in Second Language Acquisition 11, 2: 135–149.

Cohen, A., Glasman, H., Rosenbaum-Cohen, P.R., Ferrara, J., and Fine, J. (1988). Reading English for specialized purposes: Discourse analysis and the use of student informants. In Carrell, P.L., Devine, J., and Eskey, D. (eds), Interactive Approaches to Second Language Reading. Cambridge: Cambridge University Press. pp. 152–167.

Coltheart, M. (1981). The MRC Psycholinguistic Database. Quarterly Journal of Experimental Psychology, 33A: 497–505.



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

References 365

Conklin, K., Dijkstra, T., and van Heuven, W. (under review). Bilingual processing of grammatical gender information specific to one language: Evidence from eyetracking.

Conklin, K. and Schmitt, N. (2008). Formulaic sequences: Are they processed more quickly than nonformulaic language by native and nonnative speakers? Applied Linguistics 29, 1: 72–89.

Cooper, T. C. (1999). Processing of idioms by L2 learners of English. TESOL Quarterly 33: 233–262.

Coulmas, F. (1979). On the sociolinguistic relevance of routine formulae. Journal of Pragmatics 3: 239–66.

Coulmas, F. (1981). Conversational Routine. The Hague: Mouton.Cowie, A. (ed.). (1998). Phraseology: Theory, Analysis, and Applications. Oxford: Oxford

University Press.Coxhead, A. (2000). A new academic word list. TESOL Quarterly 34, 213–238.Craik, F.I.M. and Lockhart, R.S. (1972). Levels of processing: A framework for mem-

ory research. Journal of Verbal Learning and Verbal Behavior 11: 671–684.Cruttenden, A. (1981). Item-learning and system-learning. Journal of Psycholinguistic

Research 10: 79–88.Crystal, D. (1987). The Cambridge Encyclopedia of Langage. Cambridge: Cambridge

University Press.Cseresznyés, M. (2008) The reading tests. In Alderson, J.C., Nagy, E., and Öveges, E.

(eds), English Language Education in Hungary Part 2. Accessed July 2008 from <http://www.examsreform.hu/Pages/ELE2.html>.

Cutler, A., Mehler, J., Norris, D. and Segui, J. (1986). Limits on bilingualism. Nature 340: 229–30.

Cutler, A. and Norris, D.G. (1988). The role of strong syllables in segmentation for lexical access. Journal of Experimental Psychology: Human Perception and Performance 14: 113–121.

Dagut, M.B. and Laufer, B. (1985). Avoidance of phrasal verbs by English learners, speakers of Hebrew – a case for contrastive analysis. Studies in Second Language Acquisition 7: 73–79.

Dale, E. (1965). Vocabulary measurement: Techniques and major findings. Elementary English 42: 895–90l.

Daller, H., Milton, J., and Treffers-Daller, J. (2007). Modelling and Assessing Vocabulary Knowledge. Cambridge: Cambridge University Press.

D’Anna, C.A., Zechmeister, E.B., and Hall, J.W. (1991). Toward a meaningful defini-tion of vocabulary size. Journal of Reading Behavior 23, 1: 109–122.

Davies, M. (2002) Corpus del Español (100 million words, 1200s–1900s). Available on-line at <http://www.corpusdelespanol.org>.

Davies, M. (2004). BYU-BNC: The British National Corpus. Available on-line at <http://corpus.byu.edu/bnc>.

Davies, M. (2007). TIME Magazine Corpus (100 million words, 1920s–2000s). Available on-line at <http://corpus.byu.edu/time>.

Davies, M. (accessed 2008) American Corpus of English. Accessed July 2008. Available on-line at <http://americancorpus.org>.

Davies, M., Biber, D., Jones, J., and Tracy, N. (2008) Corpus del Español: Registers. Accessed July 2008 at <http://www.corpusdelespanol.org/registers>.

Davies, M. and Ferreira, M. (2006). Corpus do Português (45 million words, 1300s–1900s). Available on-line at <http://www.corpusdoportugues.org>.

Davis, M.H., Di Betta, A.M., Macdonald, M.J.E., and Gaskell, M.G. (2008). Learning and consolidation of novel spoken words. Journal of Cognitive Neuroscience 21, 4: 803–820.



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

http://www.examsreform.hu/Pages/ELE2.html

http://www.examsreform.hu/Pages/ELE2.html

http://www.corpusdelespanol.org



http://corpus.byu.edu/time

http://americancorpus.org

http://www.corpusdelespanol.org/registers


366 References

Davou, M. (2008). Formulaic language in second language oral production. Presentation given at the Formulaic Language Research Network conference, University of Nottingham, June 2008.

Day, R.R. and Bamford, J. (1998). Extensive Reading in the Second Language Classroom. Cambridge: Cambridge University Press.

de Bot, K. (1992). A bilingual production model: Levelt’s speaking model adapted. Applied Linguistics 13, 1: 1–24.

de Bot, K., Lowrie, W., and Verspoor, M. (2005). Second Language Acquisition. London: Routledge.

de Bot, K. and Stoessel, S. (2000). In search of yesterday’s words: Reactivating a long forgotten language. Applied Linguistics 21, 3: 364–384.

Dechert, H. (1983). How a story is done in a second language. In Faerch, C. and Kasper, G. (eds), Strategies in Interlanguage Communication. London: Longman. pp. 175–195.

De Cock, S. (2000). Repetitive phrasal chunkiness and advanced EFL speech and writing. In Mair, C. and Hundt, M. (eds), Corpus Linguistics and Linguistic Theory. Amsterdam: Rodopi. pp. 51–68.

De Cock, S., Granger, S., Leech, G., and McEnery, T. (1998). An automated approach to the phrasicon on EFL learners. In Granger, S. (ed.), Learner English on Computer. London: Addison Wesley Longman. pp. 67–79.

de Groot, A.M.B. (1992). Determinants of word translation. Journal of Experimental Psychology: Learning, Memory, and Cognition 18, 5: 1001–1018.

de Groot, A.M.B. (2006). Effects of stimulus characteristics and background music on foreign language vocabulary learning and forgetting. Language Learning 56, 3: 463–506.

de Groot, A.M.B. and Keijzer, R. (2000). What is hard to learn is easy to forget: The roles of word concreteness, cognate status, and word frequency in foreign-language vocabulary learning and forgetting. Language Learning 50, 1: 1–56.

de Groot, A.M.B. and van Hell, J.G. (2005). The learning of foreign language vocabu-lary. In Kroll, J.F. and de Groot, A.M.B. (eds), Handbook of Bilingualism. Oxford: Oxford University Press.

DeKeyser, R. (2003). Implicit and explicit learning. In Doughty, C.J. and Long, M.H. (eds), The Handbook of Second Language Acquisition. Malden, MA: Blackwell. pp. 313–348.

Dörnyei, Z. (2001a). Motivational Strategies in the Language Classroom. Cambridge: Cambridge University Press.

Dörnyei, Z. (2001b). Teaching and research motivation. Harlow, UK: Pearson Education.

Dörnyei, Z. (2005). The Psychology of the Language Learner. Mahwah, NJ: Lawrence Erlbaum.

Dörnyei, Z. (2007). Research Methods in Applied Linguistics. Oxford: Oxford University Press.

Dörnyei, Z. (2009). The Psychology of Second Language Acquisition. Oxford: Oxford University Press.

Dörnyei, Z., Durow, V., and Zahran, K. (2004). Individual differences and their effects on formulaic sequence acquisition. In Schmitt, N. (ed.), Formulaic Sequences. Amsterdam: John Benjamins. pp. 87–106.

Doughty, C.J. (2003). Instructed SLA: Constraints, compensation, and enhancement. In Doughty, C.J. and Long, M.H. (eds), The Handbook of Second Language Acquisition. Malden, MA: Blackwell.



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

References 367

Drew, P. and Holt, E. (1998). Figures of speech: Figurative expressions and the man-agement of topic transition in conversation. Language in Society 27: 495–522.

Dumay, N. and Gaskell, M.G. (2007). Sleep-associated changes in the mental repre-sentation of spoken words. Psychological Science 18: 35–39.

Dumay, N., Gaskell, M.G., and Feng, X. (2004). A day in the life of a spoken word. In Forbus, K., Gentner, D., and Regier, T. (eds), Proceedings of the Twenty-Sixth Annual Conference of the Cognitive Science Society. Mahwah, NJ: Erlbaum. pp. 339–344.

Durrant, P. (2007). Review of Nadja Nesselhauf’s Collocations in a learner corpus. Functions of Language 14, 2: 251–261.

Durrant, P. (2008). High Frequency Collocations and Second Language Learning. Unpublished PhD dissertation, University of Nottingham.

Durrant, P. and Schmitt, N. (2009). To what extent do native and non-native writers make use of collocations? International Review of Applied Linguistics 47: 157–177.

Durrant, P. and Schmitt, N. (in press). Adult learners’ retention of collocations from exposure. Second Language Research.

Dunning, T. (1993). Accurate methods for the statistics of surprise and coincidence. Computational Linguistics 19: 61–74.

Duyck, W., van Assche, E., Drighe, D., and Hartsuiker, R. (2007). Visual word rec-ognition by bilinguals in a sentence context: Evidence for nonselective lexical access. Journal of Experimental Psychology: Learning, Memory, and Cognition 33, 4: 663–679.

Ellis, N.C. (1996). Sequencing in SLA: Phonological memory, chunking, and points of order. Studies in Second Language Acquisition 18: 91–126.

Ellis, N.C. (1997). Vocabulary acquisition: Word structure, collocation, word-class, and meaning. In Schmitt, N. (ed.), Vocabulary: Description, Acquisition, and Pedagogy. Cambridge: Cambridge University Press.

Ellis, N.C. (2002). Frequency effects in language processing: A review with implica-tions for theories of implicit and explicit language acquisition. Studies in Second Language Acquisition 24: 143–188.

Ellis, N.C. (2006a). Language acquisition as rational contingency learning. Applied Linguistics 27, 1: 1–24.

Ellis, N.C. (2006b). Selective attention and transfer phenomena in L2 acquisition: Contingency, cue competition, salience, interference, overshadowing, blocking, and perceptual learning. Applied Linguistics 27, 2: 164–194.

Ellis, N.C. and Beaton, A. (1993). Psycholinguistic determinants of foreign language vocabulary learning. Language Learning 43: 559–617.

Ellis, N.C., Simpson-Vlach, R., and Maynard, C. (2008). Formulaic language in native and second-language speakers: Psycholinguistics, corpus linguistics, and TESOL. TESOL Quarterly 41, 3: 375–396.

Ellis, R. and He, X. (1999). The roles of modified input and output in the incidental acquisition of word meanings. Studies in Second Language Acquisition 21: 285–301.

Ellis, R., Tanaka, Y. and Yamazaki, A. (1994). Classroom interaction, comprehension, and the acquisition of L2 word meanings. Language Learning 44: 449–491.

Entwisle, D.R. (1966). Word Associations of Young Children. Baltimore, MD: Johns Hopkins Press.

Entwisle, D.R., Forsyth, D.F., and Muuss, R. (1964). The syntactic-paradigmatic shift in children’s word associations. Journal of Verbal Learning and Verbal Behavior 3: 19–29.

Ehrman, M.E., Leaver, B.L., and Oxford, R.L. (2003). A brief overview of individual differences in second language learning. System 31: 313–30.



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

368 References

Erman, B. and Warren, B. (2000). The idiom principle and the open choice principle. Text 20, 1: 29–62.

Ervin, S.M. (1961). Changes with age in the verbal determinants of word association. American Journal of Psychology 74: 361–372.

Evert, S. (2004). Computational approaches to collocations. Available on-line at <http://www.collocations.de>.

Evert, S., and Krenn, B. (2001). Methods for the qualitative evaluations of lexical asso-ciation measures. Paper presented at the 39th Annual Meeting of the Association for Computational Linguistics, Toulouse, France. Available on-line at <http://www.collocations.de>.

Fan, M. (2000). How big is the gap and how to narrow it? An investigation into the active and passive vocabulary knowledge of L2 learners. RELC Journal 31, 2: 105–119.

Fan, M. (2003). Frequency of use, perceived usefulness, and actual usefulness of second language vocabulary strategies: A study of Hong Kong learners. Modern Language Journal 87, 2: 222–240.

Farghal, M. and Obiedat, H. (1995). Collocations: A neglected variable in EFL. International Journal of Applied Linguistics 28, 4: 313–331.

Fellbaum, C. (ed.). (2007). Idioms and Collocations. London: Continuum.Field, A. (2005). Discovering Statistics Using SPSS. London: Sage.Firth, J. (1935). The technique of semantics. Transactions of the Philological Society:

36–72.Fitzpatrick, T. (2006). Habits and rabbits: Word associations and the L2 lexicon.

EUROSLA Yearbook 6: 121–145.Fitzpatrick, T. (2007). Word association patterns: Unpacking the assumptions.

International Journal of Applied Linguistics 17: 319–31.Flowerdew, J. (1992). Definitions in science lectures. Applied Linguistics 13, 2: 201–

221.Folse, K.S. (2004). Vocabulary Myths. Ann Arbor: University of Michigan Press.Folse, K.S. (2006). The effect of type of written exercise on L2 vocabulary retention.

TESOL Quarterly 40, 2: 273–293.Foster, P. (2001). Rules and routines: A consideration of their role in the task-based

language production of native and non-native speakers. In Bygate, M., Skehan, P., and Swain, M. (eds), Researching Pedagogic Tasks: Second Language Learning, Teaching, and Testing. Harlow: Longman. pp. 75–94.

Francis, W.N. and Kucera, H. (1982). Frequency Analysis of English Usage. Boston: Houghton Mifflin.

Fraser, C.A. (1999). Lexical processing strategy use and vocabulary learning through reading. Studies in Second Language Acquisition 21: 225–241.

Frayn, M. (2002). Spies. London: Faber and Faber.Fukkink, R.G. and De Glopper, K. (1998). Effects of instruction in deriving word mean-

ing from context: A metaanalysis. Review of Educational Research 68, 4: 450–69.Gaskell, M.G. and Dumay, N. (2003). Lexical competition and the acquisition of

novel words. Cognition 89: 105–132.Gibbs, R., Bogadanovich, J., Sykes, J., and Barr, D. (1997). Metaphor in idiom compre-

hension. Journal of Memory and Language 37: 141–154.Gläser, R. (1998). The stylistic potential of phraseological units in the light of genre

analysis. In Cowie, A. (ed.), Phraseology: Theory, Analysis and Applications. Oxford: Oxford University Press. pp. 125–143.

Gleason, J.B. (2005). The Development of Language. Boston: Pearson Education.



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

http://www.collocations.de



References 369

Goulden, R., Nation, P., and Read, J. (1990). How large can a receptive vocabulary be? Applied Linguistics 11, 4: 341–363.

Grabe, W. and Stoller, F.L. (2002). Teaching and Researching Reading. Harlow: Longman.

Graesser A.C, McNamara, D.S., Louwerse, M.M., and Cai, Z. (2004). Coh-Metrix: Analysis of text on cohesion and language. Behavioral Research Methods, Instruments, and Computers, 36: 193–202.

Grainger, J. and Dijkstra, T. (1992). On the representation and use of language information in bilinguals. In Harris, R.J. (ed.), Cognitive Processing in Bilinguals. Amsterdam: North Holland. pp. 207–220.

Granger, S. (1993). Cognates: An aid or a barrier to successful L2 vocabulary develop-ment. ITL: Review of Applied Linguistics 99–100: 43–56.

Granger, S. (1998). Prefabricated patterns in advanced EFL writing: Collocations and formulae. In Cowie, A.P. (ed.), Phraseology: Theory, Analysis and Applications. Oxford: Oxford University Press. pp. 145–160.

Granger, S. and Meunier, F. (eds). (2008). Phraseology: An Interdisciplinary Perspective. Amsterdam: John Benjamins.

Granger S., Paquot, M., and Rayson, P. (2006). Extraction of multi-word units from EFL and native English corpora: The phraseology of the verb ‘make’. In Häcki Buhofer, A. and Burger, H. (eds), Phraseology in Motion I: Methoden und Kritik. Akten der Internationalen Tagung zur Phraseologie (Basel, 2004). Baltmannsweiler: Schneider Verlag Hohengehren. pp. 57–68.

Greidanus, T., Bogaards, P., van der Linden, E., Nienhuis, L., and de Wolf, T. (2004). The construction and validation of a deep word knowledge test for advanced learn-ers of French. In Bogaards, P. and Laufer, B. (eds), Vocabulary in a Second Language. Amsterdam: John Benjamins.

Greidanus, T. and Nienhuis, L. (2001). Testing the Quality of word knowledge in a second language by means of word associations: Types of distractors and types of associations. Modern Language Journal 85, 4: 567–577.

Grendel, M. (1993). Verlies en herstel van lexicale kennis [Attrition and recovery of lexical knowledge]. Unpublished doctoral dissertation, University of Nijmegen.

Groot, P.J.M. (2000). Computer assisted second language vocabulary acquisition. Language Learning and Technology 4, 1: 60–81.

Gu Y. and Johnson, R.K. (1996). Vocabulary learning strategies and language learn-ing outcomes. Language Learning 46, 4: 643–679.

Gyllstad, H. (2005). Words that go together well: Developing test formats for measur-ing learner knowledge of English collocations. In Heinat, F. and Klingvall, E. (eds), The Department of English in Lund: Working Papers in Linguistics 5. pp. 1–31. Available on-line at <http://www.sol.lu.se/engelska/wp.html?expand_menu=14>.

Gyllstad, H. (2007). Testing English collocations. Lund University: PhD thesis.Haastrup, K. (1991). Lexical Inferencing Procedures. Tübingen: Gunter Narr Verlag.Haastrup, K. (2008). Lexical inferencing procedures in two languages. In Albrechtsen,

D., Haastrup, K., and Henriksen, B., Vocabulary and Writing in a First and Second Language: Process and Development. Basingstoke: Palgrave Macmillan. pp. 67–111.

Haastrup, K. and Henriksen, B. (2000). Vocabulary acquisition: Acquiring depth of knowledge through network building. Intemational Joumal of Applied Linguistics 10, 2: 221–240.

Hall, C.J. (2002). The automatic cognate form assumption: Evidence for the para-sitic model of vocabulary development. International Review of Applied Linguistics in Language Teaching 40: 69–87.



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

http://www.sol.lu.se/engelska/wp.html?expand_menu=14

370 References

Hansen, L. and Chen,Y-L. (2001). What counts in the acquisition and attrition of numeral classifiers? JALT (Japan Association for Language Teaching) Journal 23, 1: 90–110.

Hansen, L., Umeda, Y., and McKinney, M. (2002). Savings in the relearning of second language vocabulary: The effects of time and proficiency. Language Learning 52, 4: 653–678.

Harley, T.A. (2008). The Psychology of Language: From Data to Theory (3rd edn). Hove: Psychology Press.

Haynes, M. (1993). Patterns and perils of guessing in second language reading. In Huckin, T., Haynes, M., and Coady, J. (eds), Second Language Reading and Vocabulary Learning. Norwood, NJ: Ablex. pp. 46–65.

Hazenberg, S. and Hulstijn, J.H. (1996). Defining a minimal receptive second-lan-guage vocabulary for non-native university students: An empirical investigation. Applied Linguistics 17, 2: 145–163.

Heigham, J. and Croker, R.A. (2009). Qualitative Research in Applied Linguistics: A Practical Introduction . Basingstoke: Palgrave Macmillan.

Hemchua, S. and Schmitt, N. (2006). An analysis of lexical errors in the English com-positions of Thai learners. Prospect 21, 3: 3–25.

Henriksen, B. (1999). Three dimensions of vocabulary development. Studies in Second Language Acquisition 21, 2: 303–317.

Henriksen, B. (2008). Declarative lexical knowledge. In Albrechtsen, D., Haastrup, K., and Henriksen, B., Vocabulary and Writing in a First and Second Language. Basingstoke: Palgrave Macmillan.

Hill, M. and Laufer, B. (2003). Type of task, time-on-task and electronic dictionaries in incidental vocabulary acquisition. International Review of Applied Linguistics in Language Teaching 41, 2: 87–106.

Hoey, M. (2005). Lexical Priming: A New Theory of Words and Language. London: Routledge.

Hoffman, S. and Lehmann, H.M. (2000). Collocational evidence from the British National Corpus. In Kirk, J.M. (ed.), Corpora Galore: Analyses and Techniques in Describing English. Papers from the Nineteenth International Conference on English Language Research on Computerised Corpora (ICAME 1998). Amsterdam: Rodopi. pp. 17–32.

Hofland, K. and Johansson, S. (1982). Word Frequencies in British and American English. Bergen: Norwegian Computing Centre for the Humanities.

Holley, F.M. and King, J.K. (1971). Vocabulary glosses in foreign language reading materials. Language Learning 21: 213–219.

Horst, M. (2005). Learning L2 vocabulary through extensive reading: A measurement study. Canadian Modern Language Review 61, 3: 355–382.

Horst, M., Cobb T., and Meara, P. (1998). Beyond A Clockwork Orange: Acquiring second language vocabulary through reading. Reading in a Foreign Language 11, 2: 207–223.

Horst, M. and Collins, L. (2006). From faible to strong: How does their vocabulary grow? Canadian Modern Language Review 63, 1: 83–106.

Howarth, P. (1998). The phraseology of learners’ academic writing. In Cowie, A. (ed.), Phraseology: Theory, Analysis and Applications. Oxford: Oxford University Press. pp. 161–186.

Howatt, A.P.R. (2004). A History of English Language Teaching (2nd edn). Oxford: Oxford University Press.

Hu, M. and Nation, I.S.P. (2000). Vocabulary density and reading comprehension. Reading in a Foreign Language 23, 1: 403–430.



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

References 371

Hughes, A. (2003). Testing for Language Teachers. Cambridge: Cambridge University Press.

Hughes, G. (2000). A History of English Words. Oxford: Blackwell.Huibregtse, I., Admiraal, W., and Meara, P. (2002). Scores on a yes–no vocabulary test:

Correction for guessing and response style. Language Testing 19: 227–245.Hulstijn, J.H. (1992). Retention of inferred and given word meanings: Experiments in

incidental vocabulary learning. In Anaud, P.J. and Béjoint, H. (eds), Vocabulary and Applied Linguistics. London: Macmillan. pp. 113–125.

Hulstijn, J.H. (2007). Psycholinguistic perspectives on language and its acquisition. In Cummins, J. and Davison, C. (eds), International Handbook of English Language Teaching. New York: Springer. pp. 783–795.

Hulstijn, J.H., Hollander, M. and Greidanus, T. (1996). Incidental vocabulary learn-ing by advanced foreign language students: The influence of marginal glosses, dictionary use, and reoccurrence of unknown words. Modern Language Journal 80: 327–339.

Hulstijn, J.H. and Laufer, B. (2001). Some empirical evidence for the involvement load hypothesis in vocabulary acquisition. Language Learning 51, 3: 539–558.

Hulstijn, J.H. and Trompetter, P. (1998). Incidental learning of second language vocabulary in computer-assisted reading and writing tasks. In Albrechtsen, D., Hendricksen, B., Mees, M., and Poulsen, E. (eds), Perspectives on Foreign and Second Language Pedagogy. Odense, Denmark: Odense University Press. pp. 191–200.

Hulstijn, J.H., van Gelderen, A., and Schoonen, R. (2009). Automatization in sec-ond-language acquisition: What does the coefficient of variation tell us? Applied Psycholinguistics 30, 4: 555–582.

Hunston, S. (2002). Corpora in Applied Linguistics. Cambridge: Cambridge University Press.

Hunston, S. (2007). Semantic prosody revisited. International Journal of Corpus Linguistics 12, 2: 249–268.

Hunston, S. and Francis, G. (2000). Pattern Grammar. Amsterdam: John Benjamins.Hunt, A. and Beglar, D . (1998). Current research and practice in teaching vocabulary.

The Language Teacher. Accessed January 1998. Available on-line at <http://www.jalt.publications.org/tlt/articles/1998/01/hunt>.

Hunt, A. and Beglar, D. (2005). A framework for developing EFL reading vocabulary. Reading in a Foreign Language 17, 1: 23–59.

Hutchison, K.A. (2003). Is semantic priming due to association strength of feature overlap? A micro-analytic review. Psychonomic Bulletin and Review 10, 4: 785–813.

Hyland, K. and Tse, P. (2007). Is there an ‘Academic Vocabulary’? TESOL Quarterly 41, 2: 235–253.

Irujo, S. (1986). A piece of cake: Learning and teaching idioms. ELT Journal 40: 236–242.

Irujo, S. (1993). Steering clear: Avoidance in the production of idioms. International Review of Applied Linguistics in Language Teaching 31: 205–219.

Jackendoff, R. (1995). The boundaries of the lexicon. In Everaert, M., van der Linden, E., Schenk, A., and Schreuder, R. (eds), Idioms: Structural and Psychological Perspectives. Hillsdale, NJ: Erlbaum. pp. 133–166.

Jacobs, G.M., Dufon, P., and Fong, C.H. (1994). LI and L2 vocabulary glosses in L2 reading passages: Their effectiveness for increasing comprehension and vocabulary knowledge. Journal of Research in Reading 17: 19–28.

Jarvis, S. (2000). Methodological rigor in the study of transfer: Identifying L1 influ-ence in the interlanguage lexicon. Langauge Learning 50, 2: 245–309.



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

http://www.jalt.publications.org/tlt/articles/1998/01/hunt

http://www.jalt.publications.org/tlt/articles/1998/01/hunt

372 References

Jarvis, S. (2002). Short texts, best-fitting curves and new measures of lexical diversity. Language Testing 19, 1: 57–84.

Jezzard, P., Matthews, P.M., and Smith, S.M. (eds). (2003). Functional Magnetic Resonance Imaging: An Introduction to Methods. Oxford: Oxford University Press.

Jiang, N. (2002). Form-meaning mapping in vocabulary acquisition in a second lan-guage. Studies in Second Language Acquisition 24: 617–637.

Jiang, N. and Nekrasova, T.M. (2007). The processing of formulaic sequences by sec-ond language speakers. Modern Language Journal 91, 3: 433–445.

Joe, A. (1995). Text-based tasks and incidental vocabulary learning: A case study. Second Language Research 11: 149–158.

Joe, A. (1998). What effects do text-based tasks promoting generation have on inci-dental vocabulary acquisition? Applied Linguistics 19, 3: 357–377.

Joe, A.G. (2006). The nature of encounters with vocabulary and long-term vocabu-lary acquisition. Unpublished PhD thesis, Victoria University of Wellington.

Johansson, S. and Hofland, K. (1989). Frequency Analysis of English Vocabulary and Grammar Volumes 1 & 2. Oxford: Clarendon Press.

Johnston, M.H. (1974). Word associations of schizophrenic children. Psychological Reports 35: 663–674.

Keller, E. (1981). Gambits: Conversation strategy signals. In Coulmas, F. (ed.), Conversational Routine. The Hague: Mouton. pp. 93–113.

Kilgarriff, A. BNC frequency lists. Available on-line at <http://www.kilgarriff.co.uk/bnc-readme.html>.

Kiss, G.R., Armstrong, C., Milroy, R., and Piper, J. (1973). An associative thesaurus of English and its computer analysis. In Aitken, A.J., Bailey, R.W., and Hamilton-Smith, N. (eds), The Computer and Literary Studies. Edinburgh: Edinburgh University Press.

Knight, S. (1994). Dictionary: The tool of last resort in foreign language reading? A new perspective. Modern Language Journal 78: 285–299.

Koda, K. (1997). Orthographic knowledge in L2 lexical processing. In Coady, J. and Huckin, T. (eds), Second Language Vocabulary Acquisition. Cambridge: Cambridge University Press.

Koda, K. (1998). The role of phonemic awareness in second language reading. Second Language Research 14, 2: 194–215.

Kruse, H., Pankhurst, J., and Sharwood Smith, M. (1987). A multiple word associ-ation probe in second language acquisition research. Studies in Second Language Acquisition 9, 2: 141–154.

Kuc era, H. and Francis, W.N. (1967). Computational Analysis of Present-Day American English. Providence, Rhode Island: Brown University Press

Kuhn, M.R. and Stahl, S.A. (1998). Teaching children to learn word meanings from context: A synthesis and some questions. Journal of Literacy Research 30, 1: 119–38.

Kuiper, K. (1996). Smooth Talkers. Hillsdale, NJ: Lawrence Erlbaum.Kuiper, K. (2004). Formulaic performance in conventionalised varieties of speech. In

Schmitt, N. (ed.), Formulaic Sequences. Amsterdam: John Benjamins. pp. 37–54.Kuiper, K. (2009). Formulaic Genres. Basingstoke: Palgrave Macmillan.Kuiper, K., Columbus, G., and Schmitt, N. (2009). Acquiring phrasal vocabulary.

In Foster-Cohen, S. (ed.), Advances in Language Acquisition. Basingstoke: Palgrave Macmillan.

Kuiper, K. and Flindall, M. (2000). Social rituals, formulaic speech and small talk at the supermarket checkout. In Coupland, J. (ed.), Small Talk. Harlow: Longman. pp. 183–207.



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25



References 373

Kuiper, K. and Haggo, D. (1984). Livestock auctions, oral poetry, and ordinary lan-guage. Language in Society 13: 205–234.

Kuiper, K., van Egmond, M-E., Kempen, G., and Sprenger, S. (2007). Slipping on superlemmas: Multi-word lexical items in speech production. The Mental Lexicon 2, 3: 313–357.

Kutas, M., Van Petten, C.K., and Kluender, R. (2006). Psycholinguistics electri-fied II (1994–2005). In Traxler, M.J. and Gernsbacher, M.A. (eds), Handbook of Psycholinguistics (2nd edn). London: Academic Press. pp. 659–724.

Lambert, W.E. and Moore, N. (1966). Word association responses: Comparisons of American and French monolinguals with Canadian monolinguals and bilinguals. Journal of Personality and Social Psychology 3, 3: 313–320.

Larsen-Freeman, D. (1975). The acquisition of grammatical morphemes by adult ESL students. TESOL Quarterly 9: 409–430.

Laufer, B. (1988). The concept of ‘synforms’ (similar lexical forms) in vocabulary acquisition. Language and Education 2, 2: 113–132.

Laufer, B. (1989). What percentage of text-lexis is essential for comprehension? In Lauren, C. and Nordman, M. (eds), Special Language: From Humans Thinking to Thinking Machines. Clevedon: Multilingual Matters.

Laufer, B. (1992). How much lexis is necessary for reading comprehension? In Arnaud, P.J.L. and Béjoint, H. (eds), Vocabulary and Applied Linguistics. London: Macmillan. pp. 126–132.

Laufer, B. (1995). Beyond 2000 – a measure of productive lexicon in a second lan-guage. In Eubank, L., Sharwood-Smith, M., and Selinker, L. (eds), The Current State of Interlanguage. Amsterdam: John Benjamins. pp. 265–272.

Laufer, B. (1997). What’s in a word that makes it hard or easy? Intralexicalfactors affecting the difficulty of vocabulary acquisition. In Schmitt, N. and McCarthy, M. (eds), Vocabulary: Description, Acquisition, and Pedagogy. Cambridge: Cambridge University Press. pp. 140–155.

Laufer, B. (1998). The development of passive and active vocabulary in a second lan-guage: Same or different? Applied Linguistics 12: 255–271.

Laufer, B. (2000a). Task effect on instructed vocabulary learning: The hypothesis of ‘involvement’. Selected Papers from AILA ‘99 Tokyo. Tokyo: Waseda University Press. pp. 47–62.

Laufer, B. (2000b). Avoidance of idioms in a second language: The effect of L1-L2 degree of similarity. Studia Linguistica 54: 186–196.

Laufer, B. (2001). Quantitative evaluation of vocabulary: How it can be done and what it is good for. In Elder, C., Hill, K., Brown, A., Iwashita, N., Grove, L., Lumley, T., and McNamara, T. (eds), Experimenting with Uncertainty. Cambridge: Cambridge University Press.

Laufer, B. (2005a). Focus on form in second language vocabulary learning. EUROSLA Yearbook 5: 223–250.

Laufer, B. (2005b). Lexical frequency profiles: From Monte Carlo to the real world. A response to Meara (2005). Applied Linguistics 26, 4: 582–588.

Laufer, B., Elder, C., Hill, K., and Congdon, P. (2004). Size and strength: Do we need both to measure vocabulary knowledge? Language Testing 21, 2: 202–226.

Laufer, B. and Eliasson, S. (1993). What causes avoidance in L2 learning: L1-L2 dif-ference, L1-L2 similarity, or L2 complexity? Studies in Second Language Acquisition 15: 35–48.

Laufer, B. and Goldstein, Z. (2004). Testing vocabulary knowledge: Size, strength, and computer adaptiveness. Language Learning 54, 3: 399–436.



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

374 References

Laufer, B. and Hulstijn, J. (2001). Incidental vocabulary acquisition in a second lan-guage: The construct of task-induced involvement. Applied Linguistics 22, 1: 1–26.

Laufer, B. and Nation, P. (1995). Vocabulary size and use: Lexical richness in L2 writ-ten production. Applied Linguistics 16, 3: 307–322.

Laufer, B. and Nation, P. (1999). A vocabulary-size test of controlled productive vocabulary. Language Testing 16, 1: 33–51.

Laufer, B. and Paribakht, T.S. (1998). The relationship between passive and active vocab-ularies: Effects of language learning context. Language Learning 48, 3: 365–391.

Laufer, B. and Shmueli, K. (1997). Memorizing new words: Does teaching have any-thing to do with it? RELC Journal 28: 89–108.

Leech, G., Rayson, P., and Wilson, A. (2001) . Word Frequencies in Written and Spoken English. Harlow: Longman.

Lennon, P. (2000). The lexical element in spoken second language fluency. In Riggenbach, H. (ed.), Perspectives on Fluency. Ann Arbor: University of Michigan Press. pp. 25–42.

Levelt, W.J.M. (1989). Speaking. From Intention to Articulation. Cambridge, Mass: MIT Press.

Li, J. and Schmitt, N. (2009). The acquisition of lexical phrases in academic writing: A longitudinal case study. Journal of Second Language Writing 18: 85–102.

Li, J. and Schmitt, N. (in press). The development of collocation use in academic texts by advanced L2 learners: A multiple case-study approach. In Woods, D. (ed.), Perspectives on Formulaic Language in Communication and Acquisition. London: Continuum.

Li, P., Farkas, I., and MacWhinney, B. (2004). Early lexical development in a self organising neural network. Neural Networks 17: 1345–1362.

Liao, P. (2006). EFL learners’ beliefs about and strategy use of translation in English learning. RELC Journal 37, 2: 191–215.

Lin, P.M.S. (in preparation). Are formulaic sequences phonologically coherent as we assumed? Unpublished PhD thesis, University of Nottingham.

Liu, N. and Nation, I.S.P. (1985). Factors affecting guessing vocabulary in context. RELC Journal 16: 33–42.

Longman Language Activator. (1993). Harlow: Longman.Lotto, L. and de Groot, A.M.B. (1998). Effects of learning method and word type on

acquiring vocabulary in an unfamiliar language. Language Learning 48, 1: 31–69.Lüdeling, A. and Kyto, M. (eds). (2009). Corpus Linguistics: An International Handbook.

Berlin: Mouton de Gruyter.Luppescu, S. and Day, R.R. (1993). Reading, dictionaries, and vocabulary learning.

Language Learning 43, 2: 263–287.MacAndrew, R. (2002). Inspector Logan. Cambridge English Readers Series. Cambridge:

Cambridge University Press.Malvern, D., Richards, B.J., Chipere, N., and Durán, P. (2004). Lexical Diversity

and Language Development: Quantification and Assessment. Basingstoke: Palgrave Macmillan.

Manning, C.D. and Schütze, H. (1999). Foundations of Statistical Natural Language Processing. Cambridge, MA: MIT Press.

Mason, B. and Krashen, S. (2004). Is form-focused vocabulary instruction worth-while? RELC Journal 35, 2: 179–185.

McArthur, T. (ed.). (1992). The Oxford Companion to the English Language. Oxford: Oxford University Press.

McCarthy, M. (1990). Vocabulary. Oxford: Oxford University Press.



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

References 375

McCarthy, M. and Carter, R. (1997). Written and spoken vocabulary. In Schmitt, N. and McCarthy, M. (eds), Vocabulary: Description, Acquisition, and Pedagogy. Cambridge: Cambridge University Press.

McCarthy, M. and Carter, R. (2002). This that and the other: Multi-word clusters in spoken English as visible patterns of interaction. Teanga (Yearbook of the Irish Association for Applied Linguistics) 21: 30–52.

McCarthy, P.M. and Jarvis, S. (2007). vocd: A theoretical and empirical evaluation. Language Testing 24, 4: 459–488.

McCrostie, J. (2007). Investigating the accuracy of teachers’ word frequency intui-tions. RELC Journal 38, 1: 53–66.

McDonough, K. and Trofimovich, P. (2008). Using Priming Methods in Second Language Research. London: Routledge/Taylor and Francis.

McGee, I. (2008). Word frequency estimates revisited – A response to Alderson (2007). Applied Linguistics 29, 3: 509–514.

McMillion, A. and Shaw, P. (2008). The balance of speed and accuracy in advanced L2 reading comprehension. Nordic Journal of English Studies, December.

McMillion, A. and Shaw, P. (2009). Comprehension and compensatory processing in advanced L2 readers. In Brantmeier, C. (ed.), Empirical Research on Adult Foreign Language Reading. New York: Information Age Publishing.

McNeill, A. (1996). Vocabulary knowledge profiles: Evidence from Chinese-speaking ESL teachers. Hong Kong Journal of Applied Linguistics 1: 39–63.

Meara, P. (1980). Vocabulary acquisition: A neglected aspect of language learning. Language Teaching & Linguistics: Abstracts 13, 4: 221–246.

Meara, P. (1983). Word associations in a foreign language. Nottingham Linguistic Circular 11, 2: 29–38.

Meara, P. (1987). Vocabulary in a second language, Vol. 2. Specialised Bibliography 4. London: CILT.

Meara, P. (1990). A note on passive vocabulary. Second Language Research 6, 2: 150–154.Meara, P. (1992). EFL Vocabulary Tests. University College, Swansea: Centre for Applied

Language Studies.Meara, P. (1996a). The classical research in vocabulary acquisition. In Anderman,

G. and Rogers, M. (eds), Words, Words, Words. Clevedon: Multilingual Matters. pp. 27–40. Accessed on-line at <http://www.lognostics.co.uk/vlibrary/index.htm>.

Meara, P.M. (1996b). The dimensions of lexical competence. In Brown, G., Malmkjaer, K., and Williams, J. (eds), Performance and Competence in Second Language Acquisition. Cambridge: Cambridge University Press. pp. 35–53.

Meara, P. (1996c). The vocabulary knowledge framework. Available on-line at <http://www.lognostics.co.uk/vlibrary/>.

Meara, P. (1997). Towards a new approach to modelling vocabulary acquisition. In Schmitt, N. and McCarthy, M. (eds), Vocabulary: Description, Acquisition, and Pedagogy. Cambridge: Cambridge University Press. pp. 109–121.

Meara, P. (1999). Lexis: Acquisition. In Spolsky, B. (ed.), Concise Encyclopedia of Educational Linguistics. Amsterdam: Elsevier. pp.565–567.

Meara, P. (2004). Modelling vocabulary loss. Applied Linguistics 25, 2: 137–155.Meara, P. (2005). Lexical frequency profiles: A Monte Carlo analysis. Applied Linguistics

26, 1: 32–47.Meara, P. (2006). Emergent properties of multilingual lexicons. Applied Linguistics 27,

4: 620–644.Meara, P. (accessed 2008). P_Lex v2.0: The Manual. Available on the _lognostics web-

site, <http://www.lognostics.co.uk/>.



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

http://www.lognostics.co.uk/vlibrary/index.htm

http://www.lognostics.co.uk/vlibrary

http://www.lognostics.co.uk/vlibrary


376 References

Meara, P. (2009). Connected Words: Word Associations and Second Language Vocabulary Acquisition. Amsterdam: John Benjamins.

Meara, P.M. and Bell, H. (2001). P_Lex: A simple and effective way of describing the lexical characteristics of short L2 texts. Prospect 16, 3: 5–19.

Meara, P.M. (in preparation b). L2 vocabulary profiles and Zipf’s law. Cited in the V_Size: The Manual. Meara, P.M. and Miralpeix, I. Accessed on-line at <http://www.lognostics.co.uk/tools/index.htm>.

Meara, P. and Miralpeix, I. (accessed 2008a). V_Size: The Manual. Available on the_lognostics website, <http://www.lognostics.co.uk/>.

Meara, P. and Miralpeix, I. (accessed 2008b). D_Tools: The Manual. Available on the_lognostics website, <http://www.lognostics.co.uk/>.

Meara, P. and Wolter, B. (2004). V_Links: Beyond vocabulary depth. In Albrechtsen, D., Haastrup, K., and Henriksen, B. (eds), Angles on the English-Speaking World. Copenhagen: Museum Tusculanum/University of Copenhagen Press. pp. 85–96.

Mel’cuk, I.A. (1995). Phrasemes in language and phraseology in linguistics. In Everaert, M., van der Linden, E.-J., Schenk, A., and Schreuder, R. (eds), Idioms: Structural and Psychological Perspectives. Hillsdale, NJ: Lawrence Erlbaum. pp. 167–232.

Melka, F. (1997). Receptive vs. productive aspects of vocabulary. In Schmitt, N. and McCarthy, M. (eds), Vocabulary: Description, Acquisition, and Pedagogy. Cambridge: Cambridge University Press. pp. 84–102.

Milton, J. and Hales, T. (1997). Applying a lexical profiling system to technical English. In Ryan, A. and Wray, A. (eds), Evolving Models of Language. Clevedon: Multilingual Matters. pp. 72–83.

Milton, J. and Hopkins, N. (2006). Comparing phonological and orthographic vocab-ulary size: Do vocabulary tests underestimate the knowledge of some learners? Canadian Modern Language Review 63, 1: 127–147.

Milton, J. and Meara, P. (1998). Are the British really bad at learning foreign lan-guages? Language Learning Journal 18: 68–76.

Miralpeix, I. (2007). Lexical knowledge in instructed language learning: The effects of age and exposure. International Journal of English Studies 7, 2: 61–83.

Miralpeix, I. (2008). The influence of age on vocabulary acquisition in English as a Foreign Language. PhD thesis, Universitat de Barcelona.

Mochida, A. and Harrington, M. (2006). The yes/no test as a measure of receptive vocabulary knowledge. Language Testing 23: 73–98.

Mochizuki, M. (2002). Exploration of two aspects of vocabulary knowledge: Paradigmatic and collocational. Annual Review of English Language Education in Japan 13:121–129.

Mondria, J-A. (2003). The effects of inferring, verifying, and memorizing on the retention of L2 word meanings. Studies in Second Language Acquisition 25: 473–499.

Moon, R. (1997). Vocabulary connections: Multi-word items in English. In Schmitt, N. and McCarthy, M. (eds), Vocabulary: Description, Acquisition, and Pedagogy. Cambridge: Cambridge University Press. pp. 40–63.

Moon, R. (1998). Fixed Expressions and Idioms in English: A Corpus-based Approach. Oxford: Oxford University Press.

Morris, L. and Cobb, T. (2004). Vocabulary profiles as predictors of the academic per-formance of Teaching English as a Second Language trainees. System 32: 75–87.

Moss, H.E. and Older, L.J.E. (1996). Birkbeck Word Association Norms. London: Taylor and Francis.

Myles, F., Hooper, J., and Mitchell, R. (1998). Rote or rule? Exploring the role of for-mulaic language in classroom foreign language learning. Language Learning 48, 3: 323–363.



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

http://www.lognostics.co.uk/tools/index.htm

http://www.lognostics.co.uk/tools/index.htm



References 377

Nagy, W.E., Anderson, R., Schommer, M., Scott, J.A., and Stallman, A. (1989). Morphological families in the internal lexicon. Reading Research Quarterly 24, 3: 263–282.

Nagy, W.E., Diakidoy, I.N., and Anderson, R.C. (1993). The acquisition of morphol-ogy: Learning the contribution of suffixes to the meanings of derivatives. Journal of Reading Behavior 25: 155–170.

Namei, S. (2004). Bilingual lexical development: A Persian-Swedish word association study. International Journal of Applied Linguistics 14, 3: 363–388.

Nassaji, H. (2003). L2 vocabulary learning from context: Strategies, knowledge sources, and their relationship with success in L2 lexical inferencing. TESOL Quarterly 37, 4: 645–670.

Nation, P. (1983). Testing and teaching vocabulary. Guidelines 5: 12–25.Nation, I.S.P. (1990). Teaching and Learning Vocabulary. New York: Newbury House.Nation, I.S.P. (1993). Using dictionaries to estimate vocabulary size: Essential, but

rarely followed, procedures. Language Testing 10, 1: 27–40.Nation, P. (1995). The word on words: An interview with Paul Nation. The Language

Teacher 19, 2: 5–7.Nation, I.S.P. (2001). Learning Vocabulary in Another Language. Cambridge: Cambridge

University Press.Nation, I.S.P. (2006). How large a vocabulary is needed for reading and listening?

Canadian Modern Language Review 63, 1: 59–82.Nation, I.S.P. (2008). Teaching Vocabulary: Strategies and Techniques. Boston: Heinle.Nation, P. and Gu, P.Y. (2007). Focus on Vocabulary. Sydney: National Centre for

English Language Teaching and Research.Nation, I.S.P. and Hwang K. (1995). Where would general service vocabulary stop and

special purposes vocabulary begin? System 23, 1: 35–41.Nation, P. and Meara, P. (2002). Vocabulary. In Schmitt, N. (ed.), An Introduction to

Applied Linguistics. London: Arnold.Nation, I.S.P. and Wang, K. (1999). Graded readers and vocabulary. Reading in a Foreign

Language 12: 355–380.Nation, I.S.P. and Waring, R. (1997). Vocabulary size, text coverage, and word lists.

In Schmitt, N. and McCarthy, M. (eds.), Vocabulary: Description, Acquisition, and Pedagogy. pp. 6–19.

Nattinger, J.R. and DeCarrico, J.S. (1992). Lexical Phrases and Language Teaching. Oxford: Oxford University Press.

Nelson, K. (1973). Structure and Strategy in Learning to Talk. Monographs of the Society for Research in Child Development, Serial no. 149, nos 1–2.

Nelson, K. (1981). Individual differences in language development: Implications for development and language. Developmental Psychology 17: 170–187.

Nelson, D.L., McEvoy, C.L., and Schreiber, T.A. (1998). The University of South Florida word association, rhyme, and word fragment norms. Available on-line at <http://www.usf.edu/FreeAssociation/>.

Nesselhauf, N. (2003). The use of collocations by advanced learners of English and some implications for teaching. Applied Linguistics 24/2: 223–242.

Nesselhauf, N. (2004). How learner corpus analysis can contribute to language teaching: A study of support verb constructions. In Aston, G., Bernardini, S., and Stewart, D. (eds), Corpora and Language Learners. Amsterdam: John Benjamins. pp. 109–124.

Nesselhauf, N. (2005). Collocations in a Learner Corpus. Amsterdam: John Benjamins.Newton, J. (1995). Task-based interaction and incidental vocabulary learning: A case

study. Second Language Research 11: 159–177.



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

http://www.usf.edu/FreeAssociation

378 References

Nippold, M.A. (1991). Evaluating and enhancing idiom comprehension in language-disordered students. Language, Speech, and Hearing Services in Schools 22: 100–106. Cited in Cain, K., Oakhill, J., and Lemmon, K. (2005). The relation between chil-dren’s reading comprehension level and their comprehension of idioms. Journal of Experimental Child Psychology 90, 1: 65–87.

Nissen, H.B. and Henriksen, B. (2006). Word class influence on word association test results. International Journal of Applied Linguistics 16, 3: 389–408.

Nooteboom, S.G. (1999). Sloppiness in uttering stock phrases. International Congress of Phonetic Sciences. San Francisco.

Nurweni, A., and Read, J. (1999). The English vocabulary knowledge of Indonesian university students. English for Specific Purposes 18: 161–175.

O’Keeffe, A., McCarthy, M., and Carter, R. (2007). From Corpus to Classroom: Language Use and Language Teaching. Cambridge: Cambridge University Press.

Olshtain, E. (1989). Is second language attrition the reversal of second language acquisition? Studies in Second Language Acquisition 11, 2: 151–165.

Oppenheim, N. (2000). The importance of recurrent sequences for nonnative speaker fluency and cognition. In Riggenbach, H. (ed.), Perspectives on Fluency. Ann Arbor: University of Michigan Press. pp. 220–240.

Osterhout, L., McLaughlin, J., Pitkänen, I., Frenck-Mestre, C., and Molinaro, N. (2006). Novice learners, longitudinal designs, and event-related potentials: A means for exploring the neurocognition of second language processing. Language Learning 56, Supplement 1: 199–230.

Oxford, R.L. (1990). Language Learning Strategies: What Every Teacher Should Know. New York: Newbury House.

Palmer, H.E., West, M.P., and Faucett, L. (1936). Interim Report on Vocabulary Selection for the Teaching of English as a Foreign Language. Report of the Carnegie Conference, New York 1934, and London 1935. London: P.S. King and Son.

Paribakht, T.S. (2005). The influence of L1 lexicalization on L2 lexical inferencing: A Study of Farsi-speaking EFL learners. Language Learning 55, 4: 701–748.

Paribakht, T.S. and Wesche, M.B. (1993). Reading comprehension and second lan-guage development in a comprehension-based ESL program. TESL Canada Journal 11: 9–29.

Paribakht, T.S. and Wesche, M. (1997). Vocabulary enhancement activities and reading for meaning in second language vocabulary acquisition. In Coady, J. and Huckin, T. (eds), Second Language Vocabulary Acquisition. Cambridge University Press. pp. 174–200.

Paribakht, T.S. and Wesche, M. (1999). Reading and ‘incidental’ L2 vocabulary acqui-sition: An introspective study of lexical inferencing. Studies in Second Language Acquisition 21: 195–224.

Partington, A. (1998). Patterns and Meanings. Amsterdam: John Benjamins.Pawley, A. and Syder, F.H. (1983). Two puzzles for linguistic theory: Nativelike selec-

tion and nativelike fluency. In Richards, J.C. and Schmidt, R.W. (eds), Language and Communication. London: Longman. pp. 191–225.

Pellicer Sánchez, A. and Schmitt, N. (in press). Incidental vocabulary acquisition from an authentic novel: A Clockwork Orange/multiple word knowledge approach. Reading in a Foreign Language.

Peters, A. (1983). The Units of Language Acquisition. Cambridge: Cambridge University Press.

Phongphio, T. and Schmitt, N. (2006). Learning English multi-word verbs in Thailand. Thai TESOL Bulletin 19: 122–136.



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

References 379

Pienemann, M. and Johnston, M. (1987). Factors influencing the development of lan-guage proficiency. In Nunan, D. (ed.), Applying Second Language Acquisition Research. Adelaide: National Curriculum Resource Centre. pp. 45–141.

Pigada, M. and Schmitt, N. (2006). Vocabulary acquisition from extensive reading: A case study. Reading in a Foreign Language 18, 1: 1–28.

Pintrich, P.R., Smith, D.A.F., Garcia, T., and McKeachie, W.J. (1991). A Manual for the Use of The Motivated Strategies for Learning Questionnaire (MSLQ). Ann Arbor: University of Michigan Press.

Politzer, R.L. (1978). Paradigmatic and syntagmatic associations of first year French students. In Honsa, V. and Hardman-de-Baudents, J. (eds), Papers in Linguistics and Child Language. The Hague: Mouton. pp. 203–210.

Postman, L. and Keppel, G. (1970). Norms of Word Association. New York: Academic Press.

Prince, P. (1996). Second language vocabulary learning: The role of context versus translations as a function of proficiency. Modern Language Journal 80: 478–493.

Pulvermüller, F. (2005). Brain mechanisms linking language and action. Nature Reviews Neuroscience 6, 7: 576–582. Available on-line at <http://www.nature.com/nrn/journal/v6/n7/index.html>.

Qian, D.D. (2002). Investigating the relationship between vocabulary knowledge and academic reading performance: An assessment perspective. Language Learning 52, 3: 513–536.

Ramachandran, S.D. and Rahim, H.A. (2004). Meaning recall and retention: The impact of the translation method on elementary level learners’ vocabulary learn-ing. RELC Journal 35, 2: 161–178.

Rastle, K., Harrington, J., and Coltheart, M. (2002). 358,534 nonwords: The ARC Nonword Database. Quarterly Journal of Experimental Psychology, 55A: 1339–1362.

Rayson, P. (2008). Software demonstration: Identification of multiword expressions with Wmatrix. Presentation given at the Formulaic Language Research Network (FLaRN) conference, University of Nottingham.

Read, J. (1988). Measuring the vocabulary knowledge of second language learners. RELC Journal 19: 12–25.

Read, J. (1993). The development of a new measure of L2 vocabulary knowledge. Language Testing 10: 355–371.

Read, J. (1998). Validating a test to measure depth of vocabulary knowledge. In Kunnan, A. (ed.), Validation in Language Assessment. Mahwah, NJ: Lawrence Erlbaum. pp. 41–60.

Read, J. (2000). Assessing Vocabulary. Cambridge: Cambridge University Press.Read, J. (2004). Research in teaching vocabulary. Annual Review of Applied Linguistics

24: 146–161.Read, J. (2005). Applying lexical statistics to the IELTS listening test. Research Notes

20: 12–16.Renandya, W.A., Rajan, B.R.S., and Jacobs, G.M. (1999). Extensive reading with adult

learners of English as a second language. RELC Journal 30: 39–60.Reppen, R. and Simpson, R. (2002). Corpus linguistics. In Schmitt, N. (ed.), An

Introduction to Applied Linguistics. London: Arnold.Richards, B. and Malvern, D. (2007). Validity and threats to the validity of vocabu-

lary measurement. In Daller, H., Milton, J., and Treffers-Daller, J. (eds), Modelling and Assessing Vocabulary Knowledge. Cambridge: Cambridge University Press. pp. 79–92.

Richards, J.C. (1976). The role of vocabulary teaching. TESOL Quarterly 10, 1: 77–89.



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

http://www.nature.com/nrn/journal/v6/n7/index.html

http://www.nature.com/nrn/journal/v6/n7/index.html

380 References

Ringbom, H. (2007). Cross-linguistic Similarity in Foreign Language Learning. Clevedon: Multilingual Matters.

Rosenzweig, M.R. (1961). Comparisons among word-association responses in English, French, German, and Italian. American Journal of Psychology 74: 347–360.

Rosenzweig, M.R. (1964). Word associations of French workmen: Comparisons with associations of French students and American workmen and students. Journal of Verbal Behavior and Verbal Learning 3: 57–69.

Rott, S. (1999). The effect of exposure frequency on intermediate language learn-ers’ incidental vocabulary acquisition through reading. Studies in Second Language Acquisition 21, 1: 589–619.

Rott, S., Williams, J. and Cameron, R. (2002). The effect of multiple-choice L1 glosses and input-output cycles on lexical acquisition and retention. Language Teaching Research 6, 3: 183–222.

Ruke-Dravina, V. (1971). Word associations in monolingual and multilingual indi-viduals. Linguistics 74: 66–85.

Ryan, A. (1997). Learning the orthographic form of L2 vocabulary – A receptive and a productive process. In Schmitt, N. and McCarthy, M. (eds), Vocabulary: Description, Acquisition and Pedagogy. Cambridge: Cambridge University Press.

Saragi, T., Nation, I.S.P., and Meister, F. (1978). Vocabulary learning and reading. System 6, 2: 72–78.

Scarcella, R. and Zimmerman, C.B. (1998). Academic words and gender: ESL student performance on a test of academic lexicon. Studies in Second Language Acquisition 20: 27–49.

Schmidt, R.W. (1983). Interaction, acculturation, and the acquisition of commu-nicative competence: A case study of an adult. In N. Wolfson and E. Judd (eds), Sociolinguistics and Language Acquisition. Rowley, MA: Newbury House. pp. 137–174.

Schmitt, N. (1997). Vocabulary learning strategies. In Schmitt, N. and McCarthy, M. (eds), Vocabulary: Description, Acquisition, and Pedagogy. Cambridge: Cambridge University Press.

Schmitt, N. (1998a). Tracking the incidental acquisition of second language vocabu-lary: A longitudinal study. Language Learning 48, 2: 281–317.

Schmitt, N. (1998b). Measuring collocational knowledge: Key issues and an experi-mental assessment procedure. ITL Review of Applied Linguistics 119–120: 27–47.

Schmitt, N. (1998c). Quantifying word association responses: What is native-like? System 26: 389–401.

Schmitt, N. (2000). Vocabulary in Language Teaching. Cambridge: Cambridge University Press.

Schmitt, N. (ed.). (2004). Formulaic Sequences: Acquisition, Processing, and Use. Amsterdam: John Benjamins.

Schmitt, N. (2005). Grammar: Rules or patterning? Applied Linguistics Forum 26, 2: 1–2. Available on-line at <https://www.nottingham.ac.uk/English/research/cral/doku.php?id=people:Schmitt>.

Schmitt, N. (2008). Instructed second language vocabulary learning. Language Teaching Research 12, 3: 329–363.

Schmitt, N. and Carter, R. (2004). Formulaic sequences in action: An introduction. In Schmitt, N. (ed.), Formulaic Sequences. Amsterdam: John Benjamins. pp. 1–22.

Schmitt, N., Dörnyei, Z., Adolphs, S., and Durow, V. (2004). Knowledge and acquisi-tion of formulaic sequences: A longitudinal study. In Schmitt, N. (ed.), Formulaic Sequences: Acquisition, Processing, and Use. Amsterdam: John Benjamins. pp. 55–86.

Schmitt, N. and Dunham, B. (1999). Exploring native and non-native intuitions of word frequency. Second Language Research 15, 4: 389–411.



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

https://www.nottingham.ac.uk/English/research/cral/doku.php?id=people:Schmitt

https://www.nottingham.ac.uk/English/research/cral/doku.php?id=people:Schmitt

References 381

Schmitt, N., Grandage, S., and Adolphs, S. (2004). Are corpus-derived recurrent clus-ters psycholinguistically valid? In Schmitt, N. (ed.), Formulaic Sequences: Acquisition, Processing, and Use. Amsterdam: John Benjamins. pp. 127–151.

Schmitt, N., Jiang, X., and Grabe, W. (in press). The percentage of words known in a text and reading comprehension. Modern Language Journal.

Schmitt, N. and Marsden, R. (2006). Why is English Like That? Historical Answers to Hard ELT Questions. An Arbor: University of Michigan Press.

Schmitt, N. and McCarthy, M. (eds). (1997). Vocabulary: Description, Acquisition and Pedagogy. Cambridge: Cambridge University Press.

Schmitt, N. and Meara, P. (1997). Researching vocabulary through a word knowl-edge framework: Word associations and verbal suffixes. Studies in Second Language Acquisition 19: 17–36.

Schmitt, N., Schmitt, D., and Clapham, C. (2001). Developing and exploring the behaviour of two new versions of the Vocabulary Levels Test. Language Testing 18, 1: 55–88.

Schmitt, N. and Zimmerman, C.B. (2002). Derivative word forms: What do learners know? TESOL Quarterly 36, 2: 145–171.

Schonell, F.J., Meddleton, I.G., and Shaw, B.A. (1956). A Study of the Oral Vocabulary of Adults. Brisbane: University of Queensland Press.

Schoonen, R., van Gelderen, A., de Glopper, K., Hulstijn, J., Simis, A., Snellings, P., and Stevenson, M. (2003). First language and second language writing: The role of linguistic fluency, linguistic knowledge, and metacognitive knowledge. Language Learning 53, 1: 165–202.

Schoonen, R. and Verhallen, M. (2008). The assessment of deep word knowledge in young first and second language learners. Language Testing 25, 2: 211–236.

Scrivener, J. (2005). Learning Teaching: A Guidebook for English Language Teachers. Oxford: Macmillan.

Segalowitz, N. and Hulstijn, J. (2005). Automaticity in bilingualism and second lan-guage learning. In Kroll, J.F. and de Groot, A.M.B. (eds), Handbook of Bilingualism: Psycholinguistic Approaches. Oxford: Oxford University Press. pp. 371–388.

Segalowitz, N.S., Poulsen, C., and Segalowitz, S.J. (1999). RT coefficient of variation is differentially sensitive to executive control involvement in an attention switching task. Brain and Cognition 38: 255–258.

Segalowitz, N.S. and Segalowitz, S.J. (1993). Skilled performance, practice, and the differentiation of speed-up from automatization effects: Evidence from second lan-guage word recognition. Applied Psycholinguistics 14: 369–385.

Segalowitz, S.J., Segalowitz, N.S., and Wood, A.G. (1998). Assessing the development of automaticity in second language word recognition. Applied Psycholinguistics 19: 53–67.

Shapiro, B. J. (1969). The subjective estimation of relative word frequency, Journal of Verbal Learning and Verbal Behaviour 8: 248–51.

Sharp, D. and Cole, M. (1972). Patterns of responding in the word associations of West African children. Child Development 43: 55–65.

http://www.lognostics.co.uk/vlibrary/index.htm

382 References

Sinclair, J. (2004). Trust the Text: Lexis, Corpus, Discourse. London: Routledge.Singleton, D. (1999). Exploring the Second Language Mental Lexicon. Cambridge:

Cambridge University Press.Singleton, D. (2000). Language and the Lexicon. London: Arnold.Siyanova, A., Conklin, K., and Schmitt, N. (under review). Processing of idioms by

native speakers and proficient L2 learners: An eye-tracking study.Siyanova, A. and Schmitt, N. (2007). Native and nonnative use of multi-word vs. one-

word verbs. International Review of Applied Linguistics, 45: 119–139.Siyanova, A. and Schmitt, N. (2008). L2 learner production and processing of col-

location: A multi-study perspective. Canadian Modern Language Review 64, 3: 429–458.

Söderman, T. (1993). Word associations of foreign language learners and native speakers – different response tvpes and their relevance to lexical development. In Hammarberg, B. (ed.), Problems, Process and Product in Language Learning. Abo: AfinLA.

Sorhus, H. (1977). To hear ourselves – Implications for teaching English as a second language. English Language Teaching Journal 31: 211–21.

Staehr, L. (2009). Vocabulary knowledge and advanced listening comprehension in English as a Foreign Language. Studies in Second Language Acquisition 31: 1–31.

Stubbs, M. (1995). Collocations and semantic profiles: On the cause of the trouble with quantitative methods. Functions of Language 2: 1–33.

Stubbs, M. (2002). Words and Phrases: Corpus Studies of Lexical Semantics. Oxford: Blackwell.

Sunderman, G. and Kroll, J.F. (2006). First language activation during second lan-guage lexical processing. Studies in Second Language Acquisition 28: 387–422.

Sutarsyah, C., Nation, P., and Kennedy, G. (1994). How useful is EAP vocabulary for ESP? A corpus based study. RELC Journal 25, 2: 34–50.

Swan, M. (1997). The influence of the mother tongue on second language vocabulary acquisition and use. In Schmitt, N. and McCarthy, M. (eds), Vocabulary: Description, Acquisition, and Pedagogy. Cambridge: Cambridge University Press.

Takala, S. (1984). Evaluation of Students’ Knowledge of English Vocabulary in the Finnish Comprehensive School. (Rep. No. 350). Jyväskylä, Finland: Institute of Educational Research.

Teliya, V., Bragina, N., Oparina, E., and Sandomirskaya, I. (1998). Phraseology as a lan-guage of culture: Its role in the representation of a collective mentality. In Cowie, A. (ed.), Phraseology: Theory, Analysis, and Applications. Oxford: Oxford University Press. pp. 55–75.

Thorndike, E.L. and Lorge, I. (1944). The Teacher’s Word Book of 30,000 Words. New York: Teachers College, Columbia University.

Tomasello, M. (2003). Constructing a Language: A Usage-based Theory of Language Acquisition. Cambridge, MA: Harvard University Press.

Tomiyama, M. (1999). The first stage of second language attrition: A case study of a Japanese returnee. In Hansen, L. (ed.), Second Language Attrition in Japanese Contexts. Oxford: Oxford University Press. pp. 59–79.

Tremblay. A., Baayen, H., Derwing, B., and Libben, G. (2008). Lexical bundles and work-ing memory: An ERP study. Presentation given at the (FLaRN) Formulaic Language Research Network Conference. University of Nottingham, June 19–20, 2008.

Tseng, W-T., Dörnyei, Z., and Schmitt, N. (2006). A new approach to assessing strategic learning: The case of self-regulation in vocabulary acquisition. Applied Linguistics 27, 1: 78–102.



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

References 383

Tseng, W-T. and Schmitt, N. (2008). Towards a self-regulating model of vocabu-lary learning: A structural equation modeling approach. Language Learning 58, 2: 357–400.

Underwood, G., Schmitt, N., and Galpin, A. (2004). The eyes have it: An eye-move-ment study into the processing of formulaic sequences. In Schmitt, N. (ed.), Formulaic Sequences. Amsterdam: John Benjamins.

van Gelderen, A., Schoonen, R., de Glopper, K., Hulstijn, J., Simis, A., Snellings, P., and Stevenson, M. (2004). Linguistic knowledge, processing speed, and metacogni-tive knowledge in first- and second-language reading comprehension: A compo-nential analysis. Journal of Educational Psychology 96, 1: 19–30.

van Hout, R. and Vermeer, A. (2007). Comparing measures of lexical richness. In Daller, H., Milton, J., and Treffers-Daller, J. (eds), Modelling and Assessing Vocabulary Knowledge. Cambridge: Cambridge University Press. pp. 93–115.

van Lancker, D., Canter, G.J., and Terbeek, D. (1981). Disambiguation of diatropic sentences: Acoustic and phonetic cues. Journal of Speech and Hearing Research 24: 330–335.

Walters, J.M. (2004). Teaching the use of context to infer meaning: A longitudinal survey of L1 and L2 vocabulary research. Language Teaching 37, 4: 243–52.

Walters, J. (2006). Methods of teaching inferring meaning from context. RELC Journal 37, 2: 176–190.

Wang, M. and Koda, K. (2005). Commonalities and differences in word identifica-tion skills among learners of English as a second language. Language Learning 55, 1: 71–98.

Waring, R. (1999). Tasks for assessing second language receptive and productive vocabulary. Unpublished PhD thesis, University of Wales, Swansea. Available on-line at <http://www1.harenet.ne.jp/~waring/papers/phd/title.html>.

Waring, R. and Takaki, M. (2003). At what rate do learners learn and retain new vocabulary from reading a graded reader? Reading in a Foreign Language 15: 130–163.

Watanabe,Y. (1997). Input, intake, and retention: Effects of increased processing on incidental learning of foreign vocabulary. Studies in Second Language Acquisition 19: 287–307.

Webb, S. (2005). Receptive and productive vocabulary learning: The effects of read-ing and writing on word knowledge. Studies in Second Language Acquisition 27: 33–52.

Webb, S. (2007a). Learning word pairs and glossed sentences: The effects of a single context on vocabulary knowledge. Language Teaching Research 11, 1: 63–81.

Webb, S. (2007b). The effects of repetition on vocabulary knowledge. Applied Linguistics 28, 1: 46–65.

Weltens, B. (1989). The Attrition of French as a Foreign Language. Dordrecht: Foris.Weltens, B. and Grendel, M. (1993). Attrition of vocabulary knowledge. In Schreuder,

R. and Weltens, B. (eds), The Bilingual Lexicon. Amsterdam: John Benjamins.Weltens, B., Van Els, T.J.M., and Schils, E. (1989). The long-term retention of French

by Dutch students. Studies in Second Language Acquisition 11, 2: 205–216.Wesche, M. and Paribakht, T.S. (1996). Assessing L2 vocabulary knowledge: Depth

versus breadth. Canadian Modern Language Review 53, 1: 13–40.Wesche, M. and Paribakht, T.S. (2009). Lexical Inferencing in a First and Second Language:

Cross-linguistic Dimensions. Clevedon: Multilingual Matters.West, M. (1953). A General Service List of English Words. London: Longman, Green

and Co.



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

http://www1.harenet.ne.jp/~waring/papers/phd/title.html

384 References

Wilks, C. and Meara, P. (2002). Untangling word webs: Graph theory and the notion of density in second language word association networks. Second Language Research 18, 4: 303–324.

Wilkins, D.A. (1972). Linguistics in Language Teaching. London: Arnold.Winne, P.H. and Perry, N.E. (2000). Measuring self-regulated learning. In Boekaerts,

M., Pintrich, P.R., and Zeidner, M. (eds), Handbook of Self-Regulation. San Diego, CA: Academic Press.

Wong Fillmore, L. (1976). The second time around. Doctoral dissertation, Stanford University.

Wood, D. (2002). Formulaic language in acquisition and production: Implications for teaching. TESL Canada Journal 20, 1: 1–15.

Woodrow, H. and Lowell, F. (1916). Children’s association frequency tables. Psychology Monographs 22, 5, No. 97.

Wray, A. (2000). Formulaic sequences in second language teaching: Principle and practice. Applied Linguistics 21, 4: 463–489.

Wray, A. (2002). Formulaic Language and the Lexicon. Cambridge: Cambridge University Press.

Wray, A. (2008). Formulaic Language: Pushing the Boundaries. Oxford: Oxford University Press.

Wray, A. and Bloomer, A. (2006). Projects in Linguistics: A Practical Guide to Researching Language. London: Hodder Arnold.

Wray, A. and Perkins, M.R. (2000). The functions of formulaic language: An inte-grated model. Language and Communication 20: 1–28.

Xiao, R. (2008). Well-known and influential corpora. In Lüdeling, A. and Kyto, M. (eds), Corpus Linguistics: An International Handbook. Berlin: Mouton de Gruyter. pp. 383–457.

Xue, G. and Nation, I.S.P. (1984). A university word list. Language Learning and Communication 3, 2: 215–229.

Yorio, C.A. (1980). Conventionalized language forms and the development of com-municative competence. TESOL Quarterly 14, 4: 433–442.

Yoshii, M. (2006). LI and L2 glosses: Their effects on incidental vocabulary learning. Language Learning and Technology 10, 3: 85–101.

Zahar, R., Cobb, T., and Spada, N. (2001). Acquiring vocabulary through reading: Effects of frequency and contextual richness. Canadian Modern Language Review 57, 3: 541–572.

Zareva, A. (2007). Structure of the second language mental lexicon: How does it compare to native speakers’ lexical organization? Second Language Research 23, 2: 123–153.

Zechmeister, E.B., Chronis, A.M., Cull, W.L., D’Anna, C.A. and Healy, N.A. (1995). Growth of a functionally important lexicon. Journal of Reading Behavior 27, 2: 201–212.

Zechmeister, E.B., D’Anna, C.A., Hall, J.W., Paus, C.H., and Smith, J.A. (1993). Metacognitive and other knowledge about the mental lexicon: Do we know how many words we know? Applied Linguistics 14, 2: 188–206.

Zeidner, M., Boekaerts, M., and Pintrich, P.R. (2000). Self-regulation: Directions and challenges for future research. In Boekaerts, M., Pintrich, P.R., and Zeidner, M. (eds), Handbook of Self-Regulation. San Diego, CA: Academic Press.

Zimmerman, C.B. (1997). Historical trends in second language vocabulary instruc-tion. In Coady, J. and Huckin, T. (eds), Second Language Vocabulary Acquisition. Cambridge: Cambridge University Press.



Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

385

Index

absolute results, 167academic vocabulary, 78–9Academic Word List (AWL), 79AKC Nonword Database, 164, 180attrition, 23

measuring, 256–9automaticity, 17, 106–7

measurement of, 242

British National Corpus, 13, 309–12

characteristics of lexical items, 48–9coefficient of variance, 245cognates, 73–4computer simulations of vocabulary,

97–105corpora, 307–35

American National Corpus, 315–36British Academic Spoken Corpus, 321British National Corpus, 309–12Brown Corpus, 316–18‘Brown family’ corpora, 319–20CANCODE, 322–3CHILDES Corpus, 325COBUILD Corpus (Bank of English),

318–19COLT Corpus, 325Corpus of Contemporary American

English, 312–14ICAME Corpus Compilation, 331–2International Corpus of English, 323International Corpus of Learner

English, 325Lancaster–Oslo/Bergen Corpus, 318London–Lund Corpus, 320MICASE / MICUSP, 321Non-English corpora, 327–31SUBTLEXUS, 319–20Time Corpus, 314–15VOICE Corpus, 322

corpus analysis, 12–15and frequency, 13–14and formulaic language, 123–32concordancers / tools, 335–43

content vs. function words 54–5counting units of vocabulary size,

188–93cross-association, 56cut-points, 187

definitions in vocabulary items, 174–7delayed posttests, 155–8depth of knowledge, 15–17

measurement of, 216–42

effects size, 166–7engagement, 26–8equivalence of tests, 177exposures, number necessary for

learning, 30–1extensive reading, 32

form, 24–5form-meaning link, 49–52formulaic language

acquisition of, 136–41amount in language, 9–10, 40, 117–18and fluency, 11–12identification, 120–32nonnative use of, 142processing of, 134–6project, 268psycholinguistic reality of, 141–2types of, 10–11, 119–20with open slots, 132–4

frequency, 13, 63–71

General Service List, 13, 75–6general vocabulary, 75–6glosses, 34Graph Theory, 254guessing from context

see lexical inferencing

Involvement Load Hypothesis, 27importance of vocabulary, 3–5incidental vocabulary learning, 29–31

project, 274

9781403_985354_11_ind.indd 3859781403_985354_11_ind.indd 385 4/13/2010 1:00:59 PM4/13/2010 1:00:59 PM


Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

386 Index

incremental nature of vocabulary learning, 19–22

internet, using as a corpus, 333–5interviews, in validating measurement,

182–3intrinsic difficulty of lexical items, 55–7

L1 influence, 25–6, 71–5learning strategies, 89–97Lexical Frequency Profile, 104lexical inferencing, 32–4Lextutor, 39, 71, 199–200, 209, 341–2lists of vocabulary,

description of, 345–7project, 265

Mason and Krashen critique, 169–72meaning, 52–5MRC Psycholinguistic Database, 163–4multiple measures of vocabulary, 152–5mutual information score, 130–1

network connections, 58–62

organization, measuring lexical, 247–56

participants, 150–2pre-existing knowledge, controlling for,

179–80productive vs. receptive mastery,

see receptive vs. productive masterypsycholinguistic approaches to

vocabulary research, 105–16

qualitative research, 149–50

receptive vs. productive mastery, 21, 36, 79–89

project, 275reliability, 183–7repetition of vocabulary in texts

project, 272

sampling from dictionaries, 193–6sample size, 164–5size of vocabulary

measurement of, 187–216of native English speakers, 6of non native English speakers, 9, 19project, 260requirements for using English, 7

spoken discourse and vocabulary, 38synonymy, 52

target items, selection of, 158–64task types

project, 271teaching pedagogy

project, 266technical vocabulary, 77–8tests and measures of vocabulary

BNC 20,000 Profile, 208CATSS Test, 84–7, 202, 216checklist tests, 199–202Coh-Metrix, 215–16collocation measures, 229–36Lexical Frequency Profile, 205–7Peabody Picture Vocabulary Test, 196P_Lex, 208–10Productive Vocabulary Levels Test,

203–5project, 263Schmitt and Zimmerman scale, 221–3Test of English Derivatives, 228–9Type-token measures, 212–15Vocabulary Levels Test, 197–8, 279–92Vocabulary Knowledge Scale, 218–21Vocabulary Size Test, 198–9, 293–306vocd (D_Tools), 213–14V_Links, 254–6V_Size, 210–12V_Quint, 255Word Associates Format, 226–8

theories of vocabulary acquisition, 36t-score, 126

validity, 181–3variable expressions, 132–5vocabulary and other language

proficiencies, 4–5vocabulary control movement, 13

websites, 347–51word associations, 18, 59–62, 248–56

word association norms, 252project, 262

word familiesnumber of members, 8

word knowledge framework, 16–17, 79–80, 211

Zipf’s Law, 64, 211

9781403_985354_11_ind.indd 3869781403_985354_11_ind.indd 386 4/13/2010 1:00:59 PM4/13/2010 1:00:59 PM


Co

pyr

igh

t m

ater

ial f

rom

ww

w.p

alg

rave

con

nec

t.co

m -

lice

nse

d t

o U

niv

ersi

ty o

f S

ou

th F

lori

da

- P

alg

rave

Co

nn

ect

- 20

11-0

5-25

Research and Practice in Applied...

Documents

Transcript of Research and Practice in Applied...