The Corpus In The Classroom
-
Upload
colin-graham -
Category
Technology
-
view
3.270 -
download
2
description
Transcript of The Corpus In The Classroom
The Corpus in the Classroom
Colin Graham
THT Seminar – Manila 2008
What is a corpus?
• A corpus is basically a body of knowledge or information of some kind. In linguistics, it usually means a collection of texts which are taken to represent some aspect of language - for example, fictional writing, radio broadcasts, editorials, etc. By carrying out research on the corpus (or corpora) the researcher hopes to make generalizations about an aspect of language as a whole.
Why is a corpus useful?
• Many of the ideas we have about our own language, which are based on linguistic intuitions, are not correct.
* “Language users cannot accurately report language usage, even their own” [Sinclair, J. (1987) Introduction, in the Collins Cobuild English Language Dictionary, London: Collins]
* “There are many facts about language that cannot be discovered by just thinking about it, or even reading and listening very intently” [Sinclair, J. (1995) Introduction, in the Collins Cobuild English Dictionary, London: HarperCollins]
* “Using a language is a skill that most people are not conscious of; they cannot examine it in detail, but simply use it to communicate” [Sinclair, J. (1995) Introduction, in the Collins Cobuild English Dictionary, London: HarperCollins]
Why is a corpus useful?
• People used to think the Earth was flat and that it was the centre of the Solar System. Galileo’s discovery of the moons around Jupiter, by using better technology (a telescope), forced astronomers and other scientists to think again about their theories and assumptions. We can think of a corpus as being like a telescope which provides a more clearly focused view of the language we are investigating.
What do my intuitions tell me?
• The meanings of words in isolation
• Whether or not a sentence in isolation is well formed
What do my intuitions tell me?
• The meanings of words in isolation
• Whether or not a sentence in isolation is well formed
But…!
How can I test my intuitions?
• How do you say ‘come’ in Tagalog?
• Bad Company! [You can tell a word by the company it keeps – John Firth]
Come mai mangia come un coniglio ogni giorno?
Como vindo come como um coelho cada dia?
How come she eats like a rabbit every day?
• Now, how do you say ‘come’ in Tagalog!
The corpus as a bridge
• Long-term exposure• Many thousands of
examples• Wide variety of
contexts and usage• Self-generated rules,
patterns and meanings
• Inductive
• Short-term exposure• Perhaps only
hundreds of examples• Limited variety of
context and usage• Teacher-provided
grammar rules and dictionary definitions
• Deductive
The corpus as a bridge
• Long-term exposure • Short-term exposure
Corpora may reduce the lack of exposure to sufficiently varied examples by provided a variety of examples in a concentrated form.
They often offer more motivating, interesting or exciting approaches to teaching and learning foreign languages.
Starting a language investigation
• What do you see as being the core or primary meaning(s) of the words see and keep?
• How are these words used in English?• These words have a high frequency of use,
mostly because they have a dependent status based on a phrasal role, rather than being used in their core or primary forms
I’ll keep it in mind.I see.
You’ll have to keep an eye on her!...
The investigation continues
• What do you see as the main facts about the meaning and use of the word listen?
• Corpus research makes us consider pragmatics, in that listen is used, amongst other things, to gain the floor in conversations or discussions
Another step in the process…
• When do you use the word which and when that in a sentence?
• British and American usage is different, and usage questions like this are more about personal idiolect (localized variants of a language form)
Making a corpus – considerations
• What do you want to know? • What information do you need access to in your
corpus? • How much text do you need to have a
representative sample on which to make confident generalizations?
• Do I need to work with the data or can I present it as-is to students?
• Are there copyright restrictions? • Do I need sophisticated software to do what I
want?
Making a corpus – Can I build it?
Corpus design is an art in itself. However, you can build useful corpora in the classroom. Basically you need a collection of writings or transcriptions as simple text files and some concordancing software as a minimum tool for analysis.
However, you need to consider copyright and the type of language investigation you want to carry out.
[Wichmann, Fligelstone, McEnery and Knowles. Teaching and Language Corpora. Longman 1997.] is a good starting point for most of the further questions you may have.
Using a corpus for materials
• Use the online corpora for grammar investigations (links on final slide)
• Extract examples of real use from online corpora and build them into Tim Johns programs (link on final slide)
• Get the students to do pre-designed investigations • Get students to write sentences using selected
vocabulary and then check online to see if there are similar examples (can highlight usage problems)
• Use printouts from concordance packages to give students examples of areas where they make consistent or 'fixable' errors (prepositions, for example)
• Use a point from a grammar reference and check it out online or using your classroom corpus.
Useful Resources• HarperCollins:
http://www.collins.co.uk/Corpus/CorpusSearch.aspx [Collins - English only; search written, spoken, British, American separately; KWIC format; max 40 examples per search; can specify wordclass, and a few other features; collocation lists also available; 56 Million Words]
• Oxford-BNC (British National Corpus): http://sara.natcorp.ox.ac.uk/lookup.html [British English only; from c 1994; sentence-length examples only, not KWIC format; max 50 examples per search; 100 Million Words]
• Brigham Young University-BNC: http://corpus.byu.edu/bnc/x.asp [a better place to access BNC; KWIC format concordances, etc]
• Corpus of American English: http://www.americancorpus.org/
• David Lee’s Corpora Bookmarks: http://devoted.to/corpora
• Tim Johns’ website has many exercises and useful links, including his “CONTEXTS” program: http://www.eisu2.bham.ac.uk/johnstf/index.html