The Armchair and the Machine
Corpus-Assisted Discourse Studies
Alan Partington Lorient 14/09/07
Corpus-Assisted Discourse Studies (CADS)
• What does CADS do?
• Examples (politics & media) &
• Types of research questions / methodologies
• Teaching material?
“two types of linguist”
the Armchair linguist …
“sits in a deep soft comfortable armchair, with his eyes closed and his hands clasped behind his head.
Once in a while he opens his eyes, sits up abruptly shouting, “Wow, what a neat fact!”, grabs his pencil, and writes something down.
Then he paces around for a few hours in the excitement of having come still closer to knowing what language is really like.”
Introspection
“two types of linguist”
the Corpus linguist …
“has all the primary facts that he needs, in the form of approximately one zillion running words, and he sees his job as that of deriving secondary facts from his primary facts.
At the moment he is busy determining the relative frequencies of the eleven parts of speech as the first word of a sentence”
Data observation
“two types of linguist”
however
“These two don’t speak to each other very often,
but when they do the corpus linguist says to the armchair linguist, ‘Why should I think that what you tell me is true?’,
and the armchair linguist says to the corpus linguist, ‘Why should I think that what you tell me is interesting?’”
(Fillmore)
Four stages of science
• respect for authority (generally Scripture and Aristotle)
• rationalist introspection (Descartes: cogito ergo sum - I introspect therefore I am)
• “observationism” and distrust of theory (Bacon: ‘The intellect, left to itself, ought always to be suspected’)
• the mutually reinforcing hermeneutic interaction of theory
and observation
Four stages of science
• respect for authority (generally Scripture and Aristotle)
• rationalist introspection (Descartes: cogito ergo sum - I introspect therefore I am)
• “observationism” and distrust of theory (Bacon: ‘The intellect, left to itself, ought always to be suspected’)
• the mutually reinforcing hermeneutic interaction of theory
and observation
Psycho- & Socio-
…corpus linguists have so far contributed little to answering classic questions of cognitive and social theory; they have hardly considered the relevance of corpus evidence to questions about the mental lexicon and the construction of the social world (though one of Halliday’s central topics)
(Stubbs 2006: 15)
Data observation
Intuition & contemplation
Speculation
Stubbs 2006:
…could be related …may be reducible… may also be internally related … seems to show … might also provide … show how we could do real ‘ordinary language philosophy’ …
Interdependence: technology & theoryof machine and mind
New instruments
lead to
New ways of observing
lead to
New ways of thinking
New instruments = grinding of lenses(Galileo, Spinoza)
lead to
New ways of observing = astronomy
lead to
New ways of thinking = model of universe
New instruments = radio trasmitter, receiver
lead to
New ways of observing = radio-telescopy
lead to
New ways of thinking = theory of creation
New instruments = corpora
lead to
New ways of observing = inductive data-driven
lead to
New ways of thinking = lexical grammar
What do CADS do?
Investigate (and compare) discourse types (DTs):
‘Non-obvious’ meanings
to “not get caught in using corpora just to tell you more about what you know already”
(Sinclair 2004: 183)
It combines
Corpus Linguistics
Data crunching: Statistical OVERVIEW (very quickly)
“Quantitative” approach (“general” language dictionaries, grammars)
Discourse analysis
DETAILED analysis, even single texts
“Qualitative” approach
“Traditional” Corpus Linguistics
vs
CADS
Traditional Corpus Linguistics:
• Very large ‘general’ – heterogeneric - corpora: BNC, BoE
CADS:
• Compile your own ‘specialized’ corpus/corpora
• Comparison: Particular features of a discourse type, DT(a)?
Compare DT(a) – DT(b) – DT(n)
Compare DT(a) – BNC / BoE
Traditional CL:
Corpus: “Black box” – Keep out!
CADS: Make friends with our corpus
Detailed knowledge of DT:
• Frequency Information > Concordancing
• Reading / watching / listening to corpus-held DT tokens
• Intuitions• • “External” data (esp in political – media): interviews with
protagonists; official documents;
Beginnings
Hardt-Mautner (1995)Stubbs (1996; 2001)Teubert, Mahlberg
ITALY:
Newspool: Partington, Morley & Haarman (eds) 2004CorDis: Morley & Bayley (eds) forthcoming
Intune
FRANCE
“I’ve been doing CADS for years and never knew it”
(Geoffrey Williams, Siena 2006)
What’s been done?
What’s been done?
Berlusconi’s election speeches (Garzone & Santulli 2004)
Word lists (WordSmith):
Italia; stato; libertà
Concordanced
What’s been done?
Lo stato when it is run by the Left:
autoritario, burocratico, invasivo, moloch, padrone, stato-partito (authoritarian, bureaucratic, invasive, moloch, bossy, a party-state)
What’s been done?
Lo stato when treated to the Forza Italia cure becomes:
amico, civile, di diritto, liberale, moderno (friend, civilised, lawful, liberal, modern)
What’s been done?
Libertà is the third most frequent noun;
but it is rarely attached to an individual in the co-text. Whose liberty?
Research question type 1
How does P achieve G with language?
What does this tell us about P?
Comparative: how do P1 and P2 differ?
September 11th
September 11th
C2001• Sept 11-18 2001
• 150,000 words
Times - Independent -
Telegraph- Guardian
C2002• Sept 11-18 2002
• 150,000 words
Times - Independent -
Telegraph- Guardian
WordSmith Keywords
September 11th
world (468 - 136):
• an attack on the whole civilised world• convinced the world is its enemy• the world will never be the same
global dimension, attack on the international community, not just USA
September 11th
war (351 - 60)
• a totally new kind of war, acts of war, the first war of the 21st century, (or simply) this war
Reaction must be: declare war on terrorism, launch an international war
September 11th
September 11th
enemy (106 - 20)
• ghostlike global enemy, shadowy enemy, not a clearly defined enemy, absence of a tangible enemy
Collocates: semantic preference for the unknown
September 11th
in- and –un words:
inconceivability:• what was once thought inconceivable
• an unimaginable tragedy
• the unthinkable has happened
inexpressibility:• unspeakable horror of today’s inhuman terrorist attacks,
unspeakable sadness
• untold hundreds ... of dead and injured
September 11th
• incalculable, unfathomable
• incredible, incredulity
• unbearable, intolerable
• “…surpassing the collective ability to understand and feel” (Blair)
TYPICAL CADS METHODOLOGY
• Step 1: Design, unearth, stumble upon research question
• Step 2: Choose, edit or compile an appropriate corpus
• Step 3: Choose, edit or compile an appropriate reference corpus / corpora
TYPICAL CADS METHODOLOGY
• Step 4: Run a Keywords comparison of the corpora
• Step 5: Determine the existence of sets of key items (by eye and brain)
• Step 6: Concordance interesting key items (varying quantities of co-text: sentence, ‘chunk’)
Top Related