James Wilson University of Leeds [email protected].
-
Upload
kendall-welton -
Category
Documents
-
view
217 -
download
1
Transcript of James Wilson University of Leeds [email protected].
ReadingCorp: a corpus-based approach to teaching Russian for Research
James WilsonUniversity of Leeds
Part 1: “The Problem” (How do we teach ab-initio students to read authentic Russian texts in a year?)
Part 2: “A potential corpus-based solution”
The use of corpora and corpus tools to train ab-initio students to read authentic academic texts
ReadingCorp project Motivated by the demand for specialist PG language
training in Russian and the findings of previous research (Russian for Research 2008)
Structure of presentation
6-month project funded by the Centre for East European Language Based Area Studies (CEELBAS) and carried out at the University of Sheffield in 2008
The project aimed to:
◦ build up a profile of what PG language training was offered at CEELBAS institutions and to identify the methods of and problems in teaching languages for research;
◦ identify the demand for language training for research purposes at member departments and to establish what such language training should include;
◦ look at new modes of delivery such as distance- and computer-aided learning and the possibility of sharing of resources.
Russian for Research project
Departments of Russian and Slavonic Studies are attracting more PG students who do not know Russian and whose research is therefore restricted (the same situation is true of other languages)
Students are unable to read primary sources, use archives and work with some online packages without Russian
You simply can’t do Russian-related economic research without Russian”; “Without language skills research is much impaired”
Background information
There is a “massive” demand for PG language training across CEELBAS institutions
Potentially good researchers are being lost due to the lack of adequate PG language training
Conventional PG-focused intensive courses are effective but impractical at most institutions; they are not financially sustainable at any institution in the long term
Other methods (“piggy-backing”, non-intensive reading modules, following UG programmes) do not work
It is not possible to offer specialist tuition to the individual student or to cover all research areas
Texts are out-dated and/or more suited to some disciplines than others; their content is determined subjectively by linguists
A cost-effective way of delivering shared PG language programmes is necessary
Conclusions
Corpora are well suited to LSP learning and teaching for several reasons:◦ they can inform us of key items of vocabulary and grammar
points that require instruction in specific domains;◦ frequency data shape materials and syllabus design;◦ breadth of topics: a corpus can be created on any topic, no
matter how specialist, for which there is enough available material;
◦ needs of the individual: a corpus can be created from articles directly relevant to an individual student’s research topic;
◦ there is no printing/publication lag: corpora can be created on current events, yesterday’s news stories, etc.;
◦ they can be built within hours.
A corpus-based solution???
Corpora can be used directly or indirectly Corpora can be used in combination with
traditional teaching practices (blended learning)
Corpora have been used successfully for language for research projects in the past: German for Chemists (Butler) and on the Warwick course of Italian Language for PG students of Renaissance Studies
A corpus-based solution??? (2)
ReadingCorp
2-year project funded by the AHRC (Collaborative Language Skills Training project)
Run at the Department of Russian and Slavonic Studies (Sheffield), GRASS and CTS (Leeds)
Combines knowledge and practice of PG language teaching methods (Sheffield / Leeds) with technological expertise in creating corpus tools for language learning purposes (Leeds)
Project description
To explore possibilities for using corpora to achieve reading competence in Russian
To create tools, reference materials (keyword lists, annotated readers, a grammar for researchers) and exercises to support the acquisition of vocabulary from specific and varied domains
To actively engage students in “vocabulary identification” exercises
Aims
It may seem “ridiculous” to suggest that a complete beginner with no formal training in linguistics or experience in learning a foreign language can learn Russian in a year
We focus solely on reading skills Our aim is for students to read authentic texts
with the help of dictionaries and our tools and materials - we do not expect them to pick up a text and read it as someone with years of training would
Why within a year?
Putting our goals into perspective
Corpus◦ The Russian Academic Corpus (RAC)
Technology (additions to the IntelliText Interface)◦ Keyword list generator (single- and multi-words; POS-specific)◦ Grammar frequency◦ Advanced options for navigating texts◦ Vocabulary highlights (general academic, discipline-specific
keywords)◦ Automatic grammar classification
Pedagogy◦ Readers from 13 academic disciplines◦ “Cleaned” keyword lists from 13 academic disciplines◦ Transferable teaching materials◦ A PG-focused grammar
Corpora, tools and materials
Contains approximately 5 million words Used for compiling frequency lists and in teaching Made up of 13 sub-corpora (art, criminology,
culture, ecology, economics, geography, history, international relations, linguistics, medicine, politics, religion, sociology)
The sub-corpora are roughly equal in size and each contains 50 texts
The “main” corpus is freely available via the IntelliText Interface
Individual sub-corpora are available on demand
The Russian Academic Corpus (RAC)
“General academic” and “discipline-specific” keywords were extracted
Single words (discipline-specific) and multi-words (general academic and discipline-specific)
“cleaned”: anomalies removed; lemmas changed to original form (то не менее > тем не менее, по отношение к > по отношению к)
100 keywords for each subject area Translations (all lists) and collocations (single
words)
Keyword lists
Phrase Translation
вместе с тем moreover; that said
тем не менее nevertheless
в зависимости от depending on
состоит в том is
заключается в том is
в это время at the (this / that) time
по отношению к with regard to
список используемой литературы bibliography
может привести к may lead to
один из важных an important
включает в себя includes
Academic phrases (three-word keywords)
Keyword Translation
вода water
загрязнение polution
отходы waste
вещество substance
атмосфера atmosphere
энергия energy
воздух air
почва soil
среда environment
газ gas
Top 10 one-word keywords from the “Ecology” sub-corpus
Keyword Translation Key collocations
вода water
сточные воды "waste water"; пресная вода "fresh water"; морская вода "sea water" грунтовые воды "ground waters"; качество воды "water quality"
отходы waste
бытовые отходы "domestic waste"; промышленные отходы "industrial waste"; твёрдые отходы "solid waste"; переработка отходов "waste processing"; размещение отходов "waste disposal"
Keywords and their collocates
Lexical bundle Translation
рынок труда labour market
национальная экономика national economy
оплата труда remuneration of labour
на рынке on the market
спрос на demand for
социальная политика social policy
рабочая сила work force
цена на price of
предпринимательский риск entrepreneurial riskпредпринимательская деятельность entrepreneurship
Two-word keywords from the “Economics” sub-corpus
10 readers from each of the 13 sub-corpora Each text contains approximately 200 words The readers may be used to train general
academic vocabulary or discipline-specific vocabulary
Manually annotated Freely available
Readers
Криминогенность личности представляет собой качественной выражение соотношения негативной и позитивной направленности личности. А преступление является объективным, реальным показателем криминогенности личности. Криминогенность можно рассматривать с двух позиций. Исходя из первой, «криминогенность рождается и умирает вместе с преступлением». Однако криминогенность можно рассматривать не только как результат, но и как процесс ее становления. Таким образом, можно выделить три стадии генезиса криминогенности личности преступника: Формирование криминогенности личности, которая в этот период совершает аморальные поступки и правонарушения неуголовного характера.
Sample reader
Focus on “receptive” not “productive” language skills
Grammar identification: our aim is for users to identify and understand the use of grammatical features, with our notes and tools, not to be able to construct them
Grammar forms were selected on the basis of their frequency in academic texts: participles, gerunds and passive constructions were introduced early; some points of grammar commonly covered in the first year of UG programmes were not included.
Grammar
The following information is included for each point of grammar:◦ an English-language commentary of how and for what
purpose it is used;◦ information on what the form looks like (identification);◦ lists of other points of grammar that have the same
form and notes on how to tell them apart (disambiguation);
◦ an annotated list of common words within the category;
◦ corpus examples and translations.
Grammar 2
Use: -ing forms: judging by his comments, I’d say that ...
Looks like: принимая ,судя, опираясь Common exceptions: будучи Can be confused with: soft feminine nouns (Nom.
Sing.) = неделя, hard feminine adjectives (Nom. Sing.) = интересная; soft masculine nouns (Gen. Sing.) = трамвая
Disambiguation: gerunds are very unlikely to be directly preceded by words ending in –ая or –ого; words ending in –a rarely follow gerunds (BUT принимая лекарства)
Example (imperfective gerunds)
Gerund Translation Notes
говоря speaking, talking
о "about" + Prep.; не говоря уже о "not to mention"; по-иному / иначе говоря "put another way, in other words"; строго говоря "strictly speaking"
исходя
on the basis of, on the strength of, based on the assumption that
из "from" + Gen.; исходя из этого "on this basis"; исходя из того, что "on the basis of" (+ verb)
начиная starting с "from" + Gen.
будучи being Instr.
учитывая considering Acc.
имея having в виду
считая considering что "that"; Acc.
опираясь based, drawing; relying на "on" + Acc.
рассматривая viewing, considering Acc.
стремясь trying, in an attempt to with verb infinitives; к + Dat.
Common forms (imperfective gerunds)
For texts that are available online or that have been digitised
The ReadingCorp tools allow users to annotate their texts according to vocabulary and grammar
Vocabulary highlights work for any text uploaded to the system, as the list of academic words is stable and our tools automatically classify texts and corpora according to keywords
Automatic grammar classification helps users identify or disambiguate parts of speech
Demo with “Space” corpus
Reading texts with our tools
Automatic grammar classification
Initial corpus training (either one session over an afternoon or two shorter sessions)
Introduction to the Cyrillic alphabet (if necessary)
1 class a week focusing on (1) guided reading and (2) hands-on vocabulary building exercises
Exercises are based around keywords
Teaching methodology and materials
verb рынок noun
adj. рынок verb
noun рынок verb
prep. рынок noun
Sample materials 1
verb регулировать
рынок труда noun
adj. внутренний рынок характеризуется
verb
noun сегментация рынок является verb
prep. на рынок сбыта noun
Adjective спрос
Verb Adjective спрос
Verb Adjective спрос Preposition
Verb Adjective спрос Preposition Noun
Verb спрос Preposition
Verb спрос Preposition (Adjective) Noun
Sample materials 2
Combination Lexical bundle Translation
Adj. + Search Word (SW)
совокупный спрос aggregate demand
Verb + Adj. + SW + Noun
отражать платежеспособный спрос населения
to reflect the population’s purchasing power
Verb + Adj. + SW пользоваться большим спросом
to be in high demand
Verb + SW + Noun удовлетворить спрос покупателей
to meet customers’ demands
Noun + SW + Prep. увеличение спроса на
rise in demand for
SW + Verb спрос падает demand is decreasing
Results
Tutors working with students whose research is in an area other than those covered by ReadingCorp may:◦ use our interface to create keyword lists and analyse
texts◦ use the readers for general reading practice◦ access the RAC◦ use the grammar ◦ use the keyword lists from the RAC
They will need to:◦ create keywords lists for the subject by building a small
corpus◦ add their own examples to the material templates
“Transferability” of resources
Is/Does a corpus-based approach:◦ suitable for distance learning? ◦ cover contemporary research topics? ◦ cost-effective and sustainable? ◦ transferable to other languages and domains? ◦ cater for the needs of the individual student? ◦ help structure syllabi? ◦ allow ab-initio students to acquire the
necessary reading skills to be able to effectively carry out their research?
How does a corpus approach address the CEELBAS issues?
Corpora go beyond the traditional course book and offer exciting possibilities for LSP learning and teaching
A corpus-based approach is particularly well-suited to training reading competence in specific domains◦ It makes the goal of reading and understanding authentic academic texts
in Russian within a year a realistic objective
BUT will advances in machine translation and optical character recognition make specialised reading courses redundant? As machine translation becomes more reliable, as more material is digitised and made available online and as OCR technology becomes more accurate, will students need anything other than a scanner and Google Translate?
Conclusion