Mc Call Presentation Lancaster.07

36
Introducing Mini- McCALL: A pilot version of the Mid- Sweden Corpus of Computer- Assisted Language Learning Mats Deutschmann, Gregory Garretson, Annelie Ädel, Terry Walker Mid Sweden University

description

McCall Presentation ICAME 2009: Deutschmann, Ädel, Garretson and Walker

Transcript of Mc Call Presentation Lancaster.07

Page 1: Mc Call Presentation Lancaster.07

Introducing Mini-McCALL:A pilot version of the Mid-Sweden

Corpus of Computer-Assisted Language Learning

Mats Deutschmann, Gregory Garretson,Annelie Ädel, Terry Walker

Mid Sweden University

Page 2: Mc Call Presentation Lancaster.07

2

Overview

Part 1: A corpus of online student learningPart 2: Building the corpusPart 3: Studies underway on the corpusPart 4: Conclusion and outlook

Page 3: Mc Call Presentation Lancaster.07

Part 1 A corpus of online student

learning

Page 4: Mc Call Presentation Lancaster.07

4

Background: Collaborative e-learning

• CMC: Communication with computers Communication with others via computers (Kern & Warschauer 2000)

• CSCL (computer-supported collaborative learning; Salmon 2004)

• Used at Mid-Sweden University since 2004• Learning Management System = WebCT• Interactive methods

• peer-reviewing• group discussions • group problem-solving• group productions• self-reflection

Page 5: Mc Call Presentation Lancaster.07

5

The call for Mini-McCALL

• CMC research growing• But… relatively little linguistic research• Language is the ‘oil of the collaborative machinery’• Investigation needs:

• the role played by language in online education • the efficiency of online communication • language and learning processes

• With a few exceptions (see for example the LETEC corpus -Chanier, Thierry and Lamy), there are hardly any CMC corpora based on learner data.

Page 6: Mc Call Presentation Lancaster.07

A typical assignment model in the course

Individual solutions deposited in

discussion grouproom forums

(student exercises)

Participant 4

Participant 3

Participant 2

Participant 1

Individual reflection (student memos)

For example:

Decide on article use (definite/indefinite/ zero article) and motivate. The/a/zero_ Italian food is healthy.

General discussion

boards

E-mail correspondence

DiscussionExercise Reflection

Students reflect over how discussions changed their original answers, the group process etc

Page 7: Mc Call Presentation Lancaster.07

7

Negotiating meaning in McCALL Message HT04N.D.118

Time: Thursday, September 16, 2004 20:03Author: Tina KockSubject: Re: Discussion 2 - Jonna ÖstebergHello Jonna!

I have an opinion about 1d: "I'll have done it by tonight."You wrote that will + base form is used here, but "have done" is not the base form of the verb. That's why I think the tense is Future Perfect. The sentence is about an activity that will be completed before a specific time.Am I right?

/Tina

Message HT04N.D.147Time: Saturday, September 18, 2004 10:30Author: Melinda JensenSubject: Re: Discussion 2 - Jonna ÖstebergOK.... I think you're right! (I wonder why I thought it was will + base form?), now when you have explained it to me everything seems so clear.... :-)

byeMelinda

Page 8: Mc Call Presentation Lancaster.07

Part 2 Building the corpus

Page 9: Mc Call Presentation Lancaster.07

9

Overview of Mini-McCALL

• 1-million-word CMC corpus• From online undergraduate English course• Written (students-students/teachers)• 4 types of text • Metadata: sociolinguistic, pedagogical, and textual

information

Page 10: Mc Call Presentation Lancaster.07

10

Who’s in the corpus

• 235 students• First-term undergraduates• Age ranges from 18 to 57• 87% L1 Swedish speakers• 79% female, 21% male

• 3 teachers• Highly proficient L2 speakers of English• 2 male, 1 female• 1 teacher does the bulk of the teaching

Page 11: Mc Call Presentation Lancaster.07

11

Age of the students

Page 12: Mc Call Presentation Lancaster.07

12

Composition of the corpusby role and sex

Note: duplicate e-mails excluded from counts

Page 13: Mc Call Presentation Lancaster.07

13

What’s in the corpus

• 1,008,000 words of student and teacher writing• Plus 470,000 words of repeated e-mails (marked as such)

• 8 sections of a full-time 5-week course (Grammar A)• 2 per term from autumn 2004 to spring 2006

• 4 types of text• E-mail messages• Discussion forum messages• Student exercises• Student memos (reflections)

• We have marked up:• repeated e-mails with attribute: <email duplicate=“yes”>• assignment text with tags: <exer></exer>

Page 14: Mc Call Presentation Lancaster.07

14

Composition of the corpusby text type

Note: duplicate e-mails excluded from counts

Page 15: Mc Call Presentation Lancaster.07

15

3 kinds of metadata

• Participant information• Group membership, sex, age, language background,

activity level, role, final grade (student), teaching experience (teacher)

• Pedagogical information• Course descriptions, task descriptions, course evaluations

• Textual information• Type of text, date of creation, sequencing information

Page 16: Mc Call Presentation Lancaster.07

16

Anonymisation of the data

• Consent from all participants, provided real names withheld• Includes first names as well as last names (unlike BNC)

• Didn’t want to use placeholders like <NAME> or <F21>• Corpus is particularly useful for studies of discourse, group

collaboration, ultimate student achievement, etc.• Need to distinguish and track individuals

1c. Jonna and Josephine found that "life" is uncountable and that it therefore shouldn't have an article, whilst I and Regina answered that it was a general statement and that it didn't required an article because of that. I agree with the uncountable-statement. Josephine also took up that the word "surprises" is an abstract noun and therefore shouldn't have an article.

Page 17: Mc Call Presentation Lancaster.07

17

Anonymisation of the data

• Solution: systematic anonymisation of first and last names using Swedish census data• Sven Enqvist Jörgen Stenström• Hedda Friberg Tindra Skoog• Student IDs are also anonymized.

Page 18: Mc Call Presentation Lancaster.07

18

(Dis-)advantages of anonymisation process

• Advantages:• Anonymization is almost totally transparent.• Text remains highly legible and just as easy to process.• Individuals still trackable with both name and ID.• Gender stays the same.

• Disadvantages:• Takes a lot of work (computational and manual).• Easy to forget it’s anonymized.• Character of names may change (e.g., apparent

nationality)• Nigel Jones Göran Lindgren• But N.B.: Metadata on individuals is available.

• The corpus is rife with misspellings.• Nicknames and initials are tricky.

Page 19: Mc Call Presentation Lancaster.07

19

3 versions of Mini-McCALL

• XML version• best for processing on your own computer

• HTML version• best for reading the files

• CQPWeb online searchable version• best for online searching

See handout for the URLs!

Page 20: Mc Call Presentation Lancaster.07

Part 3 Studies underway on the

corpus

Page 21: Mc Call Presentation Lancaster.07

21

Research based on Mini-McCALL

• Initial exploration of three lines of research:• Adaptation of language to new technologies

• CMC address phrases in discussion boards (Walker & Anglemark, WIP)

• Cultural comparison of Italian and Swedish CMC discourse (Deutschmann & Helm, WIP)

• Communicative strategies in a collaborative learning environment

• Rapport-building in discussion boards (Ädel, WIP)• “But” and hedging in written peer-review dialogue (Popaditch,

WIP)

• Pedagogical aspects of teacher-student communication• Effect of teacher input—formal versus informal style—on

student activity (Deutschmann & Lundmark 2008)

Page 22: Mc Call Presentation Lancaster.07

22

CMC address phrases in discussion boards

• Walker & Anglemark, WIP• A CMC address phrase is defined as a salutation (e.g.

Dear Birgitta) or a vocative (e.g. I agree with you Natanael)

• The study focuses on the use of CMC address phrases in discussion board messages - compared with that of e-mail and chatroom data from Walker and Anglemark’s Corpus of CMC

• Hypotheses: 1. use of CMC address phrases in discussion boards will

resemble e-mail (as both types of communication are asynchronous) but also resemble chatroom communication (as both have messages which may be read by the whole group)

2. use of CMC address phrases in discussion boards will be affected by participant’s role and native language, and teacher input

Page 23: Mc Call Presentation Lancaster.07

23

CMC address phrases

• Results so far:• discussion boards and e-mail favour the same structure of

CMC address phrase i.e. first names (e.g. Dear Birgitta)• discussion boards and e-mail favour the same function of

CMC address phrase i.e. greetings (e.g. Hello fellow students)

• discussion boards and chatrooms favour the same position of CMC address phrase i.e. final position (e.g. Have a nice weekend everybody!)

• a group of students who are native speakers of English, and the teacher, are responsible for there being a larger number of CMC address phrases in one class compared with the other

Page 24: Mc Call Presentation Lancaster.07

24

Rapport-building in discussion boards

• Ädel, WIP• Most research into classroom discourse is based on

teacher talk or teacher-student talk, but Mini-McCALL offers an opportunity to examine student-student communication.

• Research question: What linguistic strategies for social interaction are used• in a collaborative learning environment • which is written and asynchronous (discussion boards)?

• Model for ‘social interaction’• Partly based on Tannen (1990:77) and Spencer-Oatey

(2000:3), the present study defines rapport-building as communicative acts promoting social concord.

Page 25: Mc Call Presentation Lancaster.07

25

Rapport-building

• Qualitative aim of study: to create a taxonomy of rapport-building linguistic units based on naturally-occurring data.• This taxonomy will be used in quantitative comparisons of

rapport-building • across different populations and across different genres

• Starting point: frequency word list with a cut-off point of 100• Produced almost 700 types for analysis

• Concordancing used as an aid in finding which of the most frequent expressions function as rapport-building

• The expressions which these high-frequency words are part of were categorised and then fed into the taxonomy.• See handout for the preliminary taxonomy

Page 26: Mc Call Presentation Lancaster.07

26

Preliminary taxonomy of rapport

Type of unit Function ExampleMessage-structuring units

Greeting Hi there

Closing Have a good week-end; Hugs Yasmin!!!

Intertextual units

Referring toin-group discourse

So yes, just just like someone else mentioned...;I still think it should be "She smells" like Klara writes.

Face-saving units

Apologising Here are my answers. A little late â?? sorry for that!.

Mitigating criticism I just have some small comments to your answers, hopfully it might be useful:)

Bonding units

Aligning with in-group

I also found myself unsure on 2L. In the end, I settled for…; It seems like our group is a little bit small but I think we will manage anyway.

Agreeing Looks like I agree with you on most answers.

Commiserating …and just like you I would appreciate a key for the old exam.

Complimenting Very good indeed! I have no comments to add really. =)

Soliciting feedback am i the only one who found this really hard??;This is the way I try to think of it...Does it make sense?

Offering encouragement

Great Job!!!!!!; GL with everything !; We can only do our best and try.

Thanking Thank you!

Responding to thanks

no problem, glad to be at your service :)

Page 27: Mc Call Presentation Lancaster.07

Part 4 Conclusion and outlook

Page 28: Mc Call Presentation Lancaster.07

28

The more, the merrier

• Mini-McCALL will be one of the first of many corpora of online learning to come.• Needed: more corpus resources that enable empirical and

systematic studies of both linguistic and pedagogical aspects of online learning environments

• To promote such research, Mini-McCALL will be made freely available to the research community.

• See handout for contact information.

• Mini-McCALL is the 1st stage of the proposed Mid-Sweden Corpus of Computer-Assisted Language Learning (McCALL)

Page 29: Mc Call Presentation Lancaster.07

29

Future plans: the real McCALL

• A comprehensive four-year snapshot of the various types of communication that take place in an online learning environment

• ALL the online English courses from four years, including courses in language, cultural studies, literature, and linguistics • Over 100 courses involving 16 teachers and over 900

students • Will also include spoken data, from both teachers and

students, at various levels, and of various genres• Large amount of material: more than 10 million words

Page 30: Mc Call Presentation Lancaster.07

30

References

• Hot-off-the-press paper with more information:Deutschmann, M., A. Ädel, G. Garretson & T. Walker. 2009. Introducing Mini-McCALL: A pilot version of the Mid-Sweden Corpus of Computer-Assisted Language Learning. ICAME Journal 33:21-44.

• Further references on handout• http://cqpweb.lancs.ac.uk/mccall/

Page 31: Mc Call Presentation Lancaster.07

31

Thanks very much!

Page 32: Mc Call Presentation Lancaster.07

32

Extra slides

Page 33: Mc Call Presentation Lancaster.07

33

Discussion forum messages Message HT04N.D.118

Time: Thursday, September 16, 2004 20:03Author: Tina KockSubject: Re: Discussion 2 - Jonna ÖstebergHello Jonna!

I have an opinion about 1d: "I'll have done it by tonight."You wrote that will + base form is used here, but "have done" is not the base form of the

verb. That's why I think the tense is Future Perfect. The sentence is about an activity that will be completed before a specific time.Am I right?

/Tina

Message HT04N.D.147Time: Saturday, September 18, 2004 10:30Author: Melinda JensenSubject: Re: Discussion 2 - Jonna ÖstebergOK.... I think you're right! (I wonder why I thought it was will + base form?), now when you have explained it to me everything seems so clear.... :-)

byeMelinda

Page 34: Mc Call Presentation Lancaster.07

34

E-mail messages (1)

Message HT04N.E.3Time: Tuesday, September 7, 2004 19:06Author: Lilja SjögrenRecipient: George SederstedtSubject: Assignments!

Hello George!

I have been in and out of the discussion room a couple of times and it seems that noone is there to discuss the assignment that is due on friday. I need some tips how to start the discussion. I have never done this before. And what do I do if noone is discussing before friday?

Lilja Sjögren

Page 35: Mc Call Presentation Lancaster.07

35

E-mail messages (2)

Message HT04N.E.5Time: Wednesday, September 8, 2004 11:50Author: George SederstedtRecipient: Lilja SjögrenSubject: Re: Assignments!

Dear Lilja,

For the discussion assignments you actually make your comments in the discussion forum, which is not a simultanous chat. You simply mail your contribution and wait for someone to respond. The mails can be read by all. In the chat you can talk more informally. Get back to me if this is still unclear.

George

Page 36: Mc Call Presentation Lancaster.07

36

AttachmentsDocument ID: VT06S.D.16.1Attached to message: VT06S.D.16 Time: Tuesday, January 31, 2006 15:17 Author: Lina Holmström Subject: Determiners and pronouns Discussion 1: Determiners and Pronouns 1. Discuss the use or absence of the article in the following sentences:

a. I met an interesting chap at a party last night. Comment: The speaker uses indefinite articles to "an interesting chap" and "a party last night". This can mean that the referents (chap and party) are not known to the hearer. b. Why are you still in bed? You should be at school. Comment: We can say "the bed" when we mean a particular piece of furniture. Otherwise it is not combined with the definite article "the". We say at school when the hearer goes there as a pupil. c . Life is full of surprises. Comment: Life (U) in this case required a zero article because here (c) it is a generic reference. All (c) is widely generic.