Personalised statistical writing analysis
-
Upload
john6938 -
Category
Technology
-
view
423 -
download
1
description
Transcript of Personalised statistical writing analysis
John Blake Japan Advanced Institute of Science and Technology
Personalised statistical writing analysis
Overview• Introduction
– context, impetus – focus, process
• Five aspects – statistical analysis
• Personalised writing analysis – sample extracts
• Interview survey• Future direction
2
Context* Proofreading for faculty* Writing assistance for PhD candidates
3
70% 50% science
Impetus
21 email exchange on various points, including:• “minor scary incident” で統一したいと思います。• “near miss”“ ではなく” minor scary incident” で統一したい
と思います。• 提出先に聞きました。 near accident というのが一般的な
ようです。これで修正しました。• “near-miss incident” に変更しました。 … . 先生から指示
に従うように提案されました。• Near miss incident → Near miss incidents に全て修正しま
した。4
From one research article (RA)minor scary incident near-miss incident ヒヤリ・ハット
FocusEnable research articles meet generic expectations of:• Accuracy by being factually correct• Clarity by avoiding ambiguity• Formality by adopting appropriate style
5
rhetorical structure, logic, originality, flawed method, etc.= important, but…
Five aspects of generic integrity
1. Vocabulary fit2. Readability3. Word type balance4. Style and usage 5. Lexicogrammatical
errorsSummary statistics
6Bhatia, V. K. (1993). Analysing genre: Language use in professional settings. London: Longman.
Process for each research article• Create target corpus (TC)
• Analyse RA and TC
• Identify errors in RA• Compile ratios where
poss.• Create feedback document
7
Five aspects
8
• keyness of RA & TCVocabulary fit
• Readability statistics of RA & TCReadability
• Ratio of GSL, AWL and off-list for RA & TC
Word type balance
• Markedness, modality, registerStyle and usage
• Vocabulary & grammatical errorsLexico-grammar
1. Vocabulary fitScott & Tribble (2006, p.56)
``keyness [is what a text] boils down to``Hyland (2011) paper-journal fit
9
Hyland, K. (2011). Welcome to the Machine: Thoughts on writing for scholarly publication. Journal of Second Language Teaching and Research, 1 (1), 58–68.
Scott, M., & Tribble, C. (2006). Textual Patterns: Key Words and Corpus Analysis in Language Education. Amsterdam, Philadelphia: John Benjamins.
TC firm knowledge market international foreignperformance research variables markets countriesexport country relationship business model
RA organizational TMSs coordination DOPPO expertise interactions mechanisms BLOCK employee leader team coordinate informal information management
Prepared using AntConc 3.2.4w with Brown Corpus as referenceTC = 243 RAs, c. 2.1 million words RA = 10k words
10
Prepared using Wordle with RA, 10k words
TC firm knowledge market international foreignperformance research variables markets countriesexport country relationship business model
RA
2. Readability
11
Gunning fog i
ndex
Flesch
Kincaid gr
ade le
vel
Mean se
ntence le
ngth05
10152025
DraftTarget
Bogert, J. (1985). In Defense of the Fog Index. Business Communication Quarterly, 48 (2), 9-12.Gilquin, G., & Paquot, M. (2008). Too chatty: Learner academic writing and register variation.
English Text Construction, 1 (1), 41-61. McClure, G. (1987). Readability Formulas: Useful or Useless, Professional Communication, IEEE
Transactions on, 30 (1), 12-15.
Bogert (1985) & McClure (1987) – factors affecting readabilityGilquin & Paquot (2008) - Learner academic writing – rather `chatty` Research articles tend to have a higher reading difficulty.
3. Word type balance
Levels academic text1st 1000 73.5%2nd 1000 4.6%AWL 8.5% Other 13.3%
12
First 2k
words69%
AWL16%
Off-list15%
Cobb , T. (2013). Web Vocabprofile. www.lextutor.ca/vp/Nation, I.S.P. (2001). Learning vocabulary in another language. Cambridge: Cambridge University Press.
Used in EAP courses at PolyU and CityU in Hong Kong
Nation (2001,p.17)
RA analysed by WebVP classic v4 (Cobb, 2013)
4. Style and usage errors
13
Marked usage Ratio SuggestionPeople provide first 0:9 COCA People first provide
Hyland (1998) – hedgingRobb (2003) – “Google as a quick ‘n’ dirty corpus tool”
Hyland, K. (1998). Hedging in scientific research articles. Amsterdam : John BenjaminsRobb, T. (2003). Google as a quick ‘n’ dirty corpus tool. TESL-EJ, 7(2).
Corpora: IS, KS, MS, BNC , COCA , WAC
5. Lexicogrammatical errors
14
Grammatical or vocabulary errors
Incorrect form Correct form Comment
1 Taking account differences
Taking account of differences
preposition
2 this study answers to two questions
this study answers two questions
answer to s.b. / answer s.th.
3 former employee a former employee employee [singular]
4 to participate to this study
to participate in this study
collocation (participate in)
5 emphasis is given on XX
emphasis is placed on XX
collocation (give to / place on)
6 for being responsible to be responsible general vs. specific purpose
Summary statistics
15
Based on requests for simple to understand evaluation
Caveat: subjective evaluations disguised as statistics
Personalised writing analysis
16
Selected statistics for subject 1
Readability Yours Target Word type balance Yours %
Target %
Gunning fog index
13.2 13.2 1k words 68.58 74.39
Mean sentence length
15.49 19.37 2K words 6.69 5.29
Mean number of clauses /sentence
1.19 1.54 AWL 16.36 7.67
Lexical density 0.63 0.57 Off-list words 8.36 12.65
Personalised writing analysis
17
Selected statistics for subject 4
Style and usage Sentence Ratio Comment or correction1 minor scary incidents 1: 58,700 WAC near-miss incidents2 falling-accident 0: 19 COCA slips, trips and falls OR
falling objects3 a medical examination
by interview1: 525 WAC0: 1 COCA
a medical consultation
4 According to sex 1: 18 WAC According to the gender5 175 indoor workers n/a Use One hundred and ….
6 Tomio,T. (1995) proposes
n/a Omit initials in in-text citations unless …
Personalised writing analysis
18
Selected statistics for subject 7
Style and usage Sentence Ratio Comment or correction1 people provide first their
expertise … 0:9 COCA
people first provide their expertise …
2 XX also engage into XX 1:9000 COCA
XX also engage in XX
3 The XX structure limits become
n/a Use limits for boundaries and limitations for restrictions/ inabilities
4 future studies are able to n/a Use may be to show uncertainty
5 employee simultaneous participation
0:5WAC
simultaneous participation of employees
Interview surveyInterviewer = meSubjects = 4 faculty, 1 PhD candidateNationalities = 3 Japanese, 2 non-Japanese Number = 5 participants Interview time = 30 minutes Location = private office on campus Dates of interview = Jun-Jul 2013
Semi-structured interviews
e.g. `What revisions did you make to your paper since…..? `How can I make the feedback more useful?`
19
Survey results
20
• Explanatory notes – too long
• Key word lists – couldn`t understand
• Three readability scores – too complex
• Raw ratios – too difficult e.g. 47:211,120 1:4500
• Lexico-grammatical errors• Word type balance• Ratios for style and usage
Incremental improvements (made)1. Create summary statistic scorecard 2. Use word tag cloud for vocabulary fit 3. Shorten explanatory notes 4. Simplify and approximate ratios 5. Show word type balance graphically with
percentages6. Select `most useful` readability measure(s) –
mean sentence and word length?
21
Future developments• Integration of metrics into one-stop online
porthole (thanks to reviewer for idea) for researchers to submit drafts
• Statistical comparison of draft and published versions to evaluate success of feedback
22
Any questions, suggestions or comments?
John Blake [email protected]