Kristopher Ebarb and Michael Marlo Indiana University and ...
Measuring Linguistic Complexity Kristopher Kyle 3-5-2015.
-
Upload
madison-walton -
Category
Documents
-
view
217 -
download
2
Transcript of Measuring Linguistic Complexity Kristopher Kyle 3-5-2015.
Measuring Linguistic Complexity
Kristopher Kyle3-5-2015
Who is this guy?
Interested in:
L2 Writing Quality/Development
Assessment
Natural Language Processing
Productive Vocabulary
Productive Syntax
Outline of Workshop
Why measure linguistic complexity?
How can linguistic complexity measures be conceptualized?
How do we actually measure linguistic complexity?
Hands-on workshop I: Measuring syntactic complexity
Hands-on workshop II: From raw data to findings (if time)
Why measure linguistic complexity?
In the 70’s, SLA researchers (e.g., Larsen-Freeman, 1978) wanted to measure language development
Larsen-Freeman proposed three constructs of development: complexity accuracy fluency
The general hypothesis (with regard to complexity) has been: As language learners develop, their language will become more complex.
How complexity is measured has been the subject of much debate (e.g., Bulté & Housen, 2012)
How can linguistic complexity measures be conceptualized?
Wolfe-Quintero et al. (1998) provides a compendium of CAF measures up until the late 90’s
Lexical Complexity: a variety of general and part of speech specific
type/token ratio counts
Syntactic Complexity a variety of clause, sentence, and T-unit measures
that focus on clausal complexity.
How can linguistic complexity measures be conceptualized?
Most of syntactic complexity indices are ratio scores: (Structure A)/(Structure B).
The denominator (Structure B) is either:
clause: a main verb and its dependents (I eat pizza.)
T-unit: an independent clause and any attached dependent clauses (I eat pizza because it is delicious.)
sentence: A string of words that starts with a capital letter and ends with sentence-ending punctuation (I think you know what a sentence is.)
How can linguistic complexity measures be conceptualized?
The numerator (Structure A) has included many structures:
clauses
dependent clauses
adverbial clauses
T-units
complex T-units
coordinate phrases
complex nominals
verb phrases
passives
How can linguistic complexity measures be conceptualized?
Length of unit measures have also been prominent (e.g., Ortega, 2003; Lu, 2011).
Mean length of clause (MLC)
Mean length of T-unit (MLTU)
Mean length of sentence (MLS)
How can linguistic complexity measures be conceptualized?
The rise of phrasal complexity:
Biber, Poonpon, and Grey (2011) suggested that clausal subordination (i.e., what most syntactic complexity indices measure) is NOT a prominent feature of academic writing
Informal speech includes many dependent clauses, but academic writing includes many dependent phrases (and especially noun phrases.
How can linguistic complexity measures be conceptualized?
Some important issues:
Definition of measures What counts as a clause?
Prominence of broad indices What does MLC really tell us about development?
Often only a limited range of measures are used.
How do we actually measure linguistic complexity?
To measure linguistic complexity, we have two options.
Option #1: Count features by hand
Option #2: Count features using a computer
How do we actually measure linguistic complexity?
Advantages of Option 1: Researcher has full control over how syntactic
complexity is measured. Human counts may be more accurate
Disadvantages of Option 1: Expensive! Intra-rater reliability Inter-rater reliability – who is qualified?
How do we actually measure linguistic complexity?
Advantages of Option 2: Very cheap Reliable (same results every time) Usually Accurate
Biber (e.g., 2004) and Lu (2010, 2011) report accuracies above 90%
Can analyze a broad range of indices at once.
Disadvantages of Option 2: Research has less control (is at mercy of available
programs) Some data is not well-suited to automatic analysis Some linguistic features cannot be reliably captured
Hands-on workshop I: Measuring syntactic complexity
Go to www.kristopherkyle.com/workshop/ and download the “short_samples.zip” file.
Without talking with your neighbor(s) fill in the included excel sheet for examples 1-5.
What were your answers?
Any issues with example 5?
Now do the same for example 6…
Hands-on workshop I: Measuring syntactic complexity
Tool for the Automatic Analysis of Syntactic Complexity (TAASC) Prototype!!!
Includes indices created by Xiaofe Lu (Syntactic Complexity Analyzer; Lu, 2011)
Also includes some replications of the Biber Tagger
Hands-on workshop I: Measuring syntactic complexity
How TAASC works:
Reads file
Splits file into sentences
Parses each sentence uses Stanford Parser
Uses regular expressions (a way to search for patterns) to identify particular structures in the parse tree. uses Stanford Tregex (regular expressions for parse
trees)
Hands-on workshop I: Measuring syntactic complexity
Now, lets check to see if your computer is set up correctly.
First, search for Terminal (mac) or Command Prompt (Windows)
Then type: java –version
Then type: python
Go to www.kristopherkyle.com/workshop/ and download the appropriate version of TAASC (windows or mac).
Extract it to your Desktop
Copy the example files to the “to_process_2” folder
Hands-on workshop I: Measuring syntactic complexity
Now, in Terminal/Command Prompt type: cd [location of TAASC folder] (then press “return”) python [name of the appropriate TAASC program]
(“return”)
Your results should now be in a file called “results.csv”
If you want to examine the accuracy of the parse trees, look in the folder “parsed_files” using Tregex
Hands-on workshop I: Measuring syntactic complexity
Some simple patterns:
VP
VP<S
Some important patterns:
clause: S|SINV|SQ <<# MD|VBP|VBZ|VBD
T-unit: S|SBARQ|SINV|SQ > ROOT | [$-- S|SBARQ|SINV|SQ !>> SBAR|VP]
Hands-on workshop II: From raw data to findings
Go to www.kristopherkyle.com/workshop/ and download the “Workshop_Data.zip” file.
58 participants, three timed essays over 1 year.
IEP Levels 3-4 (Intermediate/Advanced)
Now let’s analyze some data!
NOTE: We didn’t get to this in class…