CSA2050 Introduction to Computational Linguistics Lecture 1 What is Computational Linguistics?
(and computational linguistics) · Computational linguistics: computer science theory cognition...
Transcript of (and computational linguistics) · Computational linguistics: computer science theory cognition...
![Page 1: (and computational linguistics) · Computational linguistics: computer science theory cognition algorithms Natural language processing: software development application practical](https://reader034.fdocuments.net/reader034/viewer/2022051806/6002685ce1f910128874997f/html5/thumbnails/1.jpg)
1
(and computational linguistics)
![Page 2: (and computational linguistics) · Computational linguistics: computer science theory cognition algorithms Natural language processing: software development application practical](https://reader034.fdocuments.net/reader034/viewer/2022051806/6002685ce1f910128874997f/html5/thumbnails/2.jpg)
Computational linguistics: computer science theory cognition algorithms
Natural language processing: software development application practical techniques
![Page 3: (and computational linguistics) · Computational linguistics: computer science theory cognition algorithms Natural language processing: software development application practical](https://reader034.fdocuments.net/reader034/viewer/2022051806/6002685ce1f910128874997f/html5/thumbnails/3.jpg)
Computer methods and their usefulness (or uselessness) for human language processing (textual, spoken, gestural, etc.)
Implementation of techniques, procedures, algorithms for language computation
Enabling human-machine communication Enhancing human-human communication
3
![Page 4: (and computational linguistics) · Computational linguistics: computer science theory cognition algorithms Natural language processing: software development application practical](https://reader034.fdocuments.net/reader034/viewer/2022051806/6002685ce1f910128874997f/html5/thumbnails/4.jpg)
4
computer science
psychology/cognitive science
linguistics
math/statistics
philosophy
communication
NLP
![Page 5: (and computational linguistics) · Computational linguistics: computer science theory cognition algorithms Natural language processing: software development application practical](https://reader034.fdocuments.net/reader034/viewer/2022051806/6002685ce1f910128874997f/html5/thumbnails/5.jpg)
Tokenization Part-of-speech tagging Computational morphology Syntactic parsing Lexical relations Dialogue move engines
![Page 6: (and computational linguistics) · Computational linguistics: computer science theory cognition algorithms Natural language processing: software development application practical](https://reader034.fdocuments.net/reader034/viewer/2022051806/6002685ce1f910128874997f/html5/thumbnails/6.jpg)
Dialectizer Speech recognition (speech to text) Speech synthesis (text to speech) Diacritization, Romanization Corpus annotation (Syriac) Thought identification
![Page 7: (and computational linguistics) · Computational linguistics: computer science theory cognition algorithms Natural language processing: software development application practical](https://reader034.fdocuments.net/reader034/viewer/2022051806/6002685ce1f910128874997f/html5/thumbnails/7.jpg)
Question answering Summarization Natural language generation Machine translation Spoken language identification Spoken language translation
![Page 8: (and computational linguistics) · Computational linguistics: computer science theory cognition algorithms Natural language processing: software development application practical](https://reader034.fdocuments.net/reader034/viewer/2022051806/6002685ce1f910128874997f/html5/thumbnails/8.jpg)
Humanities, natural and behavioral sciences, and engineering
Linguistics, computer science, psychology, and mathematics
Theory and practice, science and art Models, foundations vs. corpora, data
(top-down vs. bottom-up)
8
![Page 9: (and computational linguistics) · Computational linguistics: computer science theory cognition algorithms Natural language processing: software development application practical](https://reader034.fdocuments.net/reader034/viewer/2022051806/6002685ce1f910128874997f/html5/thumbnails/9.jpg)
Math: statistics, calculus, algebra, modelling Computational paradigms: connectionist, rule-
based, cognitively plausible Linguistics: LFG, HPSG, GB, OT, CG, etc. Architectures: stacks, automata, networks,
compilers
9
![Page 10: (and computational linguistics) · Computational linguistics: computer science theory cognition algorithms Natural language processing: software development application practical](https://reader034.fdocuments.net/reader034/viewer/2022051806/6002685ce1f910128874997f/html5/thumbnails/10.jpg)
Several approaches implemented, taught here Homegrown: analogical modeling (AM) State-of-the-art performance in various
applications for various languages: Written language identification Part-of-speech tagging Morpheme boundary detection Named entity recognition Word sense disambiguation Shallow parsing Semantic role labeling Spoken language identification
10
![Page 11: (and computational linguistics) · Computational linguistics: computer science theory cognition algorithms Natural language processing: software development application practical](https://reader034.fdocuments.net/reader034/viewer/2022051806/6002685ce1f910128874997f/html5/thumbnails/11.jpg)
11
Year Price
Make Mileage
Model
Feature
PhoneNr
Extension
Car
has has
has
has is for
has
has
has
1..*
0..1
1..*
1..* 1..*
1..*
1..*
1..*
0..1 0..1 0..1
0..1
0..1
0..1
0..*
1..*
![Page 12: (and computational linguistics) · Computational linguistics: computer science theory cognition algorithms Natural language processing: software development application practical](https://reader034.fdocuments.net/reader034/viewer/2022051806/6002685ce1f910128874997f/html5/thumbnails/12.jpg)
Work on information extraction (data-rich text, web)
Recognition and extraction of low-level data elements
Ontology-based Related applications: ontology
generation, text similarity and classification, information integration, etc.
NSF-funded
12
![Page 13: (and computational linguistics) · Computational linguistics: computer science theory cognition algorithms Natural language processing: software development application practical](https://reader034.fdocuments.net/reader034/viewer/2022051806/6002685ce1f910128874997f/html5/thumbnails/13.jpg)
![Page 14: (and computational linguistics) · Computational linguistics: computer science theory cognition algorithms Natural language processing: software development application practical](https://reader034.fdocuments.net/reader034/viewer/2022051806/6002685ce1f910128874997f/html5/thumbnails/14.jpg)
Results and issues
• Corpus of 1500 obituaries, 500 hand-annotated
• Preliminary evaluation on a few features: name, age, title, birth date, death date, death place, funeral time/location
• Results: around 80% precision, little less on recall
• Lexicon coverage (especially place names)
• Occasional typos • Deceased sometimes
not named • Factored lists: Pierre et
Marie, son fils et belle-fille
• Anaphora resolution: Né à Paris et y décédé…
![Page 15: (and computational linguistics) · Computational linguistics: computer science theory cognition algorithms Natural language processing: software development application practical](https://reader034.fdocuments.net/reader034/viewer/2022051806/6002685ce1f910128874997f/html5/thumbnails/15.jpg)
… …
![Page 16: (and computational linguistics) · Computational linguistics: computer science theory cognition algorithms Natural language processing: software development application practical](https://reader034.fdocuments.net/reader034/viewer/2022051806/6002685ce1f910128874997f/html5/thumbnails/16.jpg)
… …
![Page 17: (and computational linguistics) · Computational linguistics: computer science theory cognition algorithms Natural language processing: software development application practical](https://reader034.fdocuments.net/reader034/viewer/2022051806/6002685ce1f910128874997f/html5/thumbnails/17.jpg)
… …
… …
![Page 18: (and computational linguistics) · Computational linguistics: computer science theory cognition algorithms Natural language processing: software development application practical](https://reader034.fdocuments.net/reader034/viewer/2022051806/6002685ce1f910128874997f/html5/thumbnails/18.jpg)
… …
grandchildren of Mary Ely
… …
![Page 19: (and computational linguistics) · Computational linguistics: computer science theory cognition algorithms Natural language processing: software development application practical](https://reader034.fdocuments.net/reader034/viewer/2022051806/6002685ce1f910128874997f/html5/thumbnails/19.jpg)
… …
grandchildren of Mary Ely
… …
![Page 20: (and computational linguistics) · Computational linguistics: computer science theory cognition algorithms Natural language processing: software development application practical](https://reader034.fdocuments.net/reader034/viewer/2022051806/6002685ce1f910128874997f/html5/thumbnails/20.jpg)
grandchildren of Mary Ely
… …
… …
![Page 21: (and computational linguistics) · Computational linguistics: computer science theory cognition algorithms Natural language processing: software development application practical](https://reader034.fdocuments.net/reader034/viewer/2022051806/6002685ce1f910128874997f/html5/thumbnails/21.jpg)
… …
… …
![Page 22: (and computational linguistics) · Computational linguistics: computer science theory cognition algorithms Natural language processing: software development application practical](https://reader034.fdocuments.net/reader034/viewer/2022051806/6002685ce1f910128874997f/html5/thumbnails/22.jpg)
Number of facts extracted: 22,251 8,740 Person-BirthDate facts 3,803 Person-DeathDate facts 9,708 children facts, including
▪ 5,020 Child-has-parent-Person facts ▪ 2,394 Son-of-Person facts ▪ 2,294 Daughter-of-Person facts
Number of implied grandchild facts inferred: 5,277
Processing time: ~18 seconds per page CPU time: ~4 hours
Precision: .52 (spot-checking 100 of the 22,251 facts) Recall: .33 & Precision: .40 (spot-checking 2 fact-filled family
pages)
![Page 23: (and computational linguistics) · Computational linguistics: computer science theory cognition algorithms Natural language processing: software development application practical](https://reader034.fdocuments.net/reader034/viewer/2022051806/6002685ce1f910128874997f/html5/thumbnails/23.jpg)
“Find a BBQ restaurant near the Umeda station, with typical prices under $40”
Language-Agnostic Ontology
![Page 24: (and computational linguistics) · Computational linguistics: computer science theory cognition algorithms Natural language processing: software development application practical](https://reader034.fdocuments.net/reader034/viewer/2022051806/6002685ce1f910128874997f/html5/thumbnails/24.jpg)
Oral proficiency testing for language learners
Sentences presented aurally, repeated back Carefully engineered for vocabulary level,
grammatical complexity, length in syllables Score responses with forced alignment Correlate to standard testing methods English, French, Spanish, Japanese In use at language training facilities,
universities, industry
![Page 25: (and computational linguistics) · Computational linguistics: computer science theory cognition algorithms Natural language processing: software development application practical](https://reader034.fdocuments.net/reader034/viewer/2022051806/6002685ce1f910128874997f/html5/thumbnails/25.jpg)
Too short: just WM task w/ parroting Too long: impossible to repeat Too complex: even NS can’t repeat Too simple: can’t discriminate NNS levels EI item design is a linguistic engineering
task! Sentence length Sentence complexity Vocabulary levels Breadth of sampling of grammatical
structures, constructions
![Page 26: (and computational linguistics) · Computational linguistics: computer science theory cognition algorithms Natural language processing: software development application practical](https://reader034.fdocuments.net/reader034/viewer/2022051806/6002685ce1f910128874997f/html5/thumbnails/26.jpg)
![Page 27: (and computational linguistics) · Computational linguistics: computer science theory cognition algorithms Natural language processing: software development application practical](https://reader034.fdocuments.net/reader034/viewer/2022051806/6002685ce1f910128874997f/html5/thumbnails/27.jpg)
681,925 annotated sentences of length 5-20 words
![Page 28: (and computational linguistics) · Computational linguistics: computer science theory cognition algorithms Natural language processing: software development application practical](https://reader034.fdocuments.net/reader034/viewer/2022051806/6002685ce1f910128874997f/html5/thumbnails/28.jpg)
![Page 29: (and computational linguistics) · Computational linguistics: computer science theory cognition algorithms Natural language processing: software development application practical](https://reader034.fdocuments.net/reader034/viewer/2022051806/6002685ce1f910128874997f/html5/thumbnails/29.jpg)
![Page 30: (and computational linguistics) · Computational linguistics: computer science theory cognition algorithms Natural language processing: software development application practical](https://reader034.fdocuments.net/reader034/viewer/2022051806/6002685ce1f910128874997f/html5/thumbnails/30.jpg)
NLP in a cognitive modeling framework Goal-directed, incremental Machine learning Trying to model/mimic human performance
in language tasks Several modalities Parsing Generation Translation Dialogue
30
![Page 31: (and computational linguistics) · Computational linguistics: computer science theory cognition algorithms Natural language processing: software development application practical](https://reader034.fdocuments.net/reader034/viewer/2022051806/6002685ce1f910128874997f/html5/thumbnails/31.jpg)
Cognitive modeling Model human behavior: agent-based, goal-
directed, representation of world, decomposable actions, learned skills, behaviors, expertise, memory
Fatigue, emotion, attention, overload, confusion Plausible: processes, time course, constraints Robots: explore control, agency, interaction Language: cognition, acquisition, modeling,
agency, incrementality, discourse/dialogue, process (parsing, lexical access, generation, translation, …)
![Page 32: (and computational linguistics) · Computational linguistics: computer science theory cognition algorithms Natural language processing: software development application practical](https://reader034.fdocuments.net/reader034/viewer/2022051806/6002685ce1f910128874997f/html5/thumbnails/32.jpg)
Develop NLP capability in Soar Parsing, generation, discourse/dialogue,
translation, speech Fit models of human performance data Incremental, learning, agent-based WordNet, other resources for lexical info English, French, Japanese Use in HCI, modeling (reading, acquisition),
task interactions, emotion, attention, ambiguity resolution, parser breakdown, etc.
![Page 33: (and computational linguistics) · Computational linguistics: computer science theory cognition algorithms Natural language processing: software development application practical](https://reader034.fdocuments.net/reader034/viewer/2022051806/6002685ce1f910128874997f/html5/thumbnails/33.jpg)
33
![Page 34: (and computational linguistics) · Computational linguistics: computer science theory cognition algorithms Natural language processing: software development application practical](https://reader034.fdocuments.net/reader034/viewer/2022051806/6002685ce1f910128874997f/html5/thumbnails/34.jpg)
Dialogue
Comprehension
Generation
Dialogue
Generation
Comprehension
![Page 35: (and computational linguistics) · Computational linguistics: computer science theory cognition algorithms Natural language processing: software development application practical](https://reader034.fdocuments.net/reader034/viewer/2022051806/6002685ce1f910128874997f/html5/thumbnails/35.jpg)
![Page 36: (and computational linguistics) · Computational linguistics: computer science theory cognition algorithms Natural language processing: software development application practical](https://reader034.fdocuments.net/reader034/viewer/2022051806/6002685ce1f910128874997f/html5/thumbnails/36.jpg)
Operationalize language processing of all kinds (mostly for DoD) Machine translation, sentiment analysis,
dialect recognition, prevarication detection, etc. Beyond the current paradigms, language
resources (cf. trained on newswire) MT and CLIR (A), HCI English+Arabic (B), ST
English+Arabic (C), Arabic dialects (D) Activity E: language, agents, and robotics
![Page 37: (and computational linguistics) · Computational linguistics: computer science theory cognition algorithms Natural language processing: software development application practical](https://reader034.fdocuments.net/reader034/viewer/2022051806/6002685ce1f910128874997f/html5/thumbnails/37.jpg)
Grounded language acquisition by robots Deep semantics, visual+tactile input,
experiential learning of objects, actions, and consequences
Acquires language via grounding, hypothesizing, automated reasoning
Human guides acquisition via situated, inter-active instruction
Robot demonstrates understanding via performance
![Page 38: (and computational linguistics) · Computational linguistics: computer science theory cognition algorithms Natural language processing: software development application practical](https://reader034.fdocuments.net/reader034/viewer/2022051806/6002685ce1f910128874997f/html5/thumbnails/38.jpg)
Social band (105 to 107:days to months) Rational band (102 to 104:minutes to
hours) Cognitive band (10-1 to 101:100 ms to 10
secs) Biological band (10-4 to 10-2:100 μs to 10
ms)
![Page 39: (and computational linguistics) · Computational linguistics: computer science theory cognition algorithms Natural language processing: software development application practical](https://reader034.fdocuments.net/reader034/viewer/2022051806/6002685ce1f910128874997f/html5/thumbnails/39.jpg)
Put <object> in <location> Includes moving to <object>, picking it up, moving to <location>,
opening <location> if necessary, depositing <object>, closing <location> if necessary
Fails if already another object in location (or can extend to put second object in work area?)
Cook <object> Clears the location where the object will be cooked. Turns on location to correct temperature (background knowledge in
semantic memory!) If need to preheat (oven), wait for it to preheat. Puts object in location. Waits. Tests temperature or other appropriate sensor (toothpick for
cake?). Removes object from oven/stove and places on workspace Turns off oven/stove
![Page 40: (and computational linguistics) · Computational linguistics: computer science theory cognition algorithms Natural language processing: software development application practical](https://reader034.fdocuments.net/reader034/viewer/2022051806/6002685ce1f910128874997f/html5/thumbnails/40.jpg)
40
![Page 41: (and computational linguistics) · Computational linguistics: computer science theory cognition algorithms Natural language processing: software development application practical](https://reader034.fdocuments.net/reader034/viewer/2022051806/6002685ce1f910128874997f/html5/thumbnails/41.jpg)
41
![Page 42: (and computational linguistics) · Computational linguistics: computer science theory cognition algorithms Natural language processing: software development application practical](https://reader034.fdocuments.net/reader034/viewer/2022051806/6002685ce1f910128874997f/html5/thumbnails/42.jpg)
42