Hans UszkoreitGerman Research Center for Artificial Intelligence
and Saarland University at Saarbruecken
Hans UszkoreitGerman Research Center for Artificial Intelligence
and Saarland University at Saarbruecken
The Rôle of Linguisticsfor the Future of
Language Processing
The Rôle of Linguisticsfor the Future of
Language Processing
LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK © 2003 H. Uszkoreit© 2003 H. Uszkoreit
The development of linguistics
Linguistics and the computer
The relevance of CL for theoretical linguistics
The role of linguistics for language technology
Current trends and outlook
The development of linguistics
Linguistics and the computer
The relevance of CL for theoretical linguistics
The role of linguistics for language technology
Current trends and outlook
OutlineOutline
LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK © 2003 H. Uszkoreit© 2003 H. Uszkoreit
Data-Gathering and Maintenance• automatic handling of large volumes of data
Scientific Computing • data and model visualization• data exploitation, • simulation• modelling
Electronic scientific information• data on research (centers, people, resources, projects,
literature)
Electronic scientific content• reports, articles, books, e-journals, e-print archives
Data-Gathering and Maintenance• automatic handling of large volumes of data
Scientific Computing • data and model visualization• data exploitation, • simulation• modelling
Electronic scientific information• data on research (centers, people, resources, projects,
literature)
Electronic scientific content• reports, articles, books, e-journals, e-print archives
IT in ScienceIT in Science
LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK © 2003 H. Uszkoreit© 2003 H. Uszkoreit
Development of LinguisticsDevelopment of Linguistics
first half of 20th century: linguistics becomes concrete structuralist linguistics - ontological concepts (entities and structures)
second half of 20th century: linguistics becomes formalgenerative linguistics - formalisms for syntax and semantics
first half of 21st century: linguistics becomes empirical empirical linguistics - quantitative models - graded grammaticality
LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK © 2003 H. Uszkoreit© 2003 H. Uszkoreit
The Rôle of ComputationThe Rôle of Computation
formalization led to highly complex systems of formal rules, principles or constraints that cannot be tested, validated and modified without sophisticated information processing
language data of sufficient size cannot be gathered, searched, and maintained anymore without powerful computing
LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK © 2003 H. Uszkoreit© 2003 H. Uszkoreit
Empirical LinguisticsEmpirical Linguistics
discrete findings
statistical findings
replicability
shared interpretations of data
connection with data and results
LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK © 2003 H. Uszkoreit© 2003 H. Uszkoreit
EMPIRICAL LINGUISTICS EMPIRICAL LINGUISTICS
corpus data experimentalpsycholinguistic data
introspective data
DB of relevant data
research
LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK © 2003 H. Uszkoreit© 2003 H. Uszkoreit
Driving Forces of CLDriving Forces of CL
CognitionCognition
models of human models of human language processinglanguage processing
CognitionCognition
models of human models of human language processinglanguage processing
EngineeringEngineering
language technologylanguage technologyapplicationsapplications
EngineeringEngineering
language technologylanguage technologyapplicationsapplications
LinguisticsLinguistics
linguistic theorylinguistic theory
LinguisticsLinguistics
linguistic theorylinguistic theory
LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK © 2003 H. Uszkoreit© 2003 H. Uszkoreit
Role of Computing in LinguisticsRole of Computing in Linguistics
theoreticallinguistics
applied linguistics
linguistics w/o the computer
linguistics with the computer
LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK © 2003 H. Uszkoreit© 2003 H. Uszkoreit
Until 1980Until 1980
LinguisticsComputational
Linguistics
LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK © 2003 H. Uszkoreit© 2003 H. Uszkoreit
1980-19901980-1990
LinguisticsComputational
Linguistics
LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK © 2003 H. Uszkoreit© 2003 H. Uszkoreit
1990 - 20001990 - 2000
LinguisticsComputational
Linguistics
LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK © 2003 H. Uszkoreit© 2003 H. Uszkoreit
LT METHODSLT METHODSdiscrete non-discretehybrid
shallow
deep
HMM-basedHMM-basedPOS TaggerPOS Tagger
LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK © 2003 H. Uszkoreit© 2003 H. Uszkoreit
discrete non-discretehybrid
shallow
deep
HPSG-ParserHPSG-Parserwith MRSwith MRS
LT METHODSLT METHODS
LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK © 2003 H. Uszkoreit© 2003 H. Uszkoreit
discrete non-discretehybrid
shallow
deep
PCF Parser PCF Parser
LT METHODSLT METHODS
LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK © 2003 H. Uszkoreit© 2003 H. Uszkoreit
discrete non-discretehybrid
shallow
deep
syntactic LFGsyntactic LFGparser with MEparser with MEselection selection
LT METHODSLT METHODS
LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK © 2003 H. Uszkoreit© 2003 H. Uszkoreit
discrete non-discretehybrid
shallow
deep
LT METHODS (Trends)LT METHODS (Trends)
LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK © 2003 H. Uszkoreit© 2003 H. Uszkoreit
Simulation and ModellingSimulation and Modelling
N
NP
A
NDetV
VP
NP
S
Sue gave Paul an old penny.
NP
PHON/anoldpenny/
SYN
CATNP
HEADCASEobjectiveNUMBERsingPERSONthird
VALENCEvstruc
SEM
QUANTexistVARX1
RESTR
RELold'VARX1
ARGpenny'
LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK © 2003 H. Uszkoreit© 2003 H. Uszkoreit
Sue gab Paul einen alten Pfennig.
NP
NA
NDetV
S/NP
NP
S
NP
N
NP
A
NDetV
VP
NP
S
Sue gave Paul an old penny.
NP
x[(old'(penny')) (x) Past(give'(sue‘, paul‘, x)))]
LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK © 2003 H. Uszkoreit© 2003 H. Uszkoreit
APPLICATIONSAPPLICATIONS
Machine Translation
e.g. Systran, Logos, METAL-Comprendium, IBM PT
Access to Databases
e.g. Core Language Engine
New: Information Extraction and Text Enrichment
e.g. WHITEBOARD, DEEP THOUGH
Machine Translation
e.g. Systran, Logos, METAL-Comprendium, IBM PT
Access to Databases
e.g. Core Language Engine
New: Information Extraction and Text Enrichment
e.g. WHITEBOARD, DEEP THOUGH
LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK © 2003 H. Uszkoreit© 2003 H. Uszkoreit
Problems with Deep AnalysisProblems with Deep Analysis
Coverage (Development Time)
Robustness (Coping with Out-of-Grammar Input)
Efficiency (Runtime and Space Efficiency)
Specificity (Selection among Readings)
Coverage (Development Time)
Robustness (Coping with Out-of-Grammar Input)
Efficiency (Runtime and Space Efficiency)
Specificity (Selection among Readings)
LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK © 2003 H. Uszkoreit© 2003 H. Uszkoreit
OutlookOutlook
Linguistics will develop hybrid discrete and nondiscrete models of language
More subareas of linguistics will employ computational modelling
Computational linguistics will play a central role in the emprirical branch of linguistic research
Computational linguistics methods and results do have a future in language technology
Language technology will have to get more deeply into semantics
The field provides some grand challenges
Linguistics will develop hybrid discrete and nondiscrete models of language
More subareas of linguistics will employ computational modelling
Computational linguistics will play a central role in the emprirical branch of linguistic research
Computational linguistics methods and results do have a future in language technology
Language technology will have to get more deeply into semantics
The field provides some grand challenges
LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK LEITLINIEN FÜR DIE HEIDELBERGER COMPUTERLINGUISTIK © 2003 H. Uszkoreit© 2003 H. Uszkoreit
Grand ChallengesGrand Challenges
hybrid models of language processing and learning,
models of language change
empirical methodology of language science: large multilevel linguistically interpreted data collections
ambient computing -- ubiquitous natural access to information and assistance
turning the WWW as well as personal and collective digital infor-mation repositories into digital memories and knowledge bases
hybrid models of language processing and learning,
models of language change
empirical methodology of language science: large multilevel linguistically interpreted data collections
ambient computing -- ubiquitous natural access to information and assistance
turning the WWW as well as personal and collective digital infor-mation repositories into digital memories and knowledge bases
Top Related