HLT, Data Sparsity and Semantic Tagging Louise Guthrie (University of Sheffield)
description
Transcript of HLT, Data Sparsity and Semantic Tagging Louise Guthrie (University of Sheffield)
![Page 1: HLT, Data Sparsity and Semantic Tagging Louise Guthrie (University of Sheffield)](https://reader036.fdocuments.net/reader036/viewer/2022062517/56813adb550346895da32533/html5/thumbnails/1.jpg)
1(21)
HLT, Data Sparsity and Semantic Tagging
Louise Guthrie (University of Sheffield)Roberto Basili (University of Tor Vergata, Rome)
Hamish Cunningham (University of Sheffield)
![Page 2: HLT, Data Sparsity and Semantic Tagging Louise Guthrie (University of Sheffield)](https://reader036.fdocuments.net/reader036/viewer/2022062517/56813adb550346895da32533/html5/thumbnails/2.jpg)
2(21)
Outline
– A ubiquitous problem: data sparsity
– The approach:• coarse-grained semantic tagging• learning by combining multiple evidence
– The evaluation: intrinsic and extrinsic measures
– The expected outcomes: architectures, tools, development support
![Page 3: HLT, Data Sparsity and Semantic Tagging Louise Guthrie (University of Sheffield)](https://reader036.fdocuments.net/reader036/viewer/2022062517/56813adb550346895da32533/html5/thumbnails/3.jpg)
3(21)
Applications
PresentWe’ve seen growing interest in a range of HLT tasks:
e.g. IE, MT
Trends– Fully portable IE, unsupervised learning– Content Extraction vs. IE
![Page 4: HLT, Data Sparsity and Semantic Tagging Louise Guthrie (University of Sheffield)](https://reader036.fdocuments.net/reader036/viewer/2022062517/56813adb550346895da32533/html5/thumbnails/4.jpg)
4(21)
Data Sparsity
• Language Processing depends on a model of the features important to an application.– MT - Trigrams and frequencies– Extraction - Word patterns
• New texts always seem to have lots of phenomena we haven’t seen before
![Page 5: HLT, Data Sparsity and Semantic Tagging Louise Guthrie (University of Sheffield)](https://reader036.fdocuments.net/reader036/viewer/2022062517/56813adb550346895da32533/html5/thumbnails/5.jpg)
5(21)
Different kinds of patterns
Person was appointed as post of company
Company named person to post
• Almost all extraction systems tried to find patterns of mixed words and entities.– People, Locations, Organizations, dates,
times, currencies
![Page 6: HLT, Data Sparsity and Semantic Tagging Louise Guthrie (University of Sheffield)](https://reader036.fdocuments.net/reader036/viewer/2022062517/56813adb550346895da32533/html5/thumbnails/6.jpg)
6(21)
Can we do more?
Astronauts aboard the space shuttle Endeavor were forced to dodge a derelict Air Force satellite Friday
Humans aboard space_vehicle dodge satellite timeref.
![Page 7: HLT, Data Sparsity and Semantic Tagging Louise Guthrie (University of Sheffield)](https://reader036.fdocuments.net/reader036/viewer/2022062517/56813adb550346895da32533/html5/thumbnails/7.jpg)
7(21)
Could we know these are the same?
The IRA bombed a family owned shop in Belfast yesterday.
FMLN set off a series of explosions in central Bogota today.
ORGANIZATION ATTACKED LOCATION DATE
![Page 8: HLT, Data Sparsity and Semantic Tagging Louise Guthrie (University of Sheffield)](https://reader036.fdocuments.net/reader036/viewer/2022062517/56813adb550346895da32533/html5/thumbnails/8.jpg)
8(21)
Machine translation
• Ambiguity of words often means that a word can translate several ways.
• Would knowing the semantic class of a word, help us to know the translation?
![Page 9: HLT, Data Sparsity and Semantic Tagging Louise Guthrie (University of Sheffield)](https://reader036.fdocuments.net/reader036/viewer/2022062517/56813adb550346895da32533/html5/thumbnails/9.jpg)
9(21)
Sometimes . . .
• Crane the bird vs crane the machine
• Bat the animal vs bat for cricket and baseball
• Seal on a letter vs the animal
![Page 10: HLT, Data Sparsity and Semantic Tagging Louise Guthrie (University of Sheffield)](https://reader036.fdocuments.net/reader036/viewer/2022062517/56813adb550346895da32533/html5/thumbnails/10.jpg)
10(21)
SO ..
P(translation(crane) = grulla | animal) > P(translation(crane) = grulla)
P(translation(crane) = grua | machine) > P(translation(crane) = grua)
Can we show the overall effect lowers entropy?
![Page 11: HLT, Data Sparsity and Semantic Tagging Louise Guthrie (University of Sheffield)](https://reader036.fdocuments.net/reader036/viewer/2022062517/56813adb550346895da32533/html5/thumbnails/11.jpg)
11(21)
Language Modeling – Data Sparseness again ..
• We need to estimate Pr (w3 | w1 w2)
• If we have never seen w1w2 w3 before
• Can we instead develop a model and estimate Pr (w3 | C1 C2) or Pr (C3 | C1 C2)
![Page 12: HLT, Data Sparsity and Semantic Tagging Louise Guthrie (University of Sheffield)](https://reader036.fdocuments.net/reader036/viewer/2022062517/56813adb550346895da32533/html5/thumbnails/12.jpg)
12(21)
A Semantic Tagging technology. How?
• We will exploit similarity with NE tagging, ...– Development of pattern matching rules as
incremental wrapper induction
• ... with semantic (sense) disambiguation– Use as much evidence as possible– Exploit existing resources like MRD or LKBs
• ... and with machine learning tasks– Generalize from positive examples in training data
![Page 13: HLT, Data Sparsity and Semantic Tagging Louise Guthrie (University of Sheffield)](https://reader036.fdocuments.net/reader036/viewer/2022062517/56813adb550346895da32533/html5/thumbnails/13.jpg)
13(21)
Multiple Sources of Evidence
• Lexical information (priming effects)
• Distributional information from general and training texts
• Syntactic features– SVO patterns or Adjectival modifiers
• Semantic features– Structural information in LKBs– (LKB-based) similarity measures
![Page 14: HLT, Data Sparsity and Semantic Tagging Louise Guthrie (University of Sheffield)](https://reader036.fdocuments.net/reader036/viewer/2022062517/56813adb550346895da32533/html5/thumbnails/14.jpg)
14(21)
Machine Learning for ST
• Similarity estimation– among contexts (texts overlaps, …)– among lexical items wrt MRD/LKBs
• We will experiment– Decision tree learning (e.g. C4.5)– Support Vector Machines (e.g. SVM light)– Memory-based Learning (TiMBL)– Bayesian learning
![Page 15: HLT, Data Sparsity and Semantic Tagging Louise Guthrie (University of Sheffield)](https://reader036.fdocuments.net/reader036/viewer/2022062517/56813adb550346895da32533/html5/thumbnails/15.jpg)
15(21)
What’s New?
• Granularity– Semantic categories are coarser than word senses
(cfr. homograph level in MRD)
• Integration of existing ML methods– Pattern induction is combined with probabilistic
description of word semantic classes
• Co-training– Annotated data are used to drive the sampling of
further evidence from unannotated material (active learning)
![Page 16: HLT, Data Sparsity and Semantic Tagging Louise Guthrie (University of Sheffield)](https://reader036.fdocuments.net/reader036/viewer/2022062517/56813adb550346895da32533/html5/thumbnails/16.jpg)
16(21)
How we know what we’ve done: measurement, the corpus
Hand-annotated corpus - from the BNC, 100-million word balanced corpus- 1 million words annotated - a little under ½ million categorised noun phrases
Extrinsic evaluationPerplexity of lexical choice in Machine Translation
Intrinsic evaluationStandard measures or precision, recall, false positives(baseline: tag with most common category = 33%)
![Page 17: HLT, Data Sparsity and Semantic Tagging Louise Guthrie (University of Sheffield)](https://reader036.fdocuments.net/reader036/viewer/2022062517/56813adb550346895da32533/html5/thumbnails/17.jpg)
17(21)
Ambiguity levels in the training data
distribution of NPs to categories
0.0%
5.0%
10.0%
15.0%
20.0%
25.0%
30.0%
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
number of senses
per
cen
tag
e o
f N
Ps
NPs by semantic categories:
0 104824 23.1%1 119228 26.3%2 96852 21.4%3 44385 9.8%4 35671 7.9%5 15499 3.4%6 13555 3.0%7 7635 1.7%8 6000 1.3%9 2191 0.5%10 3920 0.9%11 1028 0.2%12 606 0.1%13 183 0.0%14 450 0.1%15 919 0.2%17 414 0.1%
Total NPs (interim) 453360
![Page 18: HLT, Data Sparsity and Semantic Tagging Louise Guthrie (University of Sheffield)](https://reader036.fdocuments.net/reader036/viewer/2022062517/56813adb550346895da32533/html5/thumbnails/18.jpg)
18(21)
Maximising project outputs:software infrastructure for HLT
Three outputs from the project:
1. A new resource Automatical annotation of the whole corpus
2. Experimental evidence re. 1.- how accurate the final results are- how accurate the various methods employed are
3. Component tools for doing 1., based on GATE(a General Architecture for Text Engineering)
![Page 19: HLT, Data Sparsity and Semantic Tagging Louise Guthrie (University of Sheffield)](https://reader036.fdocuments.net/reader036/viewer/2022062517/56813adb550346895da32533/html5/thumbnails/19.jpg)
19(21)
What is GATE?• An architectureA macro-level organisational picture for LE software systems. • A frameworkFor programmers, GATE is an object-oriented class library that implements the architecture. • A development environmentFor language engineers, computational linguists et al, GATE is a graphical development environment bundled with a set of tools for doing e.g. Information Extraction. • Some free components... ...and wrappers for other people's components • Tools for: evaluation; visualise/edit; persistence; IR; IE; dialogue; ontologies; etc.• Free software (LGPL). Download at http://gate.ac.uk/download/
![Page 20: HLT, Data Sparsity and Semantic Tagging Louise Guthrie (University of Sheffield)](https://reader036.fdocuments.net/reader036/viewer/2022062517/56813adb550346895da32533/html5/thumbnails/20.jpg)
20(21)
Where did GATE come from?
A number of researchers realised in the early- mid-1990s (e.g. in TIPSTER):
• Increasing trend towards multi-site collaborative projects• Role of engineering in scalable, reusable, and portable HLT solutions• Support for large data, in multiple media, languages, formats, and
locations• Lower the cost of creation of new language processing components • Promote quantitative evaluation metrics via tools and a level playing field
History:
• 1996 – 2002: GATE version 1, proof of concept
• March 2002: version 2, rewritten in Java, component based, LGPL, more
users
• Fall 2003: new development cycle
![Page 21: HLT, Data Sparsity and Semantic Tagging Louise Guthrie (University of Sheffield)](https://reader036.fdocuments.net/reader036/viewer/2022062517/56813adb550346895da32533/html5/thumbnails/21.jpg)
21(21)
Role of GATE in the project• Productivity
- reuse some baseline components for simple tasks- development environment support for implementors (MATLAB for HLT?)- reduce integration overhead (standard interfaces between components)- system takes care of persistency, visualisation, multilingual edit, ...
• Quantification- tool support for metrics generation - visualisation of key/response differences- regression test tool for nightly progress verification
• Repeatability- open source supported, maintained, documented software- cross-platform (Linux, Windows, Solaris, others)- easy install and proven useability (thousands of people, hundreds of sites)- mobile code if you write in Java; web services otherwise