Automatic Assignment of Domain Labels to WordNet Mauro Castillo V. Francis Real V. German Rigau C....
-
Upload
brett-dunkerley -
Category
Documents
-
view
215 -
download
0
Transcript of Automatic Assignment of Domain Labels to WordNet Mauro Castillo V. Francis Real V. German Rigau C....
Automatic Assignment of
Domain Labels to WordNet
Mauro Castillo V.
Francis Real V.
German Rigau C.
GWC 2004
Departament de Llenguatges i Sistemes InformàticsUniversitat Politècnica de Catalunya
Outline
•Introduction
•WordNet
•WN Domains
•Experimentation
•Evaluation and results
•Discussion
•Conclusions
Introduction
• To semantically enrich any WN version with the semantic domain labels of MultiWordNet Domains
• WN is an standard resource for semantic processing
• Effectiveness of Word Domain Disambiguation
• The work presented explores the automatic and sistematic assignment of domain labels to glosses
• Proposed Method can be used to correct and verify the suggested labeling
WordNet
• The version WN1.6 was used because of the availability of WN Domains
WN Domains
TOPpure_science
biology
botany
zoology
entomology
anatomy
mathematics
geometry
statistics
... ... ...
WordNet Domain hierarchy
developed at IRST (Magnini and Cavagliá, 2000)
WN Domains
• The synsets have been annotated semiautomatically with one or more labels
• Most of synsets it has single a label
# nom verb adj adv %1 56458 11287 16681 3460 88,20202 8104 743 1113 109 10,10503 1251 88 113 6 1,46324 210 8 8 0 0,22685 2 1 0 0 0,0030
Distribution of domain labels for synset
noun = 1.170 verb = 1.078adj = 1.076adv = 1.033
Average labels for synset
WN Domains
• A domain may include synsets of different syntactic categories : e.g. MEDICINE
doctor#1 (n)operar#7 (v)medical#1 (a)clinically#1 (r)
• A domain label may also contain senses from different Wn subhierarchies. e.g. SPORT
athleta#1 life-form#1game-equipment#1 physical-object#1sport#1 act#2playing-field#1 location#1
WN Domains
• Synsets that have more than one label, do not seem to follow any pattern
• sultana#n#1 (pale yellow seedless grape used for raisins and wine)
• morocco#n#2 (a soft pebble-grained leather made from goatskin; used for shoes and book bindings etc.)
• canicola_fever#n#1(an acute feverish disease in people and in dogs marked by gastroenteritis and mild jaundice)
• blue#n#1, blueness#n#1 (the color of the clear sky in the daytime; "he had eyes of bright blue")
Botany Gastronomy
Anatomy Zoology
Medicine Physiology Zoology
Color Quality
WN Domains
• FACTOTUM : Used to mark the senses of WN that do not have a specific domain
• STOP Senses: The synsets that appear frequently in different contexts, for instance: numbers, colours, etc.
• Word Sense Disambiguation• Word Domain Disambiguation• Text Categorization, etc.
Applications of WN Domains
Experimentation
POS FAC no FAC %FACnoun 66025 58252 11,77verb 12127 4425 63,51adj 17915 6910 61,42adv 3575 1039 70,93
• Process to automatically assign domain labels to WN1.6 glosses
• Validation procedures of the consistency of the domains assignment in WN1.6, and especially, the automatic assignment of the factotum labels
Distribution of synset with and without the domain label factotum in WN1.6
Experimentación
POS FAC no FAC %FACnoun 572 647 11,90verb 43 121 60,33
Test set was randomly selected (around 1%) and the other synsets were used as a training set
Corpus test for nouns and verbs
Experimentation
castle#n#4, castling#n#1 CHESS SPORT
castle castling | interchanging the positions of the king and a rook
castle chess
castle sport
castling chess
castling sport
interchanging chess
interchanging sport
interchanging chess
interchanging sport
interchanging chess
interchanging sport
king chess
king sport
rook chess
rook sport
Calculation of frequency
castle chess 68
castle sport 27
castle hystory 18
castle archictecture 57
castle law 12
castle tourism 24
…
M2: Association Ratio
Experimentation
Measures
Ar(w,D) = Pr(w|D)log2(Pr(w|D) / Pr(w))
M3: Logarithm formula
log2(N*c(w,D) / c(w)c(D))
M1: Square root formula
c(w,D) - 1/N*c(w)c(D)
c(w,D)
Experimentation
CALCULATIONMATRIX
OF WEIGHTS
orange botany 10.1739451057135orange gastronomy 4.98225066954225orange color 3.28232334801756orange jewellery 1.49369255002054orange entomology 1.23243498322359orange quality 1.17822271128967orange hunting 0.412524764820793orange geology 0.293707167933641orange chemistry 0.166183492890361orange biology 0.110492358490017
VALIDATION
TRAINING
Experimentation
glossvariant
VD = weigth(wi,dj)*percentage person
POSITION 1: person = 30.23POSITION 2: politics = 13.40POSITION 3: law = 11.08......
leader | a person who rules or guides or inspires others
06950891 leader#n#1 PERSON
politics 4.30history 3.33religion 2.19person 1.78mythology 1.17commerce 1.11
person 19.94law 8.01economy 4.74religion 4.24anthropology 3.74sexuality 3.53politics 3.49
law 2.70factotum 2.09computer_science 2.05mathematics 1.83grammar 1.68play 1.57linguistics 1.54politics 1.35
tourism 1.64industry 1.54person 1.46mechanics 1.26factotum 1.24occultism 0.98pedagogy 0.93
psychology 0.96factotum 0.82
Evaluation y Results: nouns
N AP AT P R F1M1A 70,94 79,75 64,74 68,25 66,45M1D 74,50 84,85 68,88 72,62 70,70M2A 45,75 50,39 42,73 43,12 42,92M2D 52,09 57,50 48,75 49,21 48,98M3A 66,77 74,50 60,86 63,76 62,27M3D 71,56 81,45 66,54 69,71 68,09
Results for nouns with factotum CF
AP: Accuracy first label
AT: Accuracy all labels
P : Precision
R : Recall
F1 : 2PR/(P+R)
MiA : Measures the success of each formula (M1, M2 or M3) when the first proposed label is correct
MiD : Measures the success of each formula (M1, M2 or M3) when the first proposed label is correct (or subsumed as correct one in the domain hierarchy).
N AP AT P R F1M1A 73,95 81,82 66,81 68,68 67,73M1D 78,50 87,24 71,24 73,24 72,23M2A 52,45 57,52 49,32 48,24 48,77M2D 59,44 65,21 55,94 54,71 55,32M3A 74,48 82,69 68,41 69,41 68,91M3D 78,85 88,64 73,33 74,41 73,87
Results for nouns without factotum SF
Evaluation y Results: verbs
Results for verbs with factotum CF
AP: Accuracy first label
AT: Accuracy all labels
P : Precision
R : Recall
F1 : 2PR/(P+R)
Results for verbs without factotum SF
V AP AT P R F1M1A 51,24 57,02 47,26 50,74 48,94M1D 51,24 57,02 47,26 50,74 48,94M2A 13,22 14,88 12,68 13,24 12,95M2D 16,53 19,83 16,90 17,65 17,27M3A 23,14 28,10 21,94 25,00 23,37M3D 24,79 29,75 23,23 26,47 24,74
V AP AT P R F1M1A 69,77 76,74 64,71 55,93 60,00M1D 74,72 83,72 69,23 61,02 64,86M2A 20,93 25,58 19,64 18,64 19,13M2D 41,86 51,16 38,60 37,29 37,93M3A 41,86 55,81 39,34 40,68 40,00M3D 53,49 67,44 46,77 49,15 47,93
MiA : Measures the success of each formula (M1, M2 or M3) when the first proposed label is correct
MiD : Measures the success of each formula (M1, M2 or M3) when the first proposed label is correct (or subsumed as correct one in the domain hierarchy).
Evaluation y Results
• On average, the method assigns:Noun : 1.23 domains labels (1.170)Verb : 1.20 domains labels (1.078)
• We obtain better results with nouns
• The best average results were obtained with the M1 measure
• The first proposed label (noun): 70% accuracy
• The results of verbs are worse than nouns, one of the reasons may be the high number of verbal synsets labels with factotum domain
Discussion
Monosemic words:
credit application#n#1 (an application for a line of credit)
Domains: SCHOOLProposal 1. BankingProposal 2. Economy Banking
economy
banking
Discussion
Relation between labels:
Academic_program#n#1 (a program of education in liberal arts and sciences (usually in preparation for higher education))Domains: PEDAGOGY
Proposal 1. SchoolProposal 2. University
pedagogy
school university
Discussion
shopping#n#1 (searching for or buying goods or services: "went shopping for a reliable plumber"; "does her shopping at the mall rather than down town")
Domains: ECONOMY
Proposal 1. Commerce
social_science
commerce economy
Relation between labels:
Discussion
Fire_control_radar#n#1 (radar that controls the delivery of fire on a military target)
Domains: MERCHANT_NAVY
Proposal 1. Military
social_science
transport
merchant_navy
military
Relation between labels:
Discussion
Uncertain cases:
birthmark#n#1 (a blemish on the skin formed before birth)Domains: QUALITY
Proposal 1. Medicine
bardolatry#n#1 (idolization of William Shakespeare)Domains: RELIGION
Proposal 1. HistoryProposal 1. Literature
Conclusions
• The procedure to assign automatically domain labels to WN gloss seems to be dificult
• The proposal process is very reliable with the first proposal labels
• The proposal labels are ordered by priority
• It is posible to add new correct labels or validate the old ones
Mauro Castillo V.
Francis Real V.
German Rigau C.
Departament de Llenguatges i Sistemes InformàticsUniversitat Politècnica de Catalunya
Automatic Assignment of
Domain Labels to WordNet
GWC 2004
Discussion
Relations WN:
bowling#n#2 (a game in which balls are rolled at an object or group of objects with the aim of knocking them over)
Domains: BOWLING
Proposal 1. Play
free_timegame#n#2
bowling#n#2
play
bowling sport
play#n#16play sport hyp
hol
WN Domains
BankSense Synset & Gloss Domains
# 1 Depository financial institution, bank, banking concern,banking company (a financial institution…)
ECONOMY
# 2 Bank (sloping land…) GEOGRAPHY,GEOLOGY
# 3 Bank (a supply or stock held in reserve…) ECONOMY# 4 Bank, bank building (a building…) ARCHITECTURE,
ECONOMY# 5 Bank (an arrangement of similar objects…) FACTOTUM# 6 Savings bank, coin bank, money box, bank (a
container…)ECONOMY
# 7 Bank ( a long ridge or pile…) GEOGRAPHY,GEOLOGY
# 8 Bank (the funds help by a gambling house…) ECONOMY,PLAY
# 9 Bank, cant, camber (a slope in the turn of a road…) ARCHITECTURE# 10 Bank (a flight maneuver…) TRANSPORT
• Example (B. Magnini et. Al., 2001)
WN Domains
N SF DOMAINS SUMO TOP ONTOLOGY
#1 Group Economy Corporation Function Group Human
#2 Object Geography Geology Land-area Natural Place Substance
#3 Possession Economy Keeping Function Moneyrepresentation Part
#4 Artifact Architecture Economy Building Artifact Function Object
#5 Group Factotum Collection Group
#6 Artifact Economy Artifact Artifact Container Instrument Object
#7 Object Geography Geology Land-area Natural Place Solid Substance
#8 Possession Economy Play Currency-measure Function
#9 Object Architecture Land-area Natural Place Substance
#10 Act Transport Motion Agentive Boundedevent Cause Condition Dynamic
Purpose