The Descent of Hierarchy, and Selection in Relational Semantics*

46
The Descent of Hierarchy, and Selection in Relational Semantics* Barbara Rosario, Marti Hearst, Charles Fillmore UC Berkeley *with apologies to Charles Darwin

description

The Descent of Hierarchy, and Selection in Relational Semantics*. Barbara Rosario, Marti Hearst, Charles Fillmore UC Berkeley. *with apologies to Charles Darwin. Noun Compounds(NCs). Technical text is rich with NCs - PowerPoint PPT Presentation

Transcript of The Descent of Hierarchy, and Selection in Relational Semantics*

Page 1: The Descent of Hierarchy, and Selection in Relational Semantics*

The Descent of Hierarchy, and Selection

in RelationalSemantics*

Barbara Rosario, Marti Hearst, Charles Fillmore

UC Berkeley

*with apologies to Charles Darwin

Page 2: The Descent of Hierarchy, and Selection in Relational Semantics*

Noun Compounds (NCs) Technical text is rich with NCs

Open-labeled long-term study of the subcutaneous sumatriptan efficacy and tolerability in acute migraine treatment.

Any sequence of nouns that itself functions as a noun asthma hospitalizations health care personnel hand wash

Page 3: The Descent of Hierarchy, and Selection in Relational Semantics*

NCs: 3 computational tasks

Identification Syntactic analysis (attachments)

[Baseline [headache frequency]] [[Tension headache] patient]

Our Goal: Semantic analysis Headache treatment treatment for

headache Corticosteroid treatment treatment that uses

corticosteroid

Page 4: The Descent of Hierarchy, and Selection in Relational Semantics*

Descent of Hierarchy Idea:

Use the top levels of a lexical hierarchy to identify semantic relations

Hypothesis: A particular semantic relation holds

between all 2-word NCs that can be categorized by a lexical category pair.

Page 5: The Descent of Hierarchy, and Selection in Relational Semantics*

Outline Related work Linguistic motivation Lexical Hierarchy: MeSH Labeling NC relations

Method and results Discussion of ambiguity

Page 6: The Descent of Hierarchy, and Selection in Relational Semantics*

Related work (Semantic analysis of NCs)

Rule-based Finin (1980)

Detailed AI analysis, hand-coded Vanderwende (1994)

automatically extracts semantic information from an on-line dictionary, manipulates a set of handwritten rules. 13 classes, 52% accuracy

Probabilistic Lauer (1995):

probabilistic model, 8 classes, 47% accuracy Lapata (2000)

classifies nominalizations into subject/object. 2 classes, 80% accuracy

Page 7: The Descent of Hierarchy, and Selection in Relational Semantics*

Related work (Semantic analysis of NCs)

Lexical Hierarchy Barrett et al. (2001)

WordNet, heuristics to classify a NC given the similarity to a known NC

Rosario and Hearst (2001) MeSH, Neural Network. 18 classes, 60% accuracy

Relations pre-defined

Page 8: The Descent of Hierarchy, and Selection in Relational Semantics*

Linguistic Motivation Semantics of the NCs: head-modifier

relationship Head noun has argument structure

Meaning of the head noun determines what kinds of things can be done to it, what it is made of, what it is a part of…

Page 9: The Descent of Hierarchy, and Selection in Relational Semantics*

Linguistic Motivation (cont.)

Material + Cutlery Made of steel knife, plastic fork, wooden spoon  

Food + Cutlery Used on meat knife, dessert spoon, salad fork 

Profession + Cutlery Used by chef's knife, butcher's knife

Page 10: The Descent of Hierarchy, and Selection in Relational Semantics*

Outline Related work Linguistic motivation Lexical Hierarchy: MeSH Labeling NC relations

Method and results Discussion of ambiguity

Page 11: The Descent of Hierarchy, and Selection in Relational Semantics*

The lexical Hierarchy: MeSH

Tree Structures 1. Anatomy [A] 2. Organisms [B] 3. Diseases [C] 4. Chemicals and Drugs [D] 5. Analytical, Diagnostic and Therapeutic Techniques and Equipment [E] 6. Psychiatry and Psychology [F] 7. Biological Sciences [G] 8. Physical Sciences [H] 9. Anthropology, Education, Sociology and Social Phenomena [I] 10. Technology and Food and Beverages [J] 11. Humanities [K] 12. Information Science [L] 13. Persons [M] 14. Health Care [N] 15. Geographic Locations [Z]

Page 12: The Descent of Hierarchy, and Selection in Relational Semantics*

The lexical Hierarchy: MeSH

1. Anatomy [A] Body Regions [A01] 2. [B] Musculoskeletal System [A02] 3. [C] Digestive System [A03] 4. [D] Respiratory System [A04] 5. [E] Urogenital System [A05] 6. [F] …… 7. [G] 8. Physical Sciences [H] 9. [I] 10. [J] 11. [K] 12. [L] 13. [M]

Page 13: The Descent of Hierarchy, and Selection in Relational Semantics*

Descending the Hierarchy 1. Anatomy [A] Body Regions [A01] Abdomen [A01.047] 2. [B] Musculoskeletal System [A02] Back [A01.176] 3. [C] Digestive System [A03] Breast [A01.236] 4. [D] Respiratory System [A04] Extremities [A01.378] 5. [E] Urogenital System [A05] Head [A01.456] 6. [F] …… Neck [A01.598] 7. [G] …. 8. Physical Sciences [H] 9. [I] 10. [J] 11. [K] 12. [L] 13. [M]

Page 14: The Descent of Hierarchy, and Selection in Relational Semantics*

Descending the Hierarchy 1. Anatomy [A] Body Regions [A01] Abdomen [A01.047] 2. [B] Musculoskeletal System [A02] Back [A01.176] 3. [C] Digestive System [A03] Breast [A01.236] 4. [D] Respiratory System [A04] Extremities [A01.378] 5. [E] Urogenital System [A05] Head [A01.456] 6. [F] …… Neck [A01.598] 7. [G] …. 8. Physical Sciences [H] Electronics 9. [I] Astronomy 10. [J] Nature 11. [K] Time 12. [L] Weights and Measures 13. [M] ….

Page 15: The Descent of Hierarchy, and Selection in Relational Semantics*

Descending the Hierarchy 1. Anatomy [A] Body Regions [A01] Abdomen [A01.047] 2. [B] Musculoskeletal System [A02] Back [A01.176] 3. [C] Digestive System [A03] Breast [A01.236] 4. [D] Respiratory System [A04] Extremities [A01.378] 5. [E] Urogenital System [A05] Head [A01.456] 6. [F] …… Neck [A01.598] 7. [G] …. 8. Physical Sciences [H] Electronics Amplifiers 9. [I] Astronomy Electronics, Medical 10. [J] Nature Transducers 11. [K] Time 12. [L] Weights and Measures 13. [M] ….

Page 16: The Descent of Hierarchy, and Selection in Relational Semantics*

Descending the Hierarchy 1. Anatomy [A] Body Regions [A01] Abdomen [A01.047] 2. [B] Musculoskeletal System [A02] Back [A01.176] 3. [C] Digestive System [A03] Breast [A01.236] 4. [D] Respiratory System [A04] Extremities [A01.378] 5. [E] Urogenital System [A05] Head [A01.456] 6. [F] …… Neck [A01.598] 7. [G] …. 8. Physical Sciences [H] Electronics Amplifiers 9. [I] Astronomy Electronics, Medical 10. [J] Nature Transducers 11. [K] Time 12. [L] Weights and Measures Calibration 13. [M] …. Metric System Reference Standard

Page 17: The Descent of Hierarchy, and Selection in Relational Semantics*

Descending the Hierarchy 1. Anatomy [A] Body Regions [A01] Abdomen [A01.047] 2. [B] Musculoskeletal System [A02] Back [A01.176] 3. [C] Digestive System [A03] Breast [A01.236] 4. [D] Respiratory System [A04] Extremities [A01.378] 5. [E] Urogenital System [A05] Head [A01.456] 6. [F] …… Neck [A01.598] 7. [G] …. 8. Physical Sciences [H] Electronics Amplifiers 9. [I] Astronomy Electronics, Medical 10. [J] Nature Transducers 11. [K] Time 12. [L] Weights and Measures Calibration 13. [M] …. Metric System Reference Standard

Homogeneous

Heterogeneous

Page 18: The Descent of Hierarchy, and Selection in Relational Semantics*

Mapping Nouns to MeSH Concepts

headache recurrence C23.888.592.612.441 C23.550.291.937

headache painC23.888.592.612.441 G11.561.796.444

breast cancer cellsA01.236 C04 A11

Page 19: The Descent of Hierarchy, and Selection in Relational Semantics*

Levels of Description

headache pain

Level 0: C.23 G.11 Level 1: C23.888 G11.561 Level 1: C23.888.592 G11.561.796 … Original: C23.888.592.612.441 G11.561.796.444

Page 20: The Descent of Hierarchy, and Selection in Relational Semantics*

Outline Related work Linguistic motivation Lexical Hierarchy: MeSH Labeling NC relations

Method and results Discussion of ambiguity

Page 21: The Descent of Hierarchy, and Selection in Relational Semantics*

Descent of Hierarchy Idea:

Words falling in homogeneous MeSH subhierarchies behave “similarly” with respect to relation assignment

Hypothesis: A particular semantic relation holds

between all 2-word NCs that can be categorized by a MeSH category pairs

Page 22: The Descent of Hierarchy, and Selection in Relational Semantics*

Grouping the NCs CP: A02 C04 (Musculoskeletal System,

Neoplasms) skull tumors, bone cysts, bone metastases, skull

osteosarcoma… CP: C04 M01 (Neoplasms, Person)

leukemia survivor, lymphoma patients, cancer physician, cancer nurses…

Page 23: The Descent of Hierarchy, and Selection in Relational Semantics*

Distribution of Category Pairs

Page 24: The Descent of Hierarchy, and Selection in Relational Semantics*

Collection ~70,000 NCs extracted from titles and abstracts

of Medline 2,627 CPs at level 0 (with at least 10 unique

NCs) We analyzed

250 CPs with Anatomy (A) 21 CPs with Natural Science (H01) 3 CPs with Neoplasm (C04)

This represents 10% of total CPs and 20% of total NCs

Page 25: The Descent of Hierarchy, and Selection in Relational Semantics*

For each CP

Divide its NCs into “training-testing” sets

“Training”: inspect NCs by hand Start from level 0 0 While NCs are not all similar

descend one level of the hierarchy Repeat until all NCs for that CP are similar

Classification Method

Page 26: The Descent of Hierarchy, and Selection in Relational Semantics*

Using the CPs for classification

CP: A02 C04 (Musculoskeletal System, Neoplasms) skull tumors, bone cysts, bone metastases, skull

osteosarcoma

Page 27: The Descent of Hierarchy, and Selection in Relational Semantics*

Using the CPs for classification

CP: A02 C04 (Musculoskeletal System, Neoplasms) skull tumors, bone cysts, bone metastases, skull

osteosarcoma Similar NCs All NCs under the CP A02 C04 have the same

semantic relationship Location of disease? Disease in Anatomy?

Page 28: The Descent of Hierarchy, and Selection in Relational Semantics*

Using the CPs for classification

CP: A02 C04 (Musculoskeletal System, Neoplasms) skull tumors, bone cysts, bone metastases, skull

osteosarcoma Similar NCs All NCs under the CP A02 C04 have the same

semantic relationship Location of disease? Disease in Anatomy?

Add CP: A02 C04 to the list of classification decisions

Classification decisionsA02 C04

Page 29: The Descent of Hierarchy, and Selection in Relational Semantics*

Using the CPs for classification

CP: B06 B06 (Plants, Plants) eucalyptus trees, apple fruits, rice grains, potato plants

Classification decisionsA02 C04

Page 30: The Descent of Hierarchy, and Selection in Relational Semantics*

Using the CPs for classification

CP: B06 B06 (Plants, Plants) eucalyptus trees, apple fruits, rice grains, potato plants Similar Same relationship Add CP B06 B06

Classification decisionsA02 C04B06 B06

Page 31: The Descent of Hierarchy, and Selection in Relational Semantics*

Using the CPs for classification

CP: C04 M01 (Neoplasms, Person) leukemia survivor, lymphoma patients,

cancer physician, cancer nurses… Person afflicted by Disease? Person who treat

Disease? Too different! Second noun needs to be more specific: Descend

one level for the second noun PersonClassification decisionsA02 C04B06 B06

Page 32: The Descent of Hierarchy, and Selection in Relational Semantics*

Using the CPs for classification

CP: C04 M01 (Neoplasm, Person) leukemia survivor, lymphoma patients, cancer

physician, cancer nurses… Too different!

CP: C04 M01.643 (Neoplasms, Patients) leukemia survivor, lymphoma patients Person afflicted by Disease

CP: C04 M01.526 (Neoplasms, Occupational Groups) cancer physician, cancer nurses… Person who treat Disease

OK

Classification decisionsA02 C04B06 B06C04 M01 C04 M01.643 C04 M01.526

Page 33: The Descent of Hierarchy, and Selection in Relational Semantics*

Classification Decisions A02 C04 B06 B06 C04 M01

C04 M01.643 C04 M01.526

A01 H01 A01 H01.770 A01 H01.671

A01 H01.671.538 A01 H01.671.868

A01 M01 A01 M01.643 A01 M01.526 A01 M01.898

Page 34: The Descent of Hierarchy, and Selection in Relational Semantics*

Classification Decisions + Relations (future work)

A02 C04 Location of Disease B06 B06 Kind of Plants C04 M01

C04 M01.643 Person afflicted by Disease C04 M01.526 Person who treats Disease

A01 H01 A01 H01.770 A01 H01.671

A01 H01.671.538 A01 H01.671.868

A01 M01 A01 M01.643 A01 M01.526 A01 M01.898

Page 35: The Descent of Hierarchy, and Selection in Relational Semantics*

Classification Decisions + Relations (future work)

A02 C04 Location of Disease B06 B06 Kind of Plants C04 M01

C04 M01.643 Person afflicted by Disease C04 M01.526 Person who treats Disease

A01 H01 A01 H01.770 A01 H01.671

A01 H01.671.538 A01 H01.671.868

A01 M01 A01 M01.643 Person afflicted by Disease A01 M01.526 A01 M01.898

Page 36: The Descent of Hierarchy, and Selection in Relational Semantics*

Classification Decision Levels Anatomy: 250 CPs

187 (75%) remain first level 56 (22%) descend one level 7 (3%) descend two levels

Natural Science (H01): 21 CPs 1 (4%) remain first level 8 (39%) descend one level 12 (57%) descend two levels

Neoplasms (C04) 3 CPs: 3 (100%) descend one level

Page 37: The Descent of Hierarchy, and Selection in Relational Semantics*

Evaluation Test the decisions on “testing” set Count how many NCs that fall in the groups

defined in the classification decisions are similar to each other

Accuracy: Anatomy: 91% accurate Natural Science: 79% Neoplasm: 100%

Total Accuracy : 90.8% Generalization: our 415 classification decisions

cover ~ 46,000 possible CP pairs

Page 38: The Descent of Hierarchy, and Selection in Relational Semantics*

Outline Related work Linguistic motivation Lexical Hierarchy: MeSH Labeling NC relations

Method and results Discussion of ambiguity

Page 39: The Descent of Hierarchy, and Selection in Relational Semantics*

Ambiguity – Two Types Lexical ambiguity:

mortality state of being mortal death rate

Relationship ambiguity: bacteria mortality

death of bacteria death caused by bacteria

Page 40: The Descent of Hierarchy, and Selection in Relational Semantics*

Lexical Ambiguity vs. Multiple MeSH Senses

Lexical ambiguity different from multiple MeSH senses Ex: Mortality has 4 senses

Public Health (G) Data Collection Vital Statistics   Mortality Investigative Techniques (E) Data Collection Vital Statistics  

Mortality Information Science (L) Data Collection Vital Statistics   Mortality Population Characteristics (N) Demography Vital Statistics  

Mortality

On average, there are 1.5 MeSH senses per word for the nouns in our collection

Page 41: The Descent of Hierarchy, and Selection in Relational Semantics*

Four CasesSingle MeSH senses Multiple MeSH senses

Only one possible relationship: abdomen radiography, aciclovir treatment

Multiple relationships: hospital databases, education efforts, kidney metabolism

Only one possible relationship: alcoholism treatment

Ambiguity of relationship

Multiple relationships bacteria mortality

Page 42: The Descent of Hierarchy, and Selection in Relational Semantics*

Four CasesSingle MeSH senses Multiple MeSH senses

Only one possible relationship: abdomen radiography, aciclovir treatment

Multiple relationships: hospital databases, education efforts, kidney metabolism

Only one possible relationship: alcoholism treatment

Ambiguity of relationship

Multiple relationships bacteria mortality

Most problematic cases… but rare!

Page 43: The Descent of Hierarchy, and Selection in Relational Semantics*

Conclusions Very simple method for assigning semantic

relations to two-word technical NCs 90.8% accuracy

Grouping the NCs with respect to their semantic descriptors

Lexical resource (MeSH) useful for this task Use the upper levels of the lexical hierarchy

for an accurate classification, reducing therefore the space of the problem

Page 44: The Descent of Hierarchy, and Selection in Relational Semantics*

Future work Analyze full spectrum of hierarchy NCs with > 2 terms

[[growth hormone] deficiency] Other syntactic structures Non-biomedical words

Other ontologies (e.g.,WordNet)?

Page 45: The Descent of Hierarchy, and Selection in Relational Semantics*

And given enough data… skull character jaw depression nose resuscitation cadaver motion

Page 46: The Descent of Hierarchy, and Selection in Relational Semantics*

Thanks!

For more information:http://bailando.sims.berkeley.edu/lin

di/