Hui Yang ( 杨慧 ) Language Technologies Institute School of Computer Science Carnegie Mellon...
-
date post
20-Dec-2015 -
Category
Documents
-
view
227 -
download
0
Transcript of Hui Yang ( 杨慧 ) Language Technologies Institute School of Computer Science Carnegie Mellon...
![Page 1: Hui Yang ( 杨慧 ) Language Technologies Institute School of Computer Science Carnegie Mellon University huiyang@cs.cmu.edu 5 Sep 2008 @ Xi’an Jiaotong University.](https://reader036.fdocuments.net/reader036/viewer/2022062300/56649d405503460f94a1b005/html5/thumbnails/1.jpg)
ONTOLOGY LEARNING
![Page 2: Hui Yang ( 杨慧 ) Language Technologies Institute School of Computer Science Carnegie Mellon University huiyang@cs.cmu.edu 5 Sep 2008 @ Xi’an Jiaotong University.](https://reader036.fdocuments.net/reader036/viewer/2022062300/56649d405503460f94a1b005/html5/thumbnails/2.jpg)
A LITTLE BIT OF CONTEXT….THE LANGUAGE TECHNOLOGIES
INSTITUTE
Carnegie Mellon University’s School of Computer Science: 1 undergraduate program 7 graduate departments (CSD, HCI, LTI, RI, SEI,MLD,ETC)The Language Technologies Institute is a graduate department
in the School of Computer Science About 25 faculty About 125 graduate students (~85 PhD, ~40MS) About 30 visitors, post-docs, staff programmers, …
© 2005 JAMIE CALLAN
2
![Page 3: Hui Yang ( 杨慧 ) Language Technologies Institute School of Computer Science Carnegie Mellon University huiyang@cs.cmu.edu 5 Sep 2008 @ Xi’an Jiaotong University.](https://reader036.fdocuments.net/reader036/viewer/2022062300/56649d405503460f94a1b005/html5/thumbnails/3.jpg)
A LITTLE BIT OF CONTEXT….THE LANGUAGE TECHNOLOGIES
INSTITUTE
LTI courses and research focus on Machine translation, especially high-accuracy MT Natural language processing & computational linguistics Information retrieval & text mining Speech recognition & synthesis Computer-assisted language learning & intelligent tutoring Computational biology (“the language of the human genome”) … and combinations of the above
Speech-to-speech MT Open domain question answering …
3
![Page 4: Hui Yang ( 杨慧 ) Language Technologies Institute School of Computer Science Carnegie Mellon University huiyang@cs.cmu.edu 5 Sep 2008 @ Xi’an Jiaotong University.](https://reader036.fdocuments.net/reader036/viewer/2022062300/56649d405503460f94a1b005/html5/thumbnails/4.jpg)
A LITTLE BIT ABOUT ME
My Research Interests Text Mining Information Retrieval Natural Language Processing Statistical Machine Learning
My Earlier Work Question Answering Multimedia Information Retrieval Near-duplicate Detection Opinion Detection
![Page 5: Hui Yang ( 杨慧 ) Language Technologies Institute School of Computer Science Carnegie Mellon University huiyang@cs.cmu.edu 5 Sep 2008 @ Xi’an Jiaotong University.](https://reader036.fdocuments.net/reader036/viewer/2022062300/56649d405503460f94a1b005/html5/thumbnails/5.jpg)
TODAY’S TALK
ONTOLOGY LEARNING
![Page 6: Hui Yang ( 杨慧 ) Language Technologies Institute School of Computer Science Carnegie Mellon University huiyang@cs.cmu.edu 5 Sep 2008 @ Xi’an Jiaotong University.](https://reader036.fdocuments.net/reader036/viewer/2022062300/56649d405503460f94a1b005/html5/thumbnails/6.jpg)
ROADMAP
Introduction
Subtasks in Ontology Learning
Human-Guided Ontology Learning
User Study
Metric-Based Ontology Learning
Experimental Results
Conclusions
![Page 7: Hui Yang ( 杨慧 ) Language Technologies Institute School of Computer Science Carnegie Mellon University huiyang@cs.cmu.edu 5 Sep 2008 @ Xi’an Jiaotong University.](https://reader036.fdocuments.net/reader036/viewer/2022062300/56649d405503460f94a1b005/html5/thumbnails/7.jpg)
INFORMATION RETRIEVAL
TECHNOLOGIES
Web Search Engines have changed our life
Google’s great achievement
But, have Search Engines fulfilled Information Needs?
Some, only some
What does search bring to us?
Overwhelming Information in Search Results
Tedious manual judgment still needed
![Page 8: Hui Yang ( 杨慧 ) Language Technologies Institute School of Computer Science Carnegie Mellon University huiyang@cs.cmu.edu 5 Sep 2008 @ Xi’an Jiaotong University.](https://reader036.fdocuments.net/reader036/viewer/2022062300/56649d405503460f94a1b005/html5/thumbnails/8.jpg)
FIND A GOOD KINDERGARTEN IN THE
PITTSBURGH AREA
![Page 9: Hui Yang ( 杨慧 ) Language Technologies Institute School of Computer Science Carnegie Mellon University huiyang@cs.cmu.edu 5 Sep 2008 @ Xi’an Jiaotong University.](https://reader036.fdocuments.net/reader036/viewer/2022062300/56649d405503460f94a1b005/html5/thumbnails/9.jpg)
BUY A USED CAR IN THE PITTSBURGH AREA
![Page 10: Hui Yang ( 杨慧 ) Language Technologies Institute School of Computer Science Carnegie Mellon University huiyang@cs.cmu.edu 5 Sep 2008 @ Xi’an Jiaotong University.](https://reader036.fdocuments.net/reader036/viewer/2022062300/56649d405503460f94a1b005/html5/thumbnails/10.jpg)
IT WILL BE GREAT TO HAVE
A process to Crawl related documents Sort through relevant documents Identify important concepts/topics Organize materials
![Page 11: Hui Yang ( 杨慧 ) Language Technologies Institute School of Computer Science Carnegie Mellon University huiyang@cs.cmu.edu 5 Sep 2008 @ Xi’an Jiaotong University.](https://reader036.fdocuments.net/reader036/viewer/2022062300/56649d405503460f94a1b005/html5/thumbnails/11.jpg)
THIS IS EXACTLY THE TASK OF INFORMATION TRIAGE, OR PERSONAL ONTOLOGY LEARNING
![Page 12: Hui Yang ( 杨慧 ) Language Technologies Institute School of Computer Science Carnegie Mellon University huiyang@cs.cmu.edu 5 Sep 2008 @ Xi’an Jiaotong University.](https://reader036.fdocuments.net/reader036/viewer/2022062300/56649d405503460f94a1b005/html5/thumbnails/12.jpg)
INTRODUCTION
Ontology is a data model that represents a set of concepts within a domain and the set of pair-wise relationships between those concepts.
![Page 13: Hui Yang ( 杨慧 ) Language Technologies Institute School of Computer Science Carnegie Mellon University huiyang@cs.cmu.edu 5 Sep 2008 @ Xi’an Jiaotong University.](https://reader036.fdocuments.net/reader036/viewer/2022062300/56649d405503460f94a1b005/html5/thumbnails/13.jpg)
EXAMPLE: A SIMPLE ONTOLOGY
ball table
Game Equipment
![Page 14: Hui Yang ( 杨慧 ) Language Technologies Institute School of Computer Science Carnegie Mellon University huiyang@cs.cmu.edu 5 Sep 2008 @ Xi’an Jiaotong University.](https://reader036.fdocuments.net/reader036/viewer/2022062300/56649d405503460f94a1b005/html5/thumbnails/14.jpg)
EXAMPLE: WORDNET
![Page 15: Hui Yang ( 杨慧 ) Language Technologies Institute School of Computer Science Carnegie Mellon University huiyang@cs.cmu.edu 5 Sep 2008 @ Xi’an Jiaotong University.](https://reader036.fdocuments.net/reader036/viewer/2022062300/56649d405503460f94a1b005/html5/thumbnails/15.jpg)
EXAMPLE: ODP
ODP
![Page 16: Hui Yang ( 杨慧 ) Language Technologies Institute School of Computer Science Carnegie Mellon University huiyang@cs.cmu.edu 5 Sep 2008 @ Xi’an Jiaotong University.](https://reader036.fdocuments.net/reader036/viewer/2022062300/56649d405503460f94a1b005/html5/thumbnails/16.jpg)
INTRODUCTION
Ontology Learning is the task to construct a well-defined ontology given
a text corpus or
a set of concept terms
![Page 17: Hui Yang ( 杨慧 ) Language Technologies Institute School of Computer Science Carnegie Mellon University huiyang@cs.cmu.edu 5 Sep 2008 @ Xi’an Jiaotong University.](https://reader036.fdocuments.net/reader036/viewer/2022062300/56649d405503460f94a1b005/html5/thumbnails/17.jpg)
INTRODUCTION
Ontology offers a nice way to summarize the important topics in a domain/collection
Ontology facilitates knowledge sharing and reuse
Ontology offers relational associations for reasoning and inference
![Page 18: Hui Yang ( 杨慧 ) Language Technologies Institute School of Computer Science Carnegie Mellon University huiyang@cs.cmu.edu 5 Sep 2008 @ Xi’an Jiaotong University.](https://reader036.fdocuments.net/reader036/viewer/2022062300/56649d405503460f94a1b005/html5/thumbnails/18.jpg)
ROADMAP
Introduction
Subtasks in Ontology Learning
Human-Guided Ontology Learning
User Study
Metric-Based Ontology Learning
Experimental Results
Conclusions
![Page 19: Hui Yang ( 杨慧 ) Language Technologies Institute School of Computer Science Carnegie Mellon University huiyang@cs.cmu.edu 5 Sep 2008 @ Xi’an Jiaotong University.](https://reader036.fdocuments.net/reader036/viewer/2022062300/56649d405503460f94a1b005/html5/thumbnails/19.jpg)
SUBTASKS IN ONTOLOGY LEARNING
Concept Extraction
Synonym Detection
Relationship Formulation by Clustering
Cluster Labeling
![Page 20: Hui Yang ( 杨慧 ) Language Technologies Institute School of Computer Science Carnegie Mellon University huiyang@cs.cmu.edu 5 Sep 2008 @ Xi’an Jiaotong University.](https://reader036.fdocuments.net/reader036/viewer/2022062300/56649d405503460f94a1b005/html5/thumbnails/20.jpg)
SUBTASKS IN ONTOLOGY LEARNING
Concept Extraction
Synonym Detection
Relationship Formulation by Clustering
Cluster Labeling
![Page 21: Hui Yang ( 杨慧 ) Language Technologies Institute School of Computer Science Carnegie Mellon University huiyang@cs.cmu.edu 5 Sep 2008 @ Xi’an Jiaotong University.](https://reader036.fdocuments.net/reader036/viewer/2022062300/56649d405503460f94a1b005/html5/thumbnails/21.jpg)
CONCEPT EXTRACTION
Two Steps:
Noun N-gram and Named Entity Mining
Web-based Concept Filtering
![Page 22: Hui Yang ( 杨慧 ) Language Technologies Institute School of Computer Science Carnegie Mellon University huiyang@cs.cmu.edu 5 Sep 2008 @ Xi’an Jiaotong University.](https://reader036.fdocuments.net/reader036/viewer/2022062300/56649d405503460f94a1b005/html5/thumbnails/22.jpg)
NOUN N-GRAM MINING
I/PRP strongly/RB urge/VBP you/PRP to/TO cut/VB mercury/NN emissions/NNS from/IN power/NN plants/NNS by/IN 90/CD percent/NN by/IN 2008/CD ./.
![Page 23: Hui Yang ( 杨慧 ) Language Technologies Institute School of Computer Science Carnegie Mellon University huiyang@cs.cmu.edu 5 Sep 2008 @ Xi’an Jiaotong University.](https://reader036.fdocuments.net/reader036/viewer/2022062300/56649d405503460f94a1b005/html5/thumbnails/23.jpg)
NOUN N-GRAM MINING
I/PRP strongly/RB urge/VBP you/PRP to/TO cut/VB mercury/NN emissions/NNS from/IN power/NN plants/NNS by/IN 90/CD percent/NN by/IN 2008/CD ./.
Extracted Bi-grams
Mercury emissions
Power plants
![Page 24: Hui Yang ( 杨慧 ) Language Technologies Institute School of Computer Science Carnegie Mellon University huiyang@cs.cmu.edu 5 Sep 2008 @ Xi’an Jiaotong University.](https://reader036.fdocuments.net/reader036/viewer/2022062300/56649d405503460f94a1b005/html5/thumbnails/24.jpg)
Concept Filtering
Web-based POS error detection Assumption:
Among the first 10 google snippets, a valid concept appears more than a threshold (4 in our case)
Remove POS errors protect/NN polar/NN bear/NN
Remove Spelling errors Pullution, polor bear
![Page 25: Hui Yang ( 杨慧 ) Language Technologies Institute School of Computer Science Carnegie Mellon University huiyang@cs.cmu.edu 5 Sep 2008 @ Xi’an Jiaotong University.](https://reader036.fdocuments.net/reader036/viewer/2022062300/56649d405503460f94a1b005/html5/thumbnails/25.jpg)
CONCEPT EXTRACTION
![Page 26: Hui Yang ( 杨慧 ) Language Technologies Institute School of Computer Science Carnegie Mellon University huiyang@cs.cmu.edu 5 Sep 2008 @ Xi’an Jiaotong University.](https://reader036.fdocuments.net/reader036/viewer/2022062300/56649d405503460f94a1b005/html5/thumbnails/26.jpg)
SUBTASKS IN ONTOLOGY LEARNING
Concept Extraction
Synonym Detection
Relationship Formulation by Clustering
Cluster Labeling
![Page 27: Hui Yang ( 杨慧 ) Language Technologies Institute School of Computer Science Carnegie Mellon University huiyang@cs.cmu.edu 5 Sep 2008 @ Xi’an Jiaotong University.](https://reader036.fdocuments.net/reader036/viewer/2022062300/56649d405503460f94a1b005/html5/thumbnails/27.jpg)
CLUSTERING
Hierarchical Clustering
Different Strategies for Concepts at Different Abstraction Levels
![Page 28: Hui Yang ( 杨慧 ) Language Technologies Institute School of Computer Science Carnegie Mellon University huiyang@cs.cmu.edu 5 Sep 2008 @ Xi’an Jiaotong University.](https://reader036.fdocuments.net/reader036/viewer/2022062300/56649d405503460f94a1b005/html5/thumbnails/28.jpg)
EXAMPLE: A SIMPLE ONTOLOGY
ball table
Game Equipment
Abstract Level
Concrete Level
![Page 29: Hui Yang ( 杨慧 ) Language Technologies Institute School of Computer Science Carnegie Mellon University huiyang@cs.cmu.edu 5 Sep 2008 @ Xi’an Jiaotong University.](https://reader036.fdocuments.net/reader036/viewer/2022062300/56649d405503460f94a1b005/html5/thumbnails/29.jpg)
BOTTOM-UP HIERARCHICAL
CLUSTERING
Concept candidates are organized into groups based on the 1st sense of the head noun in WordNet
One of their common head nouns will be selected as the parent concept for this group
pollution subsumes water pollution, air pollution.
Create a high accuracy concept forests at the lower level of the ontology
Start from Concrete Concepts
![Page 30: Hui Yang ( 杨慧 ) Language Technologies Institute School of Computer Science Carnegie Mellon University huiyang@cs.cmu.edu 5 Sep 2008 @ Xi’an Jiaotong University.](https://reader036.fdocuments.net/reader036/viewer/2022062300/56649d405503460f94a1b005/html5/thumbnails/30.jpg)
ONTOLOGY FRAGMENTS
Different fragments are
grouped
![Page 31: Hui Yang ( 杨慧 ) Language Technologies Institute School of Computer Science Carnegie Mellon University huiyang@cs.cmu.edu 5 Sep 2008 @ Xi’an Jiaotong University.](https://reader036.fdocuments.net/reader036/viewer/2022062300/56649d405503460f94a1b005/html5/thumbnails/31.jpg)
CONTINUE TO BE BOTTOM-UP
Problem Still a forest Many concepts at top level are not
grouped Solution: Clustering In any clustering algorithm, we need
a metric Hard to know the metric to
measure distance for those top level nodes
![Page 32: Hui Yang ( 杨慧 ) Language Technologies Institute School of Computer Science Carnegie Mellon University huiyang@cs.cmu.edu 5 Sep 2008 @ Xi’an Jiaotong University.](https://reader036.fdocuments.net/reader036/viewer/2022062300/56649d405503460f94a1b005/html5/thumbnails/32.jpg)
HUMAN-GUIDED ONTOLOGY LEARNING
Learn What? A distance metric function
Learn from What? Concepts at lower levels since they are highly
accurate User feedback
After learning, then what? Apply the distance metric function to concepts at
the higher level to get distance scores for them Then use whatever clustering algorithm to group
them based on the distance scores
![Page 33: Hui Yang ( 杨慧 ) Language Technologies Institute School of Computer Science Carnegie Mellon University huiyang@cs.cmu.edu 5 Sep 2008 @ Xi’an Jiaotong University.](https://reader036.fdocuments.net/reader036/viewer/2022062300/56649d405503460f94a1b005/html5/thumbnails/33.jpg)
TRAINING DATA FROM LOWER LEVELS
A set of concepts x(i) on the ith level of the ontology hierarchy
Distance matrix y(i)
The Matrix entry which corresponding to concept x(i)
j and x(i)k is y(i)
jk∈{0,1},
y(i)jk = 0, if x(i)
j and x(i)k in the same group;
y(i)jk = 1, otherwise.
![Page 34: Hui Yang ( 杨慧 ) Language Technologies Institute School of Computer Science Carnegie Mellon University huiyang@cs.cmu.edu 5 Sep 2008 @ Xi’an Jiaotong University.](https://reader036.fdocuments.net/reader036/viewer/2022062300/56649d405503460f94a1b005/html5/thumbnails/34.jpg)
TRAINING DATA FROM LOWER LEVELS
0011
0011
1100
1100
y(i)
![Page 35: Hui Yang ( 杨慧 ) Language Technologies Institute School of Computer Science Carnegie Mellon University huiyang@cs.cmu.edu 5 Sep 2008 @ Xi’an Jiaotong University.](https://reader036.fdocuments.net/reader036/viewer/2022062300/56649d405503460f94a1b005/html5/thumbnails/35.jpg)
LEARNING THE DISTANCE METRIC
Distance metric represented as a Mahalanobis distance
Φ(x(i)j, x(i)
k)represents a set of pairwise underlying feature functions
A is a positive semi-definite matrix, the parameter we need to learn
Parameter estimation by Minimizing Squared Errors
![Page 36: Hui Yang ( 杨慧 ) Language Technologies Institute School of Computer Science Carnegie Mellon University huiyang@cs.cmu.edu 5 Sep 2008 @ Xi’an Jiaotong University.](https://reader036.fdocuments.net/reader036/viewer/2022062300/56649d405503460f94a1b005/html5/thumbnails/36.jpg)
SOLVE THE OPTIMIZATION
PROBLEM
Optimization can be done by
Newton’s Method
Interior-Point Method
Any standard semi-definite programming (SDP) solvers Sedumi, yalmip
![Page 37: Hui Yang ( 杨慧 ) Language Technologies Institute School of Computer Science Carnegie Mellon University huiyang@cs.cmu.edu 5 Sep 2008 @ Xi’an Jiaotong University.](https://reader036.fdocuments.net/reader036/viewer/2022062300/56649d405503460f94a1b005/html5/thumbnails/37.jpg)
GENERATE DISTANCE SCORES
We have learned A!
For any pair of concepts at higher level (x(i+1)l,
x(i+1)m), the corresponding entry in the distance
matrix y(i+1) is
![Page 38: Hui Yang ( 杨慧 ) Language Technologies Institute School of Computer Science Carnegie Mellon University huiyang@cs.cmu.edu 5 Sep 2008 @ Xi’an Jiaotong University.](https://reader036.fdocuments.net/reader036/viewer/2022062300/56649d405503460f94a1b005/html5/thumbnails/38.jpg)
K-MEDOIDS CLUSTERING
Flat clustering at a level
Use one of the concepts as the cluster center
Estimate the number of clusters by Gap statistics [Tibshirani et al. 2000]
![Page 39: Hui Yang ( 杨慧 ) Language Technologies Institute School of Computer Science Carnegie Mellon University huiyang@cs.cmu.edu 5 Sep 2008 @ Xi’an Jiaotong University.](https://reader036.fdocuments.net/reader036/viewer/2022062300/56649d405503460f94a1b005/html5/thumbnails/39.jpg)
HUMAN COMPUTER INTERACTION
![Page 40: Hui Yang ( 杨慧 ) Language Technologies Institute School of Computer Science Carnegie Mellon University huiyang@cs.cmu.edu 5 Sep 2008 @ Xi’an Jiaotong University.](https://reader036.fdocuments.net/reader036/viewer/2022062300/56649d405503460f94a1b005/html5/thumbnails/40.jpg)
SUBTASKS IN ONTOLOGY LEARNING
Concept Extraction
Synonym Detection
Relationship Formulation by Clustering
Cluster Labeling
![Page 41: Hui Yang ( 杨慧 ) Language Technologies Institute School of Computer Science Carnegie Mellon University huiyang@cs.cmu.edu 5 Sep 2008 @ Xi’an Jiaotong University.](https://reader036.fdocuments.net/reader036/viewer/2022062300/56649d405503460f94a1b005/html5/thumbnails/41.jpg)
CLUSTER LABELING
Problem: Concepts are grouped together,
but nameless Solution:
A web-based approach Send a query formed by
concatenating the child concepts to Google
Parse top 10 snippets The most frequent word is
selected to be the parent of this group
![Page 42: Hui Yang ( 杨慧 ) Language Technologies Institute School of Computer Science Carnegie Mellon University huiyang@cs.cmu.edu 5 Sep 2008 @ Xi’an Jiaotong University.](https://reader036.fdocuments.net/reader036/viewer/2022062300/56649d405503460f94a1b005/html5/thumbnails/42.jpg)
ROADMAP
Introduction
Subtasks in Ontology Learning
Human-Guided Ontology Learning
User Study
Metric-Based Ontology Learning
Experimental Results
Conclusions
![Page 43: Hui Yang ( 杨慧 ) Language Technologies Institute School of Computer Science Carnegie Mellon University huiyang@cs.cmu.edu 5 Sep 2008 @ Xi’an Jiaotong University.](https://reader036.fdocuments.net/reader036/viewer/2022062300/56649d405503460f94a1b005/html5/thumbnails/43.jpg)
USER STUDY
12 graduate students from political science in University of Pittsburgh
Divided into two groups Manual group:4 Interactive group:8
Task: Construct ontology hierarchy for 4 datasets
Mercury Polar bear Wolf Toxin Release Inventory (tri)
90 minutes limit or till user’s satisfaction
![Page 44: Hui Yang ( 杨慧 ) Language Technologies Institute School of Computer Science Carnegie Mellon University huiyang@cs.cmu.edu 5 Sep 2008 @ Xi’an Jiaotong University.](https://reader036.fdocuments.net/reader036/viewer/2022062300/56649d405503460f94a1b005/html5/thumbnails/44.jpg)
DATASETS
![Page 45: Hui Yang ( 杨慧 ) Language Technologies Institute School of Computer Science Carnegie Mellon University huiyang@cs.cmu.edu 5 Sep 2008 @ Xi’an Jiaotong University.](https://reader036.fdocuments.net/reader036/viewer/2022062300/56649d405503460f94a1b005/html5/thumbnails/45.jpg)
SOFTWARE USED FOR USER STUDY
![Page 46: Hui Yang ( 杨慧 ) Language Technologies Institute School of Computer Science Carnegie Mellon University huiyang@cs.cmu.edu 5 Sep 2008 @ Xi’an Jiaotong University.](https://reader036.fdocuments.net/reader036/viewer/2022062300/56649d405503460f94a1b005/html5/thumbnails/46.jpg)
QUALITY OF MANUAL VS. INTERACTIVE RUNS
Manual Users show moderate agreements (0.4~0.6)
Interactive runs produce similar quality results
Difference between manual and interactive runs is NOT statistically significant
![Page 47: Hui Yang ( 杨慧 ) Language Technologies Institute School of Computer Science Carnegie Mellon University huiyang@cs.cmu.edu 5 Sep 2008 @ Xi’an Jiaotong University.](https://reader036.fdocuments.net/reader036/viewer/2022062300/56649d405503460f94a1b005/html5/thumbnails/47.jpg)
COSTS OF MANUAL VS. INTERACTIVE RUNS
Interactive users use 40% less edits (statistically significant)
Interactive runs save 30-60 minutes per ontology
Within interactive runs, a human spends 64% less time than manual runs
![Page 48: Hui Yang ( 杨慧 ) Language Technologies Institute School of Computer Science Carnegie Mellon University huiyang@cs.cmu.edu 5 Sep 2008 @ Xi’an Jiaotong University.](https://reader036.fdocuments.net/reader036/viewer/2022062300/56649d405503460f94a1b005/html5/thumbnails/48.jpg)
CONTRIBUTIONS OF HUMAN-GUIDED ONTOLOGY
LEARNING
Effectively combine the strengths of automatic systems and human knowledge
Combine many techniques into a unified framework pattern-based(concept mining) knowledge-based (use of Wordnet) Web-based (concept filtering and cluster naming) Machine Learning
A detailed independent user study
![Page 49: Hui Yang ( 杨慧 ) Language Technologies Institute School of Computer Science Carnegie Mellon University huiyang@cs.cmu.edu 5 Sep 2008 @ Xi’an Jiaotong University.](https://reader036.fdocuments.net/reader036/viewer/2022062300/56649d405503460f94a1b005/html5/thumbnails/49.jpg)
WHAT TO IMPROVE?
Is bottom-up the best way to do? Maybe not Incremental clustering saves most efforts
We have used different technologies for concepts at different levels, how to formally generalize it? Model concept abstractness explicitly
We have tested on domain-specific corpora, how about corpora for more general purposes? Can we reconstruct WordNet or ODP?
![Page 50: Hui Yang ( 杨慧 ) Language Technologies Institute School of Computer Science Carnegie Mellon University huiyang@cs.cmu.edu 5 Sep 2008 @ Xi’an Jiaotong University.](https://reader036.fdocuments.net/reader036/viewer/2022062300/56649d405503460f94a1b005/html5/thumbnails/50.jpg)
ROADMAP
Introduction
Subtasks in Ontology Learning
Human-Guided Ontology Learning
User Study
Metric-Based Ontology Learning
Experimental Results
Conclusions
![Page 51: Hui Yang ( 杨慧 ) Language Technologies Institute School of Computer Science Carnegie Mellon University huiyang@cs.cmu.edu 5 Sep 2008 @ Xi’an Jiaotong University.](https://reader036.fdocuments.net/reader036/viewer/2022062300/56649d405503460f94a1b005/html5/thumbnails/51.jpg)
CHALLENGES
Hard to find a good name for a new group in bottom-up clustering framework
Formally model concept abstractness Intelligently use different techniques for concepts
at different abstract levels Flexibly incorporate heterogeneous features
State-of-the-art either use one type of semantic evidence to infer all relationships, or
Use one type of feature for a particular subtask
![Page 52: Hui Yang ( 杨慧 ) Language Technologies Institute School of Computer Science Carnegie Mellon University huiyang@cs.cmu.edu 5 Sep 2008 @ Xi’an Jiaotong University.](https://reader036.fdocuments.net/reader036/viewer/2022062300/56649d405503460f94a1b005/html5/thumbnails/52.jpg)
CHALLENGES
Hard to find a good name for a new group in bottom-up clustering framework
Formally model concept abstractness Intelligently use different techniques for concepts
at different abstract levels Flexibly incorporate heterogeneous features
State-of-the-art either use one type of semantic evidence to infer all relationships, or
Use one type of feature for a particular subtask
Solution:Increment
al clustering
![Page 53: Hui Yang ( 杨慧 ) Language Technologies Institute School of Computer Science Carnegie Mellon University huiyang@cs.cmu.edu 5 Sep 2008 @ Xi’an Jiaotong University.](https://reader036.fdocuments.net/reader036/viewer/2022062300/56649d405503460f94a1b005/html5/thumbnails/53.jpg)
CHALLENGES
Hard to find a good name for a new group in bottom-up clustering framework
Formally model concept abstractness Intelligently use different techniques for concepts
at different abstract levels Flexibly incorporate heterogeneous features
State-of-the-art either use one type of semantic evidence to infer all relationships, or
Use one type of feature for a particular subtask
Solution:Learn statistical models for each abstraction level
![Page 54: Hui Yang ( 杨慧 ) Language Technologies Institute School of Computer Science Carnegie Mellon University huiyang@cs.cmu.edu 5 Sep 2008 @ Xi’an Jiaotong University.](https://reader036.fdocuments.net/reader036/viewer/2022062300/56649d405503460f94a1b005/html5/thumbnails/54.jpg)
CHALLENGES
Hard to find a good name for a new group in bottom-up clustering framework
Formally model concept abstractness Intelligently use different techniques for concepts
at different abstract levels Flexibly incorporate heterogeneous features
State-of-the-art either use one type of semantic evidence to infer all relationships, or
Use one type of feature for a particular subtask
Solution:Separate metric learning and ontology construction
![Page 55: Hui Yang ( 杨慧 ) Language Technologies Institute School of Computer Science Carnegie Mellon University huiyang@cs.cmu.edu 5 Sep 2008 @ Xi’an Jiaotong University.](https://reader036.fdocuments.net/reader036/viewer/2022062300/56649d405503460f94a1b005/html5/thumbnails/55.jpg)
A UNIFIED SOLUTION
Metric-based Ontology Learning
![Page 56: Hui Yang ( 杨慧 ) Language Technologies Institute School of Computer Science Carnegie Mellon University huiyang@cs.cmu.edu 5 Sep 2008 @ Xi’an Jiaotong University.](https://reader036.fdocuments.net/reader036/viewer/2022062300/56649d405503460f94a1b005/html5/thumbnails/56.jpg)
LET’S BEGIN WITH SOME IMPORTANT
DEFINITIONS
An ontology is a data model
Concept Set Relationship Set Domain
![Page 57: Hui Yang ( 杨慧 ) Language Technologies Institute School of Computer Science Carnegie Mellon University huiyang@cs.cmu.edu 5 Sep 2008 @ Xi’an Jiaotong University.](https://reader036.fdocuments.net/reader036/viewer/2022062300/56649d405503460f94a1b005/html5/thumbnails/57.jpg)
MORE DEFINITIONS
ball table
Game Equipment
A Full Ontology
![Page 58: Hui Yang ( 杨慧 ) Language Technologies Institute School of Computer Science Carnegie Mellon University huiyang@cs.cmu.edu 5 Sep 2008 @ Xi’an Jiaotong University.](https://reader036.fdocuments.net/reader036/viewer/2022062300/56649d405503460f94a1b005/html5/thumbnails/58.jpg)
MORE DEFINITIONS
ball
Game Equipment
A Partial Ontology
table
![Page 59: Hui Yang ( 杨慧 ) Language Technologies Institute School of Computer Science Carnegie Mellon University huiyang@cs.cmu.edu 5 Sep 2008 @ Xi’an Jiaotong University.](https://reader036.fdocuments.net/reader036/viewer/2022062300/56649d405503460f94a1b005/html5/thumbnails/59.jpg)
MORE DEFINITIONSOntology
Metric
weight = 1.5 weight= 2
weight=1weight =1
d( , ) = 2
d( , ) = 1 ball
d( , ) = 4.5 table
![Page 60: Hui Yang ( 杨慧 ) Language Technologies Institute School of Computer Science Carnegie Mellon University huiyang@cs.cmu.edu 5 Sep 2008 @ Xi’an Jiaotong University.](https://reader036.fdocuments.net/reader036/viewer/2022062300/56649d405503460f94a1b005/html5/thumbnails/60.jpg)
MORE DEFINITIONS
weight= 1.5 weight= 2
weight=1weight=1
d( , ) = 2
d( , ) = 1 ball
d( , ) = 4.5 table
Information in an
Ontology T
![Page 61: Hui Yang ( 杨慧 ) Language Technologies Institute School of Computer Science Carnegie Mellon University huiyang@cs.cmu.edu 5 Sep 2008 @ Xi’an Jiaotong University.](https://reader036.fdocuments.net/reader036/viewer/2022062300/56649d405503460f94a1b005/html5/thumbnails/61.jpg)
MORE DEFINITIONS
d( , ) = 2
d( , ) = 1 ball
d( , ) = 1
Information in a Level L
ball
![Page 62: Hui Yang ( 杨慧 ) Language Technologies Institute School of Computer Science Carnegie Mellon University huiyang@cs.cmu.edu 5 Sep 2008 @ Xi’an Jiaotong University.](https://reader036.fdocuments.net/reader036/viewer/2022062300/56649d405503460f94a1b005/html5/thumbnails/62.jpg)
ASSUMPTIONS OF ONTOLOGY
Minimum Evolution Assumption: The
Optimal Ontology is the One Introduces the Least Information
Changes!
![Page 63: Hui Yang ( 杨慧 ) Language Technologies Institute School of Computer Science Carnegie Mellon University huiyang@cs.cmu.edu 5 Sep 2008 @ Xi’an Jiaotong University.](https://reader036.fdocuments.net/reader036/viewer/2022062300/56649d405503460f94a1b005/html5/thumbnails/63.jpg)
ASSUMPTIONS OF ONTOLOGYMinimum
Evolution Assumption
![Page 64: Hui Yang ( 杨慧 ) Language Technologies Institute School of Computer Science Carnegie Mellon University huiyang@cs.cmu.edu 5 Sep 2008 @ Xi’an Jiaotong University.](https://reader036.fdocuments.net/reader036/viewer/2022062300/56649d405503460f94a1b005/html5/thumbnails/64.jpg)
ASSUMPTIONS OF ONTOLOGYMinimum
Evolution Assumption
![Page 65: Hui Yang ( 杨慧 ) Language Technologies Institute School of Computer Science Carnegie Mellon University huiyang@cs.cmu.edu 5 Sep 2008 @ Xi’an Jiaotong University.](https://reader036.fdocuments.net/reader036/viewer/2022062300/56649d405503460f94a1b005/html5/thumbnails/65.jpg)
ASSUMPTIONS OF ONTOLOGYMinimum
Evolution Assumption
![Page 66: Hui Yang ( 杨慧 ) Language Technologies Institute School of Computer Science Carnegie Mellon University huiyang@cs.cmu.edu 5 Sep 2008 @ Xi’an Jiaotong University.](https://reader036.fdocuments.net/reader036/viewer/2022062300/56649d405503460f94a1b005/html5/thumbnails/66.jpg)
ASSUMPTIONS OF ONTOLOGYMinimum
Evolution Assumption
ball
![Page 67: Hui Yang ( 杨慧 ) Language Technologies Institute School of Computer Science Carnegie Mellon University huiyang@cs.cmu.edu 5 Sep 2008 @ Xi’an Jiaotong University.](https://reader036.fdocuments.net/reader036/viewer/2022062300/56649d405503460f94a1b005/html5/thumbnails/67.jpg)
ASSUMPTIONS OF ONTOLOGYMinimum
Evolution Assumption ball
table
![Page 68: Hui Yang ( 杨慧 ) Language Technologies Institute School of Computer Science Carnegie Mellon University huiyang@cs.cmu.edu 5 Sep 2008 @ Xi’an Jiaotong University.](https://reader036.fdocuments.net/reader036/viewer/2022062300/56649d405503460f94a1b005/html5/thumbnails/68.jpg)
ASSUMPTIONS OF ONTOLOGYMinimum
Evolution Assumption
ball table
Game Equipment
![Page 69: Hui Yang ( 杨慧 ) Language Technologies Institute School of Computer Science Carnegie Mellon University huiyang@cs.cmu.edu 5 Sep 2008 @ Xi’an Jiaotong University.](https://reader036.fdocuments.net/reader036/viewer/2022062300/56649d405503460f94a1b005/html5/thumbnails/69.jpg)
ASSUMPTIONS OF ONTOLOGYMinimum
Evolution Assumption
ball table
Game Equipment
![Page 70: Hui Yang ( 杨慧 ) Language Technologies Institute School of Computer Science Carnegie Mellon University huiyang@cs.cmu.edu 5 Sep 2008 @ Xi’an Jiaotong University.](https://reader036.fdocuments.net/reader036/viewer/2022062300/56649d405503460f94a1b005/html5/thumbnails/70.jpg)
ASSUMPTIONS OF ONTOLOGYMinimum
Evolution Assumption
ball table
Game Equipment
![Page 71: Hui Yang ( 杨慧 ) Language Technologies Institute School of Computer Science Carnegie Mellon University huiyang@cs.cmu.edu 5 Sep 2008 @ Xi’an Jiaotong University.](https://reader036.fdocuments.net/reader036/viewer/2022062300/56649d405503460f94a1b005/html5/thumbnails/71.jpg)
ASSUMPTIONS OF ONTOLOGYAbstractness
Assumption: Each abstraction level
has its own Information
function
![Page 72: Hui Yang ( 杨慧 ) Language Technologies Institute School of Computer Science Carnegie Mellon University huiyang@cs.cmu.edu 5 Sep 2008 @ Xi’an Jiaotong University.](https://reader036.fdocuments.net/reader036/viewer/2022062300/56649d405503460f94a1b005/html5/thumbnails/72.jpg)
ASSUMPTIONS OF ONTOLOGYAbstractness
Assumption
ball table
Game Equipment
![Page 73: Hui Yang ( 杨慧 ) Language Technologies Institute School of Computer Science Carnegie Mellon University huiyang@cs.cmu.edu 5 Sep 2008 @ Xi’an Jiaotong University.](https://reader036.fdocuments.net/reader036/viewer/2022062300/56649d405503460f94a1b005/html5/thumbnails/73.jpg)
FORMAL FORMULATION OF ONTOLOGY
LEARNING
The Task of Ontology Learning is defined as
The construction of a full ontology T given a set of concepts C and an initial partial ontology T0
Keeping adding concepts in C into T0
Note T0 could be empty
Until a full ontology is formed
![Page 74: Hui Yang ( 杨慧 ) Language Technologies Institute School of Computer Science Carnegie Mellon University huiyang@cs.cmu.edu 5 Sep 2008 @ Xi’an Jiaotong University.](https://reader036.fdocuments.net/reader036/viewer/2022062300/56649d405503460f94a1b005/html5/thumbnails/74.jpg)
GOAL OF ONTOLOGY LEARNING
Find the optimal full ontology s.t. the information changes since T0 are least , i.e.,
Note that this is by the Minimum Evolution Assumption
![Page 75: Hui Yang ( 杨慧 ) Language Technologies Institute School of Computer Science Carnegie Mellon University huiyang@cs.cmu.edu 5 Sep 2008 @ Xi’an Jiaotong University.](https://reader036.fdocuments.net/reader036/viewer/2022062300/56649d405503460f94a1b005/html5/thumbnails/75.jpg)
GET TO THE GOAL
Goal:
Since the optimal set of concepts is always C
Concepts are added incrementally
![Page 76: Hui Yang ( 杨慧 ) Language Technologies Institute School of Computer Science Carnegie Mellon University huiyang@cs.cmu.edu 5 Sep 2008 @ Xi’an Jiaotong University.](https://reader036.fdocuments.net/reader036/viewer/2022062300/56649d405503460f94a1b005/html5/thumbnails/76.jpg)
GET TO THE GOAL
Plug in definition of information change
Transform into a minimization problemMinimum
Evolution objective function
![Page 77: Hui Yang ( 杨慧 ) Language Technologies Institute School of Computer Science Carnegie Mellon University huiyang@cs.cmu.edu 5 Sep 2008 @ Xi’an Jiaotong University.](https://reader036.fdocuments.net/reader036/viewer/2022062300/56649d405503460f94a1b005/html5/thumbnails/77.jpg)
EXPLICITLY MODEL ABSTRACTNESS
Model Abstractness for each Level by Least Square Fit
Plug in definition of amount of information for an abstraction level
Abstractnessobjective function
![Page 78: Hui Yang ( 杨慧 ) Language Technologies Institute School of Computer Science Carnegie Mellon University huiyang@cs.cmu.edu 5 Sep 2008 @ Xi’an Jiaotong University.](https://reader036.fdocuments.net/reader036/viewer/2022062300/56649d405503460f94a1b005/html5/thumbnails/78.jpg)
MULTIPLE CRITERION OPTIMIZATION
FUNCTIONMinimum Evolution
objective function
Abstractnessobjective function
Scalarization variable
![Page 79: Hui Yang ( 杨慧 ) Language Technologies Institute School of Computer Science Carnegie Mellon University huiyang@cs.cmu.edu 5 Sep 2008 @ Xi’an Jiaotong University.](https://reader036.fdocuments.net/reader036/viewer/2022062300/56649d405503460f94a1b005/html5/thumbnails/79.jpg)
THE OPTIMIZATION ALGORITHM
![Page 80: Hui Yang ( 杨慧 ) Language Technologies Institute School of Computer Science Carnegie Mellon University huiyang@cs.cmu.edu 5 Sep 2008 @ Xi’an Jiaotong University.](https://reader036.fdocuments.net/reader036/viewer/2022062300/56649d405503460f94a1b005/html5/thumbnails/80.jpg)
ESTIMATING ONTOLOGY METRIC
Assume ontology metric is a linear interpolation of some underlying feature functions
Ridge Regression to estimate and predict the ontology metric
![Page 81: Hui Yang ( 杨慧 ) Language Technologies Institute School of Computer Science Carnegie Mellon University huiyang@cs.cmu.edu 5 Sep 2008 @ Xi’an Jiaotong University.](https://reader036.fdocuments.net/reader036/viewer/2022062300/56649d405503460f94a1b005/html5/thumbnails/81.jpg)
FEATURES
Google KL-Divergence
Wikipedia KL-Divergence
Google Minipar Syntactic distance
Lexico-Syntactic Patterns
Term Co-occurrence
Word Length Difference
![Page 82: Hui Yang ( 杨慧 ) Language Technologies Institute School of Computer Science Carnegie Mellon University huiyang@cs.cmu.edu 5 Sep 2008 @ Xi’an Jiaotong University.](https://reader036.fdocuments.net/reader036/viewer/2022062300/56649d405503460f94a1b005/html5/thumbnails/82.jpg)
EVALUATION
Reconstruct subdirectories from WordNet and ODP
50 WordNet subdirectories are from 12 topics: gathering, professional, people, building, place, milk, meal, water, beverage, alcohol, dish and herb
50 ODP subdirectories are from 16 topics: computers, robotics, intranet, mobile computing, database, operating system, linux, tex, software, computer science, data communication, algorithms, data formats, security multimedia and artificial intelligence
![Page 83: Hui Yang ( 杨慧 ) Language Technologies Institute School of Computer Science Carnegie Mellon University huiyang@cs.cmu.edu 5 Sep 2008 @ Xi’an Jiaotong University.](https://reader036.fdocuments.net/reader036/viewer/2022062300/56649d405503460f94a1b005/html5/thumbnails/83.jpg)
DATASETS
![Page 84: Hui Yang ( 杨慧 ) Language Technologies Institute School of Computer Science Carnegie Mellon University huiyang@cs.cmu.edu 5 Sep 2008 @ Xi’an Jiaotong University.](https://reader036.fdocuments.net/reader036/viewer/2022062300/56649d405503460f94a1b005/html5/thumbnails/84.jpg)
ONTOLOGY RECONSTRUCTION
An absolute gain of 10% compared to the state-of-the-art system developed in Stanford University (ACL2006 Best Paper)
![Page 85: Hui Yang ( 杨慧 ) Language Technologies Institute School of Computer Science Carnegie Mellon University huiyang@cs.cmu.edu 5 Sep 2008 @ Xi’an Jiaotong University.](https://reader036.fdocuments.net/reader036/viewer/2022062300/56649d405503460f94a1b005/html5/thumbnails/85.jpg)
INTERACTION OF ABSTRACTION LEVELS
AND FEATURES
Abstract concepts are sensitive to the explicit modeling – a good modeling on abstract concepts greatly boost performance
Contributions from different features for abstract concepts vary; for concrete concepts indifferent
Simple features (term co-occurrence, word length) work the best
Combination of heterogeneous features works better than individual features
![Page 86: Hui Yang ( 杨慧 ) Language Technologies Institute School of Computer Science Carnegie Mellon University huiyang@cs.cmu.edu 5 Sep 2008 @ Xi’an Jiaotong University.](https://reader036.fdocuments.net/reader036/viewer/2022062300/56649d405503460f94a1b005/html5/thumbnails/86.jpg)
CONTRIBUTIONS OF METRIC-BASED
ONTOLOGY LEARNING
Avoid and hence solve the problem of unknown group names
Tackle the problem of no control over concept abstractness
Experiments show that concept at different abstraction levels behave different and sensitive to different features
Provide a solution to incorporate heterogeneous features
An absolute gain of 10% in precision for both WordNet and ODP than a state-of-the-art system
![Page 87: Hui Yang ( 杨慧 ) Language Technologies Institute School of Computer Science Carnegie Mellon University huiyang@cs.cmu.edu 5 Sep 2008 @ Xi’an Jiaotong University.](https://reader036.fdocuments.net/reader036/viewer/2022062300/56649d405503460f94a1b005/html5/thumbnails/87.jpg)
WE HAVE TALKED ABOUT
The Task of Information Triage and Personal Ontology Learning
Human-guided Ontology Learning
Metric-based Ontology Learning
![Page 88: Hui Yang ( 杨慧 ) Language Technologies Institute School of Computer Science Carnegie Mellon University huiyang@cs.cmu.edu 5 Sep 2008 @ Xi’an Jiaotong University.](https://reader036.fdocuments.net/reader036/viewer/2022062300/56649d405503460f94a1b005/html5/thumbnails/88.jpg)
AT THE BEGINNING, WE SAID:
IT WILL BE GREAT TO HAVE
A process to Crawl related documents Sort through relevant documents Identify important concepts/topics Organize materials
![Page 89: Hui Yang ( 杨慧 ) Language Technologies Institute School of Computer Science Carnegie Mellon University huiyang@cs.cmu.edu 5 Sep 2008 @ Xi’an Jiaotong University.](https://reader036.fdocuments.net/reader036/viewer/2022062300/56649d405503460f94a1b005/html5/thumbnails/89.jpg)
FIND A GOOD KINDERGARTEN IN THE
PITTSBURGH AREA
Are We there yet?
![Page 90: Hui Yang ( 杨慧 ) Language Technologies Institute School of Computer Science Carnegie Mellon University huiyang@cs.cmu.edu 5 Sep 2008 @ Xi’an Jiaotong University.](https://reader036.fdocuments.net/reader036/viewer/2022062300/56649d405503460f94a1b005/html5/thumbnails/90.jpg)
THE KINDERGARTEN EXAMPLE
We Have DONE with the organization! However, does it support further
inference for a decent decision making? Maybe not Future work!
More Future work Model multiple relationships
simultaneously More Efficient Distance Metric
Learning
![Page 91: Hui Yang ( 杨慧 ) Language Technologies Institute School of Computer Science Carnegie Mellon University huiyang@cs.cmu.edu 5 Sep 2008 @ Xi’an Jiaotong University.](https://reader036.fdocuments.net/reader036/viewer/2022062300/56649d405503460f94a1b005/html5/thumbnails/91.jpg)
THANK YOU AND QUESTIONS