Ontology learning and population from from text
description
Transcript of Ontology learning and population from from text
![Page 1: Ontology learning and population from from text](https://reader033.fdocuments.net/reader033/viewer/2022051421/568165f7550346895dd92021/html5/thumbnails/1.jpg)
ONTOLOGY LEARNING AND POPULATION FROM FROM TEXTCh8 Population
![Page 2: Ontology learning and population from from text](https://reader033.fdocuments.net/reader033/viewer/2022051421/568165f7550346895dd92021/html5/thumbnails/2.jpg)
Population• Population of ontology:
• Finding instances of relations as well as of concepts• Requires full understanding of natural language
• More modest target:• The extraction of a set of predefined relations
• In this chapter:• No acquisition of instances of relations• The detection of instances of concepts
![Page 3: Ontology learning and population from from text](https://reader033.fdocuments.net/reader033/viewer/2022051421/568165f7550346895dd92021/html5/thumbnails/3.jpg)
Population• Common Approaches
• Corpus-based Population• A standard similarity-based approach
• Learning by Googling• Semi-supervised approach• PANKOW• C-PANKOW
![Page 4: Ontology learning and population from from text](https://reader033.fdocuments.net/reader033/viewer/2022051421/568165f7550346895dd92021/html5/thumbnails/4.jpg)
Common Approaches• Lexico-syntactic Patterns
• Hearst patterns
• Similarity-based Classification• Algorithm12• Data sparseness problem
• Supervised Approaches• Predict the category of a certain instance with a model• Requires thousands of training examples to train the model• Not feasible - considering hundreds of concepts as possible tags
![Page 5: Ontology learning and population from from text](https://reader033.fdocuments.net/reader033/viewer/2022051421/568165f7550346895dd92021/html5/thumbnails/5.jpg)
![Page 6: Ontology learning and population from from text](https://reader033.fdocuments.net/reader033/viewer/2022051421/568165f7550346895dd92021/html5/thumbnails/6.jpg)
Similarity-based Classification of Named Entities
• Using different similarity measures• Cosine, Jaccard, L1 norm, Jensen-Shannon, Skew
• Using different feature weighting measures • Conditional, PMI, Resnik
![Page 7: Ontology learning and population from from text](https://reader033.fdocuments.net/reader033/viewer/2022051421/568165f7550346895dd92021/html5/thumbnails/7.jpg)
Evaluation• Goal: learn a function fs
• fa and fb: specified by two annotators• Functions as sets:
• Measurement• Precision, Recall, F-measure, learning accuracy
![Page 8: Ontology learning and population from from text](https://reader033.fdocuments.net/reader033/viewer/2022051421/568165f7550346895dd92021/html5/thumbnails/8.jpg)
Experiments• Using Word Windows
• n words to the left and right of a word of interest• Excluding stopwords without trespassing sentence boundaries
• Mopti is the biggest city along the Niger with one of the most vibrant ports and a large bustling market. Mopti has a traditional ambience that other towns seem to have lost. It is also the center of the local tourist industry and suffers from hard-sell overload. The nearby junction towns of Gao and San offer nice views over the Niger's delta.
• Mopti: traditional(l), biggest(1)Niger: city(l), delta(l), view(l)Gao: San(l), ofFer(l), town(l), junction(l)
San: offer(l), view(l), Gao(l), nice(l)
![Page 9: Ontology learning and population from from text](https://reader033.fdocuments.net/reader033/viewer/2022051421/568165f7550346895dd92021/html5/thumbnails/9.jpg)
Experiments• Result:
![Page 10: Ontology learning and population from from text](https://reader033.fdocuments.net/reader033/viewer/2022051421/568165f7550346895dd92021/html5/thumbnails/10.jpg)
Experiments• Result:
![Page 11: Ontology learning and population from from text](https://reader033.fdocuments.net/reader033/viewer/2022051421/568165f7550346895dd92021/html5/thumbnails/11.jpg)
Experiments• Using Pseudo-syntactic Dependencies
• Object-attribute pair• Mopti is the biggest city along the Niger with one of the most vibrant
ports and a large bustling market. Mopti has a traditional ambience that other towns seem to have lost. It is also the center of the local tourist industry and suffers from hard-sell overload. The nearby junction towns of Gao and San offer nice views over the Niger's delta.
• Mopti: is-city(l), has_ambience(l) Niger: has_delta(l) Gao: junction.of(l) San: offer_subj(l)
• Result:
![Page 12: Ontology learning and population from from text](https://reader033.fdocuments.net/reader033/viewer/2022051421/568165f7550346895dd92021/html5/thumbnails/12.jpg)
Experiments• Dealing with Data Sparseness
• Using Conjunctions• When two named entities linked by conjunctions
• Result:
![Page 13: Ontology learning and population from from text](https://reader033.fdocuments.net/reader033/viewer/2022051421/568165f7550346895dd92021/html5/thumbnails/13.jpg)
Experiments• Dealing with Data Sparseness
• Exploiting the Taxonomy• Compute the context vector of a certain term by considering the context
vectors of its subconcepts • Take only into account the context vectors of direct subconcepts• Normalizing aggregated vectors:
• Standard normalization of the vector• Calculating its centroid
![Page 14: Ontology learning and population from from text](https://reader033.fdocuments.net/reader033/viewer/2022051421/568165f7550346895dd92021/html5/thumbnails/14.jpg)
Experiments• Dealing with Data Sparseness
• Exploiting the Taxonomy• Result:
![Page 15: Ontology learning and population from from text](https://reader033.fdocuments.net/reader033/viewer/2022051421/568165f7550346895dd92021/html5/thumbnails/15.jpg)
Experiments• Dealing with Data Sparseness
• Anaphora Resolution• Replace each anaphoric reference to the corresponding antecedent
• The port capital of Vathy is dominated by its fortified Venetian har- bor. • The port capital of Vathy is dominated by Vathy's fortified Venetian harbor.
• Result:
![Page 16: Ontology learning and population from from text](https://reader033.fdocuments.net/reader033/viewer/2022051421/568165f7550346895dd92021/html5/thumbnails/16.jpg)
Experiments• Dealing with Data Sparseness
• Downloading Documents from the Web• Downloading 20 additional documents Di for each named entity i• keep d that its similarity is over an threshold of 0.2• Result:
![Page 17: Ontology learning and population from from text](https://reader033.fdocuments.net/reader033/viewer/2022051421/568165f7550346895dd92021/html5/thumbnails/17.jpg)
Experiments• Dealing with Data Sparseness
• Post-processing• The k best answers of the system are checked for their statistical
plausibility on the web• Result:
![Page 18: Ontology learning and population from from text](https://reader033.fdocuments.net/reader033/viewer/2022051421/568165f7550346895dd92021/html5/thumbnails/18.jpg)
PANKOW• Pattern-based Annotation through Knowledge on the Web
• Certain lexico-syntactic patterns as defined by Hearst can be matched in corpus AND World Wide Web
![Page 19: Ontology learning and population from from text](https://reader033.fdocuments.net/reader033/viewer/2022051421/568165f7550346895dd92021/html5/thumbnails/19.jpg)
PANKOW• The Process of PANKOW
• Step 1: iterates the set of entities to be classified and generates instances of patterns, one for each concept in the ontology. • For example: instance - South Africa, concepts – country and
resulting in pattern instances - ' 'South Africa is a country" and ' 'South Africa is a hotel" or "countries such as South Africa" and "hotels such as South Africa". • Result 1: A set of pattern instances• Step 2: Google is queried for the pattern instances through its Web
service API• Result 2: the counts for each pattern instance• Step 3: sums up the query results to a total for each concept. • Result: The statistical web fingerprint for each entity, that is, the
results of aggregating for each entity the number of Google counts for all pattern instances conveying the relation of interest.
![Page 20: Ontology learning and population from from text](https://reader033.fdocuments.net/reader033/viewer/2022051421/568165f7550346895dd92021/html5/thumbnails/20.jpg)
PANKOW• The Process of PANKOW
![Page 21: Ontology learning and population from from text](https://reader033.fdocuments.net/reader033/viewer/2022051421/568165f7550346895dd92021/html5/thumbnails/21.jpg)
PANKOW• Evaluation
• From the two annotators• Reference standards for subject A and B
• Measurement:• Precision, recall, and F-measure
![Page 22: Ontology learning and population from from text](https://reader033.fdocuments.net/reader033/viewer/2022051421/568165f7550346895dd92021/html5/thumbnails/22.jpg)
PANKOW• Evaluation
• Measurement:• Average the results for both annotatores
![Page 23: Ontology learning and population from from text](https://reader033.fdocuments.net/reader033/viewer/2022051421/568165f7550346895dd92021/html5/thumbnails/23.jpg)
PANKOW• Result:
![Page 24: Ontology learning and population from from text](https://reader033.fdocuments.net/reader033/viewer/2022051421/568165f7550346895dd92021/html5/thumbnails/24.jpg)
C-PANKOW• Shortcoming of PANKOW
• A lot of actual instances of the pattern schema are not found
• Large number of queries sent to the Google Web API
• Not scale to larger ontologies
![Page 25: Ontology learning and population from from text](https://reader033.fdocuments.net/reader033/viewer/2022051421/568165f7550346895dd92021/html5/thumbnails/25.jpg)
C-PANKOW• C-PANKOW Process
• the web page to be annotated is scanned for candidate instances.• for each instance i discovered and for each clue-pattern pair in our
pattern library P, an automatically generated query is issued to Google and the abstracts or snippets of the n first hits are downloaded.
• Then the similarity between the document to be annotated and the downloaded abstract is calculated. If the similarity is above a given threshold t, the actual pattern found in the abstract reveals a phrase which may possibly describe the concept that the instance belongs to in the context in question.
• The pattern matched in a certain Google abstract is only considered if the similarity between the original page and this abstract is above a given threshold. In this way the pattern-matching process is contextualized.
• Finally, the instance i is annotated with that concept c having the largest number as well as most contextually relevant hits.
![Page 26: Ontology learning and population from from text](https://reader033.fdocuments.net/reader033/viewer/2022051421/568165f7550346895dd92021/html5/thumbnails/26.jpg)
C-PANKOW• C-PANKOW Process
![Page 27: Ontology learning and population from from text](https://reader033.fdocuments.net/reader033/viewer/2022051421/568165f7550346895dd92021/html5/thumbnails/27.jpg)
C-PANKOW• Evaluation
• Same dataset and evaluation measures as PANKOW • BUT the C-PANKOW uses the 682 concepts of the pruned Tourism
ontology as possible tags • Added learning accuracy
![Page 28: Ontology learning and population from from text](https://reader033.fdocuments.net/reader033/viewer/2022051421/568165f7550346895dd92021/html5/thumbnails/28.jpg)
C-PANKOW• Result:
![Page 29: Ontology learning and population from from text](https://reader033.fdocuments.net/reader033/viewer/2022051421/568165f7550346895dd92021/html5/thumbnails/29.jpg)
C-PANKOW• Result:
![Page 30: Ontology learning and population from from text](https://reader033.fdocuments.net/reader033/viewer/2022051421/568165f7550346895dd92021/html5/thumbnails/30.jpg)
C-PANKOW• Result: