Ontology Learning Shalini Gupta - 07305R02 Apoorv Sharma - 07305913 Chirag Patel - 07305909...

47
Ontology Learning Shalini Gupta - 07305R02 Apoorv Sharma - 07305913 Chirag Patel - 07305909 Shitanshu Verma - 07305037

Transcript of Ontology Learning Shalini Gupta - 07305R02 Apoorv Sharma - 07305913 Chirag Patel - 07305909...

Page 1: Ontology Learning Shalini Gupta - 07305R02 Apoorv Sharma - 07305913 Chirag Patel - 07305909 Shitanshu Verma - 07305037.

Ontology Learning

Shalini Gupta - 07305R02Apoorv Sharma - 07305913

Chirag Patel - 07305909Shitanshu Verma - 07305037

Page 2: Ontology Learning Shalini Gupta - 07305R02 Apoorv Sharma - 07305913 Chirag Patel - 07305909 Shitanshu Verma - 07305037.

Issue There is lot of information current representation renders it

uninterpretable for machines consequences

most of the information remains undiscovered

Big and popular search engines are able to search only 3-4% of the total information on the web.

Page 3: Ontology Learning Shalini Gupta - 07305R02 Apoorv Sharma - 07305913 Chirag Patel - 07305909 Shitanshu Verma - 07305037.

What is needed ? Improved machines intelligence. Make them read understand use

modify information. With minimal human intervention.

Page 4: Ontology Learning Shalini Gupta - 07305R02 Apoorv Sharma - 07305913 Chirag Patel - 07305909 Shitanshu Verma - 07305037.

To Achieve It ? Enable machines

Populate Enrich Evaluate

Maintain Their knowledge representation

Page 5: Ontology Learning Shalini Gupta - 07305R02 Apoorv Sharma - 07305913 Chirag Patel - 07305909 Shitanshu Verma - 07305037.

What is ontology A representation format that

conceptualizes domain Captures classes, instances ,

attributes, relationships Provides sound semantic ground of

machine-understandable description of digital content

Is used in various fields SE, AI Is represented using languages as

OWL etc

Page 6: Ontology Learning Shalini Gupta - 07305R02 Apoorv Sharma - 07305913 Chirag Patel - 07305909 Shitanshu Verma - 07305037.

What is ontology learning

Process of preparing updating

ontologies from sources such as Documents in natural language

with the help of dictionaries thesauruses etc

Page 7: Ontology Learning Shalini Gupta - 07305R02 Apoorv Sharma - 07305913 Chirag Patel - 07305909 Shitanshu Verma - 07305037.

Environment

Page 8: Ontology Learning Shalini Gupta - 07305R02 Apoorv Sharma - 07305913 Chirag Patel - 07305909 Shitanshu Verma - 07305037.

The flow

Initial ontology is given Information sources are given Machines work over the data sources to

enrich the ontology Once enriched

consistency check is done evaluation

Page 9: Ontology Learning Shalini Gupta - 07305R02 Apoorv Sharma - 07305913 Chirag Patel - 07305909 Shitanshu Verma - 07305037.

Terms related with the process Ontology enrichment

Improving an existing ontology Ontology population

Creating new ontology or adding new concepts to it

Inconsistency resolution resolving inconsistencies that come up while

acquiring ontologies

Page 10: Ontology Learning Shalini Gupta - 07305R02 Apoorv Sharma - 07305913 Chirag Patel - 07305909 Shitanshu Verma - 07305037.

Enrichment of Ontology Term Identification Taxonomy Extraction Non taxonomical relationship

extraction

Page 11: Ontology Learning Shalini Gupta - 07305R02 Apoorv Sharma - 07305913 Chirag Patel - 07305909 Shitanshu Verma - 07305037.

Enrichment of Ontology Term Identification

identify important terms in the text Taxonomy Extraction

identifying taxonomical relationships between terms identified

Non taxonomical relationship extraction identifying other relationships

Page 12: Ontology Learning Shalini Gupta - 07305R02 Apoorv Sharma - 07305913 Chirag Patel - 07305909 Shitanshu Verma - 07305037.

Review

Ontology learning ontology enrichment

term identification taxonomy extraction non taxonomic relationship extraction

Page 13: Ontology Learning Shalini Gupta - 07305R02 Apoorv Sharma - 07305913 Chirag Patel - 07305909 Shitanshu Verma - 07305037.

Term Identification: Basics Everything is a concept.

An object, an idea, or a thing. A term lexicalizes a concept.

A Word or Multi-word string that conveys 'a single meaning' within a given community e.g. company, Paris, man, cellphone, Red Hat,

car parking Goal: Find out representative concepts.

Page 14: Ontology Learning Shalini Gupta - 07305R02 Apoorv Sharma - 07305913 Chirag Patel - 07305909 Shitanshu Verma - 07305037.

Term Identification: Steps Steps:

Term Recognition: Find the terms. Term Classification: Cluster the terms

which are same. Term Mapping: Link the terms to well-

defined concepts of referent data sources.

Various techniques exist for every step.

Page 15: Ontology Learning Shalini Gupta - 07305R02 Apoorv Sharma - 07305913 Chirag Patel - 07305909 Shitanshu Verma - 07305037.

Term Identification: Tokenizing Different combinations of Linguistics

techniques have been able to surpass this step

Tokenizing Scan the text in order to identify

boundaries of words and complex expressions

Page 16: Ontology Learning Shalini Gupta - 07305R02 Apoorv Sharma - 07305913 Chirag Patel - 07305909 Shitanshu Verma - 07305037.

Term Identification: Tokenizing Remove the stop words like 'a', 'the', 'of',

'with' E.g. Check of the Electrical Bonding of External

Composite Panels with a CORAS Resistivity-Continuity Test

Terms: Check, Electrical Bonding, External Composite Panels, CORAS Resistivity-Continuity Test Set.

Generally nouns are considered as candidate concepts

Page 17: Ontology Learning Shalini Gupta - 07305R02 Apoorv Sharma - 07305913 Chirag Patel - 07305909 Shitanshu Verma - 07305037.

Term Identification: Importance of a term

TF-IDF technique can be used to find the important keywords [6] a balanced measure stating that a word is

more important if it appears several times in a target document and at the same time it appears rarely in other documents.

Seed-concepts can be used from existing ontologies.

Page 18: Ontology Learning Shalini Gupta - 07305R02 Apoorv Sharma - 07305913 Chirag Patel - 07305909 Shitanshu Verma - 07305037.

Term Identification:Importance of a term

Multi-word terms The C/NC-value method: [5]

(1) the frequency of occurrence, (2) the frequency of occurrence as a sub-string of

other candidate terms, (3) the number of candidate terms containing the

given term as a sub-string, (4) the number of words contained in the candidate

term The relevant terms can be determined by

mutual cohesiveness by using Mutual Expectation

Page 19: Ontology Learning Shalini Gupta - 07305R02 Apoorv Sharma - 07305913 Chirag Patel - 07305909 Shitanshu Verma - 07305037.

Term Identification: Morphological Analysis

Use of morphological knowledge of a word [9] A technique which identifies a word-stem

from a full word-form To identify small domain-specific units studies patterns of word-formation and

attempts to formulate rules using the word structure.

e.g. In the biomedical domain a word ending in “-ofilous” or “-itis” is very probably a bio-molecule or a medical term

Advantage: Can identify “background terms” even with low frequency of appearance

Page 20: Ontology Learning Shalini Gupta - 07305R02 Apoorv Sharma - 07305913 Chirag Patel - 07305909 Shitanshu Verma - 07305037.

Term Identification:Named Entity Recognition

Recognition of person, location, organization names as single

complex entities Complex date and time expressions percentage, monetary value E.g. 'Merrill Lynch'

The next step associates single words or complex expressions with the concepts

e.g 'Merrill Lynch' is related to the concept organization

Page 21: Ontology Learning Shalini Gupta - 07305R02 Apoorv Sharma - 07305913 Chirag Patel - 07305909 Shitanshu Verma - 07305037.

Identifying Relationships More information for later steps Dependency Relations:

Between the word and its neighbours, the mind perceives connections, the totality of which forms the structure of the sentence

Structural connections establish dependency relations between the words

Page 22: Ontology Learning Shalini Gupta - 07305R02 Apoorv Sharma - 07305913 Chirag Patel - 07305909 Shitanshu Verma - 07305037.

Deriving Relationships from Dependency Relations Syntactic dependency relations coincide closely

with semantic relations [3] e.g. France Telecom in Paris offers the new DSL

technology. Dependency relations would give linkage

between France Telecom(organization) and Paris(city)

From this we can derive a semantic relationship between organization and city

Page 23: Ontology Learning Shalini Gupta - 07305R02 Apoorv Sharma - 07305913 Chirag Patel - 07305909 Shitanshu Verma - 07305037.

Term Identification Identifying Relationships

Taxonomic Relationships

Non-Taxonomic Relationships

Page 24: Ontology Learning Shalini Gupta - 07305R02 Apoorv Sharma - 07305913 Chirag Patel - 07305909 Shitanshu Verma - 07305037.

Taxonomy Construction

Hierarchy of concepts Inclusion relations provide a tree view of the ontology

and imply inheritance between super-concepts and sub-concepts.

E.g. 'Living being' is a super-concept and 'mammal' is a sub-concept.

In terms of ontology, root node is the most general one for the domain of interest.

Page 25: Ontology Learning Shalini Gupta - 07305R02 Apoorv Sharma - 07305913 Chirag Patel - 07305909 Shitanshu Verma - 07305037.

Discovering taxonomic relations

Based on lexico-syntactic patterns Can find inclusion relation between concepts

through a simple pattern matching on a set of documents

E.g. NP such as NP, NP,..., and NP ...works by authors such as Herrick, Goldsmith, and

Shakespeare hyponym(“author”, Herrick) hyponym(“author”, Goldsmith) hyponym(“author”, Shakespeare)

Page 26: Ontology Learning Shalini Gupta - 07305R02 Apoorv Sharma - 07305913 Chirag Patel - 07305909 Shitanshu Verma - 07305037.

Discovering new patterns Idea is to use a pattern learner to generate new

patterns Generated patterns then can be used in order

to generate new information (new inclusion relations), as well as to assess the validity of extracted information

E.g. we can generate new patterns like NP is NP NP, NP,..., and other NP NP, especially NP, NP,..., and NP

From the pattern NP such NP as NP, NP,..., and NP

Page 27: Ontology Learning Shalini Gupta - 07305R02 Apoorv Sharma - 07305913 Chirag Patel - 07305909 Shitanshu Verma - 07305037.

Algorithm for finding new patterns

1. Decide on a lexical relation, R, that is of interest,e.g., "group/member" E.g. a hyponym relation like (author,Shakespeare).

2. Gather a list of terms/instances for which this relation holds.

3. Find places in the corpus where these terms/instances occur syntactically near one another and record the environment.

4. Find new patterns using this.

5. Once a new pattern has been positively identified, use it to gather more instances of the target relation and go to Step 2.

Page 28: Ontology Learning Shalini Gupta - 07305R02 Apoorv Sharma - 07305913 Chirag Patel - 07305909 Shitanshu Verma - 07305037.

Multi-word concepts

A concept may be represented by multi-word terms

A concept 'A' is a hyponym of a concept 'B' if A has more tokens than B all the tokens of B are present in A both terms have the same head E.g. Concepts 'private customer' and business

customer' is a hyponym of the concept 'customer'

Page 29: Ontology Learning Shalini Gupta - 07305R02 Apoorv Sharma - 07305913 Chirag Patel - 07305909 Shitanshu Verma - 07305037.

Mining non-taxonomic relations Relationships other than is-a relationships E.g. Linguistic processing may find that the word

'cost' occurs frequently with the words 'hotel', 'guest house', 'youth hostel' in sentences like 'Costs at the youth hostel are $20 per night'

Relations (cost, hotel), (cost, guest house) and (cost, youth hostel) exist

Discovery algorithm finds support and confidence measures for these pairs as well as relationships at higher levels of abstraction such as accommodation and costs

Page 30: Ontology Learning Shalini Gupta - 07305R02 Apoorv Sharma - 07305913 Chirag Patel - 07305909 Shitanshu Verma - 07305037.

Finding non-taxonomic relations Based on basic Association Rule Algorithm [3] Basic Association Rule Algorithm

Given a set of transactions, T Each transaction has a set of items, i1,i2, ... in

Goal: Compute association rules of form i1→i2 Trick: Explores the fact that many items

appear together. So occurrence of one implies occurrence of another with a high probability (confidence)

Page 31: Ontology Learning Shalini Gupta - 07305R02 Apoorv Sharma - 07305913 Chirag Patel - 07305909 Shitanshu Verma - 07305037.

Association Rule Mining

E.g. consider the transactions (bread, butter, jam, chips) (bread, butter, jam, ketchup) (ketchup,chips) (bread, butter, jam, chips) (bread,rice)

Eg. bread → butter, jam Support =n(XUY)/N

E.g. Support = 3/5 Confidence = n(XUY)/n(X)

E.g. Confidence = 3/4

Page 32: Ontology Learning Shalini Gupta - 07305R02 Apoorv Sharma - 07305913 Chirag Patel - 07305909 Shitanshu Verma - 07305037.

Algorithm 1. Extend each transaction to include the

ancestor of a particular item E.g. include the word 'Accommodation' in the

transactions containing word 'guest house' 2. Determine association rules of the form Xk→Yk

where |Xk| = 1 and |Yk| = 1 3. Determine confidence for all rules that exceed

user determined support 4. Prune the rules subsumed by ancestral rules

E.g. if we found 2 rules, (cost, accommodation) and (cost, hotel), we prune the latter rule (cost, hotel)

Page 33: Ontology Learning Shalini Gupta - 07305R02 Apoorv Sharma - 07305913 Chirag Patel - 07305909 Shitanshu Verma - 07305037.

Statistics-based Extraction of Taxonomic Relations [12][13]

Uses hierarchical clustering. Groups up the similar terms in a

bottom up fashion Uses cosine similarity function

The cosine measure or normalized correlation coefficient between two vectors x and y is given by

Page 34: Ontology Learning Shalini Gupta - 07305R02 Apoorv Sharma - 07305913 Chirag Patel - 07305909 Shitanshu Verma - 07305037.

Algorithm

Page 35: Ontology Learning Shalini Gupta - 07305R02 Apoorv Sharma - 07305913 Chirag Patel - 07305909 Shitanshu Verma - 07305037.

Computation of similarity function The similarity matrix is given by

Hotel vector=(0,14,7,4,6)Accommodation vector=(14,0,11,2,5)cos(Hotel,Accommodation) = 7*11+4*2+6*5/(105*150)

Page 36: Ontology Learning Shalini Gupta - 07305R02 Apoorv Sharma - 07305913 Chirag Patel - 07305909 Shitanshu Verma - 07305037.

Case study:Web-based Ontology Learning with ISOLDE

ISOLDE (Information System for Ontology Learning and Domain Exploration) produce domain ontology from a base ontology

Uses the following An unsupervised named entity recognition

system Web resources like DWDS, Wikipedia and

Wiktionary.

Page 37: Ontology Learning Shalini Gupta - 07305R02 Apoorv Sharma - 07305913 Chirag Patel - 07305909 Shitanshu Verma - 07305037.

Analysis steps used by ISODLE

Named-entity recognition (NER) uses a domain-specific corpus, a base ontology and a

general purpose NER system (SproUT, see Drozdzynski et al. 2004) to find instances for the classes in the base ontology.

Linguistic pattern analysis for the extraction of class candidates from the

context of the instances extracted in step 1 by use of lexico-syntactic patterns

Collecting web-based knowledge collect information on and between extracted

class candidates from online resources and integrating this into a new or extended taxonomy/ontology

Page 38: Ontology Learning Shalini Gupta - 07305R02 Apoorv Sharma - 07305913 Chirag Patel - 07305909 Shitanshu Verma - 07305037.

Architecture

Page 39: Ontology Learning Shalini Gupta - 07305R02 Apoorv Sharma - 07305913 Chirag Patel - 07305909 Shitanshu Verma - 07305037.

Stage wise Examples

After step 1 we get Ballack,Munich, as 1 named entity from soccer corpus

In the second step we find the class candidates for named entities for the sentence in the corpus and then filter the domains specific candidates using X2

method Ballack, the best midfielder in the German

national team. Gives Midfielder as the calss candidate of Ballack.

In the third step for the class candidates we search on web wikipedia definition on midfielder is A midfielder is a player whose position of play is

midway between the attacking strikers and the defenders

Page 40: Ontology Learning Shalini Gupta - 07305R02 Apoorv Sharma - 07305913 Chirag Patel - 07305909 Shitanshu Verma - 07305037.

Example contd..

We learn the relation midfielder is a player(taxonomic relationship)

Relevence Factor X2

X2=

O matrix for striker

Page 41: Ontology Learning Shalini Gupta - 07305R02 Apoorv Sharma - 07305913 Chirag Patel - 07305909 Shitanshu Verma - 07305037.

Issues in Learning

human understandable vs machine understandablelearning higher degree relationmapping to high level ontologyevaluation benchmarkincremental ontology learningmulti agent learning

Page 42: Ontology Learning Shalini Gupta - 07305R02 Apoorv Sharma - 07305913 Chirag Patel - 07305909 Shitanshu Verma - 07305037.

Application of ontology

is ubiquitous in information systems [2]improving the performance of information retrieval and reasoningmaking data between different applications interoperable ontology-type semantic description of behaviors and services allow software agents in a multi-agent system to better coordinate themselves

Page 43: Ontology Learning Shalini Gupta - 07305R02 Apoorv Sharma - 07305913 Chirag Patel - 07305909 Shitanshu Verma - 07305037.

References [1] Elias Zavitsanos, Georgios Paliouras, George

Vouros,Ontology Learning and Evaluation: A survey Technical Report, 2006.

[2] Nicolas Weber, Paul Buitelaar, Web-based Ontology Learning with ISOLDE, DFKI GmbH - Language Technology Lab Saarbrücken, German,2006.

[3] Alexander Maedche and Steffen Staab, Mining Ontologies from Text, 2000.

[4] Alexander Maedche, Viktor Pekar, and Steffen Staab, Ontology Learning Part One-On Discovering Taxonomic Relations from the Web, 2003.

Page 44: Ontology Learning Shalini Gupta - 07305R02 Apoorv Sharma - 07305913 Chirag Patel - 07305909 Shitanshu Verma - 07305037.

References [5] K. Frantzi, S. Ananiadou, and H. Mima. Automatic

recognition of multi-word terms: The c-value/nc-value method. 3(2):115–130, 2000.

[6] A. Saltion, G. Wong and C.S. Yang. A vector space model for automatic indexing. Communications of the ACM, 18(11):613–620, 1975.

[7] D.I. Moldovan and R.C. Girju. An interactive tool for the rapid development of knowledge bases. International Journal on Artificial Intelligence Tools (IJAIT), 10(1-2), 2001

Page 45: Ontology Learning Shalini Gupta - 07305R02 Apoorv Sharma - 07305913 Chirag Patel - 07305909 Shitanshu Verma - 07305037.

References [8] J.D. Cohen. Highlights: Language and domain

independent automatic indexing terms for abstracting. Journal of the American Society for Information Science, 46(3):162–174, 1995.

[9] U. Heid. A linguistic bootstrapping approach to the extraction of term candidates from german text. Terminology, 5(2):161–181, 1998.

[10] L.M. Iwanska, N. Mata, and K. Kruger. Fully Automatic Acquisition of Taxonomic Knowledge from Large Corpora of Texts, pages 335–345. MIT/AAAI Press, 2000.

Page 46: Ontology Learning Shalini Gupta - 07305R02 Apoorv Sharma - 07305913 Chirag Patel - 07305909 Shitanshu Verma - 07305037.

References [11] J.U. Kietz, A. Maedche, and R. Volz. A Method for

Semi-Automatic Ontology Acquisition from a Corporate Intranet. , Juan-Les-Pins, France, 2000.

[12] A. Maedche, V. Pekar, and S. Staab.Ontology learning part one - on discovering taxonomic relations from the web.In Proceedings of the Web Intelligence conference. Springer Verlag, 2002.

[13] Vincent Schickel-Zuber, Boi Faltings: Using hierarchical clustering for learning theontologies used in recommendation systems. KDD 2007: 599-608

[14] A . Maedche and S. Staab. Discovering Conceptual Relations from Text. In Proceedings of ECAI 2000, IOS Press, Amsterdam, 2000.

Page 47: Ontology Learning Shalini Gupta - 07305R02 Apoorv Sharma - 07305913 Chirag Patel - 07305909 Shitanshu Verma - 07305037.

Thank You