Presented by: AKHIL GADA CSCI 572 University of Southern California
description
Transcript of Presented by: AKHIL GADA CSCI 572 University of Southern California
Presented by:AKHIL GADA
CSCI 572University of Southern California
Full Text Indexing Based On Lexical Relations An
Application :Software Library by YS Maarek and F.A. Smadja
July 15th , 20102Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries
REQUIREMENT FOR SEARCH IN SOFTWARE LIBRARY
SEARCH FOR FUNCTIONALLY SIMILAR COMPONENTS
E.g. Yahoo Search API and Google Search API for query “I want to search pages”
3Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries July 15th , 2010
A.I. OR Knowledge Base Approach
I.R. OR Free Text Based Approach
ENTER DOMAIN KNOWLEDGE
NO PRIOR KNOWLEDGE REQUIRED
MANUAL OR SEMI-AUTOMATIC COMPLETELY AUTOMATIC
SPECIFIC AND DIFFICULT TO SCALE TO NEW DOMAIN
GENERIC AND VERRY EASY TO SCALE TO NEW DOMAIN
SEMANTIC UNDERSTANDING OF DOCUMENTS
NO SEMANTIC UNDERSTANDING OF DOCUMENTS
4Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries July 15th , 2010
SINGLE KEYWORD LEXICAL RELATION
CONTEXT INFORMATION IS LOST E.g. Apple Fruit VS Apple Computers
REVEALS CONTEXT INFORMATION
HIGH FREQUENCY GENERIC TERMS MIGHT INTRODUCE NOISE . E.g. Word “File” in UNIX manual does not characterize the functionality of any command
HIGH FREQUENCY OF LEXICAL TERM PROVIDES HIGH FUNCTIONAL INFORMATION OF DOCUMENT
E.g. Word “Copy File” in UNIX
VS
5Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries July 15th , 2010
LINEAR IR USING INVERTED INDEX
CLUSTERING IR USING HAC(Hierarchical Agglomerative Clustering)
6Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries July 15th , 2010
LEXICAL RELATIONS TWO WORDS IN A SENTENCE HAVING SYNTACTIC RELATIONSHIP BETWEEN THEM : Subject-Verb, Verb-Direct object , Verb-Indirect object, etc
OPEN CLASS WORD – NOUNS,ADJECTIVE,ADVERBS ARE MEANING BEARING .
CLOSED CLASS WORD – Conjunctions (and, or), Articles (the, a), Demonstratives (this, that), and Prepositions (to, from, at, with). Does not convey any Meaning to sentence
7Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries July 15th , 2010
5 – Word Window
EXTRACT [1] LEXICAL RELATIONS ALGO.[2]
W1
W2
W3
W4
W5
8Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries July 15th , 2010
EXTRACT [1] LEXICAL RELATIONS ALGO. [2]
9Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries July 15th , 2010
EXTRACT [1] LEXICAL RELATIONS ALGO. [2]
10Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries July 15th , 2010
RESOLVING POWER
OUTPUT FROM EXTRACT [1] ALGORITHM. [0]
11Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries July 15th , 2010
SELECT TOP N INFORMATIVE (RESOLVING POWER)LEXICAL RELATION FOR EACH DOCUMENT FORMING PROFILE FOR THE DOCUMENT .
CREATE INVERTED INDEX . [2]
12Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries July 15th , 2010
SIMILARITY MEASURE BETWEEN TWO DOCUMENTS [2]
• LET X = set of top N resolving power lexical relations for document dx Y = set of top N resolving power lexical relations for document dy (X ∩ Y) = Set of Lexical Relations Common Between dx and dy
dx dy∂(dx,dy)
13Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries July 15th , 2010
CLUSTER SIMILAR FUNCTIONAL COMPONENTS USING HIERARCHICAL AGGLOMERATIVE CLUSTERING[2]
{d1}
∂({d1},{d2}) ∂({d3},{d4})
{d2}
{d3}
{d4}
{d5}
∂({d3,d4},{d5})
14Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries July 15th , 2010
INFORMATION RETRIEVAL[2]USER SPECIFY FREE TEXT
QUERY
SEARCH AND RETURN RESULTS - LINEAR I.R. USING INVERTED INDEX
USER SATISFIED ??
ALLOW USER TO TRAVERSE THROUGH CLUSTERED HIERARCHY
NO
15Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries July 15th , 2010
LINEAR INFORMATION RETRIEVAL[2]
dqdq
d1
∂(dq,d2)
∂(dq,d1)
∂(dq,dn)
d2
dn
16Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries July 15th , 2010
GURU : WORKING SYSTEM SNAPSHOT [2]
17Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries July 15th , 2010
EVALUATION[2]
MAINTENANCE COST : INCREMENTAL INSERTION [3] OF NEW COMPONENTS IS EASY
EFFICIENCY: 2.5 secs on RT ;0.15 secs on IBM RISC for query containing 5 to 15 Lexical Relation
RETRIEVAL EFFECTIVENESS : Contd…
18Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries July 15th , 2010
EVALUATION Precision-Recall Curve[ 2]
If c = Total number of records retrieved after executing query q R= Total Number of expected correct result - Determined before
query is executed. r = Total number of correct result retrieved after executing query q.
Then Recall = r/R Prescision= r/c
19Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries July 15th , 2010
PROS:
EASY TO EXTEND TO ANY DOMAIN i.e. GENERIC APPROACH
VERY SIMPLE AND ELEGANT APPROACH
PAPER ADEQUATELY PROVIDED BACKGROUND BY DESCRIBING PAST RESEARCH
20Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries July 15th , 2010
CONS:May fail in following case
E.g. ‘xcalc’ and ‘bc’
21Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries July 15th , 2010
FURTHER RESEARCH:COMBINE KNOWLEDGE BASE APPROACH WITH THIS TECHNIQUE e.g. Knowledge bc=calculator can be added to GURU to increase recall.
IMPROVED ALGORITHMS FOR INCREMENTAL UPDATION OF INDICES .
22Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries July 15th , 2010
References• 0 - Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries by Yoelle S. Maarek, Frank A Smadja
• 1 - F. De Saussure, Cours de Linguistique Geaerale, Qualridme edition. Librairie Payot, Paris, France, 1949.
• 2 – GURU-Information Retrieval For Reuse - Y S. Maarek,Deniel M Berry,Gail E . Kaiser.
• 3 - Kaplan and Maarek, 1990: Incremental maintenance of semantic links in dynamically changing hypertext systems .Interacting with Computers
23Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries July 15th , 2010
Q & A