Dr. TV. Geetha

download Dr. TV. Geetha

of 176

Transcript of Dr. TV. Geetha

  • 8/2/2019 Dr. TV. Geetha

    1/176

    A Special Talk

    am ompu ng am ompu ng

    Dr.T.V.Geetha,

    Tamil Com utin Lab TACOLA De t. of CSE & IST

    College of Engineering Guindy, Anna University Chennai

    Team Co-ordinators: Ranjani Parthasarathy & Dr.Madhan Karky

    27th January 20121Tamil Computing

  • 8/2/2019 Dr. TV. Geetha

    2/176

  • 8/2/2019 Dr. TV. Geetha

    3/176

    -

    Morphologically rich language

    Morphological suffixes convey most of the

    roles played in a sentenceAmbiguity at morphological level

    Ambi uit at semantic level

    3

    Tamil Computing

  • 8/2/2019 Dr. TV. Geetha

    4/176

    Our BasisOur Basis

    Linguistics

    Use of POS Tags

    Use of Word Based Semantics with well defined semantic

    constraints (primitives) UNL

    Rule based Approach & FSA for TamilClustering Approaches

    ro a s c pproac es a ve ayes, on ona an omField, HMM

    Machine Learning Bootstrapping & Unsupervised Approach

    Tamil Computing 4

  • 8/2/2019 Dr. TV. Geetha

    5/176

    Tamil Computing 5

  • 8/2/2019 Dr. TV. Geetha

    6/176

    Language ProcessingLanguage Processing

    Morphological Analyzer

    POS TaggingChunking

    Named Entity Recognition

    Word Sense Disambiguation

    Anap ora Reso ut onSemantic Interpretation

    6Tamil Computing

  • 8/2/2019 Dr. TV. Geetha

    7/176

    7Tamil Computing

  • 8/2/2019 Dr. TV. Geetha

    8/176

    Morphological AnalyserMorphological Analyser -- IntroductionIntroduction

    Most of the textual data contains compound,

    .

    Due to morphological richness, Tamil language

    nee s an ng o ose wor s.

    Development of an Integrated Morphologicalanalyser (Compound, Numeral, Colloquial)

    Needed to tackle

    News & Lyrics

    analysis

  • 8/2/2019 Dr. TV. Geetha

    9/176

    Morphological AnalyserMorphological Analyser Word processingWord processing

    Morphological suffix stripping - (Conventional

    -

    The word W is processed by compound

    The word W is processed by numeral

    anal ser

    (Right to left by concatenating vowels and

    consonants and iterativel checkin al habets

    in the Morph Dictionary and applying Tamil

    grammar rules)

  • 8/2/2019 Dr. TV. Geetha

    10/176

    MorphologicalMorphological AnalyserAnalyser

    ompoun or epresen a on ompoun or epresen a on

  • 8/2/2019 Dr. TV. Geetha

    11/176

    MorphologicalMorphological AnalyserAnalyser

  • 8/2/2019 Dr. TV. Geetha

    12/176

    Compound AnalyserCompound Analyser RulesRules ClassicationClassication

  • 8/2/2019 Dr. TV. Geetha

    13/176

    Morphological AnalyserMorphological Analyser

    ompoun na yser

    Based on Finite State Transducer (FST)

    Not only handles simple compounding

    Handling compounding between two words that may cause inflectional

    variations during compounding process

    Ex : (i) (Golden statue)Rule:

    If, the second constituents first alphabet is Hard consonant

    , ,

    then No ModificationNo Modification

    +

  • 8/2/2019 Dr. TV. Geetha

    14/176

    Morphological AnalyserMorphological Analyser ompoun na yser

    Ex : (Root tree)u e :

    If, the second constituents first alphabet is Consonant , alphabet

    (Root)+

    InsertionInsertion

  • 8/2/2019 Dr. TV. Geetha

    15/176

    Morphological AnalyserMorphological Analyser ompoun na yser

    Ex : (Sand pot)u e :

    If, the second constituents first alphabet is Hard ConsonantThen, the first constituents last alphabet - is replaced by

    ep acemenep acemen

    (Pot)+

  • 8/2/2019 Dr. TV. Geetha

    16/176

    Morphological AnalyserMorphological Analyser ompoun na yser

    Ex : (Banana)u e :

    If, the second constituents first alphabet is Hard Consonant

    Then, the first constituents last alphabet is the same Hard

    consonant, then it is deleted

    DeletionDeletion

    (Fruit)

  • 8/2/2019 Dr. TV. Geetha

    17/176

    Compound AnalyserCompound Analyser RulesRules ClassificationClassification

  • 8/2/2019 Dr. TV. Geetha

    18/176

    Compound Word AnalyserCompound Word Analyser -- FSTFST

    n e a e rans ucer

    Two taps which describes the input (lexical form) and output

    (Surface form) sequences

    It has seven tuples

    1 represents the finite alphabet, namely the input alphabet

    2 represents the finite alphabet, namely the output alphabet

    (bi1,......bik) s a n e se o s a es , , , , , ,

    i Q is the initial state(S0 )

    F is a subset of Q, the set of final states;(S6 )

    Here a:b represents the replacement of a in the surface form to b

    in the lexical form

    form.

  • 8/2/2019 Dr. TV. Geetha

    19/176

    Compound AnalyserCompound Analyser FSTFST

  • 8/2/2019 Dr. TV. Geetha

    20/176

    Morphological AnalyzerMorphological Analyzer

    Numeral analyzer

    Based on Finite tate Transducer F T

    Numbers, one to ten, hundred, thousand, lakh

    and crore can be directly converted into numbers

    Ex : (Ten)Rule: No modification

    10

  • 8/2/2019 Dr. TV. Geetha

    21/176

    Morphological AnalyserMorphological Analyser umera na yser

    Ex : (Five Thousand)

    If, the second constituents first alphabet is Vowel

    ,insert ''

    * 5000

    InsertionInsertion

  • 8/2/2019 Dr. TV. Geetha

    22/176

    Morphological AnalyserMorphological Analyser umera na yser

    Ex : (Twenty Five)u e :

    If, the second constituent first alphabet is Vowel

    , ,

    then replace that with

    ep acemenep acemen + 25

  • 8/2/2019 Dr. TV. Geetha

    23/176

    Morphological AnalyserMorphological Analyser umera na yser

    Ex : (Twenty Three)u e :

    If, the second constituents first alphabet is Soft Consonant

    ,

    then delete the hard consonant

    DeletionDeletion (Three)+

  • 8/2/2019 Dr. TV. Geetha

    24/176

  • 8/2/2019 Dr. TV. Geetha

    25/176

    Morphological AnalyserMorphological Analyser

    Colloquial Analyser

    ase on pa ern mapp ng approac

    To the best of our knowledge, no previous work

    formal word.

    mapping for transforming informal (colloquial)

    written word into formal written word.

  • 8/2/2019 Dr. TV. Geetha

    26/176

    Colloquial AnalyserColloquial Analyser

    Pattern based Approach based on spelling variation rules

    Word processing Right to left

    List of Spelling variation rules

    Suffix Mapping of ending patterns

    Suffix Mapping of ending patterns with Morphographemic

    changes

    u x mapp ng o en ng pa erns w c ec ng o one wo

    preceding characters

    In all the rules, pattern p1 of colloquial form is converted into

    pattern p2 of normal form

  • 8/2/2019 Dr. TV. Geetha

    27/176

    Suffix Mapping of ending patternsSuffix Mapping of ending patterns

  • 8/2/2019 Dr. TV. Geetha

    28/176

    Colloquial AnalyserColloquial Analyser

    Suffix Mapping of ending patterns

    Endin attern 1 is re laced with attern 2

    Ex : (irukean) irukireanPattern 1 Pattern 2

    Replaced

  • 8/2/2019 Dr. TV. Geetha

    29/176

    Suffix Mapping of ending patterns withSuffix Mapping of ending patterns with

    C ll i l A lC ll i l A l

  • 8/2/2019 Dr. TV. Geetha

    30/176

    Colloquial AnalyserColloquial Analyser

    Morphographemic changes

    ,

    passed for morphographemic changePattern 1 Pattern 2

    (thambi kittey) (thambi yidam)

    Replaced

    morphographemic

  • 8/2/2019 Dr. TV. Geetha

    31/176

    Suffix mapping of ending patterns withSuffix mapping of ending patterns with

    C ll i l A lC ll i l A l

  • 8/2/2019 Dr. TV. Geetha

    32/176

    Colloquial AnalyserColloquial Analyser

    one/two preceding characters

    n ng pattern p s rep ace w t pattern p , a ter

    checking one or two preceding characters.Pattern 1 Pattern 2

    x :

    (kaa nju) (kaa yndhu)

    Replaced

    Check one preceding character

  • 8/2/2019 Dr. TV. Geetha

    33/176

    Compound WordCompound Word-- SemanticSemantic

    RelationRelation

    Identifying the metaphor words Identif in the characteristics of the com onents

    Identifying the comparison relation between the

    components This relation are extracted by using the Tamil

    grammar of compound word ( - thogai), the

    part o speec tag an semant c constra nts

  • 8/2/2019 Dr. TV. Geetha

    34/176

    Compound WordCompound Word-- Semantic RelationSemantic Relation

    Relations 14 are identified

    (black hair) + ++ hair (noun + pof>head)

    Black

    Noun+icl>color

    concept Semantic Constraint

  • 8/2/2019 Dr. TV. Geetha

    35/176

    35Tamil Computing

  • 8/2/2019 Dr. TV. Geetha

    36/176

    POS TaggingPOS Tagging

    What is POS tagging?

    Part-of-speech tagging is a process of assigning a part-of-speech like noun, verb,

    pronoun, preposition, adverb, adjective or

    other lexical class marker to each word in a

    sen ence.

    Tag sets for different languages

    For Tamil , a tag set is formulated by aliterature survey a view of the standard tag set

    for English language like Penn tree bank, wall

    street journal tag set. 36Tamil Computing

  • 8/2/2019 Dr. TV. Geetha

    37/176

    oun a egory

    N Noun NP Noun Phrase NN Noun + noun

    SP Sub-ordinate clause conjunction Phrase SCC Sub-ordinate clause conjunction

    Other category

    NNP Noun + Noun Phrase

    IN Interrogative noun INP Interrogative noun phrase PN Pronominal Noun

    ar ar c e adj Adjective

    Iadj Interrogative adjective Dadj Demonstrative adjective Inter Intersection

    VN Verbal Noun VNP Verbal Noun Phrase Pn Pronoun PnP Pronoun Phrase

    Int Intensifier CNum Character number Num Number 25

    DT Date time ,

    V Verb

    Nn Nominal noun NnP Nominal noun Phrase

    Verb category

    VP Verbal phrase Vinf Verb Infinitive Vvp Verb verbal participle Vrp Verbal Relative participle

    ux ary ver FV Finite Verb NFV Negative Finite Verb adv Adverb

    37Tamil Computing

  • 8/2/2019 Dr. TV. Geetha

    38/176

    Characteristics by analyzing 4,70,000 wordsTamil take on more than one morphological suffix;

    going up to 13.The role of the sequence of the morphological-

    of-speech tag.

    79 morpheme components were identified, which

    combination of integrated suffixes

    Using these morpheme properties we design a

    Analysis of morphology of words and design of

    morpheme components

    38Tamil Computing

  • 8/2/2019 Dr. TV. Geetha

    39/176

    39Tamil Computing

  • 8/2/2019 Dr. TV. Geetha

    40/176

    ChunkingChunking

    What is Chunking?

    Chunking is the task of identifying andegmenting the text into syntactically related

    non overlapping groups of words.

    Need for chunking

    one of the important preprocessing for all other

    aid to extract crux part of information fromsentences and documents

    The chunk types are

    ADJP, ADVP, CONJP, INTJ, NP, PP and VP.

    40Tamil Computing

  • 8/2/2019 Dr. TV. Geetha

    41/176

    Our ApproachOur Approach

    Our Approach

    The morpheme features of words contributein identifying boundaries of chunking.

    the features ,CRF model is designed.

    Conditional Random Fields models for identifyingchunking boundaries

    41Tamil Computing

  • 8/2/2019 Dr. TV. Geetha

    42/176

    words Transliteration POS chunk

    [intha] B -NPState features

    F(word(-2),Ctag)

    Transition features

    F(word(-2),word(-1), Ctag)

    [thakavalin] I-NP [atippataiyil] I-NP [pOlicAr] B-NP

    F(word(-1),Ctag )

    F(word(0),Ctag )F(word(1),Ctag )

    F(word(-1),word(0), Ctag)

    F(word(0),word(1), Ctag)

    [andthandtha] B-NP [mAvatta] I-NP [pOlish] I-NP

    F(word(2),Ctag )

    F(POS(-2),Ctag )

    F POS -1 C

    wor ,wor , tag

    F(POS(-2),POS(-1),Ctag )

    F(POS(-1),POS(0),C )

    [cOthanaic] I-NP [cAvati] I-NP

    F(POS(0),Ctag )

    F(POS(1),Ctag )

    F(POS(0),POS(1),Ctag )

    F(POS(1),POS(2),Ctag ) - [vAkanac] B-NP [cOthanaiyil] I-NP

    , tag

    [Itupattanar] B-VP

    42Tamil Computing

  • 8/2/2019 Dr. TV. Geetha

    43/176

  • 8/2/2019 Dr. TV. Geetha

    44/176

    Named Entity Recognition (NER)Named Entity Recognition (NER)

    Locate and classify atomic elements in text into

    redefined cate ories

    Proper names (people, organizations, locations)

    expressions of time

    monetary values

    percentagesNeed for NER- Robust handling of proper names essential

    for many applications

    Pre-processing for different classification levels

    Key part of Information Extraction system

    Information filtering

    Information linkin

    44Tamil Computing

    d ld l

  • 8/2/2019 Dr. TV. Geetha

    45/176

    Indian language NERIndian language NER

    use two levels of linguistic evidence to

    e o m mo e g:

    Context cues

    Attributes to identify entities.

    A standard list of attributes is maintained initially

    s up a e - su a e earn ng a gor m.

    Attributes are thus extracted and used to identify

    NEs within the framework of the same s stem an

    NER and its associated attribute extractor.

    45Tamil Computing

    N d E i R i iN d E i R i i

  • 8/2/2019 Dr. TV. Geetha

    46/176

    Named Entity RecognitionNamed Entity Recognition

    Challenges

    Presence of a free word order

    emma za on cu

    Features

    Postpositions

    Case markers

    PNG marker in Verb

    46Tamil Computing

    N d E i R i iN d E i R i i

  • 8/2/2019 Dr. TV. Geetha

    47/176

    Named Entity RecognitionNamed Entity Recognition

    For Persons:

    Presence of titles and honorifics like , [thiru, thalaivar] Presence of suffices like [Ar]. , [Al] in the corresponding noun

    phrase.

    Presence of post-position , Presence of adjacent words like [ndakar]. [ndathi]. m va am

    For Organizations:

    Presence of adjacent words like [ndiRuvanam]. [thuRai] [ [For Time/Date:

    Presence of adjacent words like [thEthi]. [ANtu]. m am

    47Tamil Computing

  • 8/2/2019 Dr. TV. Geetha

    48/176

    Trainin data

    Shallow parsing Semantic parsing Statistical processing

    NE tableDictionary

    Dictionary Entries

    Clue

    Extraction

    Verb

    Rules

    Training data

    48Tamil Computing

  • 8/2/2019 Dr. TV. Geetha

    49/176

    49Tamil Computing

    Steps in Expectation MaximizationSteps in Expectation Maximization

  • 8/2/2019 Dr. TV. Geetha

    50/176

    Steps in Expectation MaximizationSteps in Expectation Maximization

    Seed probability estimates

    Related words Ordering

    Smoothen the seed probability

    Perform ambiguity resolution Maximized probability values

    50Tamil Computing

    Modified EM algorithmModified EM algorithm

  • 8/2/2019 Dr. TV. Geetha

    51/176

    Modified EM algorithmModified EM algorithm

    Two problems were encountered with the

    traditional E-M al orithm:

    Performed only positional analysis , and amodification was required for free word order

    languages like Tamil

    it was syntactically oriented, and modification was

    The modification process called Quantum

    entan lement solves both the above roblems.

    51Tamil Computing

    ExampleExample --

  • 8/2/2019 Dr. TV. Geetha

    52/176

    En oc 0.49 . 0.01 0.01.

    .

    0.01

    .

    0.01

    0.01

    0.86

    0.01.

    Verb 0.01 0.01 0.12 0.92

    52Tamil Computing

  • 8/2/2019 Dr. TV. Geetha

    53/176

    53Tamil Computing

    Vaanavil Tamil parserVaanavil Tamil parser

  • 8/2/2019 Dr. TV. Geetha

    54/176

    Vaanavil Tamil parserVaanavil Tamil parser

    Free word order

    Simple sentences any number of nouns,

    , ...

    Clausal sentences identification using cue

    wor s or su xesNested clauses

    54Tamil Computing

    3 Constituent Formation3 Constituent Formation

  • 8/2/2019 Dr. TV. Geetha

    55/176

    3. Constituent Formation3. Constituent Formation

    Two main components are noun and verb constituent

    A noun constituent can contain only noun (Ex.

    ) or

    can be of the following form

    a ect ve a ect ve c ause a ect ve a ect veclause)* (adjective)* noun (case marker) (post position)

    Ex.

    or

    noun clause Ex.

    55Tamil Computing

    ConstituentConstituent Formation (cont )Formation (cont )

  • 8/2/2019 Dr. TV. Geetha

    56/176

    ConstituentConstituent Formation (cont..)Formation (cont..)

    Verb Constituent :

    (adverb clause)* (adverb)* verb (suffix)*

    Ex.

    .

    2.

    56Tamil Computing

    ConstituentConstituent ormation in Simpleormation in Simple

  • 8/2/2019 Dr. TV. Geetha

    57/176

    Words are grouped based on the function they perform.

    . ect ves are groupe w t t e r nouns.

    - Adjectives are adjacent to their noun.

    - Adverbs can occur anywhere in the sentence prior to itsverb

    x.

    .

    .

    57Tamil Computing

  • 8/2/2019 Dr. TV. Geetha

    58/176

    Grouping of ClausesGrouping of Clauses

  • 8/2/2019 Dr. TV. Geetha

    59/176

    Grouping of ClausesGrouping of Clauses

    Distinguishing feature of the parser

    cue suffixes or cue phrases (Ex Verbal

    , , .Grouping is done by position of the cues

    59Tamil Computing

  • 8/2/2019 Dr. TV. Geetha

    60/176

    Ex...

    .

    Noun clause :

    Adjective clause :

    Adverb clause :

    Converted minimal simple sentence: .

    60Tamil Computing

    TreeTree GenerationGeneration

  • 8/2/2019 Dr. TV. Geetha

    61/176

    Position of each word in the sentence is also

    shown to take care of free word order

    First the converted minimal simple sentence is

    considered to generate the tree.

    Ex.

    The NCs and VC are expanded to generate the

    tree for the actual input sentence.

    61Tamil Computing

    W lk th h f 4 h ith l

  • 8/2/2019 Dr. TV. Geetha

    62/176

    Walk throu h of 4 hases with an exam le :

    .

    (After phase 1)

    un en e a v (con) (N) (adv) (rpl)

    (N) (N) (N) (vpl)(V).

    (After phase 2)

    a v con(N) (adv) (rpl) N N N v l(V).

    62Tamil Computing

    (After phase 3)

  • 8/2/2019 Dr. TV. Geetha

    63/176

    (After phase 3)

    ( )

    ( ) .

    ( After phase 4)

    63Tamil Computing

    After expanding with the three clausesAfter expanding with the three clauses

  • 8/2/2019 Dr. TV. Geetha

    64/176

    64Tamil Computing

  • 8/2/2019 Dr. TV. Geetha

    65/176

    65Tamil Computing

    Word Sense DisambiguationWord Sense Disambiguation

  • 8/2/2019 Dr. TV. Geetha

    66/176

    gg

    The rocess of identif in which sense of a word in a sentence, when

    the word has multiple meanings

    Noun and Verb sense Disambiguation

    ootstrapp ng uses orp o og ca u xes, , emant c

    constraints and UNL relations (for verbs)

    Noun

    Verb

    66

    Tamil Computing

    ExampleExample Noun DisambiguationNoun Disambiguation

  • 8/2/2019 Dr. TV. Geetha

    67/176

    Word 1: showing use of context for noun sense disambiguation

    Example 1: aaru river, number, get cold, heal

    Sense POS

    number NounExample 1.1:

    pandiyanaarupadaikal kondu por thoduthaan

    action>

    river NounExample 1.2:

    Tamilnattin periyaaarukaveri aagum

    67

    ng>

    Tamil Computing

    ExampleExample Verb DisambiguationVerb Disambiguation

  • 8/2/2019 Dr. TV. Geetha

    68/176

    Word 2: showing use of context for verb sense disambiguation

    Exam le 2: adai arm disease offer create.

    offer VerbExample 2.1:

    akthan iraivanukku azan alai adaiththaan

    create Verb

    .

    iraivan makkalai padaithaan

    68

    ,

    Tamil Computing

  • 8/2/2019 Dr. TV. Geetha

    69/176

    69Tamil Computing

    Anaphora ResolutionAnaphora ResolutionAnna

    University

  • 8/2/2019 Dr. TV. Geetha

    70/176

    ppy

    The problem of resolving what a pronoun, or a

    Approaches en er ng eory rennan e a ,

    Hobbs algorithm (Hobbs, 1978)

    Applications

    Summarization

    Question AnsweringInformation Retrieval

    70Tamil Computing

    Anaphora Resolution in TamilAnaphora Resolution in Tamil

  • 8/2/2019 Dr. TV. Geetha

    71/176

    Resolving Anaphora in Tamil Text

    -

    Morphologically rich language orp o og ca su xes convey mos o e

    semantic roles played in a sentence

    se o as e as s or eman crepresentation

    71Tamil Computing

    Our ApproachOur Approach

  • 8/2/2019 Dr. TV. Geetha

    72/176

    Classification of Anaphora Persons, Places andEvents

    Centering Theory - modified by incorporating -Word level semantics - UNL Semantic constraints

    Graph based approach - Sentence level semantics

    - UNL graphsAbsence of Case suffixes have been handled

    using UNL graphs

    Plural and Event pronouns associated withmultiple antecedents - tackled using UNL

    graphs

    72Tamil Computing

    Classification of AnaphorClassification of Anaphor

  • 8/2/2019 Dr. TV. Geetha

    73/176

    Anaphora representing Persons

    Person Ana hora - Nouns Noun hrases

    avan, avaL, avar, ivan, ivaL, ivar and plural pronouns avarkaL

    and ivarkaL

    Raju nandraaka padiththaan. avan thervu ezuthinaan

    Maanavarkal nandraaka padiththaarkal. avarkaL thervu

    ezu nar a

    Anaphora representing Places

    Place Anaphora - Nouns, Noun phrases - athu, ithu

    Adverbs such as angu and ingu can also acts as pronouns representingplaces

    Examples

    tiruchy tamilnaattin periya nagarangalil ondru. Ingu ammankovil uLLathu. ithil aayiram thooNkal uLLana.

    73Tamil Computing

    Anaphora representing EventsAnaphora representing Events

  • 8/2/2019 Dr. TV. Geetha

    74/176

    Event Anaphora - Verb phrases, clauses and segments ofsentences

    Pronouns such as athu, ithu with dative case, accusative case

    represent events

    swami aanmeega soRppozhivu nikazhthinaar. athaikaaNa makkaL koodiyirunthanar.

    74Tamil Computing

    Ambiguous PronounsAmbiguous Pronouns

  • 8/2/2019 Dr. TV. Geetha

    75/176

    Pronouns such as athu, ithu can represent

    Higher level of semantics and verbseman cs s nee e

    Examples

    maduraiyil meenatchi kovil ullathu. Ithilaayiram thoonkal uLLana.

    madurayil ulla meenatchi kovilil aanmeekasorpozhivu nadaipeRRathu. Ithileera amaana ma a pange anar.

    75Tamil Computing

    Semantics IntegratedSemantics Integrated CenteringCentering

  • 8/2/2019 Dr. TV. Geetha

    76/176

    Word level semantics UNL Semanticonstra nts

    Anaphora classification

    Filter out the non-anaphoric expressions

    Filter out the non-referring expressions

    ura pronouns as een tac e to certa nextent

    76Tamil Computing

    Anaphora Resolution using UNLAnaphora Resolution using UNL

  • 8/2/2019 Dr. TV. Geetha

    77/176

    Plural Pronouns having multiple concepts asa ece e s

    Event Pronouns

    Two components

    Use of UNL relations to extract the conceptsfor anaphora resolution

    Co-ordinating UNL Relations

    Sub-ordinating UNL Relations

    Use of UNL subgraphs for anaphorareso u on

    77Tamil Computing

    Use of UNL Relations to extract theUse of UNL Relations to extract the

  • 8/2/2019 Dr. TV. Geetha

    78/176

    Co-ordinating UNL Relations

    Relations obtained for referring expressions

    exactly matches with the relations obtainedfor anaphor

    Sub-ordinating UNL Relations

    Relations obtained for anaphor depends onthe relations of referring expressions

    Rules agt obj

    ben agt

    plc aoj78Tamil Computing

    CoCo--ordinating UNL Relationsordinating UNL Relations --

  • 8/2/2019 Dr. TV. Geetha

    79/176

    raju ramuvai mirattinaan. avan payanthaan

    Mirattu threaten Payam

    (agt>thing, obj>thing)

    obj

    scare(obj>thing)

    Ramu

    iof> erson

    agt obj

    RajuAvan, he

    (Pronoun)

    79Tamil Computing

    Subordinating UNL RelationsSubordinating UNL Relations --

  • 8/2/2019 Dr. TV. Geetha

    80/176

    raamalingaththai annan sabapathi kaNdiththaar.

    Anaal avar avarukku kattu adavillai.

    an ppu, co

    (agt>thing, obj>thing)

    attuppa u

    Abide (agt>thing, obj>thing)

    Annan saba athi

    obj

    agt

    agt

    ben

    Ramalin am

    icl>person

    Avar, he

    He, pronoun

    (iof>person) pronoun

    80Tamil Computing

    Event PronounsEvent Pronouns -- ExampleExample

  • 8/2/2019 Dr. TV. Geetha

    81/176

    ..Happen

    icl>action agree

    Ramalingam

    iof>person

    ,

    agt

    to

    Spiritual

    aoj>thing

    aoj

    Avar Athu (kku)

    mod

    S eech

    icl>talk

    81Tamil Computing

  • 8/2/2019 Dr. TV. Geetha

    82/176

    Semantic InterpretationSemantic Interpretation ,

  • 8/2/2019 Dr. TV. Geetha

    83/176

    ,

    concepts that the system can understand

    The rocess of ma in a s ntacticall anal sed text of natural

    language to a representation of its meaning

    Semantic Interpretation - Aspects

    Word meaning & Word Sense Disambiguation

    Lexical Disambiguation

    Structural Disambi uation Semantic Relations

    Issues

    Coreference and Anaphora

    Lexical Semantics

    Logical Semantics

    83Tamil Computing

    Applications of Semantic InterpretationApplications of Semantic Interpretation

  • 8/2/2019 Dr. TV. Geetha

    84/176

    Information Extraction

    ex rac ng mean ng u emp a es ex

    Summarization

    important inter relation between concepts

    Question Answerin Extracting semantic similar sentences that are

    answers to questions

    Multilingual Generation & Machine Translation

    Intermediate semantic representation

    84Tamil Computing

    Purposed WorkPurposed Work

  • 8/2/2019 Dr. TV. Geetha

    85/176

    Semantic Interpretation of Tamil Text

    se o as e as s or eman c

    representation Use of UNL based information for NLP

    processing

    Use of UNL graph for Summarization and

    Question Answering

    85Tamil Computing

    Semantic Relation (UNL) ExtractionSemantic Relation (UNL) Extraction

  • 8/2/2019 Dr. TV. Geetha

    86/176

    Morpho-Semantic Rule Based Approach x s ng pproac es

    Most UNL Enconverters use syntactic parser orp o-syn ac c ea ures

    Use of rich Morphological features of Tamil for

    seman c re a on ex rac on

    Design of Rules based on Morpho-Semantic

    Use of semantic constraint information from

    86Tamil Computing

  • 8/2/2019 Dr. TV. Geetha

    87/176

    Enconversion Process

    Identify possible UNL relations of a word Wi ass

    Disambiguate the relations, if multiple unl

    re a ons ass gne or a wor Identify the connected concepts with the word

    i

    87Tamil Computing

  • 8/2/2019 Dr. TV. Geetha

    88/176

    Morphology - Case suffixes associated with the word

    Connective Natural Language word,

    Co-occurrence

    aamanaa seyyappa a u

    POS Part Of Speech tag of the word

    oun, er , ect ve, ver

    Semantics

    icl>person, iof>place, icl>time etc.

    88Tamil Computing

    Construction of UNL Graph usingConstruction of UNL Graph using

  • 8/2/2019 Dr. TV. Geetha

    89/176

    wo asseswo asses

    89Tamil Computing

  • 8/2/2019 Dr. TV. Geetha

    90/176

    Pass 1 Pass 2

    90Tamil Computing

    Our ApproachOur Approach -- BootstrappingBootstrapping

  • 8/2/2019 Dr. TV. Geetha

    91/176

    Pattern representation Generic pattern to tackle

    Features

    POS

    Matching

    Partial matching

    gnore up es w c o no a e par n

    identifying semantic relations91Tamil Computing

  • 8/2/2019 Dr. TV. Geetha

    92/176

    Features used for Probability estimation

    Morphological Suffix

    POS

    Semantic Constraints

    Startin and Endin s mbols

    Relation between concept pairs can occur

    Semantic Similarity based on UNL ontology

    92

    eature agge corpus tagge us ng ru e-

    based approachTamil Computing

  • 8/2/2019 Dr. TV. Geetha

    93/176

    93Tamil Computing

    Question ClassificationQuestion Classification

  • 8/2/2019 Dr. TV. Geetha

    94/176

    Need for QC & Answer types

    question type and then map it to an expected

    answer typea s e gges c y n e n e a es

    Question Type: Q_LOCATION_CITY

    overall accuracy of a question answeringsystem

    Morpheme based CRF approach to Questionass ca on an xpec e nswer ype e ec on

    94Tamil Computing

    Factoid typeWho [ Who is India's prime minister ?]

  • 8/2/2019 Dr. TV. Geetha

    95/176

    Factoid typeWho - ? [ Who is India's prime minister ?]When - ?

    en n a ecame n epen en coun ry

    Where - ? [Where was Gandhiji born?]Which - ?

    Abbreviation - ... ?[What is the expansion of IAS?]

    How - [] ?How does DC generator operate?

    Who - ? [ Who is Manmohan singh?]Define - - [ Define Kirchoffs Law]

    List typeEnumerate - .[ Enumerate districts in Tamil nadu]List - List out states in India

    95Tamil Computing

    Question ClassificationQuestion Classification, , , ,

    REASON,OTHER

  • 8/2/2019 Dr. TV. Geetha

    96/176

    REASON,OTHER

    NUM AGE, AREA, CODE, COUNT, DISTANCE, FREQUENCY,, , , ,

    PRICE, RANGE, SPEED, TELCODE, TEMPERATURE,

    WEIGHT, LIST, OTHER

    HUM ALIAS, DESCRIPTION, ORGANIZATION, PERSON,LIST, OTHER

    OBJ ANIMAL, CITY, COLOR, CURRENCY, ENTERTAIN,

    FOOD, INSTRUMENT, LANGUAGE, PLANT, RELIGION,SUBSTANCE, VEHICLE, LIST, OTHER

    LOC ADDRESS, CITY, CONTINENT, COUNTRY, ISLAND,LAKE, MOUNTAIN, OCEAN, PLANET, PROVINCE,

    RIVER, LIST, OTHERTIME DAY, MONTH, RANGE, TIME, YEAR, LIST, OTHER

    By TREC96Tamil Computing

  • 8/2/2019 Dr. TV. Geetha

    97/176

    Factoid Type Question Factoid Type QuestionAnsweringAnswering

    97Tamil Computing

    Our ApproachOur Approach

  • 8/2/2019 Dr. TV. Geetha

    98/176

    Bag of key words matching.

    In extracted assa e the terms that is in uestion are removed.

    The remaining concept or entity terms may be answers.

    Person - Named Entity, Possible case marker, Question word case

    marker

    Location - Considering possible case markers

    Time - Temporal word database, number range

    Quantity - Possible words in database

    () - After question term as definition term -

    98Tamil Computing

    Sentence to extract predicate relationSentence to extract predicate relation

    Sentences

  • 8/2/2019 Dr. TV. Geetha

    99/176

    Wordnet

    reprocess ng

    Predicate Extractionre cate re at on

    ExtractionA (x ,y)

    Rule Dictionary

    Ta ed Predicate rule

    training

    document

    learning

    The relation graph gives semantic relation

    This semantic information provide filteringout the required Answer part

    99Tamil Computing

  • 8/2/2019 Dr. TV. Geetha

    100/176

    De initional T e uestion De initional T e uestionAnsweringAnswering

    100Tamil Computing

    Definitional QA ProcessDefinitional QA Process

    Due to the free word nature of Tamil the ranked sentences will not be the

  • 8/2/2019 Dr. TV. Geetha

    101/176

    prcise answer for the question.

    o e e n on erms rom e sen ences are ex rac e us ng some s or

    patterns (K Soo Han,2007)( Jinxi Xu, 2003) as given below.

    lpiRanthAr

    Am Andu

    Am mAtham

    ivarathu thanthaiyAr < father Name>, thAyAr

    Am Andu maRainthAr

    The leaf nodes of the answer graph give the details presented in the

    sentence.

    The definition answer has been created using the definition templates.

    Use of statisticall rocessed seed information for classification

    and scoring of sentences for inclusion in the answer graph

    representing the definitional answer to who questions101Tamil Computing

    arge erm ame o erson

  • 8/2/2019 Dr. TV. Geetha

    102/176

    WEBDocument retrieval

    us ng

    Sentence classificationStatistical processing

    Sentence tagging based on seed information

    Definitional

    Sentence

    corpus Seed Information

    Sentence rankingTerm Probability

    Definition term extractor

    Definition

    102Tamil Computing

    Sentence ClassificationSentence Classification

    S. No Category Features( ) n dT W tN

    =

  • 8/2/2019 Dr. TV. Geetha

    103/176

    S. No Category Features

    1 Birth (piRappu)( )

    N

    (piRanthAr)(thOnRiNar)2 Parent (peRROr)

    nd= No. of documents,

    in which the term t

    occurred (thanthaiyAr)3 Education (kalvi)(padippu)

    (

    N= Total number of

    documents

    bt+=

    (pa4 Work (paNi)(vElai) viruthu(parisu)6 Death (iRanthAr)(maRainthAr)where w is the weights vector of7 General (pErasiriyar)(vinjAni)(arasiyalvAthi)

    features, b is intercept

    103Tamil Computing

    1931 15 . .

  • 8/2/2019 Dr. TV. Geetha

    104/176

    . .

    . . .

    .1981 . .

    1997 . . . .

    . .

    .

    .

    . . . . .

    104Tamil Computing

    .

    . .

  • 8/2/2019 Dr. TV. Geetha

    105/176

    . .

    < > < >.1981 . 1990 . 1997 .

    . . . . . .

    . .

    . < > < > .

    . .

    105Tamil Computing

  • 8/2/2019 Dr. TV. Geetha

    106/176

  • 8/2/2019 Dr. TV. Geetha

    107/176

    1931 15

    . .

    . 1981 , 1990 , 1997 .

    .

    107Tamil Computing

  • 8/2/2019 Dr. TV. Geetha

    108/176

    Tamil Document SummarizationTamil Document SummarizationUsing Semantic Graph MethodUsing Semantic Graph Method

    108Tamil Computing

    Our WorkOur Work

    Capturing semantic features of the

  • 8/2/2019 Dr. TV. Geetha

    109/176

    Capturing semantic features of the.

    Identifying key concepts and relations for.

    Using machine learning model to identify

    semantic graph.

    109Tamil Computing

    DetailedDetailed DesignDesign

    Linguistic Analysis

    SEMANTIC GRAPH GENERATION

    EXTRACTING SUMMARYSENTENCES

  • 8/2/2019 Dr. TV. Geetha

    110/176

    g ySyntactic and Analysing &

    Semantic analysis Logical Form

    Parsing

    Co reference ResolutionSUB GRAPH

    Linguistic

    Analysisfor Named Entities

    Identification of

    Named EntitiesFeature Set

    Identification

    ApproachCoreference Resolution

    Linguistic & SemanticGraph attributes,

    Document discourse

    structure

    Semantic

    Normalization

    WordNet

    Learning Algorithm

    Construction of

    Semantic Graph SVM110Tamil Computing

    After training the learned model is used to predict theimportant nodes of the given documents semantic graph.

  • 8/2/2019 Dr. TV. Geetha

    111/176

    important nodes of the given document s semantic graph.

    ent cat on o u rap

    The sub graph of the large semantic graph is generated

    Extraction of Summary Sentences

    graph are extracted from the input document.

    111Tamil Computing

    Sample Input DocumentSample Input Document

  • 8/2/2019 Dr. TV. Geetha

    112/176

    112Tamil Computing

    Morphological Analyzer OutputMorphological Analyzer Output

  • 8/2/2019 Dr. TV. Geetha

    113/176

    113Tamil Computing

  • 8/2/2019 Dr. TV. Geetha

    114/176

    114Tamil Computing

    Graph GenerationGraph Generation

  • 8/2/2019 Dr. TV. Geetha

    115/176

    115Tamil Computing

    Identification ofIdentification of SubSub--GraphGraph

  • 8/2/2019 Dr. TV. Geetha

    116/176

    116Tamil Computing

    Extraction of Summary SentencesExtraction of Summary Sentences

  • 8/2/2019 Dr. TV. Geetha

    117/176

    117Tamil Computing

  • 8/2/2019 Dr. TV. Geetha

    118/176

    Tamil Summary Generation forTamil Summary Generation fora Cricket Matcha Cricket Match

    Tamil Computing 118

  • 8/2/2019 Dr. TV. Geetha

    119/176

    To propose a framework for automatic analysis and summary

    generation for a cricket match in Tamil, with the scorecard of.

    The framework proposes a method to evaluate the

    interestingness of a cricket match.

    The framework proposes a customization model for the

    summary.

    e ramewor a so proposes met o s or eva uat ng t e

    humanness of the generated summary.

    a a n ng an na y cs a a n ng an na y cs

  • 8/2/2019 Dr. TV. Geetha

    120/176

    Modified version of Apriori algorithm is used to find the

    association rules from the feature vectors.

    performed, CoV is plotted against average to give an idea

    about how consistent the player is.

    The interestingness of the match is calculated based on theweighted average of the scores assigned to the factors

    Individual records made, High run rate, Series state, Relative

    position in international ranking, Reaction in social

    ne wor s e c.

    Sentence GenerationSentence Generation

  • 8/2/2019 Dr. TV. Geetha

    121/176

    The sentence which is the most apt to the current event under

    consideration is selected The vocabulary used in the sentence and the depth to which an

    event is discussed is also varied based on the expert level of the user

    generator along with the desired case endings and the generated

    variants are added to the sentences.

    The system uses the morphological generator developed at TaCoLa

    + =

  • 8/2/2019 Dr. TV. Geetha

    122/176

    Tamil Computing 122

  • 8/2/2019 Dr. TV. Geetha

    123/176

    Clustering for news eventClustering for news eventdetectiondetection

    Tamil Computing 123

    Why UNL context Cluster?Why UNL context Cluster?

    Identifying semantic coherence between

  • 8/2/2019 Dr. TV. Geetha

    124/176

    two sentences is based on overlapping of

    terms between the sentences. In news paper article, the term overlap

    between two sentences is minimal.

    Each sentence can have more than one

    -

    sentences.

    Tamil Computing 124

    UNL based context clustering for news event

    etect on- vent na ys s

    In natural lan ua e text, an event anal sis involves

  • 8/2/2019 Dr. TV. Geetha

    125/176

    discovering the portion of text in a sentence that describes

    an event par c pan s o e even

    the actual event occurrence and time of the event.

    All these event s ecific ro erties are obtained from

    UNL(Universal Networking Language) representation.

    These properties help in separating the event sentences with

    - .

    Contribution

    the concept distance score as well as the TF/IDF score.

  • 8/2/2019 Dr. TV. Geetha

    126/176

    Snapshot for Tamil News Event Search

  • 8/2/2019 Dr. TV. Geetha

    127/176

  • 8/2/2019 Dr. TV. Geetha

    128/176

    L rL rGenerationGeneration

    Tamil Computing 128

    Lyric MiningLyric Mining

    We have processing using 2,000 lyrics

  • 8/2/2019 Dr. TV. Geetha

    129/176

    Word level analysis

    Concept co-occurence analysis

    easan ness score This analysis has been mainly used in the lyric

    generation and computing freshness scoring for

    lyrics.

    Lyric MiningLyric Mining

    Word Level Analysis The fre uenc of words is used to associate a

  • 8/2/2019 Dr. TV. Geetha

    130/176

    popularity score for each word.

    Po ularit score of the word has beenidentified from lyrics.

    In l rics the words are attached with suffix.

    Root words - determine its frequency count.

    Lyric MiningLyric Mining

    Word Level Analysis - ResultsWORDS USAGE

  • 8/2/2019 Dr. TV. Geetha

    131/176

    Lyric corpus of two thousand songs

    were analysed for the word, rhyme

    1153

    and Co-occurence concepts usage. 793 965 857

    List of top 5 usage words

    in lyrics

    Lyric MiningLyric Mining yme eve na ys s

    Adapted Apriori Algorithm

    requency coun o r yme a era on an

  • 8/2/2019 Dr. TV. Geetha

    132/176

    requency coun o r yme, a era on an

    end rhyme pairs of Tamil lyrics

    EDHUGAI USAGE

    , 2291, 3338

    , 2028,

    1973

    ,, 2947

    , 1952,

    , 2480

    a op usage wor r yme

    b) top 5 usage word

    Alliteration

    Lyric MiningLyric Mining

    Concept Co-occurence Analysis

  • 8/2/2019 Dr. TV. Geetha

    133/176

    from a lyric corpus Agaraadhi, an on-

    Cancelling the ambiguous and the

    po ysemy o wor s o mprove eaccuracy of the entire system. Example : The word whichhas the concept , , , ,, ,

    Lyric MiningLyric Mining

    Identify the pleasantness of a word based on 5

  • 8/2/2019 Dr. TV. Geetha

    134/176

    3 models Language independent

    2 models Language dependent

    In all the models, first the given grapheme

    word is converted into phoneme form using

    . Models

    Language Dependent Model I

    Language Dependent Model II

    Manner of articulation based model

    Manner and place of articulation based model

    Lyric MiningLyric Mining

    Meaning based Model

    Maintain the pleasant and unpleasant word

  • 8/2/2019 Dr. TV. Geetha

    135/176

    Maintain the pleasant and unpleasant word

    list Calculate the frequency of phoneme in

    pleasant and unpleasant word list

    Language Dependent Model I Judge the plesantness based on Vallinum,

    Mellinum Idaiyinam classification,

    Maathirai and kurukkams except

    Language Dependent Model II

    Lyric MiningLyric Mining

    Manner of articulation based model

    Category Manner of Articulation Phoneme

  • 8/2/2019 Dr. TV. Geetha

    136/176

    Category Manner of Articulation o e e

    reater oug etro ex, r ,

    , , , , , ,

    Intermediate Semivowels, Approximants ,,,,

    Soft Nasal ,,, ,,

    Lyric MiningLyric Mining

    Manner and place of articulation based

  • 8/2/2019 Dr. TV. Geetha

    137/176

    place of articulation score, categories which

    arise from the arts near the oral cavit areconsidered pleasanter than those which go deeper.

    Taking manner of articulation into consideration,

    Nasals are given highest sweetness scorefollowed by Laterals, Fricatives, Stops and Trills.

  • 8/2/2019 Dr. TV. Geetha

    138/176

    Tamil Computing 138

  • 8/2/2019 Dr. TV. Geetha

    139/176

    COREECOREE The Conce t basedThe Conce t basedSearchSearch EngineEngine

    139Tamil Computing

    Components of a Search EngineComponents of a Search Engine

    Crawler (or Worm or Spider)collects a es

    checks for page changes

  • 8/2/2019 Dr. TV. Geetha

    140/176

    checks for page changes

    Indexerconstructs a sophisticated file structure to

    enable fast page retrieval

    SearcherSearches the indexed information that

    satisfies user queries

    Ranks output

    140Tamil Computing

    Search Engine ArchitectureSearch Engine Architecture

  • 8/2/2019 Dr. TV. Geetha

    141/176

    141Tamil Computing

    The Concept based SearchThe Concept based Search

    Indexes concepts instead of words

    Indexes concepts and relations etween concepts

  • 8/2/2019 Dr. TV. Geetha

    142/176

    In this work

    epresen ng e ocumenUse ofUNL ( Universal Networking Language) as intermediate

    structure

    cons s s o concep s an re a ons

    Three indices Concept-Relation-Concept, Concept-Relation, Concept

    Query converted into UNL representation

    Searching and ranking based on concepts &

    142Tamil Computing

    COREECOREE--Archit ect ureArchitectureUNL IndexThesaurus

    Input Processing

    IL QueryQuery Expansion

    Query Translation

    [UNL Encoding]

    UNL Expressions/

    UNL GraphMorphological

    Based listParsed

    Query

    UNL basedranking

  • 8/2/2019 Dr. TV. Geetha

    143/176

    Light Weight

    WSD NERMWE

    List UNL BasedSet of

    Documents

    ranking

    WWWUNL

    Matching With UNLExpressions

    UW List

    Document Processing

    Focused

    Document

    Processing

    Indexing

    Selection ofDetailed

    Web Crawler using

    Semantic

    Approach

    Tamil to

    UNL Converter

    UNL

    Expressions

    For Indexing

    Expression

    WSD NER

    List

    MWE

    List

    Searching

    143Tamil Computing

    Modules of COREEModules of COREE

    Focussed CrawlingUNL based Document Processin

    Sentence Extraction

  • 8/2/2019 Dr. TV. Geetha

    144/176

    Enconversion

    UNL based Input Processing

    Query Expansion

    uery rans a on

    UNL based Searching

    Matching and Ranking

    UNL based Output Processing

    Information Extraction Summar Generation

    144Tamil Computing

    Document ProcessingDocument Processing

    WSD NER

  • 8/2/2019 Dr. TV. Geetha

    145/176

    Extraction ofComponents

    Enconversion(Concept &

    Tamil

    Document

    UNL

    Expression/Graph(multi-list)

    o sen ences(TF based)

    Tamil UW

    list

    Tamil Enconversion

    Rules

    145Tamil Computing

    UNL ListsUNL Lists

    UWList Universal Word List

  • 8/2/2019 Dr. TV. Geetha

    146/176

    146Tamil Computing

  • 8/2/2019 Dr. TV. Geetha

    147/176

    UW concepts are

    iof>city

    v aplf

    p

    the semantics of the concepts (iof>city)

    147Tamil Computing

    MULTILISTMULTILISTConce t

    Head Node

    Nodes

    Nodes To Concept

    Nodes

  • 8/2/2019 Dr. TV. Geetha

    148/176

    148Tamil Computing

  • 8/2/2019 Dr. TV. Geetha

    149/176

    149Tamil Computing

    ExampleExample check fontcheck font --balajibalaji

    Concept

    Onl

    Concept C t

  • 8/2/2019 Dr. TV. Geetha

    150/176

    p

    Relation -

    Concept -

    Concept

    150Tamil Computing

    The Index StructureThe Index Structure

    Input - set of UNL graphs as a MultiList datastructure

    Output are UNL indices stored in three BinarySearch Trees

  • 8/2/2019 Dr. TV. Geetha

    151/176

    Search Trees.

    an inverted list on the indices.

    The indices are categorized into three different

    ypes - o a re r eva o seman ca y re evandocuments CRC (Concept -Relation- Concept) Indices

    CR (Concept -Relation) Indices

    C (Concept Only) indices

    151Tamil Computing

    Query TranslationQuery Translation

  • 8/2/2019 Dr. TV. Geetha

    152/176

    [s][w]; vivekanandar; iof>person; Entity; 1

    pos; lecture; icl>action; Noun; 2[/w][r]

    [r]2 pos 1[/r][/s]

    152Tamil Computing

    Query ExpansionQuery Expansion

    NER WSD

    Parsed

    InputQuery

  • 8/2/2019 Dr. TV. Geetha

    153/176

    Parsed

    Quer

    Query

    MorphologicalProcessing

    Query Expansion(indexbased/verb-noun

    Expanded Query

    pairs)

    oma n spec cNoun verb pairs

    na yzeIndex table

    Other tools to be Integrated

    153Tamil Computing

    Query ExpansionQuery Expansion

    Query Word Query word WithExpanded word

    Relation

    < - pos> < - pos>

  • 8/2/2019 Dr. TV. Geetha

    154/176

    < pos>

    < - pos> < - pos>

    < - pos>

    154Tamil Computing

    SearchSearch

    Indexed Concept,

    UNL Based

    Concept-relation and

    Concept-Relation-Concept

    Performing various levels of

    Expanded

    query

  • 8/2/2019 Dr. TV. Geetha

    155/176

    UNL Based

    Indexing UNL Based

    Performing various levels of

    matching

    UNLquery

    Exact match(CRC)Concept-RelationMatch

    ase

    RankingConcept OnlyMatch

    Ranked set ofdocuments(O/P)

    155Tamil Computing

  • 8/2/2019 Dr. TV. Geetha

    156/176

    Example Result Snap Shot in 5 WindowExample Result Snap Shot in 5 Window

  • 8/2/2019 Dr. TV. Geetha

    157/176

    Actual query term Match Actual query term+Concept Match Conceptual Results

    Single Term Match Expanded Term Match Results

    157Tamil Computing

  • 8/2/2019 Dr. TV. Geetha

    158/176

    --DICTIONARY FRAMEWORKDICTIONARY FRAMEWORK

    158Tamil Computing

    OBJECTIVESOBJECTIVES

    ,

    indexing and retrieving Tamil words, their

  • 8/2/2019 Dr. TV. Geetha

    159/176

    , .

    Framework to incorporate various unique

    -

    information to the user regarding the word that

    .

    159Tamil Computing

    GARAADHIGARAADHI RAMEWORKRAMEWORK

  • 8/2/2019 Dr. TV. Geetha

    160/176

    160Tamil Computing

    Agaraadhi - Features

    FeaturesFeatures

    1. Morphological Analyzer 1. Lyric Related

    2. Morphological Generator 2. Kural related

  • 8/2/2019 Dr. TV. Geetha

    161/176

    3. S ellin su estion.

    4. Equivalent word

    .

    4. Pleasantness score

    5. Picture Dictionary 5. Bharathiyar Songs

    . .

  • 8/2/2019 Dr. TV. Geetha

    162/176

    Agaraadhi Meaning for the Word pookkalAgaraadhi Meaning for the Word pookkal

    (example for case ending word)(example for case ending word)

  • 8/2/2019 Dr. TV. Geetha

    163/176

  • 8/2/2019 Dr. TV. Geetha

    164/176

    KuralagamKuralagam -- Concept Relation basedConcept Relation basedSearch Engine forSearch Engine for ThirukkuralThirukkural

    164Tamil Computing

    ec vesec ves

    Thirukkural based on UNL Framework.

  • 8/2/2019 Dr. TV. Geetha

    165/176

    Searching with keywords in kurals andintepretations

    Concept based search based on CoReX conceptual

    Bilingual search English and Tamil

    165Tamil Computing

    Kuralagam FrameworkKuralagam Framework

  • 8/2/2019 Dr. TV. Geetha

    166/176

    166Tamil Computing

    Online ProcessingOnline Processing

    Search and Rankingfetches the Thirukkural number and its details.

    Thirukkurals for a given query are fetched using the two

    types of concept relation indices namely CRC and C.

  • 8/2/2019 Dr. TV. Geetha

    167/176

    yp p y

    The query concept is expanded using related CRC indices

    pointing to the query concept.

    the query not possible with key word Thirukkural searchengines.

    The ranking is based on

    priority to the indices in the order CRC>Cusage score

    frequency occurrence of the query concept167Tamil Computing

    KuralagamKuralagam

    results for the query word paNamresults for the query word paNam

  • 8/2/2019 Dr. TV. Geetha

    168/176

    KuralagamKuralagamconceptual results for the query word paNam isconceptual results for the query word paNam is

  • 8/2/2019 Dr. TV. Geetha

    169/176

    KuralagamKuralagamMeaning for a particular kuralMeaning for a particular kural

  • 8/2/2019 Dr. TV. Geetha

    170/176

  • 8/2/2019 Dr. TV. Geetha

    171/176

    Tamil Word GameTamil Word GameMiruginajumbo (Jumble words)Miruginajumbo (Jumble words)

  • 8/2/2019 Dr. TV. Geetha

    172/176

    Tamil Word GameTamil Word Game

    Kattaboman (Scramble words)Kattaboman (Scramble words)

  • 8/2/2019 Dr. TV. Geetha

    173/176

    Tamil Word GameTamil Word Game

    Thookku Thookki (Hang man)Thookku Thookki (Hang man)

  • 8/2/2019 Dr. TV. Geetha

    174/176

  • 8/2/2019 Dr. TV. Geetha

    175/176

  • 8/2/2019 Dr. TV. Geetha

    176/176

    COREEA r dhi

    Tamil Language Based Games

    Tamil Computing 176