java_lect_35.pdf

28
25 August, 2005 "NLP-AI" 1 Language Modeling for Information Retrieval Ananthakrishnan R (anand@cse)

Transcript of java_lect_35.pdf

  • 25 August, 2005 "NLP-AI" 1

    Language Modeling for

    Information Retrieval

    Ananthakrishnan R(anand@cse)

  • 2Outline of the Seminar What is Language Modeling?

    Language Modeling in NLP Speech Recognition, Statistical MT, Classification, IR

    The Language Modeling approach to IR the conceptual model smoothing semantic smoothing

    IR as Statistical Machine Translation

  • 3Language Modeling A language model is a probabilistic

    mechanism for generating text

    Language models estimate the probability distribution of various natural language phenomena sentences, utterances, queries

  • 4Language Modeling Techniques N-grams Class-based N-grams Probabilistic CFGs Decision Trees

    Claude Shannon was probably the first language modelerHow well can we predict or compress natural language text through simple n-gram models?

  • 5Language Modeling for Other NLP Tasks

  • 6Applications

    Speech Recognition Statistical Machine Translation Classification Spell Checking

    Play the role of either the prior or the likelihood

  • 7Language Models inSpeech Recognition

    Noisy ChannelSource WordNoisyWord Decoder

    )()|(maxarg)|(maxarg

    wpwOp

    Owpw

    Vocabularyw

    Vocabularyw

    w

    )(wp (source model)

    Ow

  • 8Language Models in Statistical Machine Translation

    Noisy ChannelOriginal sentence e Noisy sentence f

    Decoder

    )e()e|f(maxarg)f|e(maxarge

    e

    e

    pp

    p

    E

    E

    )e(p )e|f(p (translation model)(source model)

    Source e

  • 9Language Models inText Classification

    )c()c|d(maxarg)d|c(maxargc

    c

    c

    pp

    p

    C

    C

    a language model for each class

    Bayesian Classifier

  • 10

    The Language Modeling Approach to Information Retrieval

  • 11

    The Conceptual Model of IR The user has an information need ,

    From this need he generates an ideal document fragment d,

    He selects a set of key terms from d,, and generates a query q from this set

  • 12

    Analysis of Australias defeatto England in the Ashes. Reason

    for defeat: overconfidence?

    Eng bt. Aus: AnalysisAustralias defeat to England in the Ashes islargely due to their overconfidence. They dontseem to have figured out that they are facing a

    far superior team

    The Ashes ReviewBrash batting by Australia has lost them

    a test match. Their opposition is not Zimbabwe, and not many of the Aussies

    seem to remember that!

    Information need ,

    Ideal document d,

    A candidate document

    australia ashes defeat analysisquery

    d

    q

  • 13

    Document generation model

    Document-queryTranslation model

    Retrieval Engineq

    q

    {d}

    d,,

    retrieved documentsranked by:

    informationneed

    Ideal documentfragment

    query

    query

    ),q|d( 8p

    8

  • 14

    )|q()|d(),d|q(),q|d( 8

    888p

    ppp

    Using Bayes law,

    For the purpose of ranking,

    )|d(),d|q()d(q 88 ppp

    )d()d|q()d(q ppp (user independent)

    The Language Modeling Approach

  • 15

    )d()d|q()d(q ppp

    query-likelihood: comes from the language model

    document prior(query-independent)

    E.g., PageRank

    Language Modeling for IR:Using document language models to assign likelihood scores to queries(Ponte and Croft, 1998)

  • 16

    Analysis of Australias defeatto England in the Ashes. Reason

    for defeat: overconfidence?

    Eng bt. Aus: AnalysisAustralias defeat to England in the Ashes islargely due to their overconfidence. They dontseem to have figured out that they are facing a

    far superior team

    The Ashes ReviewBrash batting by Australia has lost them

    a test match. Their opposition is not Zimbabwe, and not many of the Aussies

    seem to remember that!

    Information need ,

    Ideal document d,

    A candidate document

    australia ashes defeat analysisquery

    ?),d|q( 8pq

    d

  • 17

    The Language Model For each document d

    in the collection C, for each term w, estimate:

    To find the query likelihood, assume query terms to be independent:

    )d|w(p

    mi iqpp 1 )d|()d|q(

  • 18

    The Need for Smoothing To eliminate zero probabilities

    Correct the error in the MLE due to the problem of data sparsity

    australia ashes defeat analysis

    The Ashes ReviewBrash batting by Australia has lost them

    a test match. Their opposition is not Zimbabwe, and not many of the Aussies

    seem to remember that!

    p(defeat|d) = 0p(q|d) = 0

    d

    q

  • 19

    Smoothing Jelinek-Mercer Method: Linear interpolation of

    the maximum likelihood model with the collection model

    (Zhai & Lafferty, 2003) compares Jelinek-Mercer, Dirichlet and the absolute discounting methods for Language Models applied to IR.

    )|w()d|w()1()d|w( &ppp OOO

  • 20

    Analysis of Australias defeatto England in the Ashes. Reason

    for defeat: overconfidence?

    Eng bt. Aus: AnalysisAustralias defeat to England in the Ashes islargely due to their overconfidence. They dontseem to have figured out that they are facing a

    far superior team

    The Ashes ReviewBrash batting by Australia has lost them

    a test match. Their opposition is not Zimbabwe, and not many of the Aussies

    seem to remember that!

    Information need , Ideal document d,

    A candidate document

    australia ashes defeat overconfidencequery

    q

    d

    The Need for Semantic Smoothing

  • 21

    Eng bt. Aus: AnalysisAustralias defeat to England in the Ashes islargely due to their overconfidence. They dontseem to have figured out that they are facing a

    far superior team

    australia ashes defeat overconfidence

    Document-queryTranslation model

    Ideal document d,

    query

    Document-Query Translation Model

    ?)d|q( pMust be sophisticated

    enough to handle polysemy and synonymy

  • 22

    Semantic Smoothing

    m

    1i wi )d|w()w|()d|q( pqtp

    )d|(analysisp

    Estimate translation probabilities t(wq|wd) formapping a document term wd to a query term wq

    )d|(review)|( reviewpanalysisp

    (Berger and Lafferty, 1999)(Lafferty and Zhai, 2001)

  • 23

    Compare withThe Statistical MT model

    Noisy ChannelOriginal sentence e Noisy sentence f

    Decoder

    )e()e|f(maxarg)f|e(maxarge

    e

    e

    pp

    p

    E

    E

    )e(p )e|f(p (translation model)(source model)

    Source e

    Language model Semantic Smoothing

  • 24

    Language Modeling vis--visthe traditional probabilistic model

    ?)d,q|( rp

    ),q|d(),q|d(

    rprp

    log

    The Probability Ranking Principle(Robertson, 1977)

    The Robertson-Sparck Jones Model(Sparck Jones et al., 2000)

    Language Modeling approach: Wheres the relevance? Is the ranking optimal?

    (Lafferty & Zhai, 2002)(Laverenko and Croft, 2001)

  • 25

    Discussion: Advantages of the Language Modeling Approach

    Conditioning on d provides a larger foothold for estimation

    p(d): an explicit notion of the importance of a document

    Document normalization is not an issueWhen designing a statistical model the most natural route is to apply a generative model which builds up the output step-by-step. The source channel perspective suggests a different approach: turn the search problem around to predict the input. In speech recognition, NLP, and MT, researchers have time and again found that predicting what is known from competing hypothesis can be easier than directly predicting all of the hypothesis

  • 26

    Discussion: Disadvantage of the Language Modeling Approach

    The Robertson-Sparck Jones model can directly use relevance judgements to improve the estimation of p(Ai |q,r)

    This is not possible in the LanguageModeling approach

  • 27

    References (Robertson, 1977) The probability ranking

    principle in IR (Sparck Jones et al., 2000) A probabilistic model

    of information retrieval: development and comparative experiments.

    (Ponte and Croft, 1998) A language modeling approach to information retrieval

    (Zhai and Lafferty, 2001) A study of smoothing methods for language models applied to ad hoc information retrieval

  • 28

    References (Berger and Lafferty, 1999) Information retrieval

    as statistical translation (Lafferty and Zhai, 2001) Risk minimization and

    language modeling in information retrieval (Lavrenko and Croft, 2001) Relevance-based

    language models (Lafferty and Zhai, 2001) Probabilistic IR models

    based on document and query generation