Collocation Extraction Using Monolingual Word Alignment Method

  • Collocation Extraction Using Monolingual Word Alignment MethodZhanyi Liu, Haifeng Wang, Hua Wu, Sheng LiEMNLP 2009

  • CollocationTwo wordsConsecutive ("by accident")Interrupted ("take ... advice")Other examplesProper noun ("New York")Compound nouns ("ice cream")Correlative conjunction ("either ... or")

  • Previous WorksCo-occurring word pairsWord pairs in a given window sizeAssociation measuresFrequency, log-likelihood, mutual information ...DisadvantageLong-span collocation"either ... or", "because ... so"Limited by window sizeFalse collocationAny word pairs in window size

  • Monolingual Word AlignmentBilingual word alignment (BWA)Source-target sentence pairsMonolingual Word Alignment (MWA)Source-source sentence pairsReplicate the corpus

  • Monolingual Word Alignment (2)BilingualMonolingualA word never collocates with itself

  • MWA ModelSentence with l words S ={w1,...,wl}Alignment A = {(i,ai) | i[1,l]}A = {(2,3), (3,2), (4,7), (6,7)...}

  • MWA Model (2)Adapt IBM Model 3 to MWA

    EM training algorithm, produce 3 probabilityWord collocation probabilityPosition collocation probabilityd(4|7,12)Prob that 4th collocates with 7th word in a 12-word sentenceFertility probabilityProb that wi is collocate with i words

  • Collocation ExtractionExtract and rank. Filter when freq(wi,wj)
  • Initial ExperimentChineseTraining dataLDC2007T03 Tagged Chinese Giga WordXinhua portion, 28M wordsGold setHandcrafted collocation dictionaries56888 collocations

  • Initial Experiment (2)Precision

    BaselineFrequency, log-likelihood, mutual informationLog-likelihood achieves the best performance

  • Initial Experiment (3)ObservationPrecision is lowSmall gold set (57K/200K = 28%)Low precision when N < 20K

  • ObservationFrequency vs. Probability vs. PrecisionPrecision curveLower freq --> lower precisionAlignment probability curveLower freq --> higher probability

  • Observation (2)ConclusionWhat causes lower precision of top 20K?Collocation with low freq but high probability

  • Improved MWA MethodAdd a penalization function y=f(x), x=freq(w1,w2)When x is small, y approaches 0 (penalize)When x is large, y approaches 1 (do not penalize)y = e-b/x (b is tuned to 25)New ranking score

  • Further EvaluationAutomatic evaluationGreatly outperforms the best baselineFor top 1K, 20.6% vs. 11.7%Exponential function plays a key role

  • Further Evaluation (2)Human EvaluationTop 1K collocationsFor each collocation, tag "True" or "False"4 "False" casesA: two semantically related words(, )B: a part of multi-word collocation(>= 3 words)(, ) in (, , )C: high frequency bigram(, ), (, ), (, )D: two words co-occurring frequently(, ), (, )

  • Further Evaluation (3)True collocations are much more than baselineFalse collocationA: semantically related, not distinguishable by MWAB: only two-word collocation is extracted.Few collocations have >=3 wordsC: frequent bigram, not distinguishable by MWAD: much less than baseline

  • Further Evaluation (3) cont.MWA are able to produce long-span collocations48 extracted collocations with span > 633 are tagged "True"("", ""), ("", "")69% precision

  • Fertility vs. PrecisionManually label 100 sentences and observe fertility78% words collocate with 1 word17% words collocate with 2 words95% words have fertility
  • ConclusionMain contributionSuccessfully adapt BWA to MWAPropose a ranking methodAlignment probability + Exponential penalty functionInitial failure are well discussedFuture workImproving Statistical Machine Translation with Monolingual Collocation, ACL 2010Improve alignment, phrase table