You Can’t Beat Frequency (Unless You Use Linguistic Knowledge) – A Qualitative Evaluation of...
-
Upload
jonathan-davis -
Category
Documents
-
view
213 -
download
0
Transcript of You Can’t Beat Frequency (Unless You Use Linguistic Knowledge) – A Qualitative Evaluation of...
![Page 1: You Can’t Beat Frequency (Unless You Use Linguistic Knowledge) – A Qualitative Evaluation of Association Measures for Collocation and Term Extraction Joachim.](https://reader036.fdocuments.net/reader036/viewer/2022082818/56649ee05503460f94bf08b3/html5/thumbnails/1.jpg)
You Can’t Beat Frequency (Unless You Use Linguistic Knowledge) – A Qualitative
Evaluation of Association Measures for Collocation and Term Extraction
Joachim Wermter and Udo Hahn
Jena University
ACL 2006 Regular Conference Paper
![Page 2: You Can’t Beat Frequency (Unless You Use Linguistic Knowledge) – A Qualitative Evaluation of Association Measures for Collocation and Term Extraction Joachim.](https://reader036.fdocuments.net/reader036/viewer/2022082818/56649ee05503460f94bf08b3/html5/thumbnails/2.jpg)
Objective
• Compare the performance of frequency, t-test, LSM and LPM methods on collocation extraction and domain-specific automatic term recognition
![Page 3: You Can’t Beat Frequency (Unless You Use Linguistic Knowledge) – A Qualitative Evaluation of Association Measures for Collocation and Term Extraction Joachim.](https://reader036.fdocuments.net/reader036/viewer/2022082818/56649ee05503460f94bf08b3/html5/thumbnails/3.jpg)
Collocation Extraction
• Extract idioms
• “kick the bucket”
![Page 4: You Can’t Beat Frequency (Unless You Use Linguistic Knowledge) – A Qualitative Evaluation of Association Measures for Collocation and Term Extraction Joachim.](https://reader036.fdocuments.net/reader036/viewer/2022082818/56649ee05503460f94bf08b3/html5/thumbnails/4.jpg)
Domain-Specific Term Extraction
• Extract domain-specific phrases
• “mitochondrial inheritance”
![Page 5: You Can’t Beat Frequency (Unless You Use Linguistic Knowledge) – A Qualitative Evaluation of Association Measures for Collocation and Term Extraction Joachim.](https://reader036.fdocuments.net/reader036/viewer/2022082818/56649ee05503460f94bf08b3/html5/thumbnails/5.jpg)
Corpus
![Page 6: You Can’t Beat Frequency (Unless You Use Linguistic Knowledge) – A Qualitative Evaluation of Association Measures for Collocation and Term Extraction Joachim.](https://reader036.fdocuments.net/reader036/viewer/2022082818/56649ee05503460f94bf08b3/html5/thumbnails/6.jpg)
LSM
• A “linguistic knowledge-based” method for collocation extraction proposed by the same authors in another paper
• Assumes that idioms are less modifiable by supplements– e.g. “kick the beautiful bucket”
• probability of PNVtriple having Suppk :
• f(x) : frequency of x
![Page 7: You Can’t Beat Frequency (Unless You Use Linguistic Knowledge) – A Qualitative Evaluation of Association Measures for Collocation and Term Extraction Joachim.](https://reader036.fdocuments.net/reader036/viewer/2022082818/56649ee05503460f94bf08b3/html5/thumbnails/7.jpg)
LSM
• Modifiability of a PNVtriple
• Probability of a PNVtriple
• Collocation Score
![Page 8: You Can’t Beat Frequency (Unless You Use Linguistic Knowledge) – A Qualitative Evaluation of Association Measures for Collocation and Term Extraction Joachim.](https://reader036.fdocuments.net/reader036/viewer/2022082818/56649ee05503460f94bf08b3/html5/thumbnails/8.jpg)
LPM
• A “linguistic knowledge-based” method for automatic term recognition proposed by the same authors in another paper
• Assumes that words in a phrase are less interchangeable– e.g mitochondrion inheritance money inheritance
• Modifiability of a phrase:
• modk(n-gram) : replace k words• seli : particular replacement
![Page 9: You Can’t Beat Frequency (Unless You Use Linguistic Knowledge) – A Qualitative Evaluation of Association Measures for Collocation and Term Extraction Joachim.](https://reader036.fdocuments.net/reader036/viewer/2022082818/56649ee05503460f94bf08b3/html5/thumbnails/9.jpg)
LPM
• Phrase Score:
![Page 10: You Can’t Beat Frequency (Unless You Use Linguistic Knowledge) – A Qualitative Evaluation of Association Measures for Collocation and Term Extraction Joachim.](https://reader036.fdocuments.net/reader036/viewer/2022082818/56649ee05503460f94bf08b3/html5/thumbnails/10.jpg)
Evaluation Criteria
• Compared to the baseline frequency ranking method, a good ranking function should have the four characteristics:
1. Keep the true positives in the upper portion of the list
2. Keep the true negatives in the lower portion of the list
3. Demote true negatives from the upper portion
4. Promote true positives from the lower portion
![Page 11: You Can’t Beat Frequency (Unless You Use Linguistic Knowledge) – A Qualitative Evaluation of Association Measures for Collocation and Term Extraction Joachim.](https://reader036.fdocuments.net/reader036/viewer/2022082818/56649ee05503460f94bf08b3/html5/thumbnails/11.jpg)
Collocation Extraction Results
![Page 12: You Can’t Beat Frequency (Unless You Use Linguistic Knowledge) – A Qualitative Evaluation of Association Measures for Collocation and Term Extraction Joachim.](https://reader036.fdocuments.net/reader036/viewer/2022082818/56649ee05503460f94bf08b3/html5/thumbnails/12.jpg)
Automatic Term Recognition Results
![Page 13: You Can’t Beat Frequency (Unless You Use Linguistic Knowledge) – A Qualitative Evaluation of Association Measures for Collocation and Term Extraction Joachim.](https://reader036.fdocuments.net/reader036/viewer/2022082818/56649ee05503460f94bf08b3/html5/thumbnails/13.jpg)
Observations
• CE Criterion 1– t-test and frequency methods have similar per
formance– LSM promotes some TPs to top 1/6
• ATR Criterion 1– t-test and frequency methods have similar per
formance– LPM promotes a few TPs to top 1/6
![Page 14: You Can’t Beat Frequency (Unless You Use Linguistic Knowledge) – A Qualitative Evaluation of Association Measures for Collocation and Term Extraction Joachim.](https://reader036.fdocuments.net/reader036/viewer/2022082818/56649ee05503460f94bf08b3/html5/thumbnails/14.jpg)
Observations
• CE Criterion 2– LSM promotes a lot more TNs to upper portio
n than t-test method (bad…)
• ATR Criterion 2– Same as above
![Page 15: You Can’t Beat Frequency (Unless You Use Linguistic Knowledge) – A Qualitative Evaluation of Association Measures for Collocation and Term Extraction Joachim.](https://reader036.fdocuments.net/reader036/viewer/2022082818/56649ee05503460f94bf08b3/html5/thumbnails/15.jpg)
Observations
• CE Criterion 3– LSM demotes a lot more TNs to the lower port
ion than t-test
• ATR Criterion 3– Same as above
![Page 16: You Can’t Beat Frequency (Unless You Use Linguistic Knowledge) – A Qualitative Evaluation of Association Measures for Collocation and Term Extraction Joachim.](https://reader036.fdocuments.net/reader036/viewer/2022082818/56649ee05503460f94bf08b3/html5/thumbnails/16.jpg)
Observations
• CE Criterion 4– LSM promotes more TPs to upper portion tha
n t-test
• ATR Criterion 4– Same as above
![Page 17: You Can’t Beat Frequency (Unless You Use Linguistic Knowledge) – A Qualitative Evaluation of Association Measures for Collocation and Term Extraction Joachim.](https://reader036.fdocuments.net/reader036/viewer/2022082818/56649ee05503460f94bf08b3/html5/thumbnails/17.jpg)
![Page 18: You Can’t Beat Frequency (Unless You Use Linguistic Knowledge) – A Qualitative Evaluation of Association Measures for Collocation and Term Extraction Joachim.](https://reader036.fdocuments.net/reader036/viewer/2022082818/56649ee05503460f94bf08b3/html5/thumbnails/18.jpg)
Conclusion
• LSM and LPM methods are better than t-test and frequency methods
• Pure statistics methods are worse than knowledge-based methods