VNLP: An Open Source Framework for Vietnamese Natural Language Processing Ngoc Minh Le - ePi...
-
Upload
claribel-henderson -
Category
Documents
-
view
228 -
download
0
Transcript of VNLP: An Open Source Framework for Vietnamese Natural Language Processing Ngoc Minh Le - ePi...
![Page 1: VNLP: An Open Source Framework for Vietnamese Natural Language Processing Ngoc Minh Le - ePi Technology Bich Ngoc Do – ePi Technology Vi Duong Nguyen –](https://reader036.fdocuments.net/reader036/viewer/2022062301/56649eca5503460f94bd7de8/html5/thumbnails/1.jpg)
VNLP: An Open Source Framework for Vietnamese Natural Language Processing
Ngoc Minh Le - ePi TechnologyBich Ngoc Do – ePi TechnologyVi Duong Nguyen – ePi TechnologyThi Dam Nguyen – ePi Technology
![Page 2: VNLP: An Open Source Framework for Vietnamese Natural Language Processing Ngoc Minh Le - ePi Technology Bich Ngoc Do – ePi Technology Vi Duong Nguyen –](https://reader036.fdocuments.net/reader036/viewer/2022062301/56649eca5503460f94bd7de8/html5/thumbnails/2.jpg)
Major tasks in Natural Language Processing
2
Automatic summarization
Machine translation
Sentiment analysis
...
High level ApplicationWord segmentation
Part-of-speech tagging
...
Fundamental task
![Page 3: VNLP: An Open Source Framework for Vietnamese Natural Language Processing Ngoc Minh Le - ePi Technology Bich Ngoc Do – ePi Technology Vi Duong Nguyen –](https://reader036.fdocuments.net/reader036/viewer/2022062301/56649eca5503460f94bd7de8/html5/thumbnails/3.jpg)
Fundamental Tasks
3
Word segmentation
Part-of-speech tagging
Syntactic Parser
Named Entity Recognizer (NER)
Coreference resolution
![Page 4: VNLP: An Open Source Framework for Vietnamese Natural Language Processing Ngoc Minh Le - ePi Technology Bich Ngoc Do – ePi Technology Vi Duong Nguyen –](https://reader036.fdocuments.net/reader036/viewer/2022062301/56649eca5503460f94bd7de8/html5/thumbnails/4.jpg)
Framework for Vietnamese NLP?
4
Stanford CoreNLP Framework for English
Framework for Vietnamese Natural Language Processing
![Page 5: VNLP: An Open Source Framework for Vietnamese Natural Language Processing Ngoc Minh Le - ePi Technology Bich Ngoc Do – ePi Technology Vi Duong Nguyen –](https://reader036.fdocuments.net/reader036/viewer/2022062301/56649eca5503460f94bd7de8/html5/thumbnails/5.jpg)
JVnTextPro
JVnTextPro Tokenizer POS Tagging
5
Enough? Solution?
![Page 6: VNLP: An Open Source Framework for Vietnamese Natural Language Processing Ngoc Minh Le - ePi Technology Bich Ngoc Do – ePi Technology Vi Duong Nguyen –](https://reader036.fdocuments.net/reader036/viewer/2022062301/56649eca5503460f94bd7de8/html5/thumbnails/6.jpg)
Word segmentation
VnTokenizer with accuracy upto 96%-98%.
Some improvement are to speed up vnTokenizer: Reading XML-encode data via SAX Tokenize a document by LL parser Using automaton with default transition.
6
![Page 7: VNLP: An Open Source Framework for Vietnamese Natural Language Processing Ngoc Minh Le - ePi Technology Bich Ngoc Do – ePi Technology Vi Duong Nguyen –](https://reader036.fdocuments.net/reader036/viewer/2022062301/56649eca5503460f94bd7de8/html5/thumbnails/7.jpg)
Part-of-speech tagging
VnTagger 95%
JVnTagger 91.3%
VnQTag 85.57 %
![Page 8: VNLP: An Open Source Framework for Vietnamese Natural Language Processing Ngoc Minh Le - ePi Technology Bich Ngoc Do – ePi Technology Vi Duong Nguyen –](https://reader036.fdocuments.net/reader036/viewer/2022062301/56649eca5503460f94bd7de8/html5/thumbnails/8.jpg)
Syntactic parsing
Tree adjoining grammar, head driven phrase structure grammar,… No software deliverable.
MaltParser Open-source Independent of language
Acceptable accuracy 70%
8
![Page 9: VNLP: An Open Source Framework for Vietnamese Natural Language Processing Ngoc Minh Le - ePi Technology Bich Ngoc Do – ePi Technology Vi Duong Nguyen –](https://reader036.fdocuments.net/reader036/viewer/2022062301/56649eca5503460f94bd7de8/html5/thumbnails/9.jpg)
Named-Entity Recognition
Using rule-based method.The rule-based NER includes two part:
a word searching component called gazetteer in GATE's terminology
a pattern matching component called transducer
Accuracy 59%
9
![Page 10: VNLP: An Open Source Framework for Vietnamese Natural Language Processing Ngoc Minh Le - ePi Technology Bich Ngoc Do – ePi Technology Vi Duong Nguyen –](https://reader036.fdocuments.net/reader036/viewer/2022062301/56649eca5503460f94bd7de8/html5/thumbnails/10.jpg)
Coreference resolution
Approaching by heuristic rules consists of two component: Orthographical matcher (orthormatcher) with 17
rules. Co-referencer performs pronominal co-
referencing and integrate everything into co-reference lists
10
![Page 11: VNLP: An Open Source Framework for Vietnamese Natural Language Processing Ngoc Minh Le - ePi Technology Bich Ngoc Do – ePi Technology Vi Duong Nguyen –](https://reader036.fdocuments.net/reader036/viewer/2022062301/56649eca5503460f94bd7de8/html5/thumbnails/11.jpg)
Open Source Framework for Vietnamese NLP
1111
VnTagger
Sentence splitter
MaltParser Vn-Ner
Co-reference
Named-entity recognition
Syntactic parsing
VNLP
VnTokenizer
Document Reset PR
![Page 12: VNLP: An Open Source Framework for Vietnamese Natural Language Processing Ngoc Minh Le - ePi Technology Bich Ngoc Do – ePi Technology Vi Duong Nguyen –](https://reader036.fdocuments.net/reader036/viewer/2022062301/56649eca5503460f94bd7de8/html5/thumbnails/12.jpg)
Application of VNLP
12 12
Online Reputation Managerment - noti5.vn• applications of sentiment analysis• all mention about a brand• determine positive and negative opinion
Automatic synthesis and classification webpages
![Page 13: VNLP: An Open Source Framework for Vietnamese Natural Language Processing Ngoc Minh Le - ePi Technology Bich Ngoc Do – ePi Technology Vi Duong Nguyen –](https://reader036.fdocuments.net/reader036/viewer/2022062301/56649eca5503460f94bd7de8/html5/thumbnails/13.jpg)
PART 5 – CONCLUSION AND FUTURE WORK
13
![Page 14: VNLP: An Open Source Framework for Vietnamese Natural Language Processing Ngoc Minh Le - ePi Technology Bich Ngoc Do – ePi Technology Vi Duong Nguyen –](https://reader036.fdocuments.net/reader036/viewer/2022062301/56649eca5503460f94bd7de8/html5/thumbnails/14.jpg)
14
![Page 15: VNLP: An Open Source Framework for Vietnamese Natural Language Processing Ngoc Minh Le - ePi Technology Bich Ngoc Do – ePi Technology Vi Duong Nguyen –](https://reader036.fdocuments.net/reader036/viewer/2022062301/56649eca5503460f94bd7de8/html5/thumbnails/15.jpg)
Thank for your attention!
Q & A