Biomedical Word Sense Disambiguation presentation [Autosaved]
-
Upload
akm-sabbir -
Category
Documents
-
view
79 -
download
3
Transcript of Biomedical Word Sense Disambiguation presentation [Autosaved]
05/03/2023 1
Biomedical Word Sense Disambiguation with Neural Word and Concept Embedding
Department of Computer Science University of Kentucky
Oct 7, 2016
AKM Sabbir
Advisor,Dr. Ramakanth Kavuluru
05/03/2023 2
Outline Introduction• Application of Word Sense Disambiguation(WSD)• Motivation• Related Methods to Solve WSD• Our Method• Word Vectors• Tools Used• Our Method in Detail • Experiment and Analysis• Conclusion
05/03/2023 3
Introduction
• WSD is the task of detecting correct sense or assigning proper
sense
– the air in the center of the vortex of a cyclone is generally
very cold
– I Could not come to office last week because I had a cold
• Retrieving information from Machine is not easy task
• Number of Natural Language Processing(NLP) tasks require WSD
05/03/2023 4
Outline• Introduction Application of Word Sense Disambiguation(WSD)• Motivation• Related Methods to Solve WSD• Our Method• Word Vectors• Tools Used• Our Method in Detail • Experiment and Analysis• Conclusion
05/03/2023 5
Application
• Text to Speech Conversion
– Bass can be pronounced either base or baes
• Machine Translation
– French Word Grille can be translated into gate or bar
• Information Retrieval
• Named Entity Recognition
• Document Summary Generation
05/03/2023 6
Outline• Introduction• Application of Word Sense Disambiguation(WSD) Motivation• Related Methods to Solve WSD• Our Method• Word Vectors• Tools Used• Our Approach• Experiment and Analysis• Conclusion
05/03/2023 7
Motivation
• Generalized WSD is a difficult problem
• Solve it for each domain
• Biomedical domain contains a large number of ambiguous
words
• Medical report summary generation
• Drug side effect prediction
05/03/2023 8
Outline• Introduction• Application of Word Sense Disambiguation(WSD)• Motivation Related Methods to Solve WSD• Our Method• Word Vectors• Tools Used• Our Method in Detail • Experiment and Analysis• Conclusion
05/03/2023 9
Related Method
• Supervised Methods
– Support Vector Machines, Convolutional Neural Net
• Unsupervised Methods
– Clustering, generative model
– If vocabulary has four words w1, w2, w3, w4
• Knowledge Based Methods
– WordNet, UMLS(Unified Medical Language System)
w1 … w4
w1 1/5 2/5 0w2…
05/03/2023 10
Outline• Introduction• Application of Word Sense Disambiguation(WSD)• Motivation• Related Methods to Solve WSD Our Method• Word Vectors• Tools Used• Our Approach• Experiment and Analysis• Conclusion
05/03/2023 11
Our Method• We build a semi supervised model
• Model involves usage of concept/sense/CUI vectors just like how people use word vectors (more later)
• Metamap is an knowledge based NER tool. We use its decisions is used to generate concept vectors
• Model also involves the usage of P(w|c) where c is a concept or sense generated using other knowledge based approaches
• Generated word vectors using unstructured data source
05/03/2023 12
Outline• Introduction• Application of Word Sense Disambiguation(WSD)• Motivation• Related Methods to Solve WSD• Our Approach Word Vectors• Tools Used• Our Method in Detail • Experiment and Analysis• Conclusion
05/03/2023 13
What is Word Vector• Distributed representation of words
• Representation of word spread across all dimension of vector
• The idea is different from other representation where the
length is equal to the vocabulary size. Here we choose a small
dimension say d=200 and generate dense vectors
• Each element of the vector contributes to the definition of
many different words
0.07 0.05 0.8 0.002 0.1 0.3King
0.7 0.05 0.67 0.002 0.2 0.3Queen
05/03/2023 14
What is Word Vector
• It is a numerical way of word representation
• Where each dimension captures some semantic and syntactic
information related to that word
• Using the similar idea we can generate concept/sense/CUI vectors.
05/03/2023 15
Why Word Vectors Work ?• Learned word vectors capture the syntactic and semantic
information exist in text– vector(“king”) – vector(“man”) + vector(“woman”) vector(“queen”)
Fig 5: resultant queen vector and other vectors [5]
05/03/2023 16
Outline• Introduction• Application of Word Sense Disambiguation(WSD)• Motivation• Related Methods to Solve WSD• Our Method• Word Vectors Tools Used• Our Method in Detail• Experiment and Analysis• Conclusion
05/03/2023 17
Required Tools • language model
05/03/2023 18
Required Tools Contd.Step1 Parsing: text parsed in noun phrases using xerox POS tagger to perform syntactic analysis [4]. Step2 Variant Generation: Varaint for each input phrase are generated using the knowledge of specialist lexicons and supplementary database of synonymsStep3 Candidate Retrieval: the candidate sets retrieved from the UMLS metathesaurus contains at least One of the variants generated from step threeStep 4 Candidate Evaluation
Fig2 : Variants for word ocular
Fig3 : evaluated candidates for Ocular complication
• Metamap
05/03/2023 19
Outline• Introduction• Application of Word Sense Disambiguation(WSD)• Motivation• Related Methods to Solve WSD• Our Method• Word Vectors• Tools Used Our Method in Detail• Experiment and Analysis• Conclusion
05/03/2023 20
Our Method in Detail
• Text preprocessing
– English stop words
– Nltk word tokenization
– Frequency greater than five
– Lower case everything
• Word context is ten words long
• Generated word vectors are 300 dimension
05/03/2023 21
Generating word and concept vectors• Generate word and concept vectors
• 20 million citations from pubmed for training word vectors
• Randomly chosen 5 million citations
• Retrieved 7.1 million sentences containing target ambiguous
words
• Each sentence is 16-17 words long
• Combined sentence are used to generate the bigrams
• Each bigrams fed into metamap with WSD option turned on
• Replace each bigram with corresponding concepts
• Then fed the data to language model to generate concept vector
05/03/2023 22
Estimate P(D|c) [Yepes et al.]
• Using Jimeno-Yepes and Berlanga[3] model used Markov
Chain to calculate P(D|c)
• In order to get P(D|c), need to calculate P(w|c)
05/03/2023 23
Biomedical MSH WSD
• A dataset with 203 ambiguous words
• 424 unique concept identifiers (senses)
• 38,495 test context instances with an average of 200 test
instances for each ambiguous word.
• Goal -- to correctly identify the senses for each test instance
05/03/2023 24
Model I Cosine Similarity
𝑓 𝑐 (𝑇 ,𝑤 ,𝐶 (𝑤 ) )=𝑎𝑟𝑔 max𝑐∈𝐶 (𝑤)
cos (𝑇 𝑎𝑣𝑔 , �⃗�)
• W is the ambiguous word
• T is test instance context containing the ambiguous word w
• C(w) is the set of concepts that w can assume
05/03/2023 25
Model II projection magnitude
𝑓 𝑐(𝑇 ,𝑤 ,𝐶 (𝑤))=arg max𝑐∈𝐶(𝑤)[𝜌 (cos (𝑇𝑎𝑣𝑔 ,�⃗� )) .‖𝑃𝑟 (𝑇 𝑎𝑣𝑔 , �⃗�)‖
‖𝑐‖ ]• Took projection along concept vector and then consider the
Euclidean norm
05/03/2023 26
Model III
𝑓 𝑐 (𝑇 ,𝑤 ,𝐶 (𝑤))=arg max𝑐∈𝐶(𝑤)[cos (𝑇 𝑎𝑣𝑔 , �⃗� ) .‖𝑃𝑟 (𝑇 𝑎𝑣𝑔 , �⃗�)‖
‖𝑐‖ ]• Combined both angular and magnitude
05/03/2023 27
Model IV
05/03/2023 28
Model V KNN
• Now we have multiple ways to resolve sense for ambiguous terms
• Built distantly supervised dataset by collecting data from
biomedical citations
• For each ambiguous words there is on average 40000 sentences
• Resolved senses for each sentences using Model IV
05/03/2023 29
KNN in Pseudo Code
05/03/2023 30
KNN contd.
𝑓 𝑘−𝑁𝑁 (𝑇 ,𝑤 ,𝐶 (𝑤))=argmax𝑐∈𝐶(𝑤) [ ∑
(𝐷 ,𝑤 , 𝑐)∈𝑅𝑘(𝐷𝑤 )
cos (𝑇 𝑎𝑣𝑔 , �⃗�𝑎𝑣𝑔) ]Training instance 1 (c_1)
Training instance 2(c_1)
Training instance 3 (c_2)
Training instance 4 (c_1)
Training instance 5 (c_2)
……………..
………………
Training instance n (c_2)
Test Instance 1 (__)
Cosine similarity
Training instance 1 (c_1, 0.7)
Training instance 2(c_1, 0.9)
Training instance 3 (c_2, 0.1)
Training instance 4 (c_1, 0.03)
Training instance 5 (c_2, 0.02)
……………..
………………
Training instance n (c_2, 0.12)
05/03/2023 31
KNN Accuracy graph
05/03/2023 32
Distant Supervision with CNN
• Used the refined assignment of CUIs to sentences as a training set
• Then used MSH WSD data as a test data set
• Trained 203 Convolutional Neural Net
• With one convolutional layer and one hidden layer
• Used 900 filters of 3 different size
• Used the test case for testing purpose
05/03/2023 33
Distant Supervision Using CNN
05/03/2023 34
Ensembling of CNNs
• Five CNN training and testing for each ambiguous words
• Average the output and takes the best one
• Tends to improve the result at the cost of computation
05/03/2023 35
Outline• Introduction• Application of Word Sense Disambiguation(WSD)• Motivation• Related Methods to Solve WSD• Our Method• Word Vectors• Tools Used• Our Approach Experiment and Analysis• Conclusion
05/03/2023 36
Results and AnalysisMethods Results
Jimeno-Yepes and Berlanga [1] 89.10%
Cosine similarity (Model I ) 85.54%
Projection length proportion(Model II ) 88.68%
Combining Model I and II 89.26%
Combining Model I, II and [1] 92.24%
Convolutional Neural Net 86.17%
Ensembling CNN 87.78%
K-NN with k = 3500 () 94.34%
05/03/2023 37
Outline• Introduction• Application of Word Sense Disambiguation(WSD)• Motivation• Related Methods to Solve WSD• Our Method• Word Vectors• Tools Used• Our Approach• Experiment and Analysis Conclusion
05/03/2023 38
Conclusion
• The developed model is highly accurate beating previous best
• It is unsupervised no requirement of hand label information
• It is scalable however the accuracy level will be uncertain
– By increasing the number of training sentence and the context of
sentence more information may be extractable
• Graph based algorithm need to be explored
• HPC, Theano, Nltk, Gensim Word2Vec
05/03/2023 39
Questions
05/03/2023 40
References1. Eneko Agirre and Philip Edmonds. Word sense disambiguation: Algorithms
and applications, volume 33. Springer Science & Business Media, 2007.2. Yoshua Bengio, Rejean Ducharme, Pascal Vincent, and Christian Janvin. A
neural probabilistic language model. The Journal of Machine Learning Research, 3:1137-1155, 2003
3. Antonio Jimeno Yepes and Rafael Berlanga. Knowledge based word-concept model estimation and renement for biomedical text mining. Journal of biomedical informatics, 53:300-307, 2015.
4. Aronson, Alan R. "Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program." Proceedings of the AMIA Symposium. American Medical Informatics Association, 2001.
5. https://blog.acolyer.org/2016/04/21/the-amazing-power-of-word-vectors/
05/03/2023 41
References6. Alex Krizhevsky, Ilya Sutskever, and Georey E Hinton. Imagenet
classication with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097-1105, 2012.