Biomedical Word Sense Disambiguation presentation [Autosaved]

41
Biomedical Word Sense Disambiguation with Neural Word and Concept Embedding Department of Computer Science University of Kentucky Oct 7, 2016 AKM Sabbir Advisor, Dr. Ramakanth Kavuluru 06/15/2022 1

Transcript of Biomedical Word Sense Disambiguation presentation [Autosaved]

Page 1: Biomedical Word Sense Disambiguation presentation [Autosaved]

05/03/2023 1

Biomedical Word Sense Disambiguation with Neural Word and Concept Embedding

Department of Computer Science University of Kentucky

Oct 7, 2016

AKM Sabbir

Advisor,Dr. Ramakanth Kavuluru

Page 2: Biomedical Word Sense Disambiguation presentation [Autosaved]

05/03/2023 2

Outline Introduction• Application of Word Sense Disambiguation(WSD)• Motivation• Related Methods to Solve WSD• Our Method• Word Vectors• Tools Used• Our Method in Detail • Experiment and Analysis• Conclusion

Page 3: Biomedical Word Sense Disambiguation presentation [Autosaved]

05/03/2023 3

Introduction

• WSD is the task of detecting correct sense or assigning proper

sense

– the air in the center of the vortex of a cyclone is generally

very cold

– I Could not come to office last week because I had a cold

• Retrieving information from Machine is not easy task

• Number of Natural Language Processing(NLP) tasks require WSD

Page 4: Biomedical Word Sense Disambiguation presentation [Autosaved]

05/03/2023 4

Outline• Introduction Application of Word Sense Disambiguation(WSD)• Motivation• Related Methods to Solve WSD• Our Method• Word Vectors• Tools Used• Our Method in Detail • Experiment and Analysis• Conclusion

Page 5: Biomedical Word Sense Disambiguation presentation [Autosaved]

05/03/2023 5

Application

• Text to Speech Conversion

– Bass can be pronounced either base or baes

• Machine Translation

– French Word Grille can be translated into gate or bar

• Information Retrieval

• Named Entity Recognition

• Document Summary Generation

Page 6: Biomedical Word Sense Disambiguation presentation [Autosaved]

05/03/2023 6

Outline• Introduction• Application of Word Sense Disambiguation(WSD) Motivation• Related Methods to Solve WSD• Our Method• Word Vectors• Tools Used• Our Approach• Experiment and Analysis• Conclusion

Page 7: Biomedical Word Sense Disambiguation presentation [Autosaved]

05/03/2023 7

Motivation

• Generalized WSD is a difficult problem

• Solve it for each domain

• Biomedical domain contains a large number of ambiguous

words

• Medical report summary generation

• Drug side effect prediction

Page 8: Biomedical Word Sense Disambiguation presentation [Autosaved]

05/03/2023 8

Outline• Introduction• Application of Word Sense Disambiguation(WSD)• Motivation Related Methods to Solve WSD• Our Method• Word Vectors• Tools Used• Our Method in Detail • Experiment and Analysis• Conclusion

Page 9: Biomedical Word Sense Disambiguation presentation [Autosaved]

05/03/2023 9

Related Method

• Supervised Methods

– Support Vector Machines, Convolutional Neural Net

• Unsupervised Methods

– Clustering, generative model

– If vocabulary has four words w1, w2, w3, w4

• Knowledge Based Methods

– WordNet, UMLS(Unified Medical Language System)

w1 … w4

w1 1/5 2/5 0w2…

Page 10: Biomedical Word Sense Disambiguation presentation [Autosaved]

05/03/2023 10

Outline• Introduction• Application of Word Sense Disambiguation(WSD)• Motivation• Related Methods to Solve WSD Our Method• Word Vectors• Tools Used• Our Approach• Experiment and Analysis• Conclusion

Page 11: Biomedical Word Sense Disambiguation presentation [Autosaved]

05/03/2023 11

Our Method• We build a semi supervised model

• Model involves usage of concept/sense/CUI vectors just like how people use word vectors (more later)

• Metamap is an knowledge based NER tool. We use its decisions is used to generate concept vectors

• Model also involves the usage of P(w|c) where c is a concept or sense generated using other knowledge based approaches

• Generated word vectors using unstructured data source

Page 12: Biomedical Word Sense Disambiguation presentation [Autosaved]

05/03/2023 12

Outline• Introduction• Application of Word Sense Disambiguation(WSD)• Motivation• Related Methods to Solve WSD• Our Approach Word Vectors• Tools Used• Our Method in Detail • Experiment and Analysis• Conclusion

Page 13: Biomedical Word Sense Disambiguation presentation [Autosaved]

05/03/2023 13

What is Word Vector• Distributed representation of words

• Representation of word spread across all dimension of vector

• The idea is different from other representation where the

length is equal to the vocabulary size. Here we choose a small

dimension say d=200 and generate dense vectors

• Each element of the vector contributes to the definition of

many different words

0.07 0.05 0.8 0.002 0.1 0.3King

0.7 0.05 0.67 0.002 0.2 0.3Queen

Page 14: Biomedical Word Sense Disambiguation presentation [Autosaved]

05/03/2023 14

What is Word Vector

• It is a numerical way of word representation

• Where each dimension captures some semantic and syntactic

information related to that word

• Using the similar idea we can generate concept/sense/CUI vectors.

Page 15: Biomedical Word Sense Disambiguation presentation [Autosaved]

05/03/2023 15

Why Word Vectors Work ?• Learned word vectors capture the syntactic and semantic

information exist in text– vector(“king”) – vector(“man”) + vector(“woman”) vector(“queen”)

Fig 5: resultant queen vector and other vectors [5]

Page 16: Biomedical Word Sense Disambiguation presentation [Autosaved]

05/03/2023 16

Outline• Introduction• Application of Word Sense Disambiguation(WSD)• Motivation• Related Methods to Solve WSD• Our Method• Word Vectors Tools Used• Our Method in Detail• Experiment and Analysis• Conclusion

Page 17: Biomedical Word Sense Disambiguation presentation [Autosaved]

05/03/2023 17

Required Tools • language model

Page 18: Biomedical Word Sense Disambiguation presentation [Autosaved]

05/03/2023 18

Required Tools Contd.Step1 Parsing: text parsed in noun phrases using xerox POS tagger to perform syntactic analysis [4]. Step2 Variant Generation: Varaint for each input phrase are generated using the knowledge of specialist lexicons and supplementary database of synonymsStep3 Candidate Retrieval: the candidate sets retrieved from the UMLS metathesaurus contains at least One of the variants generated from step threeStep 4 Candidate Evaluation

Fig2 : Variants for word ocular

Fig3 : evaluated candidates for Ocular complication

• Metamap

Page 19: Biomedical Word Sense Disambiguation presentation [Autosaved]

05/03/2023 19

Outline• Introduction• Application of Word Sense Disambiguation(WSD)• Motivation• Related Methods to Solve WSD• Our Method• Word Vectors• Tools Used Our Method in Detail• Experiment and Analysis• Conclusion

Page 20: Biomedical Word Sense Disambiguation presentation [Autosaved]

05/03/2023 20

Our Method in Detail

• Text preprocessing

– English stop words

– Nltk word tokenization

– Frequency greater than five

– Lower case everything

• Word context is ten words long

• Generated word vectors are 300 dimension

Page 21: Biomedical Word Sense Disambiguation presentation [Autosaved]

05/03/2023 21

Generating word and concept vectors• Generate word and concept vectors

• 20 million citations from pubmed for training word vectors

• Randomly chosen 5 million citations

• Retrieved 7.1 million sentences containing target ambiguous

words

• Each sentence is 16-17 words long

• Combined sentence are used to generate the bigrams

• Each bigrams fed into metamap with WSD option turned on

• Replace each bigram with corresponding concepts

• Then fed the data to language model to generate concept vector

Page 22: Biomedical Word Sense Disambiguation presentation [Autosaved]

05/03/2023 22

Estimate P(D|c) [Yepes et al.]

• Using Jimeno-Yepes and Berlanga[3] model used Markov

Chain to calculate P(D|c)

• In order to get P(D|c), need to calculate P(w|c)

Page 23: Biomedical Word Sense Disambiguation presentation [Autosaved]

05/03/2023 23

Biomedical MSH WSD

• A dataset with 203 ambiguous words

• 424 unique concept identifiers (senses)

• 38,495 test context instances with an average of 200 test

instances for each ambiguous word.

• Goal -- to correctly identify the senses for each test instance

Page 24: Biomedical Word Sense Disambiguation presentation [Autosaved]

05/03/2023 24

Model I Cosine Similarity

𝑓 𝑐 (𝑇 ,𝑤 ,𝐶 (𝑤 ) )=𝑎𝑟𝑔 max𝑐∈𝐶 (𝑤)

cos (𝑇 𝑎𝑣𝑔 , �⃗�)

• W is the ambiguous word

• T is test instance context containing the ambiguous word w

• C(w) is the set of concepts that w can assume

Page 25: Biomedical Word Sense Disambiguation presentation [Autosaved]

05/03/2023 25

Model II projection magnitude

𝑓 𝑐(𝑇 ,𝑤 ,𝐶 (𝑤))=arg max𝑐∈𝐶(𝑤)[𝜌 (cos (𝑇𝑎𝑣𝑔 ,�⃗� )) .‖𝑃𝑟 (𝑇 𝑎𝑣𝑔 , �⃗�)‖

‖𝑐‖ ]• Took projection along concept vector and then consider the

Euclidean norm

Page 26: Biomedical Word Sense Disambiguation presentation [Autosaved]

05/03/2023 26

Model III

𝑓 𝑐 (𝑇 ,𝑤 ,𝐶 (𝑤))=arg max𝑐∈𝐶(𝑤)[cos (𝑇 𝑎𝑣𝑔 , �⃗� ) .‖𝑃𝑟 (𝑇 𝑎𝑣𝑔 , �⃗�)‖

‖𝑐‖ ]• Combined both angular and magnitude

Page 27: Biomedical Word Sense Disambiguation presentation [Autosaved]

05/03/2023 27

Model IV

Page 28: Biomedical Word Sense Disambiguation presentation [Autosaved]

05/03/2023 28

Model V KNN

• Now we have multiple ways to resolve sense for ambiguous terms

• Built distantly supervised dataset by collecting data from

biomedical citations

• For each ambiguous words there is on average 40000 sentences

• Resolved senses for each sentences using Model IV

Page 29: Biomedical Word Sense Disambiguation presentation [Autosaved]

05/03/2023 29

KNN in Pseudo Code

Page 30: Biomedical Word Sense Disambiguation presentation [Autosaved]

05/03/2023 30

KNN contd.

𝑓 𝑘−𝑁𝑁 (𝑇 ,𝑤 ,𝐶 (𝑤))=argmax𝑐∈𝐶(𝑤) [ ∑

(𝐷 ,𝑤 , 𝑐)∈𝑅𝑘(𝐷𝑤 )

cos (𝑇 𝑎𝑣𝑔 , �⃗�𝑎𝑣𝑔) ]Training instance 1 (c_1)

Training instance 2(c_1)

Training instance 3 (c_2)

Training instance 4 (c_1)

Training instance 5 (c_2)

……………..

………………

Training instance n (c_2)

Test Instance 1 (__)

Cosine similarity

Training instance 1 (c_1, 0.7)

Training instance 2(c_1, 0.9)

Training instance 3 (c_2, 0.1)

Training instance 4 (c_1, 0.03)

Training instance 5 (c_2, 0.02)

……………..

………………

Training instance n (c_2, 0.12)

Page 31: Biomedical Word Sense Disambiguation presentation [Autosaved]

05/03/2023 31

KNN Accuracy graph

Page 32: Biomedical Word Sense Disambiguation presentation [Autosaved]

05/03/2023 32

Distant Supervision with CNN

• Used the refined assignment of CUIs to sentences as a training set

• Then used MSH WSD data as a test data set

• Trained 203 Convolutional Neural Net

• With one convolutional layer and one hidden layer

• Used 900 filters of 3 different size

• Used the test case for testing purpose

Page 33: Biomedical Word Sense Disambiguation presentation [Autosaved]

05/03/2023 33

Distant Supervision Using CNN

Page 34: Biomedical Word Sense Disambiguation presentation [Autosaved]

05/03/2023 34

Ensembling of CNNs

• Five CNN training and testing for each ambiguous words

• Average the output and takes the best one

• Tends to improve the result at the cost of computation

Page 35: Biomedical Word Sense Disambiguation presentation [Autosaved]

05/03/2023 35

Outline• Introduction• Application of Word Sense Disambiguation(WSD)• Motivation• Related Methods to Solve WSD• Our Method• Word Vectors• Tools Used• Our Approach Experiment and Analysis• Conclusion

Page 36: Biomedical Word Sense Disambiguation presentation [Autosaved]

05/03/2023 36

Results and AnalysisMethods Results

Jimeno-Yepes and Berlanga [1] 89.10%

Cosine similarity (Model I ) 85.54%

Projection length proportion(Model II ) 88.68%

Combining Model I and II 89.26%

Combining Model I, II and [1] 92.24%

Convolutional Neural Net 86.17%

Ensembling CNN 87.78%

K-NN with k = 3500 () 94.34%

Page 37: Biomedical Word Sense Disambiguation presentation [Autosaved]

05/03/2023 37

Outline• Introduction• Application of Word Sense Disambiguation(WSD)• Motivation• Related Methods to Solve WSD• Our Method• Word Vectors• Tools Used• Our Approach• Experiment and Analysis Conclusion

Page 38: Biomedical Word Sense Disambiguation presentation [Autosaved]

05/03/2023 38

Conclusion

• The developed model is highly accurate beating previous best

• It is unsupervised no requirement of hand label information

• It is scalable however the accuracy level will be uncertain

– By increasing the number of training sentence and the context of

sentence more information may be extractable

• Graph based algorithm need to be explored

• HPC, Theano, Nltk, Gensim Word2Vec

Page 39: Biomedical Word Sense Disambiguation presentation [Autosaved]

05/03/2023 39

Questions

Page 40: Biomedical Word Sense Disambiguation presentation [Autosaved]

05/03/2023 40

References1. Eneko Agirre and Philip Edmonds. Word sense disambiguation: Algorithms

and applications, volume 33. Springer Science & Business Media, 2007.2. Yoshua Bengio, Rejean Ducharme, Pascal Vincent, and Christian Janvin. A

neural probabilistic language model. The Journal of Machine Learning Research, 3:1137-1155, 2003

3. Antonio Jimeno Yepes and Rafael Berlanga. Knowledge based word-concept model estimation and renement for biomedical text mining. Journal of biomedical informatics, 53:300-307, 2015.

4. Aronson, Alan R. "Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program." Proceedings of the AMIA Symposium. American Medical Informatics Association, 2001.

5. https://blog.acolyer.org/2016/04/21/the-amazing-power-of-word-vectors/

Page 41: Biomedical Word Sense Disambiguation presentation [Autosaved]

05/03/2023 41

References6. Alex Krizhevsky, Ilya Sutskever, and Georey E Hinton. Imagenet

classication with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097-1105, 2012.