A New Graphical Representation of DNA Sequences Using Symmetrical Vector Assignment
Text mining lab (summer 2017) - Word Vector Representation
-
Upload
elvis-saravia -
Category
Data & Analytics
-
view
37 -
download
2
Transcript of Text mining lab (summer 2017) - Word Vector Representation
![Page 1: Text mining lab (summer 2017) - Word Vector Representation](https://reader034.fdocuments.net/reader034/viewer/2022042723/5a64c0767f8b9ac21c8b55fd/html5/thumbnails/1.jpg)
Summer 2017Elvis Saravia
PhD, Information Systems and [email protected]
Github username: omarsarQuestions: sli.do (#Z217)
![Page 2: Text mining lab (summer 2017) - Word Vector Representation](https://reader034.fdocuments.net/reader034/viewer/2022042723/5a64c0767f8b9ac21c8b55fd/html5/thumbnails/2.jpg)
2
![Page 3: Text mining lab (summer 2017) - Word Vector Representation](https://reader034.fdocuments.net/reader034/viewer/2022042723/5a64c0767f8b9ac21c8b55fd/html5/thumbnails/3.jpg)
●
●
●
●
●
●
● Knowledge Discovery (KDD) Process
3
![Page 4: Text mining lab (summer 2017) - Word Vector Representation](https://reader034.fdocuments.net/reader034/viewer/2022042723/5a64c0767f8b9ac21c8b55fd/html5/thumbnails/4.jpg)
4
![Page 5: Text mining lab (summer 2017) - Word Vector Representation](https://reader034.fdocuments.net/reader034/viewer/2022042723/5a64c0767f8b9ac21c8b55fd/html5/thumbnails/5.jpg)
5
![Page 6: Text mining lab (summer 2017) - Word Vector Representation](https://reader034.fdocuments.net/reader034/viewer/2022042723/5a64c0767f8b9ac21c8b55fd/html5/thumbnails/6.jpg)
ConceptNet6
![Page 7: Text mining lab (summer 2017) - Word Vector Representation](https://reader034.fdocuments.net/reader034/viewer/2022042723/5a64c0767f8b9ac21c8b55fd/html5/thumbnails/7.jpg)
●●●
7
![Page 8: Text mining lab (summer 2017) - Word Vector Representation](https://reader034.fdocuments.net/reader034/viewer/2022042723/5a64c0767f8b9ac21c8b55fd/html5/thumbnails/8.jpg)
Motel = [0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0]Hotel = [0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0]
●●
One-hot representation
8
![Page 9: Text mining lab (summer 2017) - Word Vector Representation](https://reader034.fdocuments.net/reader034/viewer/2022042723/5a64c0767f8b9ac21c8b55fd/html5/thumbnails/9.jpg)
hotel = [0.728 0.234 -0.23 0.223]
Distributed representation (low-dimension vector)9
![Page 10: Text mining lab (summer 2017) - Word Vector Representation](https://reader034.fdocuments.net/reader034/viewer/2022042723/5a64c0767f8b9ac21c8b55fd/html5/thumbnails/10.jpg)
10
Paper source: https://arxiv.org/pdf/1301.3781.pdf
![Page 11: Text mining lab (summer 2017) - Word Vector Representation](https://reader034.fdocuments.net/reader034/viewer/2022042723/5a64c0767f8b9ac21c8b55fd/html5/thumbnails/11.jpg)
11
Paper source: https://arxiv.org/pdf/1301.3781.pdf
Feedforward Neural Net Language Model (NNLM)
variables to optimizedenotes window range
![Page 12: Text mining lab (summer 2017) - Word Vector Representation](https://reader034.fdocuments.net/reader034/viewer/2022042723/5a64c0767f8b9ac21c8b55fd/html5/thumbnails/12.jpg)
12
![Page 13: Text mining lab (summer 2017) - Word Vector Representation](https://reader034.fdocuments.net/reader034/viewer/2022042723/5a64c0767f8b9ac21c8b55fd/html5/thumbnails/13.jpg)
13
P(the|over)P(fox|over)P(jumped|over)P(the|over)P(lazy|over)P(dog|over)
P(VOUT | VIN)How to define this prob. distribution?
Determines similarity in [-1,1]
Get a probability in [0,1] out of a similarity in [-1,1]
![Page 14: Text mining lab (summer 2017) - Word Vector Representation](https://reader034.fdocuments.net/reader034/viewer/2022042723/5a64c0767f8b9ac21c8b55fd/html5/thumbnails/14.jpg)
14
![Page 15: Text mining lab (summer 2017) - Word Vector Representation](https://reader034.fdocuments.net/reader034/viewer/2022042723/5a64c0767f8b9ac21c8b55fd/html5/thumbnails/15.jpg)
15https://www.healthvault.com/en-us/health-bot/
![Page 16: Text mining lab (summer 2017) - Word Vector Representation](https://reader034.fdocuments.net/reader034/viewer/2022042723/5a64c0767f8b9ac21c8b55fd/html5/thumbnails/16.jpg)
16
![Page 17: Text mining lab (summer 2017) - Word Vector Representation](https://reader034.fdocuments.net/reader034/viewer/2022042723/5a64c0767f8b9ac21c8b55fd/html5/thumbnails/17.jpg)
● https://goo.gl/ppHX65
●○ Gensim guide for word2vec: https://goo.gl/i2UrdH
● https://goo.gl/7b72S9
●● https://goo.gl/uNJDrs
●
17
![Page 18: Text mining lab (summer 2017) - Word Vector Representation](https://reader034.fdocuments.net/reader034/viewer/2022042723/5a64c0767f8b9ac21c8b55fd/html5/thumbnails/18.jpg)
18
![Page 19: Text mining lab (summer 2017) - Word Vector Representation](https://reader034.fdocuments.net/reader034/viewer/2022042723/5a64c0767f8b9ac21c8b55fd/html5/thumbnails/19.jpg)
19
![Page 20: Text mining lab (summer 2017) - Word Vector Representation](https://reader034.fdocuments.net/reader034/viewer/2022042723/5a64c0767f8b9ac21c8b55fd/html5/thumbnails/20.jpg)
20
![Page 21: Text mining lab (summer 2017) - Word Vector Representation](https://reader034.fdocuments.net/reader034/viewer/2022042723/5a64c0767f8b9ac21c8b55fd/html5/thumbnails/21.jpg)
21
![Page 22: Text mining lab (summer 2017) - Word Vector Representation](https://reader034.fdocuments.net/reader034/viewer/2022042723/5a64c0767f8b9ac21c8b55fd/html5/thumbnails/22.jpg)
22
![Page 23: Text mining lab (summer 2017) - Word Vector Representation](https://reader034.fdocuments.net/reader034/viewer/2022042723/5a64c0767f8b9ac21c8b55fd/html5/thumbnails/23.jpg)
23
![Page 24: Text mining lab (summer 2017) - Word Vector Representation](https://reader034.fdocuments.net/reader034/viewer/2022042723/5a64c0767f8b9ac21c8b55fd/html5/thumbnails/24.jpg)
● https://goo.gl/KYacjz
●●●●●
● https://goo.gl/JezgYg
●
24
![Page 25: Text mining lab (summer 2017) - Word Vector Representation](https://reader034.fdocuments.net/reader034/viewer/2022042723/5a64c0767f8b9ac21c8b55fd/html5/thumbnails/25.jpg)
a. Build API: (Flask/Django recommended)b. Pretrained models: (Guide: https://goo.gl/5qt2Ki)c. Visualization: d3js / plotly / tensorboard
a. LSTM - (Guide: http://colah.github.io/posts/2015-08-Understanding-LSTMs/)b. CNN - (Guide: https://goo.gl/PgLUs7)c. RNN - (Guide: https://goo.gl/5L9kci
a. Starting point:https://rare-technologies.com/word2vec-tutorial#app
25