Towards Understanding...Christopher Manning Word distributions word representations 0.286 0.792...
Transcript of Towards Understanding...Christopher Manning Word distributions word representations 0.286 0.792...
![Page 1: Towards Understanding...Christopher Manning Word distributions word representations 0.286 0.792 −0.177 −0.107 0.109 −0.542 0.349 0.271 0.487 linguistics = [Bengio et al. 2003,](https://reader034.fdocuments.net/reader034/viewer/2022042300/5ecaad32e42b2935df5d9cd9/html5/thumbnails/1.jpg)
![Page 2: Towards Understanding...Christopher Manning Word distributions word representations 0.286 0.792 −0.177 −0.107 0.109 −0.542 0.349 0.271 0.487 linguistics = [Bengio et al. 2003,](https://reader034.fdocuments.net/reader034/viewer/2022042300/5ecaad32e42b2935df5d9cd9/html5/thumbnails/2.jpg)
Towards UnderstandingChristopher ManningStanford University
![Page 3: Towards Understanding...Christopher Manning Word distributions word representations 0.286 0.792 −0.177 −0.107 0.109 −0.542 0.349 0.271 0.487 linguistics = [Bengio et al. 2003,](https://reader034.fdocuments.net/reader034/viewer/2022042300/5ecaad32e42b2935df5d9cd9/html5/thumbnails/3.jpg)
Christopher Manning
1980s Natural Language Processing
VP →{ V (NP:(↑ OBJ)=↓ (NP:(↑ OBJ2)=↓) )
(XP:(↑ XCOMP)=↓)
|@(COORD VP VP)}.
salmon N IRR @(CN SALMON)
(↑ PERSON)=3
{ (↑ NUM)=SG|(↑ NUM)=PL}.
![Page 4: Towards Understanding...Christopher Manning Word distributions word representations 0.286 0.792 −0.177 −0.107 0.109 −0.542 0.349 0.271 0.487 linguistics = [Bengio et al. 2003,](https://reader034.fdocuments.net/reader034/viewer/2022042300/5ecaad32e42b2935df5d9cd9/html5/thumbnails/4.jpg)
Christopher Manning
WRB VBZ DT NN VB TO VB DT
How does a project get to be aNN JJ . : CD NN IN DT NN .
year late ? … One day at a time .
P(late|a, year) = 0.0087P(NN|DT, a, project) = 0.9
Learning language
![Page 5: Towards Understanding...Christopher Manning Word distributions word representations 0.286 0.792 −0.177 −0.107 0.109 −0.542 0.349 0.271 0.487 linguistics = [Bengio et al. 2003,](https://reader034.fdocuments.net/reader034/viewer/2022042300/5ecaad32e42b2935df5d9cd9/html5/thumbnails/5.jpg)
Christopher Manning
The traditional word representation
motel
[0 0 0 0 0 0 0 0 0 0 1 0 0 0 0]
Dimensionality: 50K (small domain – speech/PTB) – 13M (web – Google 1T)
motel [0 0 0 0 0 0 0 0 0 0 1 0 0 0 0] AND
hotel [0 0 0 0 0 0 0 1 0 0 0 0 0 0 0] = 0
![Page 6: Towards Understanding...Christopher Manning Word distributions word representations 0.286 0.792 −0.177 −0.107 0.109 −0.542 0.349 0.271 0.487 linguistics = [Bengio et al. 2003,](https://reader034.fdocuments.net/reader034/viewer/2022042300/5ecaad32e42b2935df5d9cd9/html5/thumbnails/6.jpg)
Christopher Manning
Word distributions word representations
0.2860.792
−0.177−0.107
0.109−0.542
0.3490.2710.487
linguistics =
[Bengio et al. 2003, Collobert & Weston 2008, Turian 2010, Mikolov 2013, etc.]
Through corpus linguistics, large chunksthe study of language and linguistics.The field of linguistics is concernedWritten like a linguistics text bookPhonology is the branch of linguistics that
![Page 7: Towards Understanding...Christopher Manning Word distributions word representations 0.286 0.792 −0.177 −0.107 0.109 −0.542 0.349 0.271 0.487 linguistics = [Bengio et al. 2003,](https://reader034.fdocuments.net/reader034/viewer/2022042300/5ecaad32e42b2935df5d9cd9/html5/thumbnails/7.jpg)
Christopher Manning
Ratios of co-occurrence probabilities can encode meaning components
Crucial insight:
x = solid x = water
large
x = gas
small
x = random
smalllarge
small large large small
~1 ~1large small
Encoding meaning in vector differences[Pennington et al., to appear 2014]
![Page 8: Towards Understanding...Christopher Manning Word distributions word representations 0.286 0.792 −0.177 −0.107 0.109 −0.542 0.349 0.271 0.487 linguistics = [Bengio et al. 2003,](https://reader034.fdocuments.net/reader034/viewer/2022042300/5ecaad32e42b2935df5d9cd9/html5/thumbnails/8.jpg)
Christopher Manning
Ratios of co-occurrence probabilities can encode meaning components
Crucial insight:
x = solid x = water
1.9 x 10-4
x = gas x = fashion
2.2 x 10-5
1.36 0.96
Encoding meaning in vector differences[Pennington et al., to appear 2014]
8.9
7.8 x 10-4 2.2 x 10-3
3.0 x 10-3 1.7 x 10-5
1.8 x 10-5
6.6 x 10-5
8.5 x 10-2
![Page 9: Towards Understanding...Christopher Manning Word distributions word representations 0.286 0.792 −0.177 −0.107 0.109 −0.542 0.349 0.271 0.487 linguistics = [Bengio et al. 2003,](https://reader034.fdocuments.net/reader034/viewer/2022042300/5ecaad32e42b2935df5d9cd9/html5/thumbnails/9.jpg)
Christopher Manning
GloVe: A new model for learning word representations [Pennington et al., to appear 2014]
![Page 10: Towards Understanding...Christopher Manning Word distributions word representations 0.286 0.792 −0.177 −0.107 0.109 −0.542 0.349 0.271 0.487 linguistics = [Bengio et al. 2003,](https://reader034.fdocuments.net/reader034/viewer/2022042300/5ecaad32e42b2935df5d9cd9/html5/thumbnails/10.jpg)
Christopher Manning
Nearest words to frog:
1. frogs2. toad3. litoria4. leptodactylidae5. rana6. lizard7. eleutherodactylus
Word similarities
litoria leptodactylidae
rana eleutherodactylus
![Page 11: Towards Understanding...Christopher Manning Word distributions word representations 0.286 0.792 −0.177 −0.107 0.109 −0.542 0.349 0.271 0.487 linguistics = [Bengio et al. 2003,](https://reader034.fdocuments.net/reader034/viewer/2022042300/5ecaad32e42b2935df5d9cd9/html5/thumbnails/11.jpg)
Christopher Manning
Model Dimensions Corpus size
Performance(Syn + Sem)
CBOW (Mikolov et al. 2013b) 300 1.6 billion 36.1
CBOW (Mikolov et al. 2013b) 1000 6 billion 63.7
GloVe (this work) 300 6 billion 71.7
Word analogy task [Mikolov, Yih & Zweig 2013a]
![Page 12: Towards Understanding...Christopher Manning Word distributions word representations 0.286 0.792 −0.177 −0.107 0.109 −0.542 0.349 0.271 0.487 linguistics = [Bengio et al. 2003,](https://reader034.fdocuments.net/reader034/viewer/2022042300/5ecaad32e42b2935df5d9cd9/html5/thumbnails/12.jpg)
Christopher Manning
Machine translation with bilingual neural language models [Devlin et al., ACL 2014]
![Page 13: Towards Understanding...Christopher Manning Word distributions word representations 0.286 0.792 −0.177 −0.107 0.109 −0.542 0.349 0.271 0.487 linguistics = [Bengio et al. 2003,](https://reader034.fdocuments.net/reader034/viewer/2022042300/5ecaad32e42b2935df5d9cd9/html5/thumbnails/13.jpg)
Christopher Manning
Machine translation with bilingual neural language models [Devlin et al., ACL 2014]
Arabic
1st Place (BBN)
49.5
2nd Place 47.5
… …
9th Place 44.0
10th Place 41.2
NIST 2012 Open MT Arabic Results
Arabic
Previous best BBN system
49.8
+ NNJM 52.8
+ 3.0 BLEU + 6.3 BLEU
“Baseline Hiero” Features: (1) Rule probs, (2) lexical smoothing, (3) KN LM, (4) word penalty, (5) concat penalty
NNJM on best system
Arabic
“Baseline Hiero”
43.4
+ NNJM 49.7
NNJM on “Baseline”
![Page 14: Towards Understanding...Christopher Manning Word distributions word representations 0.286 0.792 −0.177 −0.107 0.109 −0.542 0.349 0.271 0.487 linguistics = [Bengio et al. 2003,](https://reader034.fdocuments.net/reader034/viewer/2022042300/5ecaad32e42b2935df5d9cd9/html5/thumbnails/14.jpg)
Christopher Manning
Sentence structure: Dependency parsing
![Page 15: Towards Understanding...Christopher Manning Word distributions word representations 0.286 0.792 −0.177 −0.107 0.109 −0.542 0.349 0.271 0.487 linguistics = [Bengio et al. 2003,](https://reader034.fdocuments.net/reader034/viewer/2022042300/5ecaad32e42b2935df5d9cd9/html5/thumbnails/15.jpg)
Christopher Manning
Universal Stanford Dependencies[de Marneffe et al., LREC 2014]
A common dependency representation and label set applicable across languages – http://universaldependencies.github.io/docs/
![Page 16: Towards Understanding...Christopher Manning Word distributions word representations 0.286 0.792 −0.177 −0.107 0.109 −0.542 0.349 0.271 0.487 linguistics = [Bengio et al. 2003,](https://reader034.fdocuments.net/reader034/viewer/2022042300/5ecaad32e42b2935df5d9cd9/html5/thumbnails/16.jpg)
Christopher Manning
Sentence structure: Dependency parsing
![Page 17: Towards Understanding...Christopher Manning Word distributions word representations 0.286 0.792 −0.177 −0.107 0.109 −0.542 0.349 0.271 0.487 linguistics = [Bengio et al. 2003,](https://reader034.fdocuments.net/reader034/viewer/2022042300/5ecaad32e42b2935df5d9cd9/html5/thumbnails/17.jpg)
Christopher Manning
Deep Learning Dependency Parser [Chen & Manning, forthcoming 2014]
![Page 18: Towards Understanding...Christopher Manning Word distributions word representations 0.286 0.792 −0.177 −0.107 0.109 −0.542 0.349 0.271 0.487 linguistics = [Bengio et al. 2003,](https://reader034.fdocuments.net/reader034/viewer/2022042300/5ecaad32e42b2935df5d9cd9/html5/thumbnails/18.jpg)
Christopher Manning
Deep Learning Dependency Parser [Chen & Manning, forthcoming 2014]
Parser type Parser LAS (Label & Attach)
Sentences / sec
Transition-based
MaltParser(stackproj)
86.9 469
Our parser 89.6 654
Graph-based MSTParser 87.6 10
TurboParser (full) 89.7 8
![Page 19: Towards Understanding...Christopher Manning Word distributions word representations 0.286 0.792 −0.177 −0.107 0.109 −0.542 0.349 0.271 0.487 linguistics = [Bengio et al. 2003,](https://reader034.fdocuments.net/reader034/viewer/2022042300/5ecaad32e42b2935df5d9cd9/html5/thumbnails/19.jpg)
Christopher Manning
Grounding language meaning with images[Socher, Karpathy, Le, Manning & Ng, TACL 2014]
![Page 20: Towards Understanding...Christopher Manning Word distributions word representations 0.286 0.792 −0.177 −0.107 0.109 −0.542 0.349 0.271 0.487 linguistics = [Bengio et al. 2003,](https://reader034.fdocuments.net/reader034/viewer/2022042300/5ecaad32e42b2935df5d9cd9/html5/thumbnails/20.jpg)
Christopher Manning
Example dependency tree and image
dep
![Page 21: Towards Understanding...Christopher Manning Word distributions word representations 0.286 0.792 −0.177 −0.107 0.109 −0.542 0.349 0.271 0.487 linguistics = [Bengio et al. 2003,](https://reader034.fdocuments.net/reader034/viewer/2022042300/5ecaad32e42b2935df5d9cd9/html5/thumbnails/21.jpg)
Christopher Manning
Recursive computation of dependency tree
dep
![Page 22: Towards Understanding...Christopher Manning Word distributions word representations 0.286 0.792 −0.177 −0.107 0.109 −0.542 0.349 0.271 0.487 linguistics = [Bengio et al. 2003,](https://reader034.fdocuments.net/reader034/viewer/2022042300/5ecaad32e42b2935df5d9cd9/html5/thumbnails/22.jpg)
Christopher Manning
EvaluationData of [Rashtchian, Young, Hodosh & Hockenmaier 2010]
1000 images,5 descriptions each; used as 800 train,100 dev, 100 test
![Page 23: Towards Understanding...Christopher Manning Word distributions word representations 0.286 0.792 −0.177 −0.107 0.109 −0.542 0.349 0.271 0.487 linguistics = [Bengio et al. 2003,](https://reader034.fdocuments.net/reader034/viewer/2022042300/5ecaad32e42b2935df5d9cd9/html5/thumbnails/23.jpg)
Christopher Manning
Results for image search
Model Mean rank
Random 52.1
Recurrent NN 19.2
Constituency Tree Recursive NN 16.1
kCCA 15.9
Bag of Words 14.6
Dependency Tree Recursive NN 12.5
Lower is better!
![Page 24: Towards Understanding...Christopher Manning Word distributions word representations 0.286 0.792 −0.177 −0.107 0.109 −0.542 0.349 0.271 0.487 linguistics = [Bengio et al. 2003,](https://reader034.fdocuments.net/reader034/viewer/2022042300/5ecaad32e42b2935df5d9cd9/html5/thumbnails/24.jpg)
Christopher Manning
How to represent the meaning of texts[Le and Mikolov, ICML 2014, Paragraph Vector]
![Page 25: Towards Understanding...Christopher Manning Word distributions word representations 0.286 0.792 −0.177 −0.107 0.109 −0.542 0.349 0.271 0.487 linguistics = [Bengio et al. 2003,](https://reader034.fdocuments.net/reader034/viewer/2022042300/5ecaad32e42b2935df5d9cd9/html5/thumbnails/25.jpg)
Christopher Manning
Political Ideology Detection Using Recursive Neural Networks[Iyyer, Enns, Boyd-Graber & Resnik 2014]
![Page 26: Towards Understanding...Christopher Manning Word distributions word representations 0.286 0.792 −0.177 −0.107 0.109 −0.542 0.349 0.271 0.487 linguistics = [Bengio et al. 2003,](https://reader034.fdocuments.net/reader034/viewer/2022042300/5ecaad32e42b2935df5d9cd9/html5/thumbnails/26.jpg)
Christopher Manning
![Page 27: Towards Understanding...Christopher Manning Word distributions word representations 0.286 0.792 −0.177 −0.107 0.109 −0.542 0.349 0.271 0.487 linguistics = [Bengio et al. 2003,](https://reader034.fdocuments.net/reader034/viewer/2022042300/5ecaad32e42b2935df5d9cd9/html5/thumbnails/27.jpg)
Christopher Manning
Extracting Semantic Relationships[Socher, Huval, Manning & Ng, EMNLP 2012]
My [apartment]e1 has a pretty large [kitchen] e2
component-whole relationship (e2,e1)
![Page 28: Towards Understanding...Christopher Manning Word distributions word representations 0.286 0.792 −0.177 −0.107 0.109 −0.542 0.349 0.271 0.487 linguistics = [Bengio et al. 2003,](https://reader034.fdocuments.net/reader034/viewer/2022042300/5ecaad32e42b2935df5d9cd9/html5/thumbnails/28.jpg)
![Page 29: Towards Understanding...Christopher Manning Word distributions word representations 0.286 0.792 −0.177 −0.107 0.109 −0.542 0.349 0.271 0.487 linguistics = [Bengio et al. 2003,](https://reader034.fdocuments.net/reader034/viewer/2022042300/5ecaad32e42b2935df5d9cd9/html5/thumbnails/29.jpg)
Microsoft Privacy Policy statement applies to all information collected. Read at research.microsoft.com
Save the planet and return your name badge before you
leave (on Tuesday)
![Page 30: Towards Understanding...Christopher Manning Word distributions word representations 0.286 0.792 −0.177 −0.107 0.109 −0.542 0.349 0.271 0.487 linguistics = [Bengio et al. 2003,](https://reader034.fdocuments.net/reader034/viewer/2022042300/5ecaad32e42b2935df5d9cd9/html5/thumbnails/30.jpg)
![Page 31: Towards Understanding...Christopher Manning Word distributions word representations 0.286 0.792 −0.177 −0.107 0.109 −0.542 0.349 0.271 0.487 linguistics = [Bengio et al. 2003,](https://reader034.fdocuments.net/reader034/viewer/2022042300/5ecaad32e42b2935df5d9cd9/html5/thumbnails/31.jpg)
http://www.freegreatpicture.com/city-impression/trinity-college-dublin-the-old-library-14885
http://nlp.stanford.edu/manning/papers/romance.pdf
http://commons.wikimedia.org/wiki/File:PR2_Robot_reads_the_Mythical_Man-Month_2.jpg
![Page 32: Towards Understanding...Christopher Manning Word distributions word representations 0.286 0.792 −0.177 −0.107 0.109 −0.542 0.349 0.271 0.487 linguistics = [Bengio et al. 2003,](https://reader034.fdocuments.net/reader034/viewer/2022042300/5ecaad32e42b2935df5d9cd9/html5/thumbnails/32.jpg)
http://en.wikipedia.org/wiki/Litoria#mediaviewer/File:Caerulea_cropped(2).jpg
http://en.wikipedia.org/wiki/Pristimantis_cruentus#mediaviewer/File:Pristimantis_cruentus_studio.jpg
http://en.wikipedia.org/wiki/Edible_frog#mediaviewer/File:Rana_esculenta_on_Nymphaea_edit.JPG
http://en.wikipedia.org/wiki/Eleutherodactylus#mediaviewer/File:Eleutherodactylus_mimus.jpg
http://pascallin.ecs.soton.ac.uk/challenges/VOC/voc2008/
![Page 33: Towards Understanding...Christopher Manning Word distributions word representations 0.286 0.792 −0.177 −0.107 0.109 −0.542 0.349 0.271 0.487 linguistics = [Bengio et al. 2003,](https://reader034.fdocuments.net/reader034/viewer/2022042300/5ecaad32e42b2935df5d9cd9/html5/thumbnails/33.jpg)
![Page 34: Towards Understanding...Christopher Manning Word distributions word representations 0.286 0.792 −0.177 −0.107 0.109 −0.542 0.349 0.271 0.487 linguistics = [Bengio et al. 2003,](https://reader034.fdocuments.net/reader034/viewer/2022042300/5ecaad32e42b2935df5d9cd9/html5/thumbnails/34.jpg)