Slides credited from Richard Socher - NTUyvchen/f106-adl/doc/171002+171005... · Split derivatives...
Transcript of Slides credited from Richard Socher - NTUyvchen/f106-adl/doc/171002+171005... · Split derivatives...
![Page 1: Slides credited from Richard Socher - NTUyvchen/f106-adl/doc/171002+171005... · Split derivatives at each node Add error messages from parent + node itself. 1) Sum derivatives of](https://reader031.fdocuments.net/reader031/viewer/2022030405/5a7d88fa7f8b9a66798dafb7/html5/thumbnails/1.jpg)
Slides credited from Richard Socher
![Page 2: Slides credited from Richard Socher - NTUyvchen/f106-adl/doc/171002+171005... · Split derivatives at each node Add error messages from parent + node itself. 1) Sum derivatives of](https://reader031.fdocuments.net/reader031/viewer/2022030405/5a7d88fa7f8b9a66798dafb7/html5/thumbnails/2.jpg)
Sequence ModelingIdea: aggregate the meaning from all words into a vector
Compositionality
Method:◦ Basic combination: average, sum
◦ Neural combination: ✓Recursive neural network (RvNN)
✓Recurrent neural network (RNN)
✓Convolutional neural network (CNN)
2
How to compute
這(this)
規格(specification)
有(have)
誠意(sincerity)
N-dim
![Page 3: Slides credited from Richard Socher - NTUyvchen/f106-adl/doc/171002+171005... · Split derivatives at each node Add error messages from parent + node itself. 1) Sum derivatives of](https://reader031.fdocuments.net/reader031/viewer/2022030405/5a7d88fa7f8b9a66798dafb7/html5/thumbnails/3.jpg)
Recursive Neural NetworkFrom Words to Phrases
3
![Page 4: Slides credited from Richard Socher - NTUyvchen/f106-adl/doc/171002+171005... · Split derivatives at each node Add error messages from parent + node itself. 1) Sum derivatives of](https://reader031.fdocuments.net/reader031/viewer/2022030405/5a7d88fa7f8b9a66798dafb7/html5/thumbnails/4.jpg)
Recursive Neural NetworkIdea: leverage the linguistic knowledge (syntax) for combining multiple words into phrases
Assumption: language is described recursively
4
![Page 5: Slides credited from Richard Socher - NTUyvchen/f106-adl/doc/171002+171005... · Split derivatives at each node Add error messages from parent + node itself. 1) Sum derivatives of](https://reader031.fdocuments.net/reader031/viewer/2022030405/5a7d88fa7f8b9a66798dafb7/html5/thumbnails/5.jpg)
Related Work for RvNNPollack (1990): Recursive auto-associative memories
Previous Recursive Neural Networks work by Goller & Küchler(1996), Costa et al. (2003) assumed fixed tree structure and used one-hot vectors.
Hinton (1990) and Bottou (2011): Related ideas about recursive models and recursive operators as smooth versions of logic operations
5
![Page 6: Slides credited from Richard Socher - NTUyvchen/f106-adl/doc/171002+171005... · Split derivatives at each node Add error messages from parent + node itself. 1) Sum derivatives of](https://reader031.fdocuments.net/reader031/viewer/2022030405/5a7d88fa7f8b9a66798dafb7/html5/thumbnails/6.jpg)
OutlineProperty◦ Syntactic Compositionality◦ Recursion Assumption
Network Architecture and Definition◦ Standard Recursive Neural Network
◦ Weight-Tied◦ Weight-Untied
◦ Matrix-Vector Recursive Neural Network◦ Recursive Neural Tensor Network
Applications◦ Parsing◦ Paraphrase Detection◦ Sentiment Analysis
6
![Page 7: Slides credited from Richard Socher - NTUyvchen/f106-adl/doc/171002+171005... · Split derivatives at each node Add error messages from parent + node itself. 1) Sum derivatives of](https://reader031.fdocuments.net/reader031/viewer/2022030405/5a7d88fa7f8b9a66798dafb7/html5/thumbnails/7.jpg)
OutlineProperty◦ Syntactic Compositionality◦ Recursion Assumption
Network Architecture and Definition◦ Standard Recursive Neural Network
◦ Weight-Tied◦ Weight-Untied
◦ Matrix-Vector Recursive Neural Network◦ Recursive Neural Tensor Network
Applications◦ Parsing◦ Paraphrase Detection◦ Sentiment Analysis
7
![Page 8: Slides credited from Richard Socher - NTUyvchen/f106-adl/doc/171002+171005... · Split derivatives at each node Add error messages from parent + node itself. 1) Sum derivatives of](https://reader031.fdocuments.net/reader031/viewer/2022030405/5a7d88fa7f8b9a66798dafb7/html5/thumbnails/8.jpg)
Phrase MappingPrinciple of “Compositionality”◦ The meaning (vector) of a sentence is determined by
1) the meanings of its words and
2) the rules that combine them
8
Idea: jointly learn parse trees and compositional vector representations
the country of my birth
0.40.3
2.13.3
77
44.5
2.33.6
13.5
2.53.8
5.56.1
15
the country of my birth
the place where I was born
![Page 9: Slides credited from Richard Socher - NTUyvchen/f106-adl/doc/171002+171005... · Split derivatives at each node Add error messages from parent + node itself. 1) Sum derivatives of](https://reader031.fdocuments.net/reader031/viewer/2022030405/5a7d88fa7f8b9a66798dafb7/html5/thumbnails/9.jpg)
Sentence Syntactic ParsingParsing is a process of analyzing a string of symbols
Parsing tree conveys1) Part-of-speech for each word
2) Phrases
3) Relationships
9
(NN = noun, VB = verb, DT = determiner, IN = Preposition)
The cat sat on the mat.
DT NN VB IN DT NN
NP
PP
VP
NP
S
![Page 10: Slides credited from Richard Socher - NTUyvchen/f106-adl/doc/171002+171005... · Split derivatives at each node Add error messages from parent + node itself. 1) Sum derivatives of](https://reader031.fdocuments.net/reader031/viewer/2022030405/5a7d88fa7f8b9a66798dafb7/html5/thumbnails/10.jpg)
Sentence Syntactic ParsingParsing is a process of analyzing a string of symbols
Parsing tree conveys1) Part-of-speech for each word
2) Phrases
3) Relationships
10
The cat sat on the mat.
DT NN VB
(NN = noun, VB = verb, DT = determiner, IN = Preposition)
IN DT NN
NP
PP
VP
NP
S
![Page 11: Slides credited from Richard Socher - NTUyvchen/f106-adl/doc/171002+171005... · Split derivatives at each node Add error messages from parent + node itself. 1) Sum derivatives of](https://reader031.fdocuments.net/reader031/viewer/2022030405/5a7d88fa7f8b9a66798dafb7/html5/thumbnails/11.jpg)
Sentence Syntactic ParsingParsing is a process of analyzing a string of symbols
Parsing tree conveys1) Part-of-speech for each word
2) Phrases• Noun phrase (NP): “the cat”, “the mat”
• Preposition phrase (PP): “on the mat”
• Verb phrase (VP): “sat on the mat”
• Sentence: “the cat sat on the mat”
3) Relationships
11
The cat sat on the mat.
DT NN VB
(NN = noun, VB = verb, DT = determiner, IN = Preposition)
IN DT NN
NP
PP
VP
NP
S
![Page 12: Slides credited from Richard Socher - NTUyvchen/f106-adl/doc/171002+171005... · Split derivatives at each node Add error messages from parent + node itself. 1) Sum derivatives of](https://reader031.fdocuments.net/reader031/viewer/2022030405/5a7d88fa7f8b9a66798dafb7/html5/thumbnails/12.jpg)
Sentence Syntactic ParsingParsing is a process of analyzing a string of symbols
Parsing tree conveys1) Part-of-speech for each word
2) Phrases
3) Relationships
12
The cat sat on the mat.
DT NN VB
(NN = noun, VB = verb, DT = determiner, IN = Preposition)
IN DT NN
NP
PP
VP
NP
S
subject verb modifier_of_place• “the cat” is the subject of “sat”• “on the mat” is the place
modifier of “sat”
![Page 13: Slides credited from Richard Socher - NTUyvchen/f106-adl/doc/171002+171005... · Split derivatives at each node Add error messages from parent + node itself. 1) Sum derivatives of](https://reader031.fdocuments.net/reader031/viewer/2022030405/5a7d88fa7f8b9a66798dafb7/html5/thumbnails/13.jpg)
Learning Structure & RepresentationVector representations incorporate the meaning of wordsand their compositional structures
13
The cat sat on the mat.
NP
PP
VP
NP
S
![Page 14: Slides credited from Richard Socher - NTUyvchen/f106-adl/doc/171002+171005... · Split derivatives at each node Add error messages from parent + node itself. 1) Sum derivatives of](https://reader031.fdocuments.net/reader031/viewer/2022030405/5a7d88fa7f8b9a66798dafb7/html5/thumbnails/14.jpg)
OutlineProperty◦ Syntactic Compositionality◦ Recursion Assumption
Network Architecture and Definition◦ Standard Recursive Neural Network
◦ Weight-Tied◦ Weight-Untied
◦ Matrix-Vector Recursive Neural Network◦ Recursive Neural Tensor Network
Applications◦ Parsing◦ Paraphrase Detection◦ Sentiment Analysis
14
![Page 15: Slides credited from Richard Socher - NTUyvchen/f106-adl/doc/171002+171005... · Split derivatives at each node Add error messages from parent + node itself. 1) Sum derivatives of](https://reader031.fdocuments.net/reader031/viewer/2022030405/5a7d88fa7f8b9a66798dafb7/html5/thumbnails/15.jpg)
Recursion AssumptionAre languages recursive?
Recursion helps describe natural language◦ Ex. “the church which has nice windows”, a noun phrase containing a
relative clause that contains a noun phrases
◦ NP NP PP
15
debatable
![Page 16: Slides credited from Richard Socher - NTUyvchen/f106-adl/doc/171002+171005... · Split derivatives at each node Add error messages from parent + node itself. 1) Sum derivatives of](https://reader031.fdocuments.net/reader031/viewer/2022030405/5a7d88fa7f8b9a66798dafb7/html5/thumbnails/16.jpg)
Recursion AssumptionCharacteristics of recursion1. Helpful in disambiguation
2. Helpful for some tasks to refer to specific phrases:◦ John and Jane went to a big festival. They enjoyed the trip and the music there.
◦ “they”: John and Jane; “the trip”: went to a big festival; “there”: big festival
3. Works better for some tasks to use grammatical tree structure
16
Language recursion is still up to debate
![Page 17: Slides credited from Richard Socher - NTUyvchen/f106-adl/doc/171002+171005... · Split derivatives at each node Add error messages from parent + node itself. 1) Sum derivatives of](https://reader031.fdocuments.net/reader031/viewer/2022030405/5a7d88fa7f8b9a66798dafb7/html5/thumbnails/17.jpg)
OutlineProperty◦ Syntactic Compositionality◦ Recursion Assumption
Network Architecture and Definition◦ Standard Recursive Neural Network
◦ Weight-Tied◦ Weight-Untied
◦ Matrix-Vector Recursive Neural Network◦ Recursive Neural Tensor Network
Applications◦ Parsing◦ Paraphrase Detection◦ Sentiment Analysis
17
![Page 18: Slides credited from Richard Socher - NTUyvchen/f106-adl/doc/171002+171005... · Split derivatives at each node Add error messages from parent + node itself. 1) Sum derivatives of](https://reader031.fdocuments.net/reader031/viewer/2022030405/5a7d88fa7f8b9a66798dafb7/html5/thumbnails/18.jpg)
Recursive Neural Network ArchitectureA network is to predict the vectors along with the structure◦ Input: two candidate children’s vector representations
◦ Output:
1) vector representations for the merged node
2) score of how plausible the new node would be
18
Neural Network
on the mat.
NP
PPscore
![Page 19: Slides credited from Richard Socher - NTUyvchen/f106-adl/doc/171002+171005... · Split derivatives at each node Add error messages from parent + node itself. 1) Sum derivatives of](https://reader031.fdocuments.net/reader031/viewer/2022030405/5a7d88fa7f8b9a66798dafb7/html5/thumbnails/19.jpg)
Recursive Neural Network Definition1) vector representations for the merged node
2) score of how plausible the new node would be
19
Neural Network
same W parameters at all nodes of the tree weight-tied
![Page 20: Slides credited from Richard Socher - NTUyvchen/f106-adl/doc/171002+171005... · Split derivatives at each node Add error messages from parent + node itself. 1) Sum derivatives of](https://reader031.fdocuments.net/reader031/viewer/2022030405/5a7d88fa7f8b9a66798dafb7/html5/thumbnails/20.jpg)
3.1 0.3 0.1 0.4 2.3
Sentence Parsing via RvNN
20
Neural Network
Neural Network
Neural Network
Neural Network
Neural Network
![Page 21: Slides credited from Richard Socher - NTUyvchen/f106-adl/doc/171002+171005... · Split derivatives at each node Add error messages from parent + node itself. 1) Sum derivatives of](https://reader031.fdocuments.net/reader031/viewer/2022030405/5a7d88fa7f8b9a66798dafb7/html5/thumbnails/21.jpg)
1.1
0.1 0.4 2.3
Sentence Parsing via RvNN
21
Neural Network
Neural Network
Neural Network
Neural Network
![Page 22: Slides credited from Richard Socher - NTUyvchen/f106-adl/doc/171002+171005... · Split derivatives at each node Add error messages from parent + node itself. 1) Sum derivatives of](https://reader031.fdocuments.net/reader031/viewer/2022030405/5a7d88fa7f8b9a66798dafb7/html5/thumbnails/22.jpg)
1.1
0.1
3.6
Sentence Parsing via RvNN
22
Neural Network
Neural Network
Neural Network
![Page 23: Slides credited from Richard Socher - NTUyvchen/f106-adl/doc/171002+171005... · Split derivatives at each node Add error messages from parent + node itself. 1) Sum derivatives of](https://reader031.fdocuments.net/reader031/viewer/2022030405/5a7d88fa7f8b9a66798dafb7/html5/thumbnails/23.jpg)
1.1
3.8
Sentence Parsing via RvNN
23
Neural Network
Neural Network
![Page 24: Slides credited from Richard Socher - NTUyvchen/f106-adl/doc/171002+171005... · Split derivatives at each node Add error messages from parent + node itself. 1) Sum derivatives of](https://reader031.fdocuments.net/reader031/viewer/2022030405/5a7d88fa7f8b9a66798dafb7/html5/thumbnails/24.jpg)
Sentence parsing score
Sentence Parsing via RvNN
24
Neural Network
Sentence vector embeddings
![Page 25: Slides credited from Richard Socher - NTUyvchen/f106-adl/doc/171002+171005... · Split derivatives at each node Add error messages from parent + node itself. 1) Sum derivatives of](https://reader031.fdocuments.net/reader031/viewer/2022030405/5a7d88fa7f8b9a66798dafb7/html5/thumbnails/25.jpg)
Backpropagation through StructurePrincipally the same as general backpropagation (Goller& Küchler, 1996)
25
1
11
lx
la
j
l
jl
i
Backward Pass
⋮
⋮
Forward Pass
⋮
⋮
Three differences Sum derivatives of W from all nodes Split derivatives at each node Add error messages from parent + node
itself
![Page 26: Slides credited from Richard Socher - NTUyvchen/f106-adl/doc/171002+171005... · Split derivatives at each node Add error messages from parent + node itself. 1) Sum derivatives of](https://reader031.fdocuments.net/reader031/viewer/2022030405/5a7d88fa7f8b9a66798dafb7/html5/thumbnails/26.jpg)
1) Sum derivatives of W from all nodes
26
Neural Network
Neural Network
![Page 27: Slides credited from Richard Socher - NTUyvchen/f106-adl/doc/171002+171005... · Split derivatives at each node Add error messages from parent + node itself. 1) Sum derivatives of](https://reader031.fdocuments.net/reader031/viewer/2022030405/5a7d88fa7f8b9a66798dafb7/html5/thumbnails/27.jpg)
2) Split derivatives at each node During forward propagation, the parent node is computed based on two children
During backward propagation, the errors should be computed wrt each of them
27
Neural Network Neural Network
![Page 28: Slides credited from Richard Socher - NTUyvchen/f106-adl/doc/171002+171005... · Split derivatives at each node Add error messages from parent + node itself. 1) Sum derivatives of](https://reader031.fdocuments.net/reader031/viewer/2022030405/5a7d88fa7f8b9a66798dafb7/html5/thumbnails/28.jpg)
3) Add error messagesFor each node, the error message is compose of◦ Error propagated from parent
◦ Error from the current node
28
Neural Network
![Page 29: Slides credited from Richard Socher - NTUyvchen/f106-adl/doc/171002+171005... · Split derivatives at each node Add error messages from parent + node itself. 1) Sum derivatives of](https://reader031.fdocuments.net/reader031/viewer/2022030405/5a7d88fa7f8b9a66798dafb7/html5/thumbnails/29.jpg)
OutlineProperty◦ Syntactic Compositionality◦ Recursion Assumption
Network Architecture and Definition◦ Standard Recursive Neural Network
◦ Weight-Tied◦ Weight-Untied
◦ Matrix-Vector Recursive Neural Network◦ Recursive Neural Tensor Network
Applications◦ Parsing◦ Paraphrase Detection◦ Sentiment Analysis
29
![Page 30: Slides credited from Richard Socher - NTUyvchen/f106-adl/doc/171002+171005... · Split derivatives at each node Add error messages from parent + node itself. 1) Sum derivatives of](https://reader031.fdocuments.net/reader031/viewer/2022030405/5a7d88fa7f8b9a66798dafb7/html5/thumbnails/30.jpg)
Composition Matrix W
30
Neural Network
Issue: using the same network Wfor different compositions
![Page 31: Slides credited from Richard Socher - NTUyvchen/f106-adl/doc/171002+171005... · Split derivatives at each node Add error messages from parent + node itself. 1) Sum derivatives of](https://reader031.fdocuments.net/reader031/viewer/2022030405/5a7d88fa7f8b9a66798dafb7/html5/thumbnails/31.jpg)
Syntactically Untied RvNN
31
Idea: the composition function is conditioned on the syntactic categories
Neural Network
Benefit• Composition function are syntax-dependent• Allows different composition functions for word pairs,
e.g. Adv + AdjP, VP + NP
Issue: speed due to many candidates
![Page 32: Slides credited from Richard Socher - NTUyvchen/f106-adl/doc/171002+171005... · Split derivatives at each node Add error messages from parent + node itself. 1) Sum derivatives of](https://reader031.fdocuments.net/reader031/viewer/2022030405/5a7d88fa7f8b9a66798dafb7/html5/thumbnails/32.jpg)
Compositional Vector GrammarCompute score only for a subset of trees coming from a simpler, faster model (Socher et al, 2013)
◦ Prunes very unlikely candidates for speed
◦ Provides coarse syntactic categories of the children for each beam candidate
32Socher et al., “Parsing with Compositional Vector Grammars,” in ACL, 2013.
Probability context-free grammar (PCFG) helps decrease the search space
![Page 33: Slides credited from Richard Socher - NTUyvchen/f106-adl/doc/171002+171005... · Split derivatives at each node Add error messages from parent + node itself. 1) Sum derivatives of](https://reader031.fdocuments.net/reader031/viewer/2022030405/5a7d88fa7f8b9a66798dafb7/html5/thumbnails/33.jpg)
Labels for RvNNThe score can be passed through a softmax function to compute the probability of each category
33
x1
x2
x3
x4
y1
y2
y3Neural Network
softmax
NP
Socher et al., “Parsing Natural Scenes and Natural Language with Recursive Neural Networks,” in ICML, 2011.
Softmax loss cross-entropy error for optimization
![Page 34: Slides credited from Richard Socher - NTUyvchen/f106-adl/doc/171002+171005... · Split derivatives at each node Add error messages from parent + node itself. 1) Sum derivatives of](https://reader031.fdocuments.net/reader031/viewer/2022030405/5a7d88fa7f8b9a66798dafb7/html5/thumbnails/34.jpg)
OutlineProperty◦ Syntactic Compositionality◦ Recursion Assumption
Network Architecture and Definition◦ Standard Recursive Neural Network
◦ Weight-Tied◦ Weight-Untied
◦ Matrix-Vector Recursive Neural Network◦ Recursive Neural Tensor Network
Applications◦ Parsing◦ Paraphrase Detection◦ Sentiment Analysis
34
![Page 35: Slides credited from Richard Socher - NTUyvchen/f106-adl/doc/171002+171005... · Split derivatives at each node Add error messages from parent + node itself. 1) Sum derivatives of](https://reader031.fdocuments.net/reader031/viewer/2022030405/5a7d88fa7f8b9a66798dafb7/html5/thumbnails/35.jpg)
Recursive Neural Network
35
Neural Network
Issue: some words act mostly as an operator, e.g. “very” in “very good”
![Page 36: Slides credited from Richard Socher - NTUyvchen/f106-adl/doc/171002+171005... · Split derivatives at each node Add error messages from parent + node itself. 1) Sum derivatives of](https://reader031.fdocuments.net/reader031/viewer/2022030405/5a7d88fa7f8b9a66798dafb7/html5/thumbnails/36.jpg)
Matrix-Vector Recursive Neural Network
36
Neural Network
Idea: each word can additionally serve as an operator
![Page 37: Slides credited from Richard Socher - NTUyvchen/f106-adl/doc/171002+171005... · Split derivatives at each node Add error messages from parent + node itself. 1) Sum derivatives of](https://reader031.fdocuments.net/reader031/viewer/2022030405/5a7d88fa7f8b9a66798dafb7/html5/thumbnails/37.jpg)
OutlineProperty◦ Syntactic Compositionality◦ Recursion Assumption
Network Architecture and Definition◦ Standard Recursive Neural Network
◦ Weight-Tied◦ Weight-Untied
◦ Matrix-Vector Recursive Neural Network◦ Recursive Neural Tensor Network
Applications◦ Parsing◦ Paraphrase Detection◦ Sentiment Analysis
37
![Page 38: Slides credited from Richard Socher - NTUyvchen/f106-adl/doc/171002+171005... · Split derivatives at each node Add error messages from parent + node itself. 1) Sum derivatives of](https://reader031.fdocuments.net/reader031/viewer/2022030405/5a7d88fa7f8b9a66798dafb7/html5/thumbnails/38.jpg)
Recursive Neural Tensor Network
38
Idea: allow more interactions of vectors
![Page 39: Slides credited from Richard Socher - NTUyvchen/f106-adl/doc/171002+171005... · Split derivatives at each node Add error messages from parent + node itself. 1) Sum derivatives of](https://reader031.fdocuments.net/reader031/viewer/2022030405/5a7d88fa7f8b9a66798dafb7/html5/thumbnails/39.jpg)
OutlineProperty◦ Syntactic Compositionality◦ Recursion Assumption
Network Architecture and Definition◦ Standard Recursive Neural Network
◦ Weight-Tied◦ Weight-Untied
◦ Matrix-Vector Recursive Neural Network◦ Recursive Neural Tensor Network
Applications◦ Parsing◦ Paraphrase Detection◦ Sentiment Analysis
39
![Page 40: Slides credited from Richard Socher - NTUyvchen/f106-adl/doc/171002+171005... · Split derivatives at each node Add error messages from parent + node itself. 1) Sum derivatives of](https://reader031.fdocuments.net/reader031/viewer/2022030405/5a7d88fa7f8b9a66798dafb7/html5/thumbnails/40.jpg)
Language Compositionality
40
![Page 41: Slides credited from Richard Socher - NTUyvchen/f106-adl/doc/171002+171005... · Split derivatives at each node Add error messages from parent + node itself. 1) Sum derivatives of](https://reader031.fdocuments.net/reader031/viewer/2022030405/5a7d88fa7f8b9a66798dafb7/html5/thumbnails/41.jpg)
Image Compositionality
41
Idea: image can be composed by the visual segments (same as natural language parsing)
![Page 42: Slides credited from Richard Socher - NTUyvchen/f106-adl/doc/171002+171005... · Split derivatives at each node Add error messages from parent + node itself. 1) Sum derivatives of](https://reader031.fdocuments.net/reader031/viewer/2022030405/5a7d88fa7f8b9a66798dafb7/html5/thumbnails/42.jpg)
OutlineProperty◦ Syntactic Compositionality◦ Recursion Assumption
Network Architecture and Definition◦ Standard Recursive Neural Network
◦ Weight-Tied◦ Weight-Untied
◦ Matrix-Vector Recursive Neural Network◦ Recursive Neural Tensor Network
Applications◦ Parsing◦ Paraphrase Detection◦ Sentiment Analysis
42
![Page 43: Slides credited from Richard Socher - NTUyvchen/f106-adl/doc/171002+171005... · Split derivatives at each node Add error messages from parent + node itself. 1) Sum derivatives of](https://reader031.fdocuments.net/reader031/viewer/2022030405/5a7d88fa7f8b9a66798dafb7/html5/thumbnails/43.jpg)
Paraphrase for Learning Sentence Vectors
43
A pair-wise sentence comparison of nodes in parsed trees for learning sentence embeddings
![Page 44: Slides credited from Richard Socher - NTUyvchen/f106-adl/doc/171002+171005... · Split derivatives at each node Add error messages from parent + node itself. 1) Sum derivatives of](https://reader031.fdocuments.net/reader031/viewer/2022030405/5a7d88fa7f8b9a66798dafb7/html5/thumbnails/44.jpg)
OutlineProperty◦ Syntactic Compositionality◦ Recursion Assumption
Network Architecture and Definition◦ Standard Recursive Neural Network
◦ Weight-Tied◦ Weight-Untied
◦ Matrix-Vector Recursive Neural Network◦ Recursive Neural Tensor Network
Applications◦ Parsing◦ Paraphrase Detection◦ Sentiment Analysis
44
![Page 45: Slides credited from Richard Socher - NTUyvchen/f106-adl/doc/171002+171005... · Split derivatives at each node Add error messages from parent + node itself. 1) Sum derivatives of](https://reader031.fdocuments.net/reader031/viewer/2022030405/5a7d88fa7f8b9a66798dafb7/html5/thumbnails/45.jpg)
Sentiment Analysis
45
Sentiment analysis for sentences with negation words can benefit from RvNN
![Page 46: Slides credited from Richard Socher - NTUyvchen/f106-adl/doc/171002+171005... · Split derivatives at each node Add error messages from parent + node itself. 1) Sum derivatives of](https://reader031.fdocuments.net/reader031/viewer/2022030405/5a7d88fa7f8b9a66798dafb7/html5/thumbnails/46.jpg)
Sentiment AnalysisSentiment Treebank with richer annotations
46
Phrase-level sentiment labels indeed improve the performance
Socher et al., “Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank,” in EMNLP, 2013.
![Page 47: Slides credited from Richard Socher - NTUyvchen/f106-adl/doc/171002+171005... · Split derivatives at each node Add error messages from parent + node itself. 1) Sum derivatives of](https://reader031.fdocuments.net/reader031/viewer/2022030405/5a7d88fa7f8b9a66798dafb7/html5/thumbnails/47.jpg)
Sentiment Tree Illustration Stanford live demo: http://nlp.stanford.edu/sentiment/
47Socher et al., “Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank,” in EMNLP, 2013.
Phrase-level annotations learn the specific compositional functions for sentiment
![Page 48: Slides credited from Richard Socher - NTUyvchen/f106-adl/doc/171002+171005... · Split derivatives at each node Add error messages from parent + node itself. 1) Sum derivatives of](https://reader031.fdocuments.net/reader031/viewer/2022030405/5a7d88fa7f8b9a66798dafb7/html5/thumbnails/48.jpg)
Concluding RemarksRecursive Neural Network◦ Idea: syntactic compositionality
& language recursion
Network Variants◦ Standard Recursive Neural
Network◦ Weight-Tied
◦ Weight-Untied
◦ Matrix-Vector Recursive Neural Network
◦ Recursive Neural Tensor Network
48