Slide credit from Richard Socheryvchen/f105-adl/doc/161020_RecursiveNN.pdfRecursive neural network...
Transcript of Slide credit from Richard Socheryvchen/f105-adl/doc/161020_RecursiveNN.pdfRecursive neural network...
Slide credit from Richard Socher 1
Sequence ModelingIdea: aggregate the meaning from all words into a vector
Compositionality
Method:◦ Basic combination: average, sum
◦ Neural combination: Recursive neural network (RvNN)
Recurrent neural network (RNN)
Convolutional neural network (CNN)
2
How to compute
這(this)
規格(specification)
有(have)
誠意(sincerity)
N-dim
Recursive Neural NetworkFrom Words to Phrases
3
Recursive Neural NetworkIdea: leverage the linguistic knowledge (syntax) for combining multiple words into phrases
Assumption: language is described recursively
4
Related Work for RvNNPollack (1990): Recursive auto-associative memories
Previous Recursive Neural Networks work by Goller & Küchler(1996), Costa et al. (2003) assumed fixed tree structure and used one-hot vectors.
Hinton (1990) and Bottou (2011): Related ideas about recursive models and recursive operators as smooth versions of logic operations
5
OutlineProperty◦ Syntactic Compositionality◦ Recursion Assumption
Network Architecture and Definition◦ Standard Recursive Neural Network
◦ Weight-Tied◦ Weight-Untied
◦ Matrix-Vector Recursive Neural Network◦ Recursive Neural Tensor Network
Applications◦ Parsing◦ Paraphrase Detection◦ Sentiment Analysis
6
OutlineProperty◦ Syntactic Compositionality◦ Recursion Assumption
Network Architecture and Definition◦ Standard Recursive Neural Network
◦ Weight-Tied◦ Weight-Untied
◦ Matrix-Vector Recursive Neural Network◦ Recursive Neural Tensor Network
Applications◦ Parsing◦ Paraphrase Detection◦ Sentiment Analysis
7
Phrase MappingPrinciple of “Compositionality”◦ The meaning (vector) of a sentence is determined by
1) the meanings of its words and
2) the rules that combine them
8
Idea: jointly learn parse trees and compositional vector representations
the country of my birth
0.40.3
2.13.3
77
44.5
2.33.6
13.5
2.53.8
5.56.1
15
the country of my birth
the place where I was born
Sentence Syntactic ParsingParsing is a process of analyzing a string of symbols
Parsing tree conveys1) Part-of-speech for each word
2) Phrases
3) Relationships
9
(NN = noun, VB = verb, DT = determiner, IN = Preposition)
The cat sat on the mat.
DT NN VB IN DT NN
NP
PP
VP
NP
S
Sentence Syntactic ParsingParsing is a process of analyzing a string of symbols
Parsing tree conveys1) Part-of-speech for each word
2) Phrases
3) Relationships
10
The cat sat on the mat.
DT NN VB
(NN = noun, VB = verb, DT = determiner, IN = Preposition)
IN DT NN
NP
PP
VP
NP
S
Sentence Syntactic ParsingParsing is a process of analyzing a string of symbols
Parsing tree conveys1) Part-of-speech for each word
2) Phrases• Noun phrase (NP): “the cat”, “the mat”
• Preposition phrase (PP): “on the mat”
• Verb phrase (VP): “sat on the mat”
• Sentence: “the cat sat on the mat”
3) Relationships
11
The cat sat on the mat.
DT NN VB
(NN = noun, VB = verb, DT = determiner, IN = Preposition)
IN DT NN
NP
PP
VP
NP
S
Sentence Syntactic ParsingParsing is a process of analyzing a string of symbols
Parsing tree conveys1) Part-of-speech for each word
2) Phrases
3) Relationships
12
The cat sat on the mat.
DT NN VB
(NN = noun, VB = verb, DT = determiner, IN = Preposition)
IN DT NN
NP
PP
VP
NP
S
subject verb modifier_of_place• “the cat” is the subject of “sat”• “on the mat” is the place
modifier of “sat”
Learning Structure & RepresentationVector representations incorporate the meaning of wordsand their compositional structures
13
The cat sat on the mat.
NP
PP
VP
NP
S
OutlineProperty◦ Syntactic Compositionality◦ Recursion Assumption
Network Architecture and Definition◦ Standard Recursive Neural Network
◦ Weight-Tied◦ Weight-Untied
◦ Matrix-Vector Recursive Neural Network◦ Recursive Neural Tensor Network
Applications◦ Parsing◦ Paraphrase Detection◦ Sentiment Analysis
14
Recursion AssumptionAre languages recursive?
Recursion helps describe natural language◦ Ex. “the church which has nice windows”, a noun phrase containing a
relative clause that contains a noun phrases
◦ NP NP PP
15
debatable
Recursion AssumptionCharacteristics of recursion1. Helpful in disambiguation
2. Helpful for some tasks to refer to specific phrases:◦ John and Jane went to a big festival. They enjoyed the trip and the music there.
◦ “they”: John and Jane; “the trip”: went to a big festival; “there”: big festival
3. Works better for some tasks to use grammatical tree structure
16
Language recursion is still up to debate
OutlineProperty◦ Syntactic Compositionality◦ Recursion Assumption
Network Architecture and Definition◦ Standard Recursive Neural Network
◦ Weight-Tied◦ Weight-Untied
◦ Matrix-Vector Recursive Neural Network◦ Recursive Neural Tensor Network
Applications◦ Parsing◦ Paraphrase Detection◦ Sentiment Analysis
17
Recursive Neural Network ArchitectureA network is to predict the vectors along with the structure◦ Input: two candidate children’s vector representations
◦ Output:
1) vector representations for the merged node
2) score of how plausible the new node would be
18
Neural Network
on the mat.
NP
PPscore
Recursive Neural Network Definition1) vector representations for the merged node
2) score of how plausible the new node would be
19
Neural Network
same W parameters at all nodes of the tree weight-tied
3.1 0.3 0.1 0.4 2.3
Sentence Parsing via RvNN
20
Neural Network
Neural Network
Neural Network
Neural Network
Neural Network
1.1
0.1 0.4 2.3
Sentence Parsing via RvNN
21
Neural Network
Neural Network
Neural Network
Neural Network
1.1
0.1
3.6
Sentence Parsing via RvNN
22
Neural Network
Neural Network
Neural Network
1.1
3.8
Sentence Parsing via RvNN
23
Neural Network
Neural Network
Sentence parsing score
Sentence Parsing via RvNN
24
Neural Network
Sentence vector embeddings
Backpropagation through StructurePrincipally the same as general backpropagation (Goller& Küchler, 1996)
25
1
11
lx
la
j
l
jl
i
Backward Pass
⋮
⋮
Forward Pass
⋮
⋮
Three differences Sum derivatives of W from all nodes Split derivatives at each node Add error messages from parent + node
itself
1) Sum derivatives of W from all nodes
26
Neural Network
Neural Network
2) Split derivatives at each node During forward propagation, the parent node is computed based on two children
During backward propagation, the errors should be computed wrt each of them
27
Neural Network Neural Network
3) Add error messagesFor each node, the error message is compose of◦ Error propagated from parent
◦ Error from the current node
28
Neural Network
OutlineProperty◦ Syntactic Compositionality◦ Recursion Assumption
Network Architecture and Definition◦ Standard Recursive Neural Network
◦ Weight-Tied◦ Weight-Untied
◦ Matrix-Vector Recursive Neural Network◦ Recursive Neural Tensor Network
Applications◦ Parsing◦ Paraphrase Detection◦ Sentiment Analysis
29
Composition Matrix W
30
Neural Network
Issue: using the same network Wfor different compositions
Syntactically Untied RvNN
31
Idea: the composition function is conditioned on the syntactic categories
Neural Network
Benefit• Composition function are syntax-dependent• Allows different composition functions for word pairs,
e.g. Adv + AdjP, VP + NP
Issue: speed due to many candidates
Compositional Vector GrammarCompute score only for a subset of trees coming from a simpler, faster model (Socher et al, 2013)
◦ Prunes very unlikely candidates for speed
◦ Provides coarse syntactic categories of the children for each beam candidate
32Socher et al., “Parsing with Compositional Vector Grammars,” in ACL, 2013.
Probability context-free grammar (PCFG) helps decrease the search space
Labels for RvNNThe score can be passed through a softmax function to compute the probability of each category
33
x1
x2
x3
x4
y1
y2
y3Neural Network
softmax
NP
Socher et al., “Parsing Natural Scenes and Natural Language with Recursive Neural Networks,” in ICML, 2011.
Softmax loss cross-entropy error for optimization
OutlineProperty◦ Syntactic Compositionality◦ Recursion Assumption
Network Architecture and Definition◦ Standard Recursive Neural Network
◦ Weight-Tied◦ Weight-Untied
◦ Matrix-Vector Recursive Neural Network◦ Recursive Neural Tensor Network
Applications◦ Parsing◦ Paraphrase Detection◦ Sentiment Analysis
34
Recursive Neural Network
35
Neural Network
Issue: some words act mostly as an operator, e.g. “very” in “very good”
Matrix-Vector Recursive Neural Network
36
Neural Network
Idea: each word can additionally serve as an operator
OutlineProperty◦ Syntactic Compositionality◦ Recursion Assumption
Network Architecture and Definition◦ Standard Recursive Neural Network
◦ Weight-Tied◦ Weight-Untied
◦ Matrix-Vector Recursive Neural Network◦ Recursive Neural Tensor Network
Applications◦ Parsing◦ Paraphrase Detection◦ Sentiment Analysis
37
Recursive Neural Tensor Network
38
Idea: allow more interactions of vectors
OutlineProperty◦ Syntactic Compositionality◦ Recursion Assumption
Network Architecture and Definition◦ Standard Recursive Neural Network
◦ Weight-Tied◦ Weight-Untied
◦ Matrix-Vector Recursive Neural Network◦ Recursive Neural Tensor Network
Applications◦ Parsing◦ Paraphrase Detection◦ Sentiment Analysis
39
Language Compositionality
40
Image Compositionality
41
Idea: image can be composed by the visual segments (same as natural language parsing)
OutlineProperty◦ Syntactic Compositionality◦ Recursion Assumption
Network Architecture and Definition◦ Standard Recursive Neural Network
◦ Weight-Tied◦ Weight-Untied
◦ Matrix-Vector Recursive Neural Network◦ Recursive Neural Tensor Network
Applications◦ Parsing◦ Paraphrase Detection◦ Sentiment Analysis
42
Paraphrase for Learning Sentence Vectors
43
A pair-wise sentence comparison of nodes in parsed trees for learning sentence embeddings
OutlineProperty◦ Syntactic Compositionality◦ Recursion Assumption
Network Architecture and Definition◦ Standard Recursive Neural Network
◦ Weight-Tied◦ Weight-Untied
◦ Matrix-Vector Recursive Neural Network◦ Recursive Neural Tensor Network
Applications◦ Parsing◦ Paraphrase Detection◦ Sentiment Analysis
44
Sentiment Analysis
45
Sentiment analysis for sentences with negation words can benefit from RvNN
Sentiment AnalysisSentiment Treebank with richer annotations
46
Phrase-level sentiment labels indeed improve the performance
Socher et al., “Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank,” in EMNLP, 2013.
Sentiment Tree Illustration Stanford live demo: http://nlp.stanford.edu/sentiment/
47Socher et al., “Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank,” in EMNLP, 2013.
Phrase-level annotations learn the specific compositional functions for sentiment
Concluding RemarksRecursive Neural Network◦ Idea: syntactic compositionality
& language recursion
Network Variants◦ Standard Recursive Neural
Network◦ Weight-Tied
◦ Weight-Untied
◦ Matrix-Vector Recursive Neural Network
◦ Recursive Neural Tensor Network
48