Support Vector Machines and Kernel Methods for Robust...
Transcript of Support Vector Machines and Kernel Methods for Robust...
![Page 1: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.](https://reader034.fdocuments.net/reader034/viewer/2022042213/5eb7e6b01b01eb3af44e15cf/html5/thumbnails/1.jpg)
Support Vector Machines and Kernel Methods for Robust
Semantic NL Processing
Roberto Basili(1), Alessandro Moschitti(2)
(1) DISP, Università di Roma, Tor vergata,(2) Università Trento
![Page 2: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.](https://reader034.fdocuments.net/reader034/viewer/2022042213/5eb7e6b01b01eb3af44e15cf/html5/thumbnails/2.jpg)
Overview
• Theory and practice of Support VectorMachines
• Kernel for HLTs
– Tree Kernels
• Semantic Role Labeling
– Linear Features
– The role of Syntax
![Page 3: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.](https://reader034.fdocuments.net/reader034/viewer/2022042213/5eb7e6b01b01eb3af44e15cf/html5/thumbnails/3.jpg)
Predicate and Arguments
Predicate
Arg. 0
Arg. M
S
N
NP
D N
VP
V Paul
in
gives
a lecture
PP
IN N
Rome
Arg. 1
• The syntax-semantic mapping
• Different semantic annotations (e.g. PropBank vs. FrameNet)
![Page 4: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.](https://reader034.fdocuments.net/reader034/viewer/2022042213/5eb7e6b01b01eb3af44e15cf/html5/thumbnails/4.jpg)
Linking syntax to semantics
S
N
NP
Det N
VP
VPolice
for
arrested
the man
PP
IN N
shoplifting
Authority
Suspect Offense
Arrest
• Police arrested the man for shoplifting
![Page 5: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.](https://reader034.fdocuments.net/reader034/viewer/2022042213/5eb7e6b01b01eb3af44e15cf/html5/thumbnails/5.jpg)
Semantics in NLP: Resources
• Lexicalized Models
– Propbank
– NomBank
• Framenet
– Inspired by frame semantics
– Frames are lexicalized prototoypes for real -world situations
– Participants are called frame elements (roles)
![Page 6: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.](https://reader034.fdocuments.net/reader034/viewer/2022042213/5eb7e6b01b01eb3af44e15cf/html5/thumbnails/6.jpg)
Generative vs. Discriminative Learning in NLP
• Generative models (e.g. HMMs) require– The design of a model of visible and hidden variables
– The definition of laws of association between hidden and visible variables
– Robust estimation methods from the available samples
• Limitations:– Lack of precise generative models for language
phenomena
– Data sparseness: most of the language phenomena are simply too rare for robust estimation even in largesamples
![Page 7: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.](https://reader034.fdocuments.net/reader034/viewer/2022042213/5eb7e6b01b01eb3af44e15cf/html5/thumbnails/7.jpg)
Generative vs. Discriminative Learning
• Discriminative models are not tight to any model (i.e. specific association among the problem variables).
• They learn to discriminate negative from positive evidence without building an explicit model of the target property
• They derive useful evidence from training data only through observed individual features by optimizing some function of the recognition task (e.g. error)
• Examples of discriminative models are the perceptrons (i.e. linear classifiers)
![Page 8: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.](https://reader034.fdocuments.net/reader034/viewer/2022042213/5eb7e6b01b01eb3af44e15cf/html5/thumbnails/8.jpg)
An hyperplane has equation :
is the vector of the instance to be classified
is the hyperplane gradient
Classification function:
Linear Classifiers (1)
bwxbwxxf n ,, ,)(
x
w
( ) sign( ( ))h x f x
![Page 9: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.](https://reader034.fdocuments.net/reader034/viewer/2022042213/5eb7e6b01b01eb3af44e15cf/html5/thumbnails/9.jpg)
Linear Classifiers (2)
• Computationally simple.
• Basic idea: select an hypothesis that makes no mistake over training-set.
• The separating function is equivalent to a neural net with just one neuron (perceptron)
![Page 10: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.](https://reader034.fdocuments.net/reader034/viewer/2022042213/5eb7e6b01b01eb3af44e15cf/html5/thumbnails/10.jpg)
A neuron
![Page 11: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.](https://reader034.fdocuments.net/reader034/viewer/2022042213/5eb7e6b01b01eb3af44e15cf/html5/thumbnails/11.jpg)
Perceptron
bxwx i
ni
i
..1
sgn)(
![Page 12: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.](https://reader034.fdocuments.net/reader034/viewer/2022042213/5eb7e6b01b01eb3af44e15cf/html5/thumbnails/12.jpg)
Geometric Margin
![Page 13: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.](https://reader034.fdocuments.net/reader034/viewer/2022042213/5eb7e6b01b01eb3af44e15cf/html5/thumbnails/13.jpg)
Geometrical margin Training set margin
x
x
i
x ix
x
x
x
x
x
xj
o j
ooo
o
o
o
oo o
o
Geometric margin vs. data points in the training set
![Page 14: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.](https://reader034.fdocuments.net/reader034/viewer/2022042213/5eb7e6b01b01eb3af44e15cf/html5/thumbnails/14.jpg)
Maximal margin vs other margins
![Page 15: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.](https://reader034.fdocuments.net/reader034/viewer/2022042213/5eb7e6b01b01eb3af44e15cf/html5/thumbnails/15.jpg)
Perceptron: on-line algorithm
),(k,return
found iserror no until
endfor
endif
1kk
then0)( if
to1 ifor
Repeat
| || |max;0;0;0
2
1
1
100
kk
ikk
iikk
kiki
ili
bw
Rybb
xyww
bxwy
xRkbw
Classification Error
adjustments
![Page 16: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.](https://reader034.fdocuments.net/reader034/viewer/2022042213/5eb7e6b01b01eb3af44e15cf/html5/thumbnails/16.jpg)
Perceptron: the hyperplane coefficents
![Page 17: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.](https://reader034.fdocuments.net/reader034/viewer/2022042213/5eb7e6b01b01eb3af44e15cf/html5/thumbnails/17.jpg)
The on-line perceptron algorithm: mistakes and adjustments (1)
![Page 18: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.](https://reader034.fdocuments.net/reader034/viewer/2022042213/5eb7e6b01b01eb3af44e15cf/html5/thumbnails/18.jpg)
The on-line perceptron algorithm: mistakes and adjustments (2)
![Page 19: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.](https://reader034.fdocuments.net/reader034/viewer/2022042213/5eb7e6b01b01eb3af44e15cf/html5/thumbnails/19.jpg)
The decision function of linear classifiers can be written as follows:
as well the adjustment function
The learning rate impacts only in the re-scaling of the hyperplanes, and does not influence the algorithm ( )
Training data only appear in the scalar products!!
1.
Duality
)sgn(
)sgn()sgn()(
..1
..1
bxxy
bxxybxwxh
jj
i
j
jj
j
j
iiijj
j
jibxxyy then 0)( if
..1
![Page 20: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.](https://reader034.fdocuments.net/reader034/viewer/2022042213/5eb7e6b01b01eb3af44e15cf/html5/thumbnails/20.jpg)
Which hyperplane?
![Page 21: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.](https://reader034.fdocuments.net/reader034/viewer/2022042213/5eb7e6b01b01eb3af44e15cf/html5/thumbnails/21.jpg)
Maximum Margin Hyperplanes
Var1
Var2
Margin
Margin
IDEA: Select the
hyperplane that
maximizes the margin
![Page 22: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.](https://reader034.fdocuments.net/reader034/viewer/2022042213/5eb7e6b01b01eb3af44e15cf/html5/thumbnails/22.jpg)
Support Vectors
Var1
Var2
Margin
Support vectors
![Page 23: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.](https://reader034.fdocuments.net/reader034/viewer/2022042213/5eb7e6b01b01eb3af44e15cf/html5/thumbnails/23.jpg)
How to get the maximum margin?
Var1
Var2kbxw
kbxw
0 bxw
kk
w
The geometric margin
is:2 k
w
Optimization problem
ex. negativ a is se ,
ex. positive a is if ,
||||
2
xkbxw
xkbxw
w
kMAX
![Page 24: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.](https://reader034.fdocuments.net/reader034/viewer/2022042213/5eb7e6b01b01eb3af44e15cf/html5/thumbnails/24.jpg)
The optimization problem
• The optimal hyperplane satyisfies:
– Minimize
– Under:
• The dual problem is simpler
libxwy
ww
ii ,...,1,1))((
2
1)(
2
![Page 25: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.](https://reader034.fdocuments.net/reader034/viewer/2022042213/5eb7e6b01b01eb3af44e15cf/html5/thumbnails/25.jpg)
Dual optimization problem
![Page 26: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.](https://reader034.fdocuments.net/reader034/viewer/2022042213/5eb7e6b01b01eb3af44e15cf/html5/thumbnails/26.jpg)
Some consequences
• Lagrange constraints:
• Karush-Kuhn-Tucker constraints
• The support vector are having not null , i.e. such that
They lie on the frontier
• b is derived through the following formula
ii
l
i
ii
l
i
i xywya
11
0
libwxy iii ,...,1,0]1)([
i
1)( bwxyii
i
x
![Page 27: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.](https://reader034.fdocuments.net/reader034/viewer/2022042213/5eb7e6b01b01eb3af44e15cf/html5/thumbnails/27.jpg)
Non linearly separable training data
Var1
Var21w x b
1w x b
0 bxw
11
w
iSlack variables
are introduced
Mistakes are
allowed and
optimization function
is penalized
i
![Page 28: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.](https://reader034.fdocuments.net/reader034/viewer/2022042213/5eb7e6b01b01eb3af44e15cf/html5/thumbnails/28.jpg)
Soft Margin SVMs
Var1
Var21w x b
1w x b
0 bxw
11
w
i
New constraints:
Objective function:
C is the trade-off
between
margin and errors
i i
Cw 2||||2
1min
0
1)(
i
iiii xbxwy
![Page 29: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.](https://reader034.fdocuments.net/reader034/viewer/2022042213/5eb7e6b01b01eb3af44e15cf/html5/thumbnails/29.jpg)
Dual optimization problem
![Page 30: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.](https://reader034.fdocuments.net/reader034/viewer/2022042213/5eb7e6b01b01eb3af44e15cf/html5/thumbnails/30.jpg)
Soft Margin Support Vector Machines
• The algorithm tries to keep i to 0 and maximize the margin
• OBS: The algorithm does not minimize the number oferrors (NP-complete problem) but just minimize the sum of the distances from the hyperplane
• If C, the solution is the one with the hard-margin
• If C = 0 we get =0. Infact it is always possible tosatisfy
• When C grows the number of errors is decreased withthe error set to 0, when C (i.e. the hard-marginformulation)
i iCw 2||||
2
1min
0
1)(
i
iiii xbxwy
|||| w
iii xby
1
![Page 31: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.](https://reader034.fdocuments.net/reader034/viewer/2022042213/5eb7e6b01b01eb3af44e15cf/html5/thumbnails/31.jpg)
Robustness: Soft vs Hard Margin SVMs
i
Var1
Var2
0 bxw
i
Var1
Var20 bxw
Soft Margin SVM Hard Margin SVM
![Page 32: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.](https://reader034.fdocuments.net/reader034/viewer/2022042213/5eb7e6b01b01eb3af44e15cf/html5/thumbnails/32.jpg)
Soft vs Hard Margin SVMs
• A Soft-Margin SVM has always a solution
• A Soft-Margin SVM is more robust wrt odd training examples
– Insufficient Vocabularies
– High ambiguity of linguistic features
• An Hard-Margin SVM requires no parameter
![Page 33: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.](https://reader034.fdocuments.net/reader034/viewer/2022042213/5eb7e6b01b01eb3af44e15cf/html5/thumbnails/33.jpg)
References
• Basili, R., A. Moschitti Automatic Text Categorization: From Information Retrieval to Support Vector Learning , Aracne Editrice, Informatica, ISBN: 88-548-0292-1, 2005
• A tutorial on Support Vector Machines for Pattern Recognition (C.J.Burges ) – URL: http://www.umiacs.umd.edu/~joseph/support-vector-machines4.pdf
• The Vapnik-Chervonenkis Dimension and the Learning Capability of Neural Nets (E.D: Sontag)
– URL: http://www.math.rutgers.edu/~sontag/FTP_DIR/vc-expo.pdf
• Computational Learning Theory(Sally A Goldman Washington University St. Louis Missouri)– http://www.learningtheory.org/articles/COLTSurveyArticle.ps
• AN INTRODUCTION TO SUPPORT VECTOR MACHINES (and other kernel-based learning methods), N. Cristianini and J. Shawe-Taylor Cambridge University Press.
• The Nature of Statistical Learning Theory, V. N. Vapnik - Springer Verlag (December, 1999)
![Page 34: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.](https://reader034.fdocuments.net/reader034/viewer/2022042213/5eb7e6b01b01eb3af44e15cf/html5/thumbnails/34.jpg)
![Page 35: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.](https://reader034.fdocuments.net/reader034/viewer/2022042213/5eb7e6b01b01eb3af44e15cf/html5/thumbnails/35.jpg)
Semantic Role Labeling @ UTV
• An important application of SVM is Semantic Role labeling wrt Propbank or Framenet
• In the UTV system, a cascade of classification steps is applied:
– Predicate detection
– Boundary recognition
– Argument categorization (Local models)
– Reranking (Joint models)
• Input: a sentence and its parse trees
![Page 36: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.](https://reader034.fdocuments.net/reader034/viewer/2022042213/5eb7e6b01b01eb3af44e15cf/html5/thumbnails/36.jpg)
Linking syntax to semantics
S
N
NP
Det N
VP
VPolice
for
arrested
the man
PP
IN N
shoplifting
Authority
Suspect Offense
Arrest
• Police arrested the man for shoplifting
![Page 37: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.](https://reader034.fdocuments.net/reader034/viewer/2022042213/5eb7e6b01b01eb3af44e15cf/html5/thumbnails/37.jpg)
Motivations
• Modeling syntax in Natural Language learning task is complex, e.g.– Semantic role relations within predicate argument structures
and
– Question Classification
• Tree kernels are natural way to exploit syntactic information from sentence parse trees– useful to engineer novel and complex features.
• How do different tree kernels impact on different parsing paradigms and different tasks?
• Are they efficient in practical applications?
![Page 38: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.](https://reader034.fdocuments.net/reader034/viewer/2022042213/5eb7e6b01b01eb3af44e15cf/html5/thumbnails/38.jpg)
Tree kernels: Outline
• Tree kernel types– Subset (SST) Tree kernel
– The Partial Tree kernel
![Page 39: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.](https://reader034.fdocuments.net/reader034/viewer/2022042213/5eb7e6b01b01eb3af44e15cf/html5/thumbnails/39.jpg)
The Collins and Duffy’s Tree Kernel (called SST in [Vishwanathan and Smola, 2002] )
NP
D N
VP
V
gives
a talk
NP
D N
VP
V
gives
a
NP
D N
VP
V
gives
NP
D N
VP
V NP
VP
V
![Page 40: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.](https://reader034.fdocuments.net/reader034/viewer/2022042213/5eb7e6b01b01eb3af44e15cf/html5/thumbnails/40.jpg)
The overall fragment set
NP
D N
a talk
NP
D N
NP
D N
a
D N
a talk
NP
D N NP
D N
VP
V
gives
a talk
V
gives
NP
D N
VP
V
a talk
NP
D N
VP
V
NP
D N
VP
V
a
NP
D
VP
V
talk
N
a
NP
D N
VP
V
gives
talk
NP
D N
VP
V
gives NP
D N
VP
V
gives
NP
VP
V NP
VP
V
gives
talk
![Page 41: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.](https://reader034.fdocuments.net/reader034/viewer/2022042213/5eb7e6b01b01eb3af44e15cf/html5/thumbnails/41.jpg)
Explicit feature space
21 xx
..,0)..,0,..,1, .,1,.,1,..,0,. ..,0,..,0,..,1, ..,1,..,1,..,0, 0,(x
• counts the number of common substructures
NP
D N
a talk
NP
D N
a
NP
D N NP
D N
VP
V
gives
a talk
NP
D N
VP
V
a talk
NP
D N
VP
V
talk
![Page 42: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.](https://reader034.fdocuments.net/reader034/viewer/2022042213/5eb7e6b01b01eb3af44e15cf/html5/thumbnails/42.jpg)
Implicit Representation
2211
),(
),()()(
21
212121
TnTn
nn
TTKTTxx
![Page 43: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.](https://reader034.fdocuments.net/reader034/viewer/2022042213/5eb7e6b01b01eb3af44e15cf/html5/thumbnails/43.jpg)
Implicit Representation
• [Collins and Duffy, ACL 2002] evaluate in O(n2):
)(
1
2121
21
21
1
))),(),,((1(),(
,1),(
,0),(
nnc
j
jnchjnchnn
nn
nn
else terminals-pre if
elsedifferent are sproduction the if
2211
),(
),()()(
21
212121
TnTn
nn
TTKTTxx
![Page 44: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.](https://reader034.fdocuments.net/reader034/viewer/2022042213/5eb7e6b01b01eb3af44e15cf/html5/thumbnails/44.jpg)
Weighting
• Normalization
)(
1
2121
21
1
))),(),,((1(),(
,),(nnc
j
jnchjnchnn
nn
else terminals-pre if
),(),(
),(),(
2211
2121
TTKTTK
TTKTTK
Decay factor
![Page 45: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.](https://reader034.fdocuments.net/reader034/viewer/2022042213/5eb7e6b01b01eb3af44e15cf/html5/thumbnails/45.jpg)
Partial Tree Kernel
• By adding two decay factors we obtain:
![Page 46: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.](https://reader034.fdocuments.net/reader034/viewer/2022042213/5eb7e6b01b01eb3af44e15cf/html5/thumbnails/46.jpg)
SRL Demo
• Kernel-based system for SRL over rawtexts
• … based on the Charniak parser
• Adopts the Propbank standard but hasalso been applied to Framenet
![Page 47: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.](https://reader034.fdocuments.net/reader034/viewer/2022042213/5eb7e6b01b01eb3af44e15cf/html5/thumbnails/47.jpg)
![Page 48: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.](https://reader034.fdocuments.net/reader034/viewer/2022042213/5eb7e6b01b01eb3af44e15cf/html5/thumbnails/48.jpg)
![Page 49: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.](https://reader034.fdocuments.net/reader034/viewer/2022042213/5eb7e6b01b01eb3af44e15cf/html5/thumbnails/49.jpg)
![Page 50: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.](https://reader034.fdocuments.net/reader034/viewer/2022042213/5eb7e6b01b01eb3af44e15cf/html5/thumbnails/50.jpg)
![Page 51: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.](https://reader034.fdocuments.net/reader034/viewer/2022042213/5eb7e6b01b01eb3af44e15cf/html5/thumbnails/51.jpg)
![Page 52: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.](https://reader034.fdocuments.net/reader034/viewer/2022042213/5eb7e6b01b01eb3af44e15cf/html5/thumbnails/52.jpg)
![Page 53: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.](https://reader034.fdocuments.net/reader034/viewer/2022042213/5eb7e6b01b01eb3af44e15cf/html5/thumbnails/53.jpg)
Automatic Predicate Argument Extraction
• Boundary Detection
– One binary classifier
• Argument Type Classification
– Multi-classification problem
– n binary classifiers (ONE-vs-ALL)
– Select the argument with maximum score
![Page 54: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.](https://reader034.fdocuments.net/reader034/viewer/2022042213/5eb7e6b01b01eb3af44e15cf/html5/thumbnails/54.jpg)
Typical standard flat features (Gildea & Jurasfky, 2002)
• Phrase Type of the argument
• Parse Tree Path, between the predicate and the argument
• Head word
• Predicate Word
• Position
• Voice
![Page 55: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.](https://reader034.fdocuments.net/reader034/viewer/2022042213/5eb7e6b01b01eb3af44e15cf/html5/thumbnails/55.jpg)
Features for the linear kernel in SRL
![Page 56: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.](https://reader034.fdocuments.net/reader034/viewer/2022042213/5eb7e6b01b01eb3af44e15cf/html5/thumbnails/56.jpg)
An example
Predicate
S
N
NP
D N
VP
V Paul
in
delivers
a talk
PP
IN N
Rome
Arg. 1
Phrase
TypePredicate
Word
Head Word
Parse Tree
Path
Voice
Active
Position
Right
![Page 57: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.](https://reader034.fdocuments.net/reader034/viewer/2022042213/5eb7e6b01b01eb3af44e15cf/html5/thumbnails/57.jpg)
Flat features (Linear Kernel)
• To each example is associated a vector of 6 feature types
• The dot product counts the number of features in common
zx
V P PW HW PTP PT
1) ..,1,..,1,..,0, ..,0,..,1,..,0, ..,0,..,1,..,0, ..,0,..,1,..,0, 0,(x
![Page 58: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.](https://reader034.fdocuments.net/reader034/viewer/2022042213/5eb7e6b01b01eb3af44e15cf/html5/thumbnails/58.jpg)
Automatic Predicate Argument Extraction
Deriving Positive/Negative example
Given a sentence, a predicate p:
1. Derive the sentence parse tree
2. For each node pair <Np,Nx>
a. Extract a feature representation set F
b. If Nx exactly covers the Arg-i, F is one of its
positive examples
c. F is a negative example otherwise
![Page 59: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.](https://reader034.fdocuments.net/reader034/viewer/2022042213/5eb7e6b01b01eb3af44e15cf/html5/thumbnails/59.jpg)
Argument Classification Accuracy
0,75
0,78
0,80
0,83
0,85
0,88
0 10 20 30 40 50 60 70 80 90 100
Ac
cu
rac
y
---
% Training Data
ST SST
Linear PT
![Page 60: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.](https://reader034.fdocuments.net/reader034/viewer/2022042213/5eb7e6b01b01eb3af44e15cf/html5/thumbnails/60.jpg)
SRL in Framenet: Results
![Page 61: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.](https://reader034.fdocuments.net/reader034/viewer/2022042213/5eb7e6b01b01eb3af44e15cf/html5/thumbnails/61.jpg)
Framenet SRL: best results
• Best system [Erk&Pado, 2006]
– 0.855 Precision, 0.669 Recall
– 0.751 F1
• Trento (+RTV) system (Coppola, PhD2009)
![Page 62: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.](https://reader034.fdocuments.net/reader034/viewer/2022042213/5eb7e6b01b01eb3af44e15cf/html5/thumbnails/62.jpg)
Conclusions
• Kernel –based learning is very useful in NLP as for the
possibility of embedding similarity measures for highly
structured data
– Sequence
– Trees
• Tree kernels are a natural way to introduce syntactic information
in natural language learning.
– Very useful when few knowledge is available about the
proposed problem.
– Alleviate manual feature engineering (predicate knowledge)
• Different forms of syntactic information require different tree
kernels.
– Collins and Duffy’s kernel (SST) useful for constituent parsing
– The new Partial Tree kernel useful for dependency parsing
![Page 63: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.](https://reader034.fdocuments.net/reader034/viewer/2022042213/5eb7e6b01b01eb3af44e15cf/html5/thumbnails/63.jpg)
Conclusions (2)
• Experiments on SRL and QC show that– PT and SST are efficient and very fast
– Higher accuracy when the proper kernel is used for the target task
• Open research issue are– Proper kernel design issues for the different
tasks
– Combination of syntagmatic kernels with semantic ones
• An example is the Wordnet-based kernel in (Basili et al CoNLL 05))
![Page 64: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.](https://reader034.fdocuments.net/reader034/viewer/2022042213/5eb7e6b01b01eb3af44e15cf/html5/thumbnails/64.jpg)
Tree-kernel: References
• Available over the Web:
– A. Moschitti, A study on Convolution Kernels for Shallow Semantic
Parsing. In proceedings of the 42-th Conference on Associationfor Computational Linguistic (ACL-2004), Barcelona, Spain,2004.
– A. Moschitti, Efficient Convolution Kernels for Dependency and
Constituent Syntactic Trees. In Proceedings of the 17th EuropeanConference on Machine Learning, Berlin, Germany, 2006.
– M. Collins and N. Duffy, 2002, New ranking algorithms for parsing
and tagging: Kernels over discrete structures, and the voted perceptron.In ACL02, 2002.
– S.V.N. Vishwanathan and A.J. Smola. Fast kernels on strings and
trees. In Proceedings of Neural Information ProcessingSystems, 2002.