Label propagation - Semisupervised Learning with Applications to NLP

46
Label Propagation David Przybilla [email protected] Seminar: Semi-supervised and unsupervised learning with Applications to NLP

description

Label Propagation

Transcript of Label propagation - Semisupervised Learning with Applications to NLP

Page 1: Label propagation - Semisupervised Learning with Applications to NLP

Label Propagation

David [email protected]

Seminar:Semi-supervised and unsupervised learning with Applications to NLP

Page 2: Label propagation - Semisupervised Learning with Applications to NLP

Outline

● What is Label Propagation

● The Algorithm

● The motivation behind the algorithm

● Parameters of Label Propagation

● Relation Extraction with Label Propagation

Page 3: Label propagation - Semisupervised Learning with Applications to NLP

Label Propagation

● Semi-supervised

● Shows good results when the amount of annotated data is low with respect to the supervised options

● Similar to kNN

Page 4: Label propagation - Semisupervised Learning with Applications to NLP

K-Nearest Neighbors(KNN)

● Shares similar ideas with Label Propagation

● Label Propagation (LP) uses unlabeled instances during the process of finding out the labels

Page 5: Label propagation - Semisupervised Learning with Applications to NLP

Idea of the Problem

We want to find a function f such that:

L=set of Labeled InstancesU=set of Unlabeled Instances

Similar near Unlabeled Instances should have similar Labels

Page 6: Label propagation - Semisupervised Learning with Applications to NLP

The Model● A complete graph

● Each Node is an instance ● Each arc has a weight

● is high if Nodes x and y are similar.T xy

T xy

Page 7: Label propagation - Semisupervised Learning with Applications to NLP

The Model

● Inside a Node:

Soft Labels

Page 8: Label propagation - Semisupervised Learning with Applications to NLP

Variables - Model ● T is a matrix, holding all the weights of the graph

N 1 ...N l=Labeled Data

N l+1 .. N n=Unlabeled DataT l lT l uT u lT u u

T l l

T l u

T u l

T u u

Page 9: Label propagation - Semisupervised Learning with Applications to NLP

Variables - Model ● Y is a matrix, holding the soft probabilities of

each instance

each of the possible labels

is the probability of being labeled as

The problem to solve

Y N a , Rb naRb

R1 , R2 ... Rk

N 1 , N 2 ...N n each of the instances to label

Y L

Y U

Page 10: Label propagation - Semisupervised Learning with Applications to NLP

Algorithm

will change in each iteration

Y

Page 11: Label propagation - Semisupervised Learning with Applications to NLP

How to Measure T?

Euclidean Distance

Distance Measure

Important Parameter(ignore it at the moment) we will talk about this later

Page 12: Label propagation - Semisupervised Learning with Applications to NLP

How to Initialize Y?● How to Correctly set the values of ?

● Fill the known values (of the labeled data)

● How to fill the values of the unlabeled data? → The initialization of this values can be arbitrary.

Y 0

● Transform T into T' (row normalization)

Page 13: Label propagation - Semisupervised Learning with Applications to NLP

Propagation Step

● Update during each iteration Y

Y 0Y 1 Y k

→ → ... →

● During the process Y will change

Page 14: Label propagation - Semisupervised Learning with Applications to NLP

Convergence

T̄ l l T̄ l u

T̄ u l T̄ u u Y u

Y l

During the iteration

Assumming we iterate infinite times then:

Y u

Y l

Y u

Y l =

Clamped

Y U1=T̄ uuY u

0+T̄ ulY L

Y U2=T̄ uu(T̄ uuY u

0+T̄ ulY L)+T̄ ulY L

...

Page 15: Label propagation - Semisupervised Learning with Applications to NLP

ConvergenceSince is normalized and is a submatrix of :

Doing it n times will lead to:

Converges to Zero

T̄ T̄

Page 16: Label propagation - Semisupervised Learning with Applications to NLP

After convergence

=

After convergence one can find by solving:

=

Page 17: Label propagation - Semisupervised Learning with Applications to NLP

Optimization Problem

F should minimize the energy function

f (i ) f ( j) w i j and should be similar for a high in order to minimize

w i j :Similarity between i j

Page 18: Label propagation - Semisupervised Learning with Applications to NLP

The graph laplacian

The graph laplacian is defined as :

Then we can use the graph laplacian to act on it

Rows are normalized so:

D= IT̄ i j

f :V →R

So the energy function can be rewritten in terms of

since

Let D be a diagonal matrix where

Page 19: Label propagation - Semisupervised Learning with Applications to NLP

Back to the optimization ProblemEnergy can be rewritten using laplacian

F should minimize the energy function.

Δuu=(Duu−T̄ uu)Δuu=( I−T̄ uu)

Δul=(Dul−T̄ ul )Δul=−T̄ ul

Page 20: Label propagation - Semisupervised Learning with Applications to NLP

Optimization Problem

Δuu=(Duu−T̄ uu)Δuu=( I−T̄ uu)

Δul=(Dul−T̄ ul )Δul=−T̄ ul

The algorithm converges to the minimization of the Energy function

f u=( I−T̄ uu)T ul f l

Delta can be rewritten in terms of T̄

Page 21: Label propagation - Semisupervised Learning with Applications to NLP

Sigma Parameter

Remember the Sigma parameter?

● It strongly influences the behavior of LP.

● There can be:● just one for the whole feature vector● One per dimensionσ

σ

Page 22: Label propagation - Semisupervised Learning with Applications to NLP

Sigma Parameter

● What happens if tends to be:– 0:

● The label of an unknown instance is given by just the nearest labeled instance

– Infinite● All the unlabaled instances receive the same influence

from all labeled instances. The soft probabilities of each unlabeled instance is given by the class frecuency in the labeled data

● There are heuristics for finding the appropiate value of sigma

σ

Page 23: Label propagation - Semisupervised Learning with Applications to NLP

Sigma Parameter - MST

Label1

Label2

This is the minimum arc connecting two components with differents labels

Arc connects two components with different label

σ=(minweight (arc))

3

Page 24: Label propagation - Semisupervised Learning with Applications to NLP

Sigma Parameter – Learning itHow to learn sigma?

● Assumption :A good sigma will do classification with confidence and thus minimize entropy.

How to do it?● Smoothing the transition Matrix T ● Finding the derivative of H (the entropy) w.r.t to

sigma

When to do it?● when using a sigma for each dimension can

be used to determine irrelevant dimensions

Page 25: Label propagation - Semisupervised Learning with Applications to NLP

Labeling Approach

● Once Yu is measured how do we assign labels to the instances?

Yu

● Take the most likely class

● Class mass Normalization

● Label Bidding

Page 26: Label propagation - Semisupervised Learning with Applications to NLP

Labeling Approach

● Take the most likely class

● Simply, look at the rows of Yu, and choose for each instance the label with highest probability

● Problem: no control on the proportion of classes

Page 27: Label propagation - Semisupervised Learning with Applications to NLP

Labeling Approach● Class mass Normalization

● Given some class proportions● Scalate each column to

● Then Simply, look at the rows of Yu, and choose for each instance the label with highest probability

P1 , P2 ...P k

PcC

Page 28: Label propagation - Semisupervised Learning with Applications to NLP

Labeling Approach● Label bidding

● Given some class proportions

1.estimate numbers of items per label

2. choose the label with greatest number of items, take items whose probabilty of being the current label is the highest and label as the current selected label.

3. iterate through all the possible labels

P1 , P2 ...P k

(C k)

C k

Page 29: Label propagation - Semisupervised Learning with Applications to NLP

Experiment Setup

● Artificial Data ● Comparison LP vs kNN (k=1)

● Character recognition● Recognize handwritten digits● Images 16x16 pixels,gray scale● Recognizing 1,2,3.● 256 dimensional vector

Page 30: Label propagation - Semisupervised Learning with Applications to NLP

Results using LP on artificial data

Page 31: Label propagation - Semisupervised Learning with Applications to NLP

Results using LP on artificial data

● LP finds the structure in the data while KNN fails

Page 32: Label propagation - Semisupervised Learning with Applications to NLP

P1NN● P1NN is a baseline for comparisons

● Simplified version of LP

1.During each iteration find the unlabeled instance nearest to a labeled instance and label it

2. Iterate until all instances are labeled

Page 33: Label propagation - Semisupervised Learning with Applications to NLP

Results using LP on Handwritten dataSet

● P1NN (BaseLine), 1NN (kNN)

● Cne: Class mass normalization. Proportions from Labeled Data

● Lbo: Label bidding with oracle class proportions

● ML: most likely labels

Page 34: Label propagation - Semisupervised Learning with Applications to NLP

Relation Extraction?● From natural language texts detect semantic

relations among entities

Example: B. Gates married Melinda French on January 1, 1994

spouse(B.Gates, Melinda French)

Page 35: Label propagation - Semisupervised Learning with Applications to NLP

Why LP to do RE?

Problems

UnsupervisedSupervised

Needs many annotated data

Retrieves clusters of relations with no label.

Page 36: Label propagation - Semisupervised Learning with Applications to NLP

RE- Problem Definition● Find an appropiate label to an ocurrance of two

entities in a context

Example: ….. B. Gates married Melinda French on January 1, 1994

Idea: if two ocurrances of entity pairs ahve similar Contexts, then they have same relation type

Entity 1(e1)

Entity 2(e2)

Context(Cpos)Context(Cpos)

Context(Cmid)

Context(Cpre)

Page 37: Label propagation - Semisupervised Learning with Applications to NLP

RE problem Definition - Features

● Words: in the contexts● Entity Types: Person, Location, Org...● POS tagging: of Words in the contexts● Chunking Tag: mark which words in the

contexts are inside chunks● Grammatical function of words in the contexts.

i.e : NP-SBJ (subject)● Position of words:

● First Word of e1● Second Word of e1..

-is there any word in Cmid-first word in Cpre,Cmid,Cpost...-second word in Cpre...

Page 38: Label propagation - Semisupervised Learning with Applications to NLP

RE problem Definition - Labels

Page 39: Label propagation - Semisupervised Learning with Applications to NLP

Experiment● ACE 2003 data. Corpus from Newspapers

● Assume all entities have been identified already

● Comparison between:

– Differents amount of labeled samples 1%,10%,25,50%,75%,100%

– Different Similarity Functions– LP, SVM and Bootstrapping

● LP:

● Similarity Function: Cosine, JensenShannon

● Labeling Approach: Take the most likely class

● Sigma: average similarity between labeled classes

Page 40: Label propagation - Semisupervised Learning with Applications to NLP

JensenShannon -Similarity Measure

-Measure the distance between two probabilitiy functions

-JS is a smoothing of Kullback-Leibler divergence

DK L Kullback-Leibler divergence

Experiment

-not symmetric

-not always has a finite value

Page 41: Label propagation - Semisupervised Learning with Applications to NLP

Results

Page 42: Label propagation - Semisupervised Learning with Applications to NLP

Classifying relation subtypes- SVM vs LP

SVM with linear Kernel

Page 43: Label propagation - Semisupervised Learning with Applications to NLP

Bootstrapping

Seeds Classifier

Train a Classifier

Update set of seeds whose confidence is high enough

Page 44: Label propagation - Semisupervised Learning with Applications to NLP

Classifying relation typesBootstrapping vs LP

Starting with 100 random seeds

Page 45: Label propagation - Semisupervised Learning with Applications to NLP

Results

● Performs well in general when there are few annotated data in comparison to SVM and kNN

● Irrelevant dimensions can be identified by using LP

● Looking at the structure of unlabeled data helps when there is few annotated data

Page 46: Label propagation - Semisupervised Learning with Applications to NLP

Thank you