Transductive Inference for Text Classification using Support Vector Machines Thorsten Joachims...

34
Transductive Inference for Text Classification using Support Vector Machines Thorsten Joachims Universit at Dortmund, LS VIII 44221 Dortmund, Germany [email protected] Presented by Yueng-Tien ,Lo 1

Transcript of Transductive Inference for Text Classification using Support Vector Machines Thorsten Joachims...

Page 1: Transductive Inference for Text Classification using Support Vector Machines Thorsten Joachims Universit at Dortmund, LS VIII 44221 Dortmund, Germany joachims@ls8.cs.uni-dortmund.de.

1

Transductive Inference for Text Classification using Support Vector Machines

Thorsten JoachimsUniversit at Dortmund, LS VIII44221 Dortmund, Germany

[email protected]

Presented by Yueng-Tien ,Lo

Page 2: Transductive Inference for Text Classification using Support Vector Machines Thorsten Joachims Universit at Dortmund, LS VIII 44221 Dortmund, Germany joachims@ls8.cs.uni-dortmund.de.

2

outline

• Introduction• Text Classification• Transductive Support Vector Machines• What Makes TSVMs Especially well Suited for Text

Classification• Conclusions and Outlook

Page 3: Transductive Inference for Text Classification using Support Vector Machines Thorsten Joachims Universit at Dortmund, LS VIII 44221 Dortmund, Germany joachims@ls8.cs.uni-dortmund.de.

3

Introduction

• Over the recent years, text classification has become one of the key techniques for organizing online information. Ex. filtering spam from people's email, or learning users' newsreading preferences

• The work presented here tackles the problem of learning from small training samples by taking a transductive, instead of an inductive approach.

Page 4: Transductive Inference for Text Classification using Support Vector Machines Thorsten Joachims Universit at Dortmund, LS VIII 44221 Dortmund, Germany joachims@ls8.cs.uni-dortmund.de.

4

Introduction

• In the inductive setting the learner tries to induce adecision function which has a low error rate on the whole distribution of examples for the particular learning task.

• In many situations we do not care about the particular decision function, but rather that we classify a given set of examples (i.e. a test set) with as few errors as possible.

Page 5: Transductive Inference for Text Classification using Support Vector Machines Thorsten Joachims Universit at Dortmund, LS VIII 44221 Dortmund, Germany joachims@ls8.cs.uni-dortmund.de.

5

Introduction

• Relevance Feedback :– The user is interested in a good classification of the test set

into those documents relevant or irrelevant to the query.

• Netnews Filtering :– Given the few training examples the user labeled on

previous days, he or she wants today's most interesting articles.

• Reorganizing a document collection :

Page 6: Transductive Inference for Text Classification using Support Vector Machines Thorsten Joachims Universit at Dortmund, LS VIII 44221 Dortmund, Germany joachims@ls8.cs.uni-dortmund.de.

6

Text Classification

• The goal of text classification is the automatic assignment of documents to a fixed number of semantic categories.

• Each such problem answers the question of whether or not a document should be assigned to a particular category.

Page 7: Transductive Inference for Text Classification using Support Vector Machines Thorsten Joachims Universit at Dortmund, LS VIII 44221 Dortmund, Germany joachims@ls8.cs.uni-dortmund.de.

7

• Documents, which typically are strings of characters, have to be transformed into a representation suitable for the learning algorithm and the classification task.

Page 8: Transductive Inference for Text Classification using Support Vector Machines Thorsten Joachims Universit at Dortmund, LS VIII 44221 Dortmund, Germany joachims@ls8.cs.uni-dortmund.de.

8

Transductive Support Vector Machines

Page 9: Transductive Inference for Text Classification using Support Vector Machines Thorsten Joachims Universit at Dortmund, LS VIII 44221 Dortmund, Germany joachims@ls8.cs.uni-dortmund.de.

9

Transductive Support Vector Machines

Page 10: Transductive Inference for Text Classification using Support Vector Machines Thorsten Joachims Universit at Dortmund, LS VIII 44221 Dortmund, Germany joachims@ls8.cs.uni-dortmund.de.

10

Transductive Support Vector Machines

Page 11: Transductive Inference for Text Classification using Support Vector Machines Thorsten Joachims Universit at Dortmund, LS VIII 44221 Dortmund, Germany joachims@ls8.cs.uni-dortmund.de.

11

Transductive Support Vector Machines

Page 12: Transductive Inference for Text Classification using Support Vector Machines Thorsten Joachims Universit at Dortmund, LS VIII 44221 Dortmund, Germany joachims@ls8.cs.uni-dortmund.de.

12

Transductive Support Vector Machines

• For a learning task the learner is given a hypothesis space of functions and an i.i.d. sample of n training examples

• Each training example consists of a document vector and a binary label In contrast to the

inductive setting, the learner is also given an i.i.d. sample of test examples

from the same distribution

)()|(),( xPxyPyxP

1 ,1- : h

)2( ),(,),,(),,( 2211 nn yxyxyx

)3( ,,, ***

21 kxxx

LH

1 ,1- yx

Page 13: Transductive Inference for Text Classification using Support Vector Machines Thorsten Joachims Universit at Dortmund, LS VIII 44221 Dortmund, Germany joachims@ls8.cs.uni-dortmund.de.

13

Transductive Support Vector Machines

• The transductive learner aims to selects a function from using and so that the expected number of erroneous predictions

on the test examples is minimized.• is zero if otherwise it is one.

k

ikkiiL yxdPyxdPyxh

kLR

1

**11

** ),(),(),(1

)(

ba,

testtrainL SSLh ,trainS

testSL

H

ba

Page 14: Transductive Inference for Text Classification using Support Vector Machines Thorsten Joachims Universit at Dortmund, LS VIII 44221 Dortmund, Germany joachims@ls8.cs.uni-dortmund.de.

14

Transductive Support Vector Machines

• Bounds on the relative uniform deviation of training error

and test error

• With probability

• where the confidence interval depends on the number of training examples n, the number of test examples k, and the VC-Dimension d of H.

)4( ),(1

)(1

n

iiitrain yxh

nhR

)6( ),,,()()( dknhRhR traintest 1

),,,( dkn

)5( ),(1

)(1

*

k

i

trueiitest yxh

khR

Page 15: Transductive Inference for Text Classification using Support Vector Machines Thorsten Joachims Universit at Dortmund, LS VIII 44221 Dortmund, Germany joachims@ls8.cs.uni-dortmund.de.

15

Transductive Support Vector Machines

• What information do we get from studying the test sample (3) and how can we use it?

• The training and the test sample split the hypothesis space into a finite number of equivalence classes .

• This reduces the learning problem from finding a function in the possibly infinite set to finding one of finitely many equivalence classes

• We can use these equivalence classes to build a structure of increasing VC-Dimension for structural risk minimization''' 21 HHH

H'H

H'H

Page 16: Transductive Inference for Text Classification using Support Vector Machines Thorsten Joachims Universit at Dortmund, LS VIII 44221 Dortmund, Germany joachims@ls8.cs.uni-dortmund.de.

16

Transductive Support Vector Machines

• Vapnik shows that with the size of the margin we can control the maximum number of equivalence classes (i. e. the VC-Dimension)

Page 17: Transductive Inference for Text Classification using Support Vector Machines Thorsten Joachims Universit at Dortmund, LS VIII 44221 Dortmund, Germany joachims@ls8.cs.uni-dortmund.de.

17

Theorem 1

• Consider hyperplanes as hypothesis space . If the attribute vectors of a training sample (2) and a test sample (3) are contained in a ball of diameter D, then there are at most

equivalence classes which contain a separating

hyper-plane with

1,min,1lnexp

2

2

D

add

kndN r

bxbx jkji

ni

*11

bxsignxh )(

H

Page 18: Transductive Inference for Text Classification using Support Vector Machines Thorsten Joachims Universit at Dortmund, LS VIII 44221 Dortmund, Germany joachims@ls8.cs.uni-dortmund.de.

18

Transductive Support Vector Machines

• Note that the VC-Dimension does not necessarily depend on the number of features, but can be much lower than the dimensionality of the space.

• Structural risk minimization tells us that we get the smallest bound on the test error if we select the equivalence class from the structure element which minimizes (6).

'iH

Page 19: Transductive Inference for Text Classification using Support Vector Machines Thorsten Joachims Universit at Dortmund, LS VIII 44221 Dortmund, Germany joachims@ls8.cs.uni-dortmund.de.

19

Transductive SVM (lin. sep. case) Optimization Problem 1

• Minimize over byy n ,,,, **1

1:

1:

2

1

**1

1

2

bxy

bxytosubject

jkj

iini

j

Page 20: Transductive Inference for Text Classification using Support Vector Machines Thorsten Joachims Universit at Dortmund, LS VIII 44221 Dortmund, Germany joachims@ls8.cs.uni-dortmund.de.

20

Transductive SVM (non-sep. case) Optimization Problem 2

• Minimize over

and are parameters set by the user.

0:

0:

1:

1:

2

1

*1

1

***1

1

0

**

0

2

jkj

ini

jjkj

iiini

k

jj

n

ii

bxy

bxytosubject

CC

j

**11

**1 ,,,,,,,,,, knn byy

*CC

Page 21: Transductive Inference for Text Classification using Support Vector Machines Thorsten Joachims Universit at Dortmund, LS VIII 44221 Dortmund, Germany joachims@ls8.cs.uni-dortmund.de.

21

What Makes TSVMs Especially well Suited for Text Classification

• The text classification task is characterized by a special set of properties.– High dimensional input space:

• each(stemmed) word is a feature

– Document vectors are sparse:• For each document, the corresponding document vector contains

few entries that are not zero.

– Few irrelevant features:• Experiments in [Joachims, 1998] suggest that most words are

relevant.

Page 22: Transductive Inference for Text Classification using Support Vector Machines Thorsten Joachims Universit at Dortmund, LS VIII 44221 Dortmund, Germany joachims@ls8.cs.uni-dortmund.de.

22

What Makes TSVMs Especially well Suited for Text Classification

• But how can TSVMs be any better?

• when asking the search engine Altavista about all documents containing the words pepper and salt, it returns 327,180 web pages.

• When asking for the documents with the words pepper and physics, we get only 4,220 hits, although physics is a more popular word on the web than salt.

• And it is this co-occurrence information that TSVMs exploit as prior knowledge about the learning task.

Page 23: Transductive Inference for Text Classification using Support Vector Machines Thorsten Joachims Universit at Dortmund, LS VIII 44221 Dortmund, Germany joachims@ls8.cs.uni-dortmund.de.

23

What Makes TSVMs Especially well Suited for Text Classification

• Imagine document D1 was given as a training example for class A and document D6 was given as a training example for class B.

• How should we classify documents D2 to D4 (the test set)?

Page 24: Transductive Inference for Text Classification using Support Vector Machines Thorsten Joachims Universit at Dortmund, LS VIII 44221 Dortmund, Germany joachims@ls8.cs.uni-dortmund.de.

24

What Makes TSVMs Especially well Suited for Text Classification

• The reason we choose this classification of the test data over the others stems from our prior knowledge about the properties of text and common text classification tasks.

Page 25: Transductive Inference for Text Classification using Support Vector Machines Thorsten Joachims Universit at Dortmund, LS VIII 44221 Dortmund, Germany joachims@ls8.cs.uni-dortmund.de.

25

What Makes TSVMs Especially well Suited for Text Classification

• Note again that we got to this classification by studying the location of the test examples, which is not possible for an inductive learner.

• We see that the maximum margin bias reflects our prior knowledge about text classification well.

• By analyzing the test set, we can exploit this prior knowledge for learning.

Page 26: Transductive Inference for Text Classification using Support Vector Machines Thorsten Joachims Universit at Dortmund, LS VIII 44221 Dortmund, Germany joachims@ls8.cs.uni-dortmund.de.

26

Solving the Optimization Problem

• Training a transductive SVM means solving the (partly) combinatorial optimization problem OP2

• The key idea of the algorithm is that it begins with a labeling of the test data based on the classificationof an inductive SVM.

• Then it improves the solution by switching the labels of test examples so that the objective function decreases.

Page 27: Transductive Inference for Text Classification using Support Vector Machines Thorsten Joachims Universit at Dortmund, LS VIII 44221 Dortmund, Germany joachims@ls8.cs.uni-dortmund.de.

27

Solving the Optimization Problem

• It starts with training an inductive SVM on the training data and classifying the test data accordingly.

• Then it uniformly increases the influence of the test examples by incrementing the cost-factors and up to the user defined value of (loop 1).

• While the criterion in the condition of loop 2 identifies two examples for which changing the class labels leads to a decrease in the current objective function, these examples are switched.

*C

*C

*C

Page 28: Transductive Inference for Text Classification using Support Vector Machines Thorsten Joachims Universit at Dortmund, LS VIII 44221 Dortmund, Germany joachims@ls8.cs.uni-dortmund.de.

28

Algorithm for training Transductive Support Vector Machines

• Input: – training examples– Test examples

• Parameters:– : parameters from OP(2)– : number of test examples to be assigned to class +

• Output: predicted labels of the test examples

Classify the test examples using The test exampleswith the highest value of are assigned to the classthe remaining test examples are assigned to class

),(,),,(),,( 2211 nn yxyxyx

*** ,,,21 kxxx

*,CCnum

; 0,0,,,,,__:_,,, 11Cyxyxqpsvmsolveb nn

; 10:

; 10:

5*

5*

numk

numC

C

b,

bx j **

*** ,,,21 kyyy

num 1: * jy

1: * jy

Page 29: Transductive Inference for Text Classification using Support Vector Machines Thorsten Joachims Universit at Dortmund, LS VIII 44221 Dortmund, Germany joachims@ls8.cs.uni-dortmund.de.

29

Algorithm for training Transductive Support Vector Machines

• While {

while { }

}

**** CCCC

; ,,,,,,,,__:,,, ******1

*

111 CCCyxyxyxyxqpsvmsolveb kknn

2&0&0&0:, ****** lmlmlm yylm

; ,,,,,,,,__:,,,

:

:

******1

*

**

**

111

CCCyxyxyxyxqpsvmsolveb

yy

yy

kknn

ll

mm

; , 2min:

; , 2min:***

***

CCC

CCC

; ,, **1 kyyreturn

Page 30: Transductive Inference for Text Classification using Support Vector Machines Thorsten Joachims Universit at Dortmund, LS VIII 44221 Dortmund, Germany joachims@ls8.cs.uni-dortmund.de.

30

Inductive SVM (primal) Optimization Problem 3

• The function solve_svm_qp refers to quadratic programs of the following type.

• Minimize over *,,,

b

**

1

1

1:

**

1:

**

1

2

1:

1:

2

1

**

jjkj

iiini

yjj

yjj

n

ii

bxy

bxytosubject

CCC

j

jj

Page 31: Transductive Inference for Text Classification using Support Vector Machines Thorsten Joachims Universit at Dortmund, LS VIII 44221 Dortmund, Germany joachims@ls8.cs.uni-dortmund.de.

31

Theorem 2

• Algorithm 1 converges in a finite number of steps.

• The condition in loop 2 requires that the examples to be switched have different class labels.

0** lm yy

Page 32: Transductive Inference for Text Classification using Support Vector Machines Thorsten Joachims Universit at Dortmund, LS VIII 44221 Dortmund, Germany joachims@ls8.cs.uni-dortmund.de.

32

Theorem 2

• Let so that we can write.

''2

1

)2()2(2

1

2

1

2

1

****

0

2

****

0

2

****

0

2

1:

**

1:

**

0

2

**

lm

n

ii

lm

n

ii

lm

n

ii

yjj

yjj

n

ii

CCC

CCC

CCC

CCCjj

1* my

Page 33: Transductive Inference for Text Classification using Support Vector Machines Thorsten Joachims Universit at Dortmund, LS VIII 44221 Dortmund, Germany joachims@ls8.cs.uni-dortmund.de.

33

Theorem 2

• The inequality holds due to the selection criterion in loop 2, since and

• This means that loop 2 is exited after a finite number of iterations, since there is only a finite number of permutations of the test examples.

• Loop 1 also terminates after a finite number of iterations, since is bounded by .*

C C

*** 0,2max' lmm *** 0,2max' mll

Page 34: Transductive Inference for Text Classification using Support Vector Machines Thorsten Joachims Universit at Dortmund, LS VIII 44221 Dortmund, Germany joachims@ls8.cs.uni-dortmund.de.

34

Conclusions and Outlook

• By taking a transductive instead of an inductive approach, the test set can be used as an additional source of information about margins.

• On all data sets the transductive approach showed improvements over the currently best performing method, most substantially for small training samples and large test sets