Towards Bridging the Gap between Theory and …users.cecs.anu.edu.au/~psunehag/zhihua.pdfIntel Xeon...

44
http://lamda.nju.edu.cn Towards Bridging the Gap between Theory and Practice in Semi-supervised Learning Zhi-Hua Zhou http://cs.nju.edu.cn/zhouzh/ Email: [email protected] LAMDA Group National Key Laboratory for Novel Software Technology, Nanjing University, China

Transcript of Towards Bridging the Gap between Theory and …users.cecs.anu.edu.au/~psunehag/zhihua.pdfIntel Xeon...

Page 1: Towards Bridging the Gap between Theory and …users.cecs.anu.edu.au/~psunehag/zhihua.pdfIntel Xeon X5600 6 3.0 72 95 0.76 Xeon X5570 4 2.9 46.4 95 0.5 AMD Opteron 8384 4 2.7 43.2

http://lamda.nju.edu.cn

Towards Bridging the Gap between Theory and Practice in Semi-supervised Learning

Zhi-Hua Zhou

http://cs.nju.edu.cn/zhouzh/

Email: [email protected]

LAMDA Group

National Key Laboratory for Novel Software Technology, Nanjing University, China

Page 2: Towards Bridging the Gap between Theory and …users.cecs.anu.edu.au/~psunehag/zhihua.pdfIntel Xeon X5600 6 3.0 72 95 0.76 Xeon X5570 4 2.9 46.4 95 0.5 AMD Opteron 8384 4 2.7 43.2

http://cs.nju.edu.cn/zhouzh/

http://lamda.nju.edu.cn Labeled vs. Unlabeled

In many practical applications, unlabeled training data are readily available but labeled ones are fairly expensive to obtain because labeling the unlabeled examples requires human effort

class = “war”

(almost) infinite number of web pages on the Internet

?

Page 3: Towards Bridging the Gap between Theory and …users.cecs.anu.edu.au/~psunehag/zhihua.pdfIntel Xeon X5600 6 3.0 72 95 0.76 Xeon X5570 4 2.9 46.4 95 0.5 AMD Opteron 8384 4 2.7 43.2

http://cs.nju.edu.cn/zhouzh/

http://lamda.nju.edu.cn SSL: Why unlabeled data can be helpful?

blue or red?

Blue !

Intuitively,

Page 4: Towards Bridging the Gap between Theory and …users.cecs.anu.edu.au/~psunehag/zhihua.pdfIntel Xeon X5600 6 3.0 72 95 0.76 Xeon X5570 4 2.9 46.4 95 0.5 AMD Opteron 8384 4 2.7 43.2

http://cs.nju.edu.cn/zhouzh/

http://lamda.nju.edu.cn Multiple views

Each view is a feature set

Many real tasks involve more than one feature sets, e.g.,

Video features

Audio features

Image features

Text features

……

Ideally, the views should be

Sufficient: Each view contains sufficient information for training a good learner

Redundant: The views are conditionally independent

Page 5: Towards Bridging the Gap between Theory and …users.cecs.anu.edu.au/~psunehag/zhihua.pdfIntel Xeon X5600 6 3.0 72 95 0.76 Xeon X5570 4 2.9 46.4 95 0.5 AMD Opteron 8384 4 2.7 43.2

http://cs.nju.edu.cn/zhouzh/

http://lamda.nju.edu.cn

learner1 learner2 X1 view X2 view

labeled training data

unlabeled training data

labeled unlabeled instances

labeled unlabeled instances

A representative of semi-supervised learning approaches; motivated by the use of multi-views; simple but effective

Co-training

[Blum & Mitchell, COLT’98]

Page 6: Towards Bridging the Gap between Theory and …users.cecs.anu.edu.au/~psunehag/zhihua.pdfIntel Xeon X5600 6 3.0 72 95 0.76 Xeon X5570 4 2.9 46.4 95 0.5 AMD Opteron 8384 4 2.7 43.2

http://cs.nju.edu.cn/zhouzh/

http://lamda.nju.edu.cn

• Statistical parsing [Sarkar, NAACL01; Steedman et al., EACL03;

Hwa et al., ICML03w]

• Noun phrase identification [Pierce & Cardie, EMNLP01]

• Image retrieval [Zhou et al., ECML’04, TOIS06]

• … …

Widely applied to many domains, e.g.,

Wide applications

See more in:

Zhou & Li, Semi-supervised learning by disagreement, Knowledge and Information Systems, 2010, vol.24, no.3, pp.415-439

Page 7: Towards Bridging the Gap between Theory and …users.cecs.anu.edu.au/~psunehag/zhihua.pdfIntel Xeon X5600 6 3.0 72 95 0.76 Xeon X5570 4 2.9 46.4 95 0.5 AMD Opteron 8384 4 2.7 43.2

http://cs.nju.edu.cn/zhouzh/

http://lamda.nju.edu.cn

[Blum & Mitchell, COLT’98] - Given a conditional independence assumption on the distribution D, if the target class is learnable from random classification noise in the standard PAC model, then any initial weak predictor can be boosted to arbitrarily high accuracy by co-training

[Dasgupta, Littleman & McAllester, NIPS’01] – When the requirement of sufficient and redundant views is met, the co-trained classifiers could make few generalization errors by maximizing their agreement over the unlabeled data

[Balcan, Blum & Yang, NIPS’04] - Given appropriately strong PAC-learners on each view, a weaker “expansion” assumption on the underlying data distribution is sufficient for iterative co-training to succeed

… …

Theoretical studies

All assumed “multi-views”

Page 8: Towards Bridging the Gap between Theory and …users.cecs.anu.edu.au/~psunehag/zhihua.pdfIntel Xeon X5600 6 3.0 72 95 0.76 Xeon X5570 4 2.9 46.4 95 0.5 AMD Opteron 8384 4 2.7 43.2

http://cs.nju.edu.cn/zhouzh/

http://lamda.nju.edu.cn

• Using two different types of decision trees [Goldman &

Zhou, ICML’00]

• Using classifiers constructed from different data samples [Zhou & Li, TKDE04]

• Using regression learners constructed with different distance metrics and/or k values [Zhou & Li, IJCAI’05]

• … …

Effective single-view algorithms

See more in:

Zhou & Li, Semi-supervised learning by disagreement, Knowledge and Information Systems, 2010, vol.24, no.3, pp.415-439

Page 9: Towards Bridging the Gap between Theory and …users.cecs.anu.edu.au/~psunehag/zhihua.pdfIntel Xeon X5600 6 3.0 72 95 0.76 Xeon X5570 4 2.9 46.4 95 0.5 AMD Opteron 8384 4 2.7 43.2

http://cs.nju.edu.cn/zhouzh/

http://lamda.nju.edu.cn

[Wang & Zhou, ECML’07]

A theoretical result:

“Large diversity” is sufficient !

Roughly speaking, the key requirement of co-training is that the two learners are with large diversity and each learner is better than a weak learner

Page 10: Towards Bridging the Gap between Theory and …users.cecs.anu.edu.au/~psunehag/zhihua.pdfIntel Xeon X5600 6 3.0 72 95 0.76 Xeon X5570 4 2.9 46.4 95 0.5 AMD Opteron 8384 4 2.7 43.2

http://cs.nju.edu.cn/zhouzh/

http://lamda.nju.edu.cn

Performances of the learners observed in experiments : the performances could not be improved further after a number of rounds

Previous theoretical studies indicated that the performances could always be improved

Performance of Co-training

The empirical/theoretical gap

Page 11: Towards Bridging the Gap between Theory and …users.cecs.anu.edu.au/~psunehag/zhihua.pdfIntel Xeon X5600 6 3.0 72 95 0.76 Xeon X5570 4 2.9 46.4 95 0.5 AMD Opteron 8384 4 2.7 43.2

http://cs.nju.edu.cn/zhouzh/

http://lamda.nju.edu.cn

[Wang & Zhou, ECML’07]

A theoretical result:

The gap will definitely occur

Roughly speaking, as the co-training process continues, the learners will become more and more similar, and therefore it is a “must”-phenomenon that co-training could not improve the performance further after a number of iterations

Based on this result, we get a method for roughly estimating the adequate round to terminate the co-training process

Page 12: Towards Bridging the Gap between Theory and …users.cecs.anu.edu.au/~psunehag/zhihua.pdfIntel Xeon X5600 6 3.0 72 95 0.76 Xeon X5570 4 2.9 46.4 95 0.5 AMD Opteron 8384 4 2.7 43.2

http://cs.nju.edu.cn/zhouzh/

http://lamda.nju.edu.cn Estimating the round for termination

Data with two views

Data with single view

In either case, the performances at the estimated round is

close to that at the last learning round

Page 13: Towards Bridging the Gap between Theory and …users.cecs.anu.edu.au/~psunehag/zhihua.pdfIntel Xeon X5600 6 3.0 72 95 0.76 Xeon X5570 4 2.9 46.4 95 0.5 AMD Opteron 8384 4 2.7 43.2

http://cs.nju.edu.cn/zhouzh/

http://lamda.nju.edu.cn

Several sufficient conditions for co-training:

Conditional independence [Blum & Mitchell, COLT’98]

Weak dependence [Abney, ACL’02]

Expansion [Balcan, Blum & Yang, NIPS’04]

Large diversity [Wang & Zhou, ECML’07]

Theoretical status

Necessary condition

Sufficient and necessary condition

-- Proved by [Wang & Zhou, ICML’10]

Page 14: Towards Bridging the Gap between Theory and …users.cecs.anu.edu.au/~psunehag/zhihua.pdfIntel Xeon X5600 6 3.0 72 95 0.76 Xeon X5570 4 2.9 46.4 95 0.5 AMD Opteron 8384 4 2.7 43.2

http://cs.nju.edu.cn/zhouzh/

http://lamda.nju.edu.cn

Our theoretical results disclose that

“multi-views” are not really needed !

Can we really do … without “multi-views”?

… an application example -->

If we really have two sufficient and redundant views, we can even do SSL with a single labeled example. See: Zhou, Zhan & Yang. Semi-supervised learning with very few labeled training examples. AAAI'07.

Page 15: Towards Bridging the Gap between Theory and …users.cecs.anu.edu.au/~psunehag/zhihua.pdfIntel Xeon X5600 6 3.0 72 95 0.76 Xeon X5570 4 2.9 46.4 95 0.5 AMD Opteron 8384 4 2.7 43.2

http://cs.nju.edu.cn/zhouzh/

http://lamda.nju.edu.cn Microprocessor design space exploration

Design Space Exploration (DSE) is to determine the promising processor architecture to meet the design specification

Many design parameters

• Core number, Frequency, Cache Size, Buffer Size

• Cache Replacement Policy

Design Space consists of billions of design configurations (i.e., combinations of design parameters)

Design Space

Page 16: Towards Bridging the Gap between Theory and …users.cecs.anu.edu.au/~psunehag/zhihua.pdfIntel Xeon X5600 6 3.0 72 95 0.76 Xeon X5570 4 2.9 46.4 95 0.5 AMD Opteron 8384 4 2.7 43.2

http://cs.nju.edu.cn/zhouzh/

http://lamda.nju.edu.cn To use unlabeled data for DSE

There are not “two-views”

Comprehensibility is important

COMT: CO-training Model Trees

which uses two model trees, without two views

[Qi, Chen, Chen, Zhou, Hu & Xu, IJCAI’11]

Page 17: Towards Bridging the Gap between Theory and …users.cecs.anu.edu.au/~psunehag/zhihua.pdfIntel Xeon X5600 6 3.0 72 95 0.76 Xeon X5570 4 2.9 46.4 95 0.5 AMD Opteron 8384 4 2.7 43.2

http://cs.nju.edu.cn/zhouzh/

http://lamda.nju.edu.cn

0.00

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

applu ar

t

bzip2

craf

ty

equa

ke

galgel

gcc

luca

sm

cf

swim

twol

fvp

r

MS

Es

ANNs

M5P

COMT

SPEC CPU2000 Benchmark

84%

30%

Comparing to DSE state-of-the-art, 30% to 84% improvement

[Qi, Chen, Chen, Zhou, Hu & Xu, IJCAI’11]

Page 18: Towards Bridging the Gap between Theory and …users.cecs.anu.edu.au/~psunehag/zhihua.pdfIntel Xeon X5600 6 3.0 72 95 0.76 Xeon X5570 4 2.9 46.4 95 0.5 AMD Opteron 8384 4 2.7 43.2

http://cs.nju.edu.cn/zhouzh/

http://lamda.nju.edu.cn Industry application: Godson-3B

Vendor Chip No. of Cores

Frequency (GHZ)

Peak Performance

(GFlops)

Power (W)

Performance/Power

(GFlops/W)

ICT Godson-3B 8 1.0 128 40 3.2

Intel Xeon X5600 6 3.0 72 95 0.76

Xeon X5570 4 2.9 46.4 95 0.5

AMD Opteron 8384 4 2.7 43.2 75 0.6

IBM Power6+ 2 4.7 37.6 120 0.3

Power7 8 3.3 211.2 140 1.5

IEEE Spectrum (May 2011) reports

“Chinese Chip Wins Energy-Efficiency Crown”

Page 19: Towards Bridging the Gap between Theory and …users.cecs.anu.edu.au/~psunehag/zhihua.pdfIntel Xeon X5600 6 3.0 72 95 0.76 Xeon X5570 4 2.9 46.4 95 0.5 AMD Opteron 8384 4 2.7 43.2

http://cs.nju.edu.cn/zhouzh/

http://lamda.nju.edu.cn

What is crucial ?

Diversity !

Diversity is well known to be crucial for ensemble methods

Chap.5 Diversity Z.-H. Zhou. Ensemble Methods: Foundations and Algorithms, Boca Raton, FL: Chapman & Hall/CRC, Jun. 2012.

Page 20: Towards Bridging the Gap between Theory and …users.cecs.anu.edu.au/~psunehag/zhihua.pdfIntel Xeon X5600 6 3.0 72 95 0.76 Xeon X5570 4 2.9 46.4 95 0.5 AMD Opteron 8384 4 2.7 43.2

http://cs.nju.edu.cn/zhouzh/

http://lamda.nju.edu.cn

So nice to have semi-supervised learning!

However, nothing can be perfect

Page 21: Towards Bridging the Gap between Theory and …users.cecs.anu.edu.au/~psunehag/zhihua.pdfIntel Xeon X5600 6 3.0 72 95 0.76 Xeon X5570 4 2.9 46.4 95 0.5 AMD Opteron 8384 4 2.7 43.2

http://cs.nju.edu.cn/zhouzh/

http://lamda.nju.edu.cn An observation

The use of unlabeled data may degenerate performance

[Cozman, Cohen & Cirelo, ICML’03]

Similar observations have been reported in many literatures [Nigam et al., MLJ00; Zhang & Oles, ICML’00; Blum & Chawla, ICML’01; Zhou & Li, TKDE05; Balcan et al., ICMLw’05; Chapelle et al., JMLR08; Jebara et al. ICML’ 09]

Page 22: Towards Bridging the Gap between Theory and …users.cecs.anu.edu.au/~psunehag/zhihua.pdfIntel Xeon X5600 6 3.0 72 95 0.76 Xeon X5570 4 2.9 46.4 95 0.5 AMD Opteron 8384 4 2.7 43.2

http://cs.nju.edu.cn/zhouzh/

http://lamda.nju.edu.cn Why degeneration ?

trained from data generated by Naïve Bayes model

Performance of Naïve Bayes classifiers [Cozman, Cohen & Cirelo, ICML’03]

trained from data generated by TAN model

Generative methods: mismatch of model assumption is one of the apparent reasons

Page 23: Towards Bridging the Gap between Theory and …users.cecs.anu.edu.au/~psunehag/zhihua.pdfIntel Xeon X5600 6 3.0 72 95 0.76 Xeon X5570 4 2.9 46.4 95 0.5 AMD Opteron 8384 4 2.7 43.2

http://cs.nju.edu.cn/zhouzh/

http://lamda.nju.edu.cn

What about general SSL approaches ?

Safe SSL – the holy grail of current SSL research

by using unlabeled data, the performance won’t be statistically significantly worse than purely using the original (limited) labeled data

Page 24: Towards Bridging the Gap between Theory and …users.cecs.anu.edu.au/~psunehag/zhihua.pdfIntel Xeon X5600 6 3.0 72 95 0.76 Xeon X5570 4 2.9 46.4 95 0.5 AMD Opteron 8384 4 2.7 43.2

http://cs.nju.edu.cn/zhouzh/

http://lamda.nju.edu.cn For disagreement-based SSL

During disagreement-based SSL process, the learners will provide pseudo-labels to some unlabeled instances

Performance degeneration with incorrect pseudo-labels

How about editing the pseudo-labeled data to avoid unreliable instances?

Data editing: A technique which attempts to improve the quality of the training set through identifying and eliminating the training examples incorrectly generated in the human labeling process

Some effective methods: [Wilson, TSMC72; Koplowitz & Brown, PR81;

Sánchez et al., PRL03; Jiang & Zhou, ISNN’05]

Page 25: Towards Bridging the Gap between Theory and …users.cecs.anu.edu.au/~psunehag/zhihua.pdfIntel Xeon X5600 6 3.0 72 95 0.76 Xeon X5570 4 2.9 46.4 95 0.5 AMD Opteron 8384 4 2.7 43.2

http://cs.nju.edu.cn/zhouzh/

http://lamda.nju.edu.cn Editing pseudo-labeled data

The self-training approach [Nigam & Ghani, CIKM’00] suffers seriously from incorrect pseudo-labels

Learner

Unlabeled Data

Labeled Data

Self-labeled Data

Page 26: Towards Bridging the Gap between Theory and …users.cecs.anu.edu.au/~psunehag/zhihua.pdfIntel Xeon X5600 6 3.0 72 95 0.76 Xeon X5570 4 2.9 46.4 95 0.5 AMD Opteron 8384 4 2.7 43.2

http://cs.nju.edu.cn/zhouzh/

http://lamda.nju.edu.cn Editing pseudo-labeled data

The self-training approach [Nigam & Ghani, CIKM’00] suffers seriously from incorrect pseudo-labels

Learner

Unlabeled Data

Data Editing Labeled

Data

Self-labeled Data

Cleaned self-labeled

Data

SETRED (SElf-TRaining with Editing)

[Li & Zhou, PAKDD’05]

Page 27: Towards Bridging the Gap between Theory and …users.cecs.anu.edu.au/~psunehag/zhihua.pdfIntel Xeon X5600 6 3.0 72 95 0.76 Xeon X5570 4 2.9 46.4 95 0.5 AMD Opteron 8384 4 2.7 43.2

http://cs.nju.edu.cn/zhouzh/

http://lamda.nju.edu.cn

[Li & Zhou, PAKDD’05]

UCI datasets, 25% test, 75% training (unlabel rate 90%)

Nearest Neighbor (NN) Classifiers are used

Baselines: NN-L: NN trained from L only

NN-A: NN trained from L+U with all labels given

SETRED performance

Page 28: Towards Bridging the Gap between Theory and …users.cecs.anu.edu.au/~psunehag/zhihua.pdfIntel Xeon X5600 6 3.0 72 95 0.76 Xeon X5570 4 2.9 46.4 95 0.5 AMD Opteron 8384 4 2.7 43.2

http://cs.nju.edu.cn/zhouzh/

http://lamda.nju.edu.cn For S3VMs

Large margin separator

Low-density separator [Chapelle & Zien, ICML’05]

Labeled examples:

Unlabeled examples:

f

Loss on labeled Loss on unlabeled

Page 29: Towards Bridging the Gap between Theory and …users.cecs.anu.edu.au/~psunehag/zhihua.pdfIntel Xeon X5600 6 3.0 72 95 0.76 Xeon X5570 4 2.9 46.4 95 0.5 AMD Opteron 8384 4 2.7 43.2

http://cs.nju.edu.cn/zhouzh/

http://lamda.nju.edu.cn Low density separation

Low-density separation is the fundamental assumption of S3VMs

There are often more than one low-density separators

Performance degeneration with incorrect selection of low-density separators

[Li & Zhou, ICML’11]

Page 30: Towards Bridging the Gap between Theory and …users.cecs.anu.edu.au/~psunehag/zhihua.pdfIntel Xeon X5600 6 3.0 72 95 0.76 Xeon X5570 4 2.9 46.4 95 0.5 AMD Opteron 8384 4 2.7 43.2

http://cs.nju.edu.cn/zhouzh/

http://lamda.nju.edu.cn S4VM: Safe S3VM

In contrast of generating a single low-density separator, S4VM: Generating multiple diverse low-density separators

Making the optimal prediction

[Li & Zhou, ICML’11]

Page 31: Towards Bridging the Gap between Theory and …users.cecs.anu.edu.au/~psunehag/zhihua.pdfIntel Xeon X5600 6 3.0 72 95 0.76 Xeon X5570 4 2.9 46.4 95 0.5 AMD Opteron 8384 4 2.7 43.2

http://cs.nju.edu.cn/zhouzh/

http://lamda.nju.edu.cn Make the optimal prediction

Given a set of candidate low-density separators

, inductive SVM trained from labeled data only

The S4VM prediction must not be worse than inductive SVM

Page 32: Towards Bridging the Gap between Theory and …users.cecs.anu.edu.au/~psunehag/zhihua.pdfIntel Xeon X5600 6 3.0 72 95 0.76 Xeon X5570 4 2.9 46.4 95 0.5 AMD Opteron 8384 4 2.7 43.2

http://cs.nju.edu.cn/zhouzh/

http://lamda.nju.edu.cn Make the optimal prediction (con’t)

Gains against inductive SVM

Losses against inductive SVM

However, the ground-truth is unknown!

A simple idea: To be more closer to the ground-truth

How about considering the worst case in ?

Page 33: Towards Bridging the Gap between Theory and …users.cecs.anu.edu.au/~psunehag/zhihua.pdfIntel Xeon X5600 6 3.0 72 95 0.76 Xeon X5570 4 2.9 46.4 95 0.5 AMD Opteron 8384 4 2.7 43.2

http://cs.nju.edu.cn/zhouzh/

http://lamda.nju.edu.cn S4VM objective

Linear functions of

Note: S4VM does not rely on a single low-density separator

Page 34: Towards Bridging the Gap between Theory and …users.cecs.anu.edu.au/~psunehag/zhihua.pdfIntel Xeon X5600 6 3.0 72 95 0.76 Xeon X5570 4 2.9 46.4 95 0.5 AMD Opteron 8384 4 2.7 43.2

http://cs.nju.edu.cn/zhouzh/

http://lamda.nju.edu.cn Theoretical property

S4VM is safe, under the same assumption of S3VM (that is, the ground-truth is a low-density separator)

[Li & Zhou, ICML’11]

Page 35: Towards Bridging the Gap between Theory and …users.cecs.anu.edu.au/~psunehag/zhihua.pdfIntel Xeon X5600 6 3.0 72 95 0.76 Xeon X5570 4 2.9 46.4 95 0.5 AMD Opteron 8384 4 2.7 43.2

http://cs.nju.edu.cn/zhouzh/

http://lamda.nju.edu.cn Multiple diverse separators

Generating multiple diverse low-density separators:

objective functional of S3VM A quantity of penalty about the diversity of separators, e.g.,

Page 36: Towards Bridging the Gap between Theory and …users.cecs.anu.edu.au/~psunehag/zhihua.pdfIntel Xeon X5600 6 3.0 72 95 0.76 Xeon X5570 4 2.9 46.4 95 0.5 AMD Opteron 8384 4 2.7 43.2

http://cs.nju.edu.cn/zhouzh/

http://lamda.nju.edu.cn Multiple diverse separators (con’t)

Non-convex task. We provide two implementations, resulting in two versions of S4VMs:

S4VMa: global simulated annealing search S4VMs: sampling

Thus, we get:

[Li & Zhou, ICML’11]

Page 37: Towards Bridging the Gap between Theory and …users.cecs.anu.edu.au/~psunehag/zhihua.pdfIntel Xeon X5600 6 3.0 72 95 0.76 Xeon X5570 4 2.9 46.4 95 0.5 AMD Opteron 8384 4 2.7 43.2

http://cs.nju.edu.cn/zhouzh/

http://lamda.nju.edu.cn S4VMa

Simulated annealing (SA) is a popular probabilistic method for approaching global solution

Almost based on pure SA

Local search for speedup

Page 38: Towards Bridging the Gap between Theory and …users.cecs.anu.edu.au/~psunehag/zhihua.pdfIntel Xeon X5600 6 3.0 72 95 0.76 Xeon X5570 4 2.9 46.4 95 0.5 AMD Opteron 8384 4 2.7 43.2

http://cs.nju.edu.cn/zhouzh/

http://lamda.nju.edu.cn S4VMs

Find N>T number of low-density separators

Select representative separators with large diversity

Page 39: Towards Bridging the Gap between Theory and …users.cecs.anu.edu.au/~psunehag/zhihua.pdfIntel Xeon X5600 6 3.0 72 95 0.76 Xeon X5570 4 2.9 46.4 95 0.5 AMD Opteron 8384 4 2.7 43.2

http://cs.nju.edu.cn/zhouzh/

http://lamda.nju.edu.cn S4VM performance

TSVM often significantly degenerates S4VM never significantly degenerates

S4VM accuracy highly competitive with TSVM

S4VM wins TSVM on 28/46 cases for 10 labeled 28/46 cases for 100 labeled

12 splits for benchmark, 30 splits for UCI; 10 separators; S4VMa with similar observations (less efficient as S4VMs)

[Li & Zhou, ICML’11]

Page 40: Towards Bridging the Gap between Theory and …users.cecs.anu.edu.au/~psunehag/zhihua.pdfIntel Xeon X5600 6 3.0 72 95 0.76 Xeon X5570 4 2.9 46.4 95 0.5 AMD Opteron 8384 4 2.7 43.2

http://cs.nju.edu.cn/zhouzh/

http://lamda.nju.edu.cn Some comparisons

• S3VMmin: selecting the separator with the smallest objective value

• S3VMcom: combining all seperators using uniform weights

S3VMmin and S3VMcom often degenerates performance

Page 41: Towards Bridging the Gap between Theory and …users.cecs.anu.edu.au/~psunehag/zhihua.pdfIntel Xeon X5600 6 3.0 72 95 0.76 Xeon X5570 4 2.9 46.4 95 0.5 AMD Opteron 8384 4 2.7 43.2

http://cs.nju.edu.cn/zhouzh/

http://lamda.nju.edu.cn The “best single”

• S3VMbest: selecting the best separator using test set (by cheating)

The “best singles” are far from the ground-truth Selecting the “best single” is not safe

S4VM does not rely on a single separator

It is quite robust to the assumption “ground-truth is among the low-density separators”

Page 42: Towards Bridging the Gap between Theory and …users.cecs.anu.edu.au/~psunehag/zhihua.pdfIntel Xeon X5600 6 3.0 72 95 0.76 Xeon X5570 4 2.9 46.4 95 0.5 AMD Opteron 8384 4 2.7 43.2

http://cs.nju.edu.cn/zhouzh/

http://lamda.nju.edu.cn The talk involves joint work with

My students:

My collaborators:

Tianshi Chen, Yunji Chen, Qi Guo, Weiwu Hu

Ling Li, Zhiwei Xu, … …

Ming Li

( )

Yu-Feng Li

( )

Wei Wang

( )

Page 43: Towards Bridging the Gap between Theory and …users.cecs.anu.edu.au/~psunehag/zhihua.pdfIntel Xeon X5600 6 3.0 72 95 0.76 Xeon X5570 4 2.9 46.4 95 0.5 AMD Opteron 8384 4 2.7 43.2

http://cs.nju.edu.cn/zhouzh/

http://lamda.nju.edu.cn For details

Theoretical studies on disagreement-based SSL:

W. Wang and Z.-H. Zhou. Analyzing co-training style algorithms. In: Proceedings of the 18th European Conference on Machine Learning (ECML'07), Warsaw, Poland, 2007, pp.454-465.

W. Wang and Z.-H. Zhou. A new analysis of co-training. In: Proceedings of the 27th International Conference on Machine Learning (ICML'10), Haifa, Israel, 2010, pp.1135-1142.

SSL for CPU design:

Q. Guo, T. Chen, Y. Chen, Z.-H. Zhou, W. Hu, and Z. Xu. Effective and efficient microprocessor design space exploration using unlabeled design configurations. In: Proceedings of the 22nd International Joint Conference on Artificial Intelligence (IJCAI'11), Barcelona, Spain, 2011, pp.1671-1677.

T. Chen, Y. Chen, Q. Guo, Z.-H. Zhou, L. Li, and Z. Xu. Effective and efficient microprocessor design space exploration using unlabeled design configurations. ACM Transactions on Intelligent Systems and Technology, in press.

Page 44: Towards Bridging the Gap between Theory and …users.cecs.anu.edu.au/~psunehag/zhihua.pdfIntel Xeon X5600 6 3.0 72 95 0.76 Xeon X5570 4 2.9 46.4 95 0.5 AMD Opteron 8384 4 2.7 43.2

http://cs.nju.edu.cn/zhouzh/

http://lamda.nju.edu.cn For details

Towards Safe SSL:

M. Li and Z.-H. Zhou. SETRED: Self-training with editing. In: Proceedings of the 9th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD'05), Hanoi, Vietnam, 2005, pp.611-621.

Y.-F. Li and Z.-H. Zhou. Improving semi-supervised support vector machines through unlabeled instances selection. In: Proceedings of the 25th AAAI Conference on Artificial Intelligence (AAAI'11), San Francisco, CA, 2011, pp.386-391.

Y.-F. Li and Z.-H. Zhou. Towards making unlabeled data never hurt. In: Proceedings of the 28th International Conference on Machine Learning (ICML'11), Bellevue, WA, 2011, pp.1081-1088.

Code: http://lamda.nju.edu.cn/code_S4VM.ashx

Thanks!