Unsupervised and Semi- Supervised Learninglily/Teaching/498FSlides/12-Unsupervised...L. Mihalkova,...

15
Unsupervised and Semi- Supervised Learning Machine Learning, Fall 2010 1

Transcript of Unsupervised and Semi- Supervised Learninglily/Teaching/498FSlides/12-Unsupervised...L. Mihalkova,...

Page 1: Unsupervised and Semi- Supervised Learninglily/Teaching/498FSlides/12-Unsupervised...L. Mihalkova, CSMC498F, Fall2010 Administrativia This week continuing on unsupervised learning

Unsupervised and Semi-Supervised Learning

Machine Learning, Fall 2010

1

Page 2: Unsupervised and Semi- Supervised Learninglily/Teaching/498FSlides/12-Unsupervised...L. Mihalkova, CSMC498F, Fall2010 Administrativia This week continuing on unsupervised learning

L. Mihalkova, CSMC498F, Fall2010

Announcement

2

!"#$%&'($")*+,,-$")$").$-"/)0$)

.%'12'0+)*34$$5)-")6$&720+%)*3-+"3+)

•! !"#$%&'()*+,-./0.$*1)*2343*

•! 56*7857*9,,/*4422*

•! :$,/*;</=>?@3</*ABCD"E*%6'FG%*<$,-C&.&H*

8+'%)9:;)6*)#'3250<)'",=+%)04+,+)'"1)$04+%)>2+,($",?)

I"(*%",#B&*5*D,*E,*D$'&#'E.*%F",,BJ*

I"'E*&,*'&/C%%C,6%*F,//CK..%*B,,G*L,$J*

I"'E*%",#B&*5*.M<.FE*L$,/*D$'&#'E.*%F",,BJ*

N,O*&,*5*F",,%.*'*<$,D$'/J**

N,O*&,*5*'<<B(J**

Page 3: Unsupervised and Semi- Supervised Learninglily/Teaching/498FSlides/12-Unsupervised...L. Mihalkova, CSMC498F, Fall2010 Administrativia This week continuing on unsupervised learning

L. Mihalkova, CSMC498F, Fall2010

AdministrativiaThis week continuing on unsupervised learning

Some more of a different flavor clustering

Semi-supervised learning

Intersections between ensembles, active, un-sup and semi-sup learning

May start Reinforcement Learning

Reading

Optional Chapter 17 of Manning, Raghavan, Schuetze “Information Retrieval” Book: http://nlp.stanford.edu/IR-book/pdf/17hier.pdf

3

Page 4: Unsupervised and Semi- Supervised Learninglily/Teaching/498FSlides/12-Unsupervised...L. Mihalkova, CSMC498F, Fall2010 Administrativia This week continuing on unsupervised learning

L. Mihalkova, CSMC498F, Fall2010

Hierarchical Clustering

As opposed to k-means, which produces a “flat” clustering, here we produce a hierarchy of clusters

4

On

line ed

ition

(c)2009 C

amb

ridg

e UP

17.2S

ingle-lin

kan

dcom

plete-link

clusterin

g383

1.0 0.8 0.6 0.4 0.2 0.0

NYSE closing averages

Hog prices tumble

Oil prices slip

Ag trade reform.

Chrysler / Latin America

Japanese prime minister / Mexico

Fed holds interest rates steady

Fed to keep interest rates steady

Fed keeps interest rates steady

Fed keeps interest rates steady

Mexican markets

British FTSE index

War hero Colin Powell

War hero Colin PowellLloyd’s CEO questioned

Lloyd’s chief / U.S. grilling

Ohio Blue CrossLawsuit against tobacco companies

suits against tobacco firms

Indiana tobacco lawsuitViag stays positive

Most active stocksCompuServe reports loss

Sprint / Internet access service

Planet Hollywood

Trocadero: tripling of revenues

Back!to!school spending is up

German unions split

Chains may raise prices

Clinton signs law

!F

igu

re17.5

Ad

end

rog

ramo

fa

com

plete-lin

kclu

stering

.T

he

same

30d

ocu

men

tsw

ereclu

steredw

ithsin

gle-lin

kclu

stering

inF

igu

re17.1.

From Ch17 ofIR Book

Page 5: Unsupervised and Semi- Supervised Learninglily/Teaching/498FSlides/12-Unsupervised...L. Mihalkova, CSMC498F, Fall2010 Administrativia This week continuing on unsupervised learning

L. Mihalkova, CSMC498F, Fall2010

Bottom-Up Clustering

Initially each instance is in its own cluster

Clusters are continually merged

Will discuss the HAC Algorithm (Hierarchical Agglomerative Clustering)

5

Page 6: Unsupervised and Semi- Supervised Learninglily/Teaching/498FSlides/12-Unsupervised...L. Mihalkova, CSMC498F, Fall2010 Administrativia This week continuing on unsupervised learning

L. Mihalkova, CSMC498F, Fall2010

HAC

Input: , real-number vectors

Initialize clusters: each becomes its own cluster

Iterate:

Find two most similar clusters ci and cj

Replace ci and cj with ci ∪ cj

6

X̄i

{X̄1, X̄2, . . . X̄n}

Page 7: Unsupervised and Semi- Supervised Learninglily/Teaching/498FSlides/12-Unsupervised...L. Mihalkova, CSMC498F, Fall2010 Administrativia This week continuing on unsupervised learning

L. Mihalkova, CSMC498F, Fall2010

Another Look at the Dendrogram

7

On

line ed

ition

(c)2009 C

amb

ridg

e UP

17.2S

ingle-lin

kan

dcom

plete-link

clusterin

g383

1.0 0.8 0.6 0.4 0.2 0.0

NYSE closing averages

Hog prices tumble

Oil prices slip

Ag trade reform.

Chrysler / Latin America

Japanese prime minister / Mexico

Fed holds interest rates steady

Fed to keep interest rates steady

Fed keeps interest rates steady

Fed keeps interest rates steady

Mexican markets

British FTSE index

War hero Colin Powell

War hero Colin PowellLloyd’s CEO questioned

Lloyd’s chief / U.S. grilling

Ohio Blue CrossLawsuit against tobacco companies

suits against tobacco firms

Indiana tobacco lawsuitViag stays positive

Most active stocksCompuServe reports loss

Sprint / Internet access service

Planet Hollywood

Trocadero: tripling of revenues

Back!to!school spending is up

German unions split

Chains may raise prices

Clinton signs law

!F

igu

re17.5

Ad

end

rog

ramo

fa

com

plete-lin

kclu

stering

.T

he

same

30d

ocu

men

tsw

ereclu

steredw

ithsin

gle-lin

kclu

stering

inF

igu

re17.1.

Page 8: Unsupervised and Semi- Supervised Learninglily/Teaching/498FSlides/12-Unsupervised...L. Mihalkova, CSMC498F, Fall2010 Administrativia This week continuing on unsupervised learning

L. Mihalkova, CSMC498F, Fall2010

Computing Similarity Between Clusters

Single-link: Similarity between two clusters ci and cj is computed as similarity between their most similar members

Complete-link: Computed as similarity between most dissimilar members

Group-average clustering: similarity of two clusters computed as the average similarity over all possible pairs of instances in the clusters

8

Page 9: Unsupervised and Semi- Supervised Learninglily/Teaching/498FSlides/12-Unsupervised...L. Mihalkova, CSMC498F, Fall2010 Administrativia This week continuing on unsupervised learning

L. Mihalkova, CSMC498F, Fall2010

Top-Down Clustering

Also called “Divisive”

Start with a single cluster containing all instances

Use a flat clustering algorithm as a sub-routine to split clusters

9

Page 10: Unsupervised and Semi- Supervised Learninglily/Teaching/498FSlides/12-Unsupervised...L. Mihalkova, CSMC498F, Fall2010 Administrativia This week continuing on unsupervised learning

L. Mihalkova, CSMC498F, Fall2010

Semi-Supervised Learning

Lots of different algorithms

e.g. EM

We’ll discuss a classic: co-training (Blum & Mitchell 98)

Awarded 10-Year best paper award in 2008

10

Page 11: Unsupervised and Semi- Supervised Learninglily/Teaching/498FSlides/12-Unsupervised...L. Mihalkova, CSMC498F, Fall2010 Administrativia This week continuing on unsupervised learning

L. Mihalkova, CSMC498F, Fall2010

Co-Training Algorithm

Input:

Set of labeled instances L

Set of unlabeled instances U

The attributes of each instance can be split into two “views” and that satisfy the following requirements

Each is sufficient for classification

The views are independent

11

X̄i

X̄1i X̄2

i

Page 12: Unsupervised and Semi- Supervised Learninglily/Teaching/498FSlides/12-Unsupervised...L. Mihalkova, CSMC498F, Fall2010 Administrativia This week continuing on unsupervised learning

L. Mihalkova, CSMC498F, Fall2010

Co-Training AlgorithmCreate a pool U’ consisting of u randomly chosen examples from ULoop for k iterations:

Use L to train h1 only on featuresUse L to train h2 only on features

Use h1 to label p positive and n negative examples from U’

Use h2 to label p positive and n negative examples from U’Add self-labeled examples to L

Add 2p + 2n examples from U to U’

12

X̄1i

X̄2i

Page 13: Unsupervised and Semi- Supervised Learninglily/Teaching/498FSlides/12-Unsupervised...L. Mihalkova, CSMC498F, Fall2010 Administrativia This week continuing on unsupervised learning

L. Mihalkova, CSMC498F, Fall2010

Co-Training Prediction

Given a new instance ,

13

X̄i = 〈X̄1i , X̄2

i 〉

P (Y = y|X̄i) ∝ Ph1(Y = y|X̄1i )Ph2(Y = y|X̄2

i )

Page 14: Unsupervised and Semi- Supervised Learninglily/Teaching/498FSlides/12-Unsupervised...L. Mihalkova, CSMC498F, Fall2010 Administrativia This week continuing on unsupervised learning

L. Mihalkova, CSMC498F, Fall2010

Example: Document classification

View 1: Text on document

View 2: Anchor text in hyperlinks to document

Task: classify page as course webpage or not

14

Page 15: Unsupervised and Semi- Supervised Learninglily/Teaching/498FSlides/12-Unsupervised...L. Mihalkova, CSMC498F, Fall2010 Administrativia This week continuing on unsupervised learning

L. Mihalkova, CSMC498F, Fall2010

Co-Testing

Co-training-like idea for active learning

Loop for k iterations

Use L to train h1 only on featuresUse L to train h2 only on features

Treat h1 and h2 as a committee of size 2 and request labels of unlabeled instances on which they disagree

15

X̄1i

X̄2i