Unsupervised and Semi- Supervised Learninglily/Teaching/498FSlides/12-Unsupervised...L. Mihalkova,...
Transcript of Unsupervised and Semi- Supervised Learninglily/Teaching/498FSlides/12-Unsupervised...L. Mihalkova,...
Unsupervised and Semi-Supervised Learning
Machine Learning, Fall 2010
1
L. Mihalkova, CSMC498F, Fall2010
Announcement
2
!"#$%&'($")*+,,-$")$").$-"/)0$)
.%'12'0+)*34$$5)-")6$&720+%)*3-+"3+)
•! !"#$%&'()*+,-./0.$*1)*2343*
•! 56*7857*9,,/*4422*
•! :$,/*;</=>?@3</*ABCD"E*%6'FG%*<$,-C&.&H*
8+'%)9:;)6*)#'3250<)'",=+%)04+,+)'"1)$04+%)>2+,($",?)
I"(*%",#B&*5*D,*E,*D$'&#'E.*%F",,BJ*
I"'E*&,*'&/C%%C,6%*F,//CK..%*B,,G*L,$J*
I"'E*%",#B&*5*.M<.FE*L$,/*D$'&#'E.*%F",,BJ*
N,O*&,*5*F",,%.*'*<$,D$'/J**
N,O*&,*5*'<<B(J**
L. Mihalkova, CSMC498F, Fall2010
AdministrativiaThis week continuing on unsupervised learning
Some more of a different flavor clustering
Semi-supervised learning
Intersections between ensembles, active, un-sup and semi-sup learning
May start Reinforcement Learning
Reading
Optional Chapter 17 of Manning, Raghavan, Schuetze “Information Retrieval” Book: http://nlp.stanford.edu/IR-book/pdf/17hier.pdf
3
L. Mihalkova, CSMC498F, Fall2010
Hierarchical Clustering
As opposed to k-means, which produces a “flat” clustering, here we produce a hierarchy of clusters
4
On
line ed
ition
(c)2009 C
amb
ridg
e UP
17.2S
ingle-lin
kan
dcom
plete-link
clusterin
g383
1.0 0.8 0.6 0.4 0.2 0.0
NYSE closing averages
Hog prices tumble
Oil prices slip
Ag trade reform.
Chrysler / Latin America
Japanese prime minister / Mexico
Fed holds interest rates steady
Fed to keep interest rates steady
Fed keeps interest rates steady
Fed keeps interest rates steady
Mexican markets
British FTSE index
War hero Colin Powell
War hero Colin PowellLloyd’s CEO questioned
Lloyd’s chief / U.S. grilling
Ohio Blue CrossLawsuit against tobacco companies
suits against tobacco firms
Indiana tobacco lawsuitViag stays positive
Most active stocksCompuServe reports loss
Sprint / Internet access service
Planet Hollywood
Trocadero: tripling of revenues
Back!to!school spending is up
German unions split
Chains may raise prices
Clinton signs law
!F
igu
re17.5
Ad
end
rog
ramo
fa
com
plete-lin
kclu
stering
.T
he
same
30d
ocu
men
tsw
ereclu
steredw
ithsin
gle-lin
kclu
stering
inF
igu
re17.1.
From Ch17 ofIR Book
L. Mihalkova, CSMC498F, Fall2010
Bottom-Up Clustering
Initially each instance is in its own cluster
Clusters are continually merged
Will discuss the HAC Algorithm (Hierarchical Agglomerative Clustering)
5
L. Mihalkova, CSMC498F, Fall2010
HAC
Input: , real-number vectors
Initialize clusters: each becomes its own cluster
Iterate:
Find two most similar clusters ci and cj
Replace ci and cj with ci ∪ cj
6
X̄i
{X̄1, X̄2, . . . X̄n}
L. Mihalkova, CSMC498F, Fall2010
Another Look at the Dendrogram
7
On
line ed
ition
(c)2009 C
amb
ridg
e UP
17.2S
ingle-lin
kan
dcom
plete-link
clusterin
g383
1.0 0.8 0.6 0.4 0.2 0.0
NYSE closing averages
Hog prices tumble
Oil prices slip
Ag trade reform.
Chrysler / Latin America
Japanese prime minister / Mexico
Fed holds interest rates steady
Fed to keep interest rates steady
Fed keeps interest rates steady
Fed keeps interest rates steady
Mexican markets
British FTSE index
War hero Colin Powell
War hero Colin PowellLloyd’s CEO questioned
Lloyd’s chief / U.S. grilling
Ohio Blue CrossLawsuit against tobacco companies
suits against tobacco firms
Indiana tobacco lawsuitViag stays positive
Most active stocksCompuServe reports loss
Sprint / Internet access service
Planet Hollywood
Trocadero: tripling of revenues
Back!to!school spending is up
German unions split
Chains may raise prices
Clinton signs law
!F
igu
re17.5
Ad
end
rog
ramo
fa
com
plete-lin
kclu
stering
.T
he
same
30d
ocu
men
tsw
ereclu
steredw
ithsin
gle-lin
kclu
stering
inF
igu
re17.1.
L. Mihalkova, CSMC498F, Fall2010
Computing Similarity Between Clusters
Single-link: Similarity between two clusters ci and cj is computed as similarity between their most similar members
Complete-link: Computed as similarity between most dissimilar members
Group-average clustering: similarity of two clusters computed as the average similarity over all possible pairs of instances in the clusters
8
L. Mihalkova, CSMC498F, Fall2010
Top-Down Clustering
Also called “Divisive”
Start with a single cluster containing all instances
Use a flat clustering algorithm as a sub-routine to split clusters
9
L. Mihalkova, CSMC498F, Fall2010
Semi-Supervised Learning
Lots of different algorithms
e.g. EM
We’ll discuss a classic: co-training (Blum & Mitchell 98)
Awarded 10-Year best paper award in 2008
10
L. Mihalkova, CSMC498F, Fall2010
Co-Training Algorithm
Input:
Set of labeled instances L
Set of unlabeled instances U
The attributes of each instance can be split into two “views” and that satisfy the following requirements
Each is sufficient for classification
The views are independent
11
X̄i
X̄1i X̄2
i
L. Mihalkova, CSMC498F, Fall2010
Co-Training AlgorithmCreate a pool U’ consisting of u randomly chosen examples from ULoop for k iterations:
Use L to train h1 only on featuresUse L to train h2 only on features
Use h1 to label p positive and n negative examples from U’
Use h2 to label p positive and n negative examples from U’Add self-labeled examples to L
Add 2p + 2n examples from U to U’
12
X̄1i
X̄2i
L. Mihalkova, CSMC498F, Fall2010
Co-Training Prediction
Given a new instance ,
13
X̄i = 〈X̄1i , X̄2
i 〉
P (Y = y|X̄i) ∝ Ph1(Y = y|X̄1i )Ph2(Y = y|X̄2
i )
L. Mihalkova, CSMC498F, Fall2010
Example: Document classification
View 1: Text on document
View 2: Anchor text in hyperlinks to document
Task: classify page as course webpage or not
14
L. Mihalkova, CSMC498F, Fall2010
Co-Testing
Co-training-like idea for active learning
Loop for k iterations
Use L to train h1 only on featuresUse L to train h2 only on features
Treat h1 and h2 as a committee of size 2 and request labels of unlabeled instances on which they disagree
15
X̄1i
X̄2i