Multi-Label Prediction via Compressed Sensing By Daniel Hsu, Sham M. Kakade, John Langford, Tong...
-
Upload
addison-warren -
Category
Documents
-
view
215 -
download
1
Transcript of Multi-Label Prediction via Compressed Sensing By Daniel Hsu, Sham M. Kakade, John Langford, Tong...
Multi-Label Prediction via Compressed Sensing
By
Daniel Hsu, Sham M. Kakade, John Langford, Tong Zhang
(NIPS 2009)
Presented by: Lingbo LiECE, Duke University
01-22-2010
* Some notes are directly copied from the original paper.
Outline
• Introduction
• Preliminaries
• Learning Reduction
• Compression and Reconstruction
• Empirical Results
• Conclusion
Introduction
• Large database of images;
• Goal: predict who or what is in a given image
Samples: images with corresponding labels
is the total number of entities in the whole database.
• One-against-all algorithm:
Learn a binary predictor for each label (class).
Computation is expensive when is large. e.g. ,
• Assume the output vector is sparse.
310 410
xd
dyyyy }1,0{),...,,( 21
d
d
Introduction
{ , , J , , , }Mike James ulie Nick Joe Linda
17y
11
1
11
1
8y5y
97y
31y
56y
{0,1}dy
image x
Main idea: “Learn to predict compressed label vectors, and then use sparse reconstruction algorithm to recover uncompressed labels from these predictions”
Compressed sensing:For any sparse vector , it is highly possible to compress to logarithmic in dimension with perfect reconstruction of .
y
d y
Preliminaries
• : input space; • : output (label) space, where • Training data: • Goal: to learn the predictor with low mean-
squared error
Assume• is very large;• Expected value is sparse, with only a few non-zero
entries.
d
Learning reduction
• Linear compression function where
• Goal: to learn a predictor
Predict the label y with the Predictor F
Predict the compressed label Ay with
the Predictor H
Samples Compressed Samples
To minimize To minimize
Reduction-training and prediction
Reconstruction Algorithm R:
If is close to , then should be close to
Compression Functions
Examples of valid compression functions:
Reconstruction Algorithms
Examples of valid reconstruction algorithms: iterative and greedy algorithms
• Orthogonal Matching Pursuit (OMP)
• Forward-Backward Greedy (FoBa)
• Compressive Sampling Matching Pursuit (CoSaMP)
General Robustness Guarantees
What if the reduction create a problem harder to solve than the original problem?
Sparsity error is defined as
where is the best k-sparse approximation of y
Linear Prediction
• If there is a perfect linear predictor of , then there will be a perfect linear predictor of :
•
•
Experimental Results• Experiment 1: Image data (collected by the ESP Game)
65k images, 22k unique labels; Keep the 1k most frequent labels;
the least frequent occurs 39 times while the most frequent occurs about 12k times, 4 labels on average per image;
Half of the data as training and half as testing.
• Experiment 2: Text data (collected from http://delicious.com/)
16k labeled web page, 983 unique labels;
the least frequent occurs 21 times, the most frequent occurs about 6500 times, 19 labels on average per web page;
Half of the data as training and half as testing.
• Compression function A: select m random rows of the Hadamard matrix.
• Test the greedy and iterative reconstruction algorithm: OMP, FoBa, CoSaMp and Lasso.
• Use correlation decoding (CD) as a baseline method for comparisons.
10241024
Experimental Results
MeasureMeasure the precision
2
2
k22 y distance yl
Top two: image data; Bottom: text data
Conclusion
• Application of compressed sensing to multi-label prediction problem with output sparsity;
• Efficient reduction algorithm with the number of predictions equal to logarithmic in original labels;
• Robustness Guarantees from compressed case to the original case; and vice versa for the linear prediction setting.