LEARNING AND APPLICATIONS · Microarray data analysis ... Microarray data analysis Regularization...
Transcript of LEARNING AND APPLICATIONS · Microarray data analysis ... Microarray data analysis Regularization...
LEARNING AND APPLICATIONSREGULARIZATION METHODS FOR HIGH DIMENSIONAL LEARNING
Francesca Odone and Lorenzo [email protected] - [email protected]
Regularization Methods for High Dimensional Learning Learning and applications
PLAN
Learning and engineering applications: why?
Examples of in house applicationsFace and object detectionMedical image analysisMicroarray data analysis
Regularization Methods for High Dimensional Learning Learning and applications
LET’S GO BACK TO THE BEGINNING
The goal is not to memorize but to generalize (or to predict)
Given a set of data
(x1, y1), . . . , (xn, yn)
find a function f which is a good predictor of y for a future input x
f (x) = y
Regularization Methods for High Dimensional Learning Learning and applications
WHAT IS IT USEFUL FOR?
The learning paradigm is useful whenever the underlying process ispartially unknown,too complex, ortoo noisy
to be modeled as a sequence of instructions.
Regularization Methods for High Dimensional Learning Learning and applications
THE APPLICATIONS WE DEAL WITH
Computer visionFace detection and recognitionObject detectionImage annotationDynamic events and actions analysis
Medical Image AnalysisAutomatic MR annotationDictionary learning
Computational biologyGene selection
Regularization Methods for High Dimensional Learning Learning and applications
THE APPLICATIONS WE DEAL WITH
Computer visionFace detection and recognitionObject detectionImage annotationDynamic events and actions analysis
Medical Image AnalysisAutomatic MR annotationDictionary learning
Computational biologyGene selection
Regularization Methods for High Dimensional Learning Learning and applications
THE APPLICATIONS WE DEAL WITH
Computer visionFace detection and recognitionObject detectionImage annotationDynamic events and actions analysis
Medical Image AnalysisAutomatic MR annotationDictionary learning
Computational biologyGene selection
Regularization Methods for High Dimensional Learning Learning and applications
THE APPLICATIONS WE DEAL WITH
Computer visionFace detection and recognitionObject detectionImage annotationDynamic events and actions analysis
Medical Image AnalysisAutomatic MR annotationDictionary learning
Computational biologyGene selection
Regularization Methods for High Dimensional Learning Learning and applications
THE APPLICATIONS WE DEAL WITH
Computer visionFace detection and recognitionObject detectionImage annotationDynamic events and actions analysis
Medical Image AnalysisAutomatic MR annotationDictionary learning
Computational biologyGene selection
Regularization Methods for High Dimensional Learning Learning and applications
THE APPLICATIONS WE DEAL WITH
Computer visionFace detection and recognitionObject detectionImage annotationDynamic events and actions analysis
Medical Image AnalysisAutomatic MR annotationDictionary learning
Computational biologyGene selection
Regularization Methods for High Dimensional Learning Learning and applications
THE APPLICATIONS WE DEAL WITH
Computer visionFace detection and recognitionObject detectionImage annotationDynamic events and actions analysis
Medical Image AnalysisAutomatic MR annotationDictionary learning
Computational biologyGene selection
Regularization Methods for High Dimensional Learning Learning and applications
PLAN
Learning and engineering applications: why?
Examples of in house applicationsFace and object detectionMedical image analysisMicroarray data analysis
Regularization Methods for High Dimensional Learning Learning and applications
LEARNING FROM IMAGES
Object detection, image categorization and, more in general,image understanding are difficult problemsLearning from examples has been accepted as a viable way todeal with such problems, addressing noise and intra-classvariability by collecting appropriate data and finding suitabledescriptions
Images are relatively easy to gather
Regularization Methods for High Dimensional Learning Learning and applications
IMAGE DESCRIPTIONS
WITH OVERCOMPLETE FEATURE SETS
Overcomplete general purpose sets of features are effective formodeling visual information
Many object classes have peculiar intrinsic structures that can bebetter appreciated if one looks for symmetries or localgeometries
Examples of features: wavelets, ranklets, chirplets, rectanglefeatures, ...Examples of problems: face detection [Heisele et al., Viola &Jones, Destrero et al.], pedestrian detection [Oren et al.], cardetection [Papageorgiou & Poggio]
The approach is inspired by biological systemsSee, for instance, B. A. Olshauser and D. J. Field “Sparse codingwith an over-complete basis set: a strategy employed by V1?”1997
Regularization Methods for High Dimensional Learning Learning and applications
FACE DETECTIONDESTRERO ET AL, 2009
THE CLASSIFICATION PROBLEM
It is a (binary) classification problem:→ each image region can either be a face or notWe start from a training set of faces and non-faces images:
{(x1, y1), . . . , (xn, yn)}
xi is a raw vector encoding the gray levels of image Ii ,yi = {−1,1} according to whether the image is a face or not
IMAGE REPRESENTATION
We represent images as rectangle feature vectors:
xi → (φ1(xi ), . . . , φp(xi ))
Regularization Methods for High Dimensional Learning Learning and applications
FACE DETECTION
ASSUMPTION
We assumeΦβ = Y
where Φ = {Φij} is the data matrix; β = (β1, ..., βp)T vector ofunknown weights to be estimated; Y = (y1, ..., yn)T output labels
Usually p is big; existence of the solution is ensured, uniquenessis notThe overcomplete set contains many correlated featuresThus, the problem is ill-posed. We resort to regularization.
SELECT FACE FEATURES
L1 regularization allow us to select a sparse subset of meaningfulfeatures for the problem, with the aim of discarding correlated ones
minβ∈IRp
‖Y − βΦ‖2 + λ ‖β‖1 .
Regularization Methods for High Dimensional Learning Learning and applications
A SAMPLED VERSION OF THE ALGORITHM
Applying the algorithm starting from the entire set of feature is notcomputationally feasible (Φ: 4000x64000 ' 1GB)
We create many subsets offeatures randomly sampledwith repetitionWe run the algorithmseparately on each subsetWe keep only featuresselected in every run in whichthey were present
S0
Subset 1
Random extractions of10% features w. repetition
Subset 200Subset 2
Selectedfeatures 1
Selectedfeatures 2
Selectedfeatures 200
ThresholdedLandweber
S1
Keep features selectedin every run in whichthey were present
Regularization Methods for High Dimensional Learning Learning and applications
THE FINAL SET OF FACE FEATURES
Positive and negative samples from thetraining set
Notice how vertical symmetries are notcaptured by selected features
Regularization Methods for High Dimensional Learning Learning and applications
THE SOLUTION DEPENDS ON THE TRAINING DATA
In MIT+CMU training set all imagesare registered and well cropped
Vertical symmetries are captured byselected features
Regularization Methods for High Dimensional Learning Learning and applications
FACE DETECTION
FACE CLASSIFICATION
Elastic net regularization embeds both feature selection andprediction functionalitiesAs suggested in (Candes & Tao, 2007) in order to improve theclassification performance one could use L2 regularization on thereduced data representation.Since a main requirement of our application is real-timeperformance we adopt a linear SVM for classification:
L1 + SVM gives us sparsity both on the representation and on thedataset and thus fewer computations
Regularization Methods for High Dimensional Learning Learning and applications
FACE CLASSIFICATION RESULTS
0.7
0.75
0.8
0.85
0.9
0.95
1
0 0.005 0.01 0.015 0.02
2 stages feature selection2 stages feature selection + correlation
Viola+Jones feature selection using our same dataViola+Jones cascade performance
Our strategy for feature selection outperforms the one by Viola andJones using the same dataset
Adaboost seems to need a big number of examples to be trainedeffectively (we used just 4000 examples)
Regularization Methods for High Dimensional Learning Learning and applications
FROM FACE CLASSIFICATION TO FACE DETECTIONWHY IS IT DIFFICULT?
It is very unlikely to find a face in a real image→ high number of false positives
Image dimensions:384x222px
∼ 6.5 · 105 tests in amulti-scale search with abase window of 19x19px
Only 11 faces!
Regularization Methods for High Dimensional Learning Learning and applications
FACE DETECTION:A CASCADE OF CLASSIFIERS
For each image we have many tests to do→ few positive examples and many negative examplesWe build a coarse-to-fine classification architecture:→ Simpler classifiers are used to reject the majority ofsub-windows→ More complex classifiers allow us to achieve low false positiverates
Regularization Methods for High Dimensional Learning Learning and applications
FACE DETECTION:A CASCADE OF CLASSIFIERS
1 Start from set S of selected features2 Choose at least 3 mutually distant features3 Train a linear SVM classifier using those features and test it on a
validation set4 Do we reach target performance (h = 99,5%; f = 50%)?
YES Finalize the classifier, remove used features from S and go to (2).NO Add a feature from S and go to (3).
F =K∏
i=1
fi and H =K∏
i=1
hiRegularization Methods for High Dimensional Learning Learning and applications
A PIPELINE FOR FACE AUTHENTICATIONDESTRERO ET AL., 2009
Regularization Methods for High Dimensional Learning Learning and applications
RESULTS
Regularization Methods for High Dimensional Learning Learning and applications
PLAN
Learning and engineering applications: why?
Examples of in house applicationsFace and object detectionMedical image analysisMicroarray data analysis
Regularization Methods for High Dimensional Learning Learning and applications
AUTOMATIC ANNOTATION OF MR IMAGES:SYNOVITIS ASSESSMENTBASSO ET AL, 2010
Setting: children under 16 affected by Juvenile Idiopatic ArthritisGoal: to measure the volume of the inflamed synovia in 3D MRimagesOur problem: to classify each voxel of the MRThe approach is supervised, we use for training the manualannotations performed by experts
Regularization Methods for High Dimensional Learning Learning and applications
VOXEL-BASED IMAGE DESCRIPTION
Each voxel is represented with a set of cues chosen among theones commonly used for voxel classificationThey include the intensity of the voxel and its neighbors, theposition of the voxel, the multiscale 2-jets, the vesselnessmeasures
x→ φ(x) = {ϕ1, . . . , ϕk}
Regularization Methods for High Dimensional Learning Learning and applications
MULTI-CUE VOXEL CLASSIFIER
THE DISCRIMINANT FUNCTION
We look for a more flexible discriminant function
f (φ) =∑
(i,j)∈I
αjiK
ji (φ) + b
ASSUMPTION
The k × n basis functions
K ji (φ) = exp
{−||ϕj − ϕj
i ||2
2σ2
}
measure the similarity of φ with an example voxel i with respectto a specific cue j
Regularization Methods for High Dimensional Learning Learning and applications
MULTI-CUE VOXEL CLASSIFIER
THE DISCRIMINANT FUNCTION
We look for a more flexible discriminant function
f (φ) =∑
(i,j)∈I
αjiK
ji (φ) + b
MODEL SELECTION
The optimal subset I of basis fuctions, on which f depends, maybe inferred from the data by means of feature selection.Starting from a manually annotated training set of n voxels wecompute the n × kn matrix K
K = (K1, . . . ,Kk )
and look for a sparse vector α so that
y = Kα
Regularization Methods for High Dimensional Learning Learning and applications
MULTI-CUE VOXEL CLASSIFIER
THE DISCRIMINANT FUNCTION
We look for a more flexible discriminant function
f (φ) =∑
(i,j)∈I
αjiK
ji (φ) + b
LEARNING ALGORITHM
The goal of learning is to find the optimal affine combinationdefined by the coefficients αj
i and b. This is achieved with L2regularization on the restricted matrix K̂
Regularization Methods for High Dimensional Learning Learning and applications
RESULTS
Multi-cue classifier if 15 times sparser than SVM and approximately40 times faster.
Regularization Methods for High Dimensional Learning Learning and applications
PLAN
Learning and engineering applications: why?
Examples of in house applicationsFace and object detectionMedical image analysisMicroarray data analysis
Regularization Methods for High Dimensional Learning Learning and applications
MACHINE LEARNING AND THE ANALYSIS OF
MICROARRAYS
GOALS
Design methods able to identify a gene segnature, i.e., a panelof genes potentially interesting for further screeningLearn the gene signatures, i.e., select the most discriminantsubset of genes on the available data
Regularization Methods for High Dimensional Learning Learning and applications
MACHINE LEARNING AND THE ANALYSIS OF
MICROARRAYS
A TYPICAL "-OMICS" SCENARIO
High dimensional data - Few samples per classtenths of data - tenths of thousands genes→ Variable selectionHigh risk of selection biasdata distortion arising from the way the data are collecteddue to the small amount of data available→ Model assessment needed
Possibily find ways to incorporate prior knowledgeDeal with data visualization
Regularization Methods for High Dimensional Learning Learning and applications
GENE SELECTION
THE PROBLEM
Select a small subset of input variables (genes) which are usedfor building classifiers
ADVANTAGES:it is cheaper to measure less variablesthe resulting classifier is simpler and potentially fasterprediction accuracy may improve by discarding irrelevantvariablesidentifying relevant variables gives useful information about thenature of the corresponding classification problem (biomarkerdetection)
Regularization Methods for High Dimensional Learning Learning and applications
VARIABLE SELECTION IN BIOINFORMATICS
MOTIVATIONS
Ease Computational Burden:Discard the (apparently) less significant features and train in asimplified space: alleviate the curse of dimensionalityEnhance Information:Highlight (and rank) the most important features and improve theknowledge of the underlying process.
COMMONLY ADOPTED METHODS
Statistical Filters (t-test,S/N ratio,...)Learning Techniques (embedded methods, wrapper methods,stepwise feature elimination,..)Mapping Methods (“Metagenes”: simplified model for pathways,even though biological suggestions require caution
Regularization Methods for High Dimensional Learning Learning and applications
STATISTICAL FILTERS
These approaches are well established in the gene selectionliterature. One considers the various measurements associated toeach gene (column of the data matrix X)
T TEST
For each column of X we compute
t =µ1 − µ2√σ1n1
+ σ2n2
were subscripts 1 and 2 stand for positive and negative examplesGenes are ranked with respect to the t valueA threshold is set to perform gene selection
Regularization Methods for High Dimensional Learning Learning and applications
GENE SELECTION WITH L1-L2 REGULARIZATIONMOSCI ET., 2008
minβ∈IRp
‖Y − βX‖2 + τ(‖β‖1 + ε ‖β‖22).
Consistency guaranteed - the more samples available thebetter the estimatorMultivariate - it takes into account many genes at once
OUTPUT
one-parameter (ε) family of nested lists with equivalent predictionability and increasing correlation among genes
ε→ 0: minimal list of prototype genesε1 < ε2 < ε3 < . . .: longer lists including correlated genes
Regularization Methods for High Dimensional Learning Learning and applications
DOUBLE OPTIMIZATION APPROACHMOSCI ET., 2008
VARIABLE SELECTION + CLASSIFICATION:
Variable selection step (L1-L2)
minβ∈IRp
‖Y − βX‖2 + τ(‖β‖1 + ε ‖β‖22).
Classification step (RLS)
‖Y − βX‖22 + λ ‖β‖2
2
for each ε we have to choose λ and τ
Regularization Methods for High Dimensional Learning Learning and applications
A SELECTION BIAS AWARE FRAMEWORKBARLA ET AL, 2008
λ→ (λ1, . . . , λA)τ → (τ1, . . . , τB)the optimal pair (λ∗, τ∗) is one of the possible A · B pairs (λ, τ)
Regularization Methods for High Dimensional Learning Learning and applications
ALGORITHMIC AND COMPUTATIONAL ISSUES
FROM MANY LISTS TO ONE FINAL LIST
Criterion based on frequency – i.e., occurrences of a geneacross all the listsSince we have a correlation parameter we can tune and varythe list length
FROM 1 WEEK COMPUTATION TO...?
Computational time for LOO (for one task)time1−optim = (2.5s to 25s)depending on the correlation parameter
total time = A · B · Nsamples · time1−optim∼ 20 · 20 · 30 · time1−optim
∼ 2 · 104s to 2 · 105s
6 tasks→ 1 week!!
Regularization Methods for High Dimensional Learning Learning and applications
COMPUTATION OVER A GRID
Grid middleware: OurGrid, a multiplatform grid that can deal withhosts not directly connected to the Internet.Used by the ShareGrid project, which involves severaluniversities in Northern Italy.Cheap solution: 60 PCs (students: lab)
Regularization Methods for High Dimensional Learning Learning and applications
GENE SELECTION WITH L1-L2 REGULARIZATIONDE MOL, MOSCI, TRASKINE, VERRI, 2008
Regularization Methods for High Dimensional Learning Learning and applications
FINDING STRUCTURED GENE SIGNATURES
How do we estimate groups of correlated genes?We may rely on the nested structure obtained by varying thecorrelation parameterWe consider the minimal list list0 as a starting point of anagglomerative clustering technique , based on the Pearsondistance:
d(Xi ,Xj ) =corr(Xi ,Xj )√var(Xi )var(Xj )
evaluating the normalized correlation between two columns Xiand Xj of the data matrix X
Regularization Methods for High Dimensional Learning Learning and applications
AN EXAMPLE APPLICATION
IDENTIFYING THE HYPOXIA SIGNATURE OF NEUROBLASTOMA VIAREGULARIZATION
joint research with IGG Molecular Biology lab
Dataset: 9 neuroblastoma (NB) cell lines cultured under normoxic andhypoxic conditions. Technology: Affymetrix GeneChip U133 plus 2.0.
t-test: no genes selected!
l1l2 protocol: 11 genes for the minimal list (frequency> 30%)
Regularization Methods for High Dimensional Learning Learning and applications
REFERENCES
A. Destrero, C. De Mol, F. Odone, A. Verri. "A Regularized Framework for FeatureSelection in Face Detection and Authentication". IJCV (2009).
A. Destrero, C. De Mol, F. Odone, A. Verri."A sparsity-enforcing method forlearning face features.". IEEE Transactions on Image Processing 18 (2009):188-201.
C. Basso, M. Santoro, A. Verri and M. Esposito. "Segmentation of InflamedSynovia in Multi-Modal MRI." In Proc. of IEEE ISBI 2009, June 28 - July 1 2009.
Fardin, Paolo, Cornero, Andrea, Annalisa Barla, Sofia Mosci, Acquaviva,Massimo, Lorenzo Rosasco, Gambini, Claudio, Alessandro Verri, Varesio, Luigi,"Identification of multiple hypoxia signatures in neuroblastoma cell lines by l1-l2regularization and data reduction", Journal of Biomedicine and Biotechnology,2010
A. Barla, S. Mosci, L. Rosasco and A. Verri. "A method for robust variableselection with significance assessment." Proc. of ESANN, European Symposiumon Artificial Neural Networks 2008.
C. De Mol, S. Mosci, M. Traskine and A. Verri; "A Regularized Method forSelecting Nested Groups of Relevant Genes from Microarray Data" Journal ofComputational Biology 2008.
Regularization Methods for High Dimensional Learning Learning and applications