2 Task Embedding for Model Recommendation · Model recommendation Feature Extractor Zoo 3 X L(y)...

30
TASK2VEC: Task Embedding for Model Recommendation Subhransu Maji College of Information and Computer Sciences University of Massachusetts, Amherst http://people.cs.umass.edu/smaji February 19, 2019 @ ICERM, Brown University https://arxiv.org/abs/1902.03545

Transcript of 2 Task Embedding for Model Recommendation · Model recommendation Feature Extractor Zoo 3 X L(y)...

Page 1: 2 Task Embedding for Model Recommendation · Model recommendation Feature Extractor Zoo 3 X L(y) recommender Input: Task = (dataset, loss) For each feature extractor architecture

TASK2VEC: Task Embedding for Model RecommendationSubhransu MajiCollege of Information and Computer SciencesUniversity of Massachusetts, Amherst

http://people.cs.umass.edu/smaji

February 19, 2019 @ ICERM, Brown University

https://arxiv.org/abs/1902.03545

Page 2: 2 Task Embedding for Model Recommendation · Model recommendation Feature Extractor Zoo 3 X L(y) recommender Input: Task = (dataset, loss) For each feature extractor architecture

Task Embedding for Model RecommendationAllesandro,Michael,Rahul,Avinash,Subhransu,Charless,Stefano,Pietro

2

Task={dataset,labels,loss}

Whataresimilartasks?WhatarchitectureshouldIuse?Whatpre-trainingdataset?Whathyperparameters?DoIneedmoretrainingdata?Howdifficultisthistask?...

IfwehaveauniversalvectorialrepresentationoftaskswecanframeallsortsofinterestingCVapplicationsengineeringproblemsasmeta-learningproblems

Page 3: 2 Task Embedding for Model Recommendation · Model recommendation Feature Extractor Zoo 3 X L(y) recommender Input: Task = (dataset, loss) For each feature extractor architecture

Model recommendation

3FeatureExtractorZoo

X L(y)

recommender

Input:Task=(dataset,loss)

ForeachfeatureextractorarchitectureF:1.TrainclassifieronF(dataset)2.Computevalidationperformance

Output:bestperformingmodel

BruteForce:

Input:Task=(dataset,loss)

1.Computetaskembeddingt=E(Task)2.PredictbestextractorF=M(t)2.TrainclassifieronF(dataset)3.Computevalidationperformance

Output:bestperformingmodel

Taskrecommendation:

Page 4: 2 Task Embedding for Model Recommendation · Model recommendation Feature Extractor Zoo 3 X L(y) recommender Input: Task = (dataset, loss) For each feature extractor architecture

1. Givenatask,trainaclassifierwiththetasklossonfeaturesfromageneric“probenetwork”

2. Computegradientsofprobenetworkparametersw.r.t.taskloss

3. Usestatisticsoftheprobeparametergradientsasthefixeddimensionaltaskembedding

4

Intuition:Fprovidesinformationaboutthesensitivityofthetaskperformancetosmallperturbationsofparametersintheprobenetwork

Ex⇠p̂

KLp

0(y|x)p✓

(y|x) = �✓ · F · �✓ + o(�✓2),

Task embedding using Fisher Information

Page 5: 2 Task Embedding for Model Recommendation · Model recommendation Feature Extractor Zoo 3 X L(y) recommender Input: Task = (dataset, loss) For each feature extractor architecture

5

Properties of TASK2VECembedding

(xi, yi), i = 1 . . . n, yi 2 {0, 1}Dataset:

Classifier:

FIMforcrossentropylossforthelastlayer:

Fw =1

N

X

i

pi(1� pi)�(xi)�(xi)T

pi = �

�w

T�(xi)

Twolayernetwork

x ! �(x)

Page 6: 2 Task Embedding for Model Recommendation · Model recommendation Feature Extractor Zoo 3 X L(y) recommender Input: Task = (dataset, loss) For each feature extractor architecture

1. Invariancetolabelspace2. Encodestaskdifficulty3. Encodestaskdomain4. Encodesusefulfeaturesforthetask

6

Properties of TASK2VECembedding

(xi, yi), i = 1 . . . n, yi 2 {0, 1}Dataset:

Classifier:

FIMforcrossentropylossforthelastlayer:

Fw =1

N

X

i

pi(1� pi)�(xi)�(xi)T

pi = �

�w

T�(xi)

D =1

N

X

i

�(xi)�(xi)T

Representativedomainembedding

Page 7: 2 Task Embedding for Model Recommendation · Model recommendation Feature Extractor Zoo 3 X L(y) recommender Input: Task = (dataset, loss) For each feature extractor architecture

Properties of TASK2VECembedding1. Binarytasksonunitsquare,

i.e.,eachtileisatask2. 10RandomReLUfeatures,i.e.,

φᵢ=max(0,aᵢx+bᵢy+cᵢ)3. T-SNEtomap10x10FIMto2D

7

Page 8: 2 Task Embedding for Model Recommendation · Model recommendation Feature Extractor Zoo 3 X L(y) recommender Input: Task = (dataset, loss) For each feature extractor architecture

Properties of TASK2VECembedding1. Binarytasksonunitsquare,

i.e.,eachtileisatask2. 10RandomReLUfeatures,i.e.,

φᵢ=max(0,aᵢx+bᵢy+cᵢ)3. T-SNEtomap10x10FIMto2D

8Polynomialdegree3

Page 9: 2 Task Embedding for Model Recommendation · Model recommendation Feature Extractor Zoo 3 X L(y) recommender Input: Task = (dataset, loss) For each feature extractor architecture

9

Robust Fisher Computation

1. ForrealisticCVtaskswewanttousedeepCNNs(e.g.,ResNet)andestimateFIMforalltheparameters.

2. Challenge:FIMcanbehardtoestimate(noisylosslandscape;highdimensions;smalltrainingset)

3. RobustFIM1. Restrictittoadiagonal2. Restrictitasinglevalueper

filter(CNNlayer)3. Robustestimationvia

perturbation

EstimateΛofaGaussianperturbation:

OptimalΛsatisfies:

“TrivialEmbedding”

Page 10: 2 Task Embedding for Model Recommendation · Model recommendation Feature Extractor Zoo 3 X L(y) recommender Input: Task = (dataset, loss) For each feature extractor architecture

10

Similarity measures on the space of tasks

TaskA TaskB

Task={dataset,labels,loss}

Page 11: 2 Task Embedding for Model Recommendation · Model recommendation Feature Extractor Zoo 3 X L(y) recommender Input: Task = (dataset, loss) For each feature extractor architecture

Similarity measures on the space of tasks

Domainsimilarity

11

Unbiasedlookatdatasetbias,TorralbaandEfros,CVPR11

Page 12: 2 Task Embedding for Model Recommendation · Model recommendation Feature Extractor Zoo 3 X L(y) recommender Input: Task = (dataset, loss) For each feature extractor architecture

Similarity measures on the space of tasks

Domainsimilarity

Range/labelsimilarity• e.g.,Taxonomicdistance

12

https://www.pinterest.com/pin/520799144386337065/

Page 13: 2 Task Embedding for Model Recommendation · Model recommendation Feature Extractor Zoo 3 X L(y) recommender Input: Task = (dataset, loss) For each feature extractor architecture

Similarity measures on the space of tasks

Domainsimilarity

Range/labelsimilarity• e.g.,Taxonomicdistance

Transfer“distance”• Fine-tuneonafollowedbyb

13

Taskonomy:DisentanglingTaskTransferLearning,AmirZamir,AlexanderSax,WilliamShen,LeonidasGuibas,JitendraMalik,SilvioSavarese,CVPR18

Page 14: 2 Task Embedding for Model Recommendation · Model recommendation Feature Extractor Zoo 3 X L(y) recommender Input: Task = (dataset, loss) For each feature extractor architecture

Distance measures on TASK2VECembedding

Symmetricdistance

Asymmetric“distance”

14

Page 15: 2 Task Embedding for Model Recommendation · Model recommendation Feature Extractor Zoo 3 X L(y) recommender Input: Task = (dataset, loss) For each feature extractor architecture

MODEL2VEC: Joint embedding of tasks and models

1. Sofarwehavebeenassociatingmodels(featureextractors)withthetaskstheyaretrainedon.

2. Howabout1. legacy/black-boxfeatureextractors?E.g.,SIFT,HOG,Fishervector2. modelsofdifferentcomplexitytrainedonthesamedataset

3. MODEL2VEC:Jointlyembedfeatureextractors(encodedasone-hot-vectors)andtaskssuchthatsimilarityreflectsameta-taskobjective.1. Needstrainingdata

15

Page 16: 2 Task Embedding for Model Recommendation · Model recommendation Feature Extractor Zoo 3 X L(y) recommender Input: Task = (dataset, loss) For each feature extractor architecture

• Tasks[1460]• iNaturalist[207]• CUB200[25]• iMaterialist[228]• DeepFashion[1000]

16

Task Zoo

Page 17: 2 Task Embedding for Model Recommendation · Model recommendation Feature Extractor Zoo 3 X L(y) recommender Input: Task = (dataset, loss) For each feature extractor architecture

• Tasks[1460]• iNaturalist[207]• CUB200[25]• iMaterialist[228]• DeepFashion[1000]

17

Task Zoo

Page 18: 2 Task Embedding for Model Recommendation · Model recommendation Feature Extractor Zoo 3 X L(y) recommender Input: Task = (dataset, loss) For each feature extractor architecture

• Tasks[1460]• iNaturalist[207]• CUB200[25]• iMaterialist[228]• DeepFashion[1000]

18

Task Zoo

Page 19: 2 Task Embedding for Model Recommendation · Model recommendation Feature Extractor Zoo 3 X L(y) recommender Input: Task = (dataset, loss) For each feature extractor architecture

• Tasks[1460]• iNaturalist[207]• CUB200[25]• iMaterialist[228]• DeepFashion[1000]

• Fewtasks>10Ktrainingsamplesbutmosthave100-1000samples

19

Task Zoo

Page 20: 2 Task Embedding for Model Recommendation · Model recommendation Feature Extractor Zoo 3 X L(y) recommender Input: Task = (dataset, loss) For each feature extractor architecture

Experiment: TASK2VEC recapitulates iNaturalist taxonomy

Taskembeddingcosinesimilarity

plants

reptilesbirdsmammals

insects

ResNettrainedonImageNetasprobenetwork

20

Page 21: 2 Task Embedding for Model Recommendation · Model recommendation Feature Extractor Zoo 3 X L(y) recommender Input: Task = (dataset, loss) For each feature extractor architecture

Experiment: TASK2VEC norm encodes task difficulty

ResNettrainedonImageNetasprobenetwork

21

Page 22: 2 Task Embedding for Model Recommendation · Model recommendation Feature Extractor Zoo 3 X L(y) recommender Input: Task = (dataset, loss) For each feature extractor architecture

Task Embeddings Domain Embeddings

Actinopterygii (n)

Amphibia (n)

Arachnida (n)

Aves (n)

Fungi (n)

Insecta (n)

Mammalia (n)

Mollusca (n)

Plantae (n)

Protozoa (n)

Reptilia (n)

Category (m)

Color (m)

Gender (m)

Material (m)

Neckline (m)

Pants (m)

Pattern (m)

Shoes (m)

Experiment: TASK2VEC vs DOMAIN2VEC

22

Page 23: 2 Task Embedding for Model Recommendation · Model recommendation Feature Extractor Zoo 3 X L(y) recommender Input: Task = (dataset, loss) For each feature extractor architecture

• Tasks[1460]• iNaturalist[207]• CUB200[25]• iMaterialist[228]• DeepFashion[1000]

23

• FeatureZoo[156experts]• ResNet-34pertainedonImageNet• Followedbyfine-tunedontaskswithenoughexamples

Task and Feature Zoo

FeatureExtractorZoo

X L(y)

recommender

Page 24: 2 Task Embedding for Model Recommendation · Model recommendation Feature Extractor Zoo 3 X L(y) recommender Input: Task = (dataset, loss) For each feature extractor architecture

The Matrix

24Featureextractors

Tasks The Matrix

Page 25: 2 Task Embedding for Model Recommendation · Model recommendation Feature Extractor Zoo 3 X L(y) recommender Input: Task = (dataset, loss) For each feature extractor architecture

25

iNaturalist+CUB

The Matrix

Experts

Tasks

Page 26: 2 Task Embedding for Model Recommendation · Model recommendation Feature Extractor Zoo 3 X L(y) recommender Input: Task = (dataset, loss) For each feature extractor architecture

26

ImageNet expert is usually good but on many tasks the best expert handily outperforms the ImageNet expert

Page 27: 2 Task Embedding for Model Recommendation · Model recommendation Feature Extractor Zoo 3 X L(y) recommender Input: Task = (dataset, loss) For each feature extractor architecture

27

Data efficiency of TASK2VEC

ImageNetfixed

ImageNetfinetune

Task2vecfinetune

Task2vecfixed

Bruteforcefixed

Page 28: 2 Task Embedding for Model Recommendation · Model recommendation Feature Extractor Zoo 3 X L(y) recommender Input: Task = (dataset, loss) For each feature extractor architecture

28

Choice of distance for TASK2VEC

Relativeerrorincreaseovertheoracle(bestchoice)

Page 29: 2 Task Embedding for Model Recommendation · Model recommendation Feature Extractor Zoo 3 X L(y) recommender Input: Task = (dataset, loss) For each feature extractor architecture

29

Choice of the probe network on TASK2VEC

Relativeerrorincreaseovertheoracle(bestchoice)

Page 30: 2 Task Embedding for Model Recommendation · Model recommendation Feature Extractor Zoo 3 X L(y) recommender Input: Task = (dataset, loss) For each feature extractor architecture

Thank you!

Task2Vec:TaskEmbeddingforMeta-Learning,AlessandroAchille,MichaelLam,RahulTewari,AvinashRavichandran,SubhransuMaji,CharlessFowlkes,StefanoSoatto,PietroPerona(https://arxiv.org/abs/1902.03545)

30