Self-supervised Learning for Generalizable Out-of...
Transcript of Self-supervised Learning for Generalizable Out-of...
I
We propose a new technique relying on self-
supervision for generalizable out-of-distribution
(OOD) feature learning and rejecting those samples
at the inference time, also:
✓ It does not need to pre-know the distribution of
targeted OOD samples for tuning.
✓ It incurs no extra computation and memory
overheads compared methods like DNN ensembles
and MC-dropout.
✓ Our technique performs favorably against state-of-
the-art OOD detection methods, for example:
Train Setup: 𝐷𝑡𝑟𝑎𝑖𝑛𝑖𝑛 : 𝐶𝐼𝐹𝐴𝑅, 𝐷𝑡𝑟𝑎𝑖𝑛
𝑜𝑢𝑡 : 𝑇𝑖𝑛𝑦 𝐼𝑚𝑎𝑔𝑒𝑠 𝑑𝑎𝑡𝑎𝑠𝑒𝑡
Test Setup: 𝐷𝑡𝑒𝑠𝑡𝑜𝑢𝑡 : 𝐸𝑞𝑢𝑎𝑙 𝑚𝑖𝑥 𝑜𝑓 𝑓𝑖𝑣𝑒 𝑜𝑢𝑡𝑙𝑖𝑒𝑟 𝑡𝑒𝑠𝑡 𝑠𝑒𝑡
✓Generalizable OOD Detection: Our technique does
not need to know the distribution of targeted OOD
samples. We test OOD detection performance when
𝐷𝑡𝑒𝑠𝑡𝑜𝑢𝑡 is a mix of 5 different datasets (equal mix,
random sampling) and show our technique
outperforms sota both in OOD detection AUROC and
𝐷𝑡𝑒𝑠𝑡𝑖𝑛 coverage.
✓Synthesized OOD Training Set: Our experimental
results support the use of synthesized training set
for OOD features; however, in all experiments we
observed superior results when using real OOD
samples from outlier datasets.
𝑫𝒕𝒆𝒔𝒕𝒐𝒖𝒕
FPR @ 0.95 TPR AUROC AUPR curve
Baseline OE [4]Our
MethodBaselin
eOE [4]
Our Method
Baseline
OE [4] Our Method
MN
IST
E-M
NIS
T not-MNIST 17.11 0.25 0 95.98 99.86 99.99 95.75 99.86 99.99
F-MNIST 2.96 0.99 0 99.3 99.83 100 99.19 99.83 100
k-MNIST 10.54 0.03 0.35 97.11 97.60 99.91 96.46 97.05 99.91
SVH
N
Tin
y Im
ages
Texture 4.7 1.04 2.28 98.4 99.75 99.37 93.07 99.09 98.16
Places365 2.55 0.02 0.05 99.27 99.99 99.94 99.1 99.99 99.93
LSUN 2.75 0.05 0.04 99.18 99.98 99.94 97.57 99.95 99.98
CIFAR10 5.88 3.11 0.31 98.04 99.26 99.83 94.91 97.88 99.60
CIFAR100 7.74 4.01 0.07 97.48 99 99.93 93.92 97.19 99.81
CIF
AR
-10
Tin
y Im
ages
SVHN 28.49 8.41 3.62 90.05 98.2 99.18 60.27 97.97 99.13
Texture 43.27 14.9 3.07 88.42 96.7 99.19 78.65 94.39 98.78
Places365 44.78 19.07 10.86 88.23 95.41 97.57 86.33 95.32 97.77
LSUN 38.31 15.2 4.27 89.11 96.43 98.92 86.61 96.01 98.74
CIFAR100 43.12 26.59 30.07 87.83 92.93 93.83 85.21 92.31 94.23
CIF
AR
-10
0
Tin
y Im
ages
SVHN 69.33 52.61 18.22 71.33 82.86 95.82 67.81 80.21 95.03
Texture 71.83 55.97 40.30 73.59 84.23 89.76 57.41 75.76 83.55
Places365 70.26 57.77 39.96 73.97 82.65 89.08 70.46 81.47 88.00
LSUN 73.92 63.56 41.24 70.64 79.51 88.88 66.35 77.85 87.59
CIFAR10 65.12 59.96 57.79 75.33 77.53 77.70 71.29 72.82 72.31
➢1) Architecture: our method imposes the minimal
change in the model architecture by only adding
extra nodes in the last layer of the network to train
for outlier features and detect OOD samples. We
use a two-step training, which starts with learning
the normal training set and then continues with
OOD clustering step.
➢3) Self-Supervised Out-of-Distribution Learning:
We train the auxiliary head for OOD features with
the unlabeled OOD training set that we generate
pseudo-random labels for. A two-term loss function
(ℒ𝒕𝒐𝒕𝒂𝒍 = ℒ𝒊𝒏 + 𝜆 ∗ ℒ𝒐𝒖𝒕) is used for both in- and out-
of-distribution feature learning.
✓Robust in-distribution classification: We tested our
technique for its effect on normal error rate and
coverage due to FN and FP detections. In
comparison to OE and baseline, our technique
shows higher normal test set coverage when
rejecting OOD samples.
✓Number of reject Classes: In our experiments, we
found the impact of the number of reject classes on
OOD detection performance to be mild and
insensitive. We used five reject classes for the
CIFAR-10, MNIST, and SVHN experiments and 10
reject classes for the CIFAR-100 experiment.
Introduction Detection Method
Empirical Results
Self-supervised Learning for Generalizable Out-of-Distribution Detection
Authors: Sina Mohseni1,2, Mandar Pitale1, JBS Yadawa1, ZhangyangWang2
1NVIDIA , 2Texas A&M University
➢2) Supervised In-distribution Training: we first
train the model on the normal distribution to
reach the desired classification performance. We
used cross-entropy loss ( ℒ𝒊𝒏 ) for the normal
training.
Algorithm: Two-step training for In- and Out-of-distribution Training Sets
Step 1: Supervised In-Distribution Learning
Input: Batch of 𝐷𝑡𝑟𝑎𝑖𝑛𝑖𝑛 samples in 𝑐 different classes.
Training the in-distribution set by solving: min E𝑃𝑖𝑛(ෝ𝑥,ෝ𝑦) −log(𝑃𝜃(𝑦 = ො𝑦| ො𝑥))
Step 2: Self-Supervised Out-of-Distribution Learning
Input: Batch of mixed 𝐷𝑡𝑟𝑎𝑖𝑛𝑖𝑛 samples, 𝐷𝑡𝑟𝑎𝑖𝑛
𝑜𝑢𝑡 unlabeled samples, set of OOD classes 𝑘.
Training the mixed set by solving:
min E𝑃𝑖𝑛(ෝ𝑥,ෝ𝑦) −log(𝑃𝜃(𝑦 = ො𝑦| ො𝑥)) + 𝜆E𝑃𝑜𝑢𝑡(ෝ𝑥,𝑟𝑎𝑛𝑑(𝑘)) −log(𝑃𝜃(𝑦 = 𝑟𝑎𝑛𝑑(𝑘)| ො𝑥))
The real-world deployment of Deep Neural Network
(DNN) algorithms in safety-critical applications such
as autonomous vehicles needs to address a variety of
DNNs vulnerabilities such as 1) Generalization error,
2) Out-of-distribution samples, and 3) Adversarial
attacks.
For instance, examples of OOD samples in traffic sign
recognition application include:
Motivation
Samples from the training set distribution
Outside of the training set distribution
Solution Overview
In-distribution Training set
Output layer
Supervised learning for 𝑫𝒊𝒏 samples
Self-supervised learning for 𝑫𝒐𝒖𝒕 samples
OOD Training set
✓ The problem we consider in this paper is to detect
OOD outliers (𝑫𝒐𝒖𝒕) using the same classifier
𝑷𝜽 𝒚 𝒙 trained on normal distribution (𝑫𝒊𝒏).
✓ We add an auxiliary head to the network and take a
two-step training for 𝑫𝒊𝒏 and 𝑫𝒐𝒖𝒕 distributions.
✓ We first use a supervised training for 𝑫𝒊𝒏 followed
by a self-supervised training for unlabeled 𝑫𝒐𝒖𝒕 set.
➢4) Inference: we only use one softmax function for
all output classes. We take the sum of softmax
output of the OOD classes as the OOD-detection
signal. Thus, OOD detection takes only one forward
pass with no memory overhead.
CIFAR-10 CIFAR-100
References: [1] Hendrycks et al. A baseline for detecting misclassified and out-of-distribution examples in
neural networks” ICLR 2017. [2] Liang et al. “Enhancing the reliability of out-of-distribution image
detection in neural networks” ICLR 2018. [3] Pidhorskyi et al. “Generative probabilistic novelty detection
with adversarial autoencoders” NeurIPS 2018. [4] Hendrycks et al. “Deep anomaly detection with outlier
exposure” ICLR. 2019.
0
10
20
30
40
50
60
0 10 20 30 40 50 60 70 80 90 100
Tota
l Cla
ssif
icat
ion
Err
or
Test Coverage (in)BaseLine OE Our Method
Risk-Coverage at the presence of mixed 𝑫𝒕𝒆𝒔𝒕𝒐𝒖𝒕
0
10
20
30
40
50
60
70
0 10 20 30 40 50 60 70 80 90 100
Tota
l Cla
ssif
icat
ion
Err
or
Test Coverage (in)
BaseLine OE Our Method
Risk-Coverage at the presence of mixed 𝑫𝒕𝒆𝒔𝒕𝒐𝒖𝒕
𝑫𝒊𝒏: CIFAR-100
𝑫𝒊𝒏: CIFAR-10
➢OOD Detection Performance: To evaluate our method, we train and test our technique on multiple image
datasets. Notice that in all experiments we used different unlabeled OOD training and test sets. Table 1
compared our OOD detection performance with state-of-the-art methods.