Simple baselines can fool 360º saliency metrics.
Transcript of Simple baselines can fool 360º saliency metrics.
Simple baselines can fool 360º saliency metrics.
Yasser Dahou, Kevin McGuinness, Noel E. O’ConnorDublin City University
The importance of visual attention
The importance of visual attention
Why don’t we see the changes
We don’t really see the whole image
We only focus on small specific regions: the salient parts
Human beings reliably attend to the same regions of images when shown
What we perceive
Where we look
What we actually see
Saliency predictionProduce a computational model of visual attention: predict where humans will look.
Often want to map an image to a heatmap (saliency map).
How to measure the quality of the predicted map against the Ground Truth one?
Bylinskii, Zoya, et al. "What do different evaluation metrics tell us about saliency models?." IEEE transactions on pattern analysis and machine intelligence 41.3 (2018): 740-757.
Saliency metrics: functions that take two inputs representing eye fixations (ground truth and predicted) and then output a number assessing the similarity or dissimilarity between them.
Location-based metrics:• Area under ROC Curve (AUC-J, AUC-Borji)• Normalized Scanpath Saliency (NSS)• Information Gain (IG)• Similarity (SIM)
Distribution-based metrics:• Pearson’s Correlation Coefficient (CC)• Kullback-Leibler divergence (KL)• Earth Mover’s Distance (EMD)
TABLE 4: Properties of the 8 evaluation metrics (with our specific implementations) considered in this paper.
ANALYSIS OF METRIC BEHAVIOR
Saliency formulation
Definition: The ground truth saliency maps are computed by convolving each fixation or trajectory points (for all observers of one image), defined as:
with a Gaussian or Kent kernel. The resulting saliency map P ∈ [0,1] can be treated as a multivariate Bernoulli distribution where each pixel is Bernoulli distributed, with a probability p to be attended, and(1−p) to be discarded
The Equi-rectangular saliency maps visualization of the four baselines. The upper row represent the density map, whereas the bottom row applies a color map to the saliency map for more perceptual figures.
We sort the saliency models listed in Table 1 individuallyby each metric, and compute the Spearman rankcorrelation between every pair of metrics. The first 5metrics listed are highly correlated. JSD is mostuncorrelated with other metrics, due to their high sensitivityto zero-valued predictions at fixated locations
ThanksEmail: [email protected]