Remote Sensing LaboratoryDept. of Information Engineering and Computer Science
University of TrentoVia Sommarive, 14, I-38123 Povo, Trento, Italy
STUDENTMichele Compri
Multi-Label Remote Sensing Image Retrieval By Using Deep Features
E-mail: [email protected]
THESIS ADVISORSBegüm Demir (Unitn)Xavier Girò-i-Nieto (UPC)
University of Trento, Italy
Outline
Michele Compri
Introduction
Aim of the Thesis
1
Conclusion
Proposed Approach to Multi-Label RS Image Retrieval
2
3
5
Experimental Results4
2
University of Trento, Italy
Introduction
Michele Compri
✓ During the last decade, advances in RS technology has led to an increased volume of remote sensing (RS) images.
✓ EO data archives grow rapidly motivating the need of efficient and effective content-based image retrieval (CBIR) methods.
3
Query
Archive
Similar metrics( Euclidean, cosine similarity)
v = ( v1,...,vn)
v1= ( v11,...,v1n)
vk = ( vk1,...,vkn)
Image MatchingImage Representation Ranking
University of Trento, Italy
Aim of the Thesis
Michele Compri
✓ Usually, in CBIR system in RS, for image representation and image matching, images are categorized under a single-label.
✓ Such strategy does not fit well the complexity of RS image, where each one might be associated multi labels.
4
Parking Lot Tennis court Airplane
Airplane, Cars, Grass, TreesCars, Pavement Bare-soil,Court,
Grass, Tree
Proposal Solution: To Investigate the effectiveness of different Deep Learning architecture in the framework of multi-label RS image retrieval problems.
University of Trento, Italy
Proposed Approach: General View
Michele Compri 5
XTr
Training set TTr
Fine-tuning
Pretrained DEEP CNN
fine-tuned DEEP CNN
N retrieved images
FeatureExtraction
Retrieval
System is composed by three main stages: ● Pretrained Architecture● Fine-Tuning● Retrieval
University of Trento, Italy
Proposed Approach: Pretrained Architectures
Michele Compri
✓ Since that CNN takes a lot time and huge amount of data to be trained, pretrained models on ImageNet are considered.
✓ In particular,three different pretrained architecture on ImageNet have been considered:
➢ VGG16: CNN characterized by 16 weights layers, with intermediate max pooling layers and 3 fully connected(FC) layers
➢ Inception V3: Improved version of GoogleNet, containing more layers but less parameters, by removing FC and using global average pooling ➢ ResNet50: Deeper CNN characterized by residual layer that allows data to flow by skipping the convolutional blocks
✓ Since RS images are different to images present in ImageNet, fine-tuning approach is considered to better hold on the features.
6
University of Trento, Italy 7
Proposed Approach: Fine-Tuning
XTr
Training set TTr
Fine-tuning
Pretrained DEEP CNN
fine-tuned DEEP CNN
Architecture
Classifier
Architecture
New Classfier
New Classfier } High level
Trainable
Frozen}Michele Compri 7
➢ Fine tuning is a transfer learning strategy to use generic features of pretrained architecture while training the top of fine-tuned architecture
➢ Fine tuning consists in two phases:■ Replace classifier■ Training only top of
architecture
➢ Since that Multi-Label are considered, binary cross entropy as cost function and sigmoid activation are used
University of Trento, Italy 8
Proposed Approach:Feature Extraction
XTr
Test set TTr
Fine-tunedDEEP CNN
FeatureExtraction
Features Extraction
Michele Compri 8
Retrieval
v = ( v1,...,vn)
OUTPUT
VGG16
BLOCK 1
BLOCK 2
BLOCK 3
BLOCK 5
CLASSIFIER
BLOCK 4
University of Trento, Italy 9
Proposed Approach:Retrieval
XTr
Test set TTr
Fine-tunedDEEP CNN
FeatureExtraction
Image Retrieval
Michele Compri 9
Retrieval
v = ( v1,...,vn)
Image Dataset
Image Matching
BLOCK 1
BLOCK 2
BLOCK 3
BLOCK 5
CLASSIFIER
OUTPUT
VGG16
BLOCK 4
University of Trento, Italy 10
UC Merced Land Use benchmark archive: 2100 images categorized under 21 Land-cover classes (categories) and characterized by 17 primitive classes (Multi-labels)
Data Set Description
Field Trees Airplane
Bare-soil Chaparral Buildings
Grass Sea Sand
Pavement Mobile-home Cars
Ship Dock Water
Tanks Court
Multi-labels (primitive classes) Single-Label( Broad categories)
Agricultural Airplane Basell diamond
Beach Buildings Chaparral
Dense Residential Forest Freeway
Golf Course Harbor Intersection
Medium Residential Mobile Home Park
Overpass
Parking Lot River Runaway
Sparse Residential Storage Tanks Tennis Court
Airplane
Airplane, Cars, Grass, Trees
Parking Lot
Cars, Pavement
Tennis Court
Bare-soil,Court, Grass, Tree
Michele Compri 10
University of Trento, Italy
Experimental Setup
11
✓ Considered Framework is Keras, which is deep learning python library that run on top of Theano, numerical computational library.
✓ Dataset is splitted as: 80% training set and 20% test set.
✓ Different values for each meta-parameter have been tested using fine-tuning technique.
11
Name Values
Optimizer initial/final SGD/ AdamLearning rate initial/final 0.001/ 0.01
Weights decay initial/final 0 /0.3678
Michele Compri 11
University of Trento, Italy
Experimental Setup
12
✓ To fine-tune, each architecture is splitted into Fine-tuned layers and Frozen layers.
✓ Fine-tuned layers: During training phase the weights presented in that layer are updated, in according with considered archive.
✓ Frozen layers: Part of architecture where weights does not change ( generic features).
12
Architecture Fine-tuned Layers (Top)
VGG-16 14-18
Inception V3 172-217
ResNet 50 152-174
Michele Compri
New Classfier } High level
Trainable
Frozen}12
University of Trento, Italy
Experimental Results
13
Architectures Accuracy Precision Recall
VGG-16 58.22% 69.40% 69.95%
Inception V3 52.15% 63.08% 62.64%
ResNet 50 66.89% 76.27% 78.06%
✓ Baseline Experiment: Performance of original pretrained Deep architectures on retrieval the most 20 similar images.
✓ To evaluate performance three metrics have been considered: Accuracy, Precision and Recall
Michele Compri 13
University of Trento, Italy
Experimental Results
14
Architectures Accuracy Precision Recall
VGG-16 70.97% 80.54% 81.61%
Inception-V3 66.97% 76.69% 77.53%
ResNet50 72.51% 82.18% 83.05%
Architecture Accuracy Precision Recall
VGG-16 +12.75% +11.14% +11.66%
inception V3 +14.82% +13.61% +14.89%
ResNet50 +5.62% +5.91% +4.99%
✓ Performance of fine-tuned architectures on top 20 retrieved images
✓ Gain of fine-tuning with respect to the model pre-trained with ImageNet
Michele Compri 14
University of Trento, Italy
Experimental Results
15
Methods Accuracy Precision Recall
SVM 70.39% 80.32% 76.08%
ResNet50 72.51% 82.18% 83.05%
✓ Performance of SVM by using SIFT features vs fine-tuned architecture
Michele Compri 15
University of Trento, Italy
Experimental Results
16
Intersection Buildings, Cars, Grass, Pavement, Tree
Intersection Buildings, Cars, Grass, Pavement, Tree
Intersection Bare-soil, Buildings, Cars, Grass, Pavement, Tree
Tenniscourt Buildings, Cars, Court, Pavement, Tree
Sparse Residential Buildings, Grass, Pavement, Tree
Medium Residential Buildings, Cars, Grass, Tree
Intersection Bare-soil, Buildings, Cars, Grass, Pavement, Tree
Intersection Bare-soil, Buildings, Cars, Grass, Pavement, Tree
Intersection Bare-soil, Buildings, Cars, Grass, Pavement, Tree
Intersection Bare-soil, Buildings, Cars, Grass, Pavement, TreeQuery
VGG16 Inception V3 ResNet50
111
10 1010
202020
Michele Compri 16
University of Trento, Italy
Conclusion
17
✓ Unlike to existing CBIR system, multi-label RS images are retrieved by investigating the effectiveness of different Deep Learning architecture.
✓ Three different pretrained architecture on ImageNet are considered: VGG16, Inception V3 and ResNet50
✓ These off-the-shell models are fine-tuned with subset of RS images and their multi-label information.
✓ From retrieval experiment we observe that architectures and also fine-tuning strategy are effectived in multi-label RS images framework.
✓ As future development:
▪ Different architectures could be analyzed ▪ Data augmentation could be taken in consideration▪ Collect more data to train architectures from scratch
Michele Compri 17
University of Trento, Italy 18
THANKS FOR YOUR ATTENTION !
Michele Compri 18
Top Related