Convolutional Neural Network Approach CS 229 Machine...

1
Airbus Ship Detection - Traditional v.s. Convolutional Neural Network Approach Junwen Zheng, Ying Chen, Zhengqing Zhou CS 229Machine Learning Project Professor Andrew Ng Introduction The fast growing Shipping traffic increases the chances of infractions at sea. Comprehensive maritime and monitoring services help to support the maritime and industry to increase knowledge, anticipate threats and improve efficiency at sea. This challenge origins partly from the Airbus Ship Detection Challenge on Kaggle. We developed classifiers to efficiently classify whether there is any ship from satellite images with machine learning and deep learning approaches. Various scenes including open water, wharf, buildings and clouds appear in the dataset. Figure 1: Example Images from Dataset [1]Kevin Mader, Transfer Learning For Boat or No-Boat, Kaggle, 2018. [2]Gogul09, Image Classification using Python and Machine Learning, GitHub repository, 2017. [3]Gao Huang, Zhuang Liu, Laurens van der Maaten and Kilian Q. Weinberger, Densely Connected Convolutional Networks, CVPR, 2017. Methodology Experiment Results & Discussion Future Work Reference Linear Discriminant Analysis (LDA): The algorithm finds a linear combination of features that characterizes and separates two classes, with estimation of the mean and variance for each class. K-Nearest Neighbors (KNN): The algorithm classifies an object by a majority vote of its neighbors, with the object being assigned to the class most common among its k nearest neighbors. Naive Bayes(NB): The algorithm is a probabilistic model based on applying Bayes’ theorem with strong (naïve) independence assumption between feature. Random Forest (RF): The algorithm is an ensemble learning method by constructing a multitude of decision trees and outputting the class of the individual tree. We use 70 trees in the forest. Support Vector Machine (SVM): The algorithm finds the maximum margin between different classes by determining the weights and bias of the separating hyperplane, with RBF kernel. (a) Image labeled by “has Boats” (b) Image labeled by “No Boats” (c) Image labeled by “No Boats” Traditional Machine Learning Approach Convolutional Neural Network (CNN) Approach Figure 4: Feature Engineering Examples Input Image (256x256x3) Conv. & Max (128x128x16) Conv. & Max (64x64x32) Conv. & Max (32x32x64) Conv. & Max (16x16x128) FC (32768) FC (128) Sigmoid (1) The CNN Approach can efficiently capture relevant features from different locations of an image. It take image as input, go through some hidden layers such as convolutional layers, pooling layers and fully connected layers, and output a prediction probability of the image containing ship. Figure 2: Transferred Learning CNN (TL-CNN) Model Framework Input Image (256x256x3) Pretrained ImageNet Dense169 (frozen) (9x9x1664) Batch Norm. + Dropout + Max Pool FC (1664) FC (128) Sigmoid (1) Dataset: We use a public dataset provided on Kaggle Airbus Ship Detection Challenge website. We initially implemented the methods on a dataset with 10k training images and 5k test images. All the image has been resized in 256 x 256 x 3 using the cv2 package in python. Table 1: Traditional ML Approach Comparison (w/ Featuring Engineering) Result Analysis: 1. Among all the ML Algorithms, Random Forest achieves the highest test accuracy. 2. In general, Feature Engineering improves the performance of traditional ML Algorithms. 3. “More is less”: some algorithms give significantly better performance when working with only certain combination of features. We implemented two different CNN models. To save training time and memory, we first reproduced a TL-CNN model (using DenseNet[3]) in [1], see Figure 2 for more details. We also designed a simple CNN model which consist of 4 convolutional layers and 4 max- poling layers, see Figure 3 for more details. Accu. His Ha Hu His + Ha Ha + Hu Hu + His w/ All w/o LDA 83% 86% 83% 87% 86% 83% 87% 74% KNN 83% 79% 83% 79% 79% 83% 79% 78% NB 49% 84% 49% 75% 84% 49% 75% 42% RF 94% 90% 94% 95% 91% 94% 94% 85% SVM 85% 84% 85% 83% 84% 85% 83% 65% Result Analysis: 1. Both training processes converge after 30 epochs. 2. Simple CNN Model gives lower train loss (0.2) than TL-CNN Model (0.4). 3. Simple CNN Model gives higher train accuracy (93.24%) than pretrained CNN Model (83.59%). Result Analysis: 1. Among the traditional methods, Random Forrest has the smallest variance and the highest mean of accuracy. 2. Simple CNN model outperforms all the traditional method as expected. We plan to extract global features along with local features such as SIFT, SURF or DENSE, which could be used along with Bag of Visual Words (BOVW) technique. For traditional methods we can apply data augmentation method. Implement different network (e.g., deeper network) to train the classifier. Come up with a smart input data sampling method that can balance images of different scenes/backgrounds. Apply segmentation technique to identify the locations of all ships in a image. 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Binary Corss Entropy Loss Iteration (# epoch) TL-CNN Loss SIM-CNN Loss TL-CNN Accu. SIM-CNN Accu. Figure 6: Cross Validation (k = 10) Accuracy of Traditional Methods Using All Extracted Features and Test Accuracy of CNN Models Figure 5: Training Loss and Accuracy of CNN Models for 30 Epochs Feature Engineering for Traditional ML Algorithms: We used hand engineering features extraction methods [2] to obtain three different global features for traditional ML algorithms. The images were converted to grayscale for Hu and Ha, and to HSV color space for His before extraction, shown in Figure 4. - Hu Moments (Hu) features were used to captured the general shape information. - Color Histogram (His) features were applied to quantifies the color information. - Haralick Textures (Ha) features were extracted to described the texture. Image Augmentation for CNN: To improve the robustness of our network, we augmented the training data by rotating, flipping, shifting and zooming training images (a) Original Image (b) Grayscale Image (c) Original Image (d) HSV Image Figure 3: Simple CNN (SIM-CNN) Model Framework

Transcript of Convolutional Neural Network Approach CS 229 Machine...

Page 1: Convolutional Neural Network Approach CS 229 Machine ...cs229.stanford.edu/proj2018/poster/58.pdf · the imagehas been resized in 256 x 256 x 3 using the cv2 packagein python. Table

Airbus Ship Detection - Traditional v.s. Convolutional Neural Network Approach

Junwen Zheng, Ying Chen, Zhengqing Zhou

CS 229– Machine Learning ProjectProfessor Andrew Ng

IntroductionThe fast growing Shipping traffic increases the chances of infractions at sea. Comprehensive maritime and monitoring services help to support the maritimeand industry to increase knowledge, anticipate threats and improve efficiency at sea. This challenge origins partly from the Airbus Ship Detection Challenge on Kaggle. We developed classifiers to efficiently classify whether there is any ship from satellite images with machine learning and deep learning approaches. Various scenesincluding open water, wharf, buildings and clouds appear in the dataset.

Figure 1: Example Images from Dataset

[1]Kevin Mader, Transfer Learning For Boat or No-Boat, Kaggle, 2018.

[2]Gogul09, Image Classification using Python and Machine Learning, GitHub repository,2017.

[3]Gao Huang, Zhuang Liu, Laurens van der Maaten and Kilian Q. Weinberger, DenselyConnected Convolutional Networks, CVPR, 2017.

Methodology

Experiment

Results & Discussion

Future Work

Reference

Linear Discriminant Analysis (LDA):The algorithm finds a linear combination of features that characterizes and separatestwo classes, with estimation of the mean and variance for each class.

K-Nearest Neighbors (KNN):The algorithm classifies an object by a majority vote of its neighbors, with the objectbeing assigned to the class most common among its k nearest neighbors.

Naive Bayes(NB):The algorithm is a probabilistic model based on applying Bayes’ theorem with strong(naïve) independence assumption between feature.

Random Forest (RF):The algorithm is an ensemble learning method by constructing a multitude of decisiontrees and outputting the class of the individual tree. We use 70 trees in the forest.

Support Vector Machine (SVM):The algorithm finds the maximum margin between different classes by determiningthe weights and bias of the separating hyperplane, with RBF kernel.

(a) Image labeled by “has Boats”

(b) Image labeled by “No Boats”

(c) Image labeled by “No Boats”

Traditional Machine Learning Approach

Convolutional Neural Network (CNN) Approach

Figure 4: Feature Engineering Examples

Input Image(256x256x3)

Conv. & Max(128x128x16)

Conv. & Max(64x64x32)

Conv. & Max(32x32x64)

Conv. & Max(16x16x128)

FC(32768)

FC(128)

Sigmoid(1)

The CNN Approach can efficiently capture relevant features from different locations of an image. It take image as input, go through some hidden layers such as convolutional layers, pooling layers and fully connected layers, and output a prediction probability of the image containing ship.

Figure 2: Transferred Learning CNN (TL-CNN) Model Framework

Input Image(256x256x3)

Pretrained ImageNet Dense169 (frozen) (9x9x1664)

Batch Norm. + Dropout +

Max Pool

FC(1664)

FC(128)

Sigmoid(1)

Dataset:We use a public dataset provided on Kaggle Airbus Ship Detection Challenge website. Weinitially implemented the methods on a dataset with 10k training images and 5k test images. Allthe image has been resized in 256 x 256 x 3 using the cv2 package in python.

Table 1: Traditional ML Approach Comparison (w/ Featuring Engineering)

Result Analysis:1. Among all the ML Algorithms, Random Forest achieves the highest test accuracy.2. In general, Feature Engineering improves the performance of traditional ML Algorithms.3. “More is less”: some algorithms give significantly better performance when working with only certain combination of features.

We implemented two different CNN models. To save training time and memory, we first reproduced a TL-CNN model (using DenseNet[3]) in [1], see Figure 2 for more details. We also designed a simple CNN model which consist of 4 convolutional layers and 4 max-poling layers, see Figure 3 for more details.

Accu. His Ha Hu His + Ha Ha + Hu Hu + His w/ All w/oLDA 83% 86% 83% 87% 86% 83% 87% 74%KNN 83% 79% 83% 79% 79% 83% 79% 78%NB 49% 84% 49% 75% 84% 49% 75% 42%RF 94% 90% 94% 95% 91% 94% 94% 85%

SVM 85% 84% 85% 83% 84% 85% 83% 65%

Result Analysis:1. Both training processes converge after 30 epochs.2. Simple CNN Model gives lower train loss (0.2) than TL-CNN Model (0.4).3. Simple CNN Model gives higher train accuracy (93.24%) than pretrained CNN Model (83.59%).

Result Analysis:1. Among the traditional methods, Random Forrest has the smallest variance and the highest mean of accuracy.2. Simple CNN model outperforms all the traditional method as expected.

• We plan to extract global features along with local features such as SIFT, SURF or DENSE, which could be used along with Bag of Visual Words (BOVW) technique.

• For traditional methods we can apply data augmentation method.• Implement different network (e.g., deeper network) to train the classifier.• Come up with a smart input data sampling method that can balance images of

different scenes/backgrounds.• Apply segmentation technique to identify the locations of all ships in a image.

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

Bina

ry C

orss

Ent

ropy

Loss

Iteration (# epoch)

TL-CNN Loss SIM-CNN Loss TL-CNN Accu. SIM-CNN Accu.

Figure 6: Cross Validation (k = 10) Accuracy of Traditional Methods Using All Extracted Features and Test Accuracy of CNN Models

Figure 5: Training Loss and Accuracy of CNN Models for 30 Epochs

Feature Engineering for Traditional ML Algorithms:We used hand engineering features extraction methods [2] to obtain three different globalfeatures for traditional ML algorithms. The images were converted to grayscale for Hu and Ha,and to HSV color space for His before extraction, shown in Figure 4.- Hu Moments (Hu) features were used to captured the general shape information.- Color Histogram (His) features were applied to quantifies the color information.- Haralick Textures (Ha) features were extracted to described the texture.

Image Augmentation for CNN:To improve the robustness of our network, we augmented the training data by rotating, flipping,shifting and zooming training images

(a) Original Image (b) Grayscale Image (c) Original Image (d) HSV Image

Figure 3: Simple CNN (SIM-CNN) Model Framework