AU Conv. Network for Segmentation of Mixed Crops
Transcript of AU Conv. Network for Segmentation of Mixed Crops
Conv. Network for Segmentation of Mixed CropsAUAARHUS
UNIVERSITY
1 Aarhus University, Department of Agroecology
2 University of Southern Denmark, Maersk Mc-Kinney Moller Institute
3 Aarhus University, Department of Engineering
A. K. Mortensen1, M. Dyrmann
2, H. Karstoft
3, R. N. Jørgensen
3 and R. Gislum
1
1. Problem:
2. Method:
4. Discussion and conclusion:
5. References:[1] Simonyan, K., & Zisserman, A. (2015). Very Deep Convolutional Networks for Large-Scale Image Recoginition. Intl. Conf. on Learning Representations (ICLR), 1–14.
[2] Long, J., Shelhamer, E., & Darrell, T. (2015). Fully Convolutional Networks for Semantic Segmentation. Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition, 3431–3440.
[3] Sukhbaatar, S., Bruna, J., Paluri, M., Bourdev, L., & Fergus, R. (2014). Training Convolutional Networks with Noisy Labels, 1–10. Retrieved from http://arxiv.org/abs/
1406.2080
· What?
· Pixel-wise classification of top view images of mixed crops
· Why?
· Yield estimation
· Nitrogen-uptake estimation
· Variable Nitrogen application to fields
· Optimal harvest time
3. Results:
· Fine-tuning on a pre-trained Deep Convolutional Neural
Network
· Based on VGG-16D for image classification [1]
· Adapted to pixel-wise classification [2]:
· Convert fully connected layers to convolutional layers
· Insert deconvolutional layer
· First, fine-tuned on PASCAL-Context [2]
· Then, fine-tuned on own data
· Data set
· 48 images
· 400 x 400 pixels
· 75% used for training
· 7 classes
· Unknown, Radish, Barley/Grass, Weed, Stump, Soil
Equipment
· Data argumentation to increase data set
· Flip left/right, flip up/down and transpose
· Learning parameters
· Learning rate: 10-9
· Per pixel learning rate: ~1.6*10-4
Original Flip left/right Flip up/down Flip l/r + u/d
Transpose T+ flip l/r T + flip u/d T + flip l/r + u/d
64
3 x
3 k
ern
els
64
3 x
3 k
ern
els
12
8 3
x 3
ke
rne
ls
12
8 3
x 3
ke
rne
ls
25
6 3
x 3
ke
rne
ls
25
6 3
x 3
ke
rne
ls
25
6 3
x 3
ke
rne
ls
51
2 3
x 3
ke
rne
ls
51
2 3
x 3
ke
rne
ls
51
2 3
x 3
ke
rne
ls
51
2 3
x 3
ke
rne
ls
51
2 3
x 3
ke
rne
ls
51
2 3
x 3
ke
rne
ls
40
96
1 x
1 k
ern
els
40
96
1 x
1 k
ern
els
10
00
1 x
1 k
ern
els
2 x
2 m
ax p
oo
l, stride
= 2
2 x
2 m
ax p
oo
l, stride
= 2
2 x
2 m
ax p
oo
l, stride
= 2
2 x
2 m
ax p
oo
l, stride
= 2
2 x
2 m
ax p
oo
l, stride
= 2
7 6
4 x
64
ke
rne
ls, stride
= 3
2
· Pixel-wise classification:
· Pixel accuracy: 79 %
· Frequency weighted IoU: 66%
· Ensemble classification
· Pixel accuracy: 80 %
· Frequency weighted IoU: 67%
· Works relatively well, but only tested on small labelled data set
· Large un-labelled dataset + small lalelled dataset à semi-
supervised learning?
· How?
· Auto-encoders?
· Learning Noise model? [3]
Source: Sukhbaatar et al. 2014 [3]