AU Conv. Network for Segmentation of Mixed Crops

1
Conv. Network for Segmentation of Mixed Crops AU AARHUS UNIVERSITY 1 Aarhus University, Department of Agroecology 2 University of Southern Denmark, Maersk Mc-Kinney Moller Institute 3 Aarhus University, Department of Engineering A. K. Mortensen 1 , M. Dyrmann 2 , H. Karstoft 3 , R. N. Jørgensen 3 and R. Gislum 1 1. Problem: 2. Method: 4. Discussion and conclusion: 5. References: [1] Simonyan, K., & Zisserman, A. (2015). Very Deep Convolutional Networks for Large-Scale Image Recoginition. Intl. Conf. on Learning Representations (ICLR), 1–14. [2] Long, J., Shelhamer, E., & Darrell, T. (2015). Fully Convolutional Networks for Semantic Segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3431–3440. [3] Sukhbaatar, S., Bruna, J., Paluri, M., Bourdev, L., & Fergus, R. (2014). Training Convolutional Networks with Noisy Labels, 1–10. Retrieved from http://arxiv.org/abs/ 1406.2080 · What? · Pixel-wise classification of top view images of mixed crops · Why? · Yield estimation · Nitrogen-uptake estimation · Variable Nitrogen application to fields · Optimal harvest time 3. Results: · Fine-tuning on a pre-trained Deep Convolutional Neural Network · Based on VGG-16D for image classification [1] · Adapted to pixel-wise classification [2]: · Convert fully connected layers to convolutional layers · Insert deconvolutional layer · First, fine-tuned on PASCAL-Context [2] · Then, fine-tuned on own data · Data set · 48 images · 400 x 400 pixels · 75% used for training · 7 classes · Unknown, Radish, Barley/Grass, Weed, Stump, Soil Equipment · Data argumentation to increase data set · Flip left/right, flip up/down and transpose · Learning parameters · Learning rate: 10 -9 · Per pixel learning rate: ~1.6*10 -4 Original Flip left/right Flip up/down Flip l/r + u/d Transpose T+ flip l/r T + flip u/d T + flip l/r + u/d 64 3 x 3 kernels 64 3 x 3 kernels 128 3 x 3 kernels 128 3 x 3 kernels 256 3 x 3 kernels 256 3 x 3 kernels 256 3 x 3 kernels 512 3 x 3 kernels 512 3 x 3 kernels 512 3 x 3 kernels 512 3 x 3 kernels 512 3 x 3 kernels 512 3 x 3 kernels 4096 1 x 1 kernels 4096 1 x 1 kernels 1000 1 x 1 kernels 2 x 2 max pool, stride = 2 2 x 2 max pool, stride = 2 2 x 2 max pool, stride = 2 2 x 2 max pool, stride = 2 2 x 2 max pool, stride = 2 7 64 x 64 kernels, stride = 32 · Pixel-wise classification: · Pixel accuracy: 79 % · Frequency weighted IoU: 66% · Ensemble classification · Pixel accuracy: 80 % · Frequency weighted IoU: 67% · Works relatively well, but only tested on small labelled data set · Large un-labelled dataset + small lalelled dataset à semi- supervised learning? · How? · Auto-encoders? · Learning Noise model? [3] Source: Sukhbaatar et al. 2014 [3]

Transcript of AU Conv. Network for Segmentation of Mixed Crops

Page 1: AU Conv. Network for Segmentation of Mixed Crops

Conv. Network for Segmentation of Mixed CropsAUAARHUS

UNIVERSITY

1 Aarhus University, Department of Agroecology

2 University of Southern Denmark, Maersk Mc-Kinney Moller Institute

3 Aarhus University, Department of Engineering

A. K. Mortensen1, M. Dyrmann

2, H. Karstoft

3, R. N. Jørgensen

3 and R. Gislum

1

1. Problem:

2. Method:

4. Discussion and conclusion:

5. References:[1] Simonyan, K., & Zisserman, A. (2015). Very Deep Convolutional Networks for Large-Scale Image Recoginition. Intl. Conf. on Learning Representations (ICLR), 1–14.

[2] Long, J., Shelhamer, E., & Darrell, T. (2015). Fully Convolutional Networks for Semantic Segmentation. Proceedings of the IEEE Conference on Computer Vision and

Pattern Recognition, 3431–3440.

[3] Sukhbaatar, S., Bruna, J., Paluri, M., Bourdev, L., & Fergus, R. (2014). Training Convolutional Networks with Noisy Labels, 1–10. Retrieved from http://arxiv.org/abs/

1406.2080

· What?

· Pixel-wise classification of top view images of mixed crops

· Why?

· Yield estimation

· Nitrogen-uptake estimation

· Variable Nitrogen application to fields

· Optimal harvest time

3. Results:

· Fine-tuning on a pre-trained Deep Convolutional Neural

Network

· Based on VGG-16D for image classification [1]

· Adapted to pixel-wise classification [2]:

· Convert fully connected layers to convolutional layers

· Insert deconvolutional layer

· First, fine-tuned on PASCAL-Context [2]

· Then, fine-tuned on own data

· Data set

· 48 images

· 400 x 400 pixels

· 75% used for training

· 7 classes

· Unknown, Radish, Barley/Grass, Weed, Stump, Soil

Equipment

· Data argumentation to increase data set

· Flip left/right, flip up/down and transpose

· Learning parameters

· Learning rate: 10-9

· Per pixel learning rate: ~1.6*10-4

Original Flip left/right Flip up/down Flip l/r + u/d

Transpose T+ flip l/r T + flip u/d T + flip l/r + u/d

64

3 x

3 k

ern

els

64

3 x

3 k

ern

els

12

8 3

x 3

ke

rne

ls

12

8 3

x 3

ke

rne

ls

25

6 3

x 3

ke

rne

ls

25

6 3

x 3

ke

rne

ls

25

6 3

x 3

ke

rne

ls

51

2 3

x 3

ke

rne

ls

51

2 3

x 3

ke

rne

ls

51

2 3

x 3

ke

rne

ls

51

2 3

x 3

ke

rne

ls

51

2 3

x 3

ke

rne

ls

51

2 3

x 3

ke

rne

ls

40

96

1 x

1 k

ern

els

40

96

1 x

1 k

ern

els

10

00

1 x

1 k

ern

els

2 x

2 m

ax p

oo

l, stride

= 2

2 x

2 m

ax p

oo

l, stride

= 2

2 x

2 m

ax p

oo

l, stride

= 2

2 x

2 m

ax p

oo

l, stride

= 2

2 x

2 m

ax p

oo

l, stride

= 2

7 6

4 x

64

ke

rne

ls, stride

= 3

2

· Pixel-wise classification:

· Pixel accuracy: 79 %

· Frequency weighted IoU: 66%

· Ensemble classification

· Pixel accuracy: 80 %

· Frequency weighted IoU: 67%

· Works relatively well, but only tested on small labelled data set

· Large un-labelled dataset + small lalelled dataset à semi-

supervised learning?

· How?

· Auto-encoders?

· Learning Noise model? [3]

Source: Sukhbaatar et al. 2014 [3]