Automatic markup of neural cell membranes using boosted ...

A U T O M A T I C M A R K U P O F N E U R A L C E L L

M E M B R A N E S U S I N G B O O S T E D D E C I S I O N

S T U M P S

by

Kannan Umadevi Venkataraju

A thesis submitted to the faculty of The University of Utah

in partial fulfillment of the requirements for the degree of

Master of Science

in

Computer Science

School of Computing

The University of Utah

May 2010

AUTOMATIC MARKUP OF NEURAL CELL

MEMBRANES USING BOOSTED DECISION

STUMPS

by


A thesis submitted to the faculty of The University of Utah

in partial fulfillment of the requirements for the degree of

Master of Science

In

Computer Science

School of Computing

The University of Utah

May 2010

Copyright © K a n n a n Umadevi Venkataraju 2010

All Rights Reserved

Copyright @Kannan Umadevi Venkataraju 2010

All Rights Reserved

THE UNIVERSITY OF UTAH GRADUATE SCHOOL

SUPERVISORY COMMITTEE APPROVAL

of a thesis submitted by


This thesis has been read by each member of the following supervisory committee and by majority vote has been found to be satisfactory.

Ross Whitake

4bH�Dab?=41

u0109130

Text Box

u0109130

Text Box

u0109130

Text Box

THE UNIVERSITY OF UTAH GRADUATE SCHOOL

FINAL READING APPROVAL

To the Graduate Council of the University of Utah:

I have read the thesis of Kannan Umadevi Venkataraju in its final form and have

found that (1) its format, citations, and bibliographic style are consistent and acceptable; (2) its illustrative materials including figures, tables, and charts are in place; and (3) the final manuscript is satisfactory to the Supervisory Committee and is ready for submission to T he Graduate School.

Date Tolga Tasdizen Chair, Supervisory Committee

Approved for the Major Department

Martin Berzins Chair/Dean

Approved for the Graduate Council

so. cQ. f--.

David S. Chapman Dean of The Graduate School

u0109130

Text Box

u0109130

Text Box

u0109130

Text Box

A B S T R A C T

For neuro-biologists to understand the working of the central nervous system,

they need to reconstruct the underlying neural circuitry. The neural circuit, which

consists of the neuron cells and synapses in a three-dimensional (3D) volume of

tissue are scanned slice-by-slice at very high magnifications using an electron mi

croscope. From the electron microscopy images, the neurons and their connec

tions (synapses) are identified to lay out the connections of the neural circuitry.

One of the necessary tasks in this process is to segment the individual neurons

in the images of the sliced volume. To effectively carry out this segmentation

we need to delineate the cell membranes of the neurons. For this purpose, we

propose a supervised learning approach to detect the cell membranes. The classifier

was trained using decision stumps boosted using AdaBoost, on local and context

features. The features were selected to highlight the curve like characteristics

of cell membranes. It is also shown that using features from context positions

allows for more information to be utilized in the classification. Together with

the nonlinear discrimination ability of the AdaBoost classifier there are clearly

noticeable improvements over previously used methods. We also detail several

experiments conducted for identification of synapse structures in the microscopy

images.

ABSTRACT

For neuro-biologists to understand the working of the central nervous system,

they need to reconstruct the underlying neural circuitry. The neural circuit , which

consists of the neuron cells and synapses in a three-dimensional (3D) volume of

t issue are scanned slice-by-slice at very high magnifications using an electron mi

croscope. From the electron microscopy images, the neurons and their connec

tions(synapses) are identified to layout the connections of the neural circuitry.

One of the necessary tasks in this process is to segment the individual neurons

in the images of the sliced volume. To effectively carry out this segmentation

we need to delineate the cell membranes of the neurons. For this purpose, we

propose a supervised learning approach to detect the cell membranes. The classifier

was trained using decision stumps boosted using AdaBoost, on local and context

features. The features were selected to highlight the curve like characteristics

of cell membranes. It is also shown that using features from context positions

allows for more information to be utilized in the classification. Together with

the nonlinear discrimination ability of the AdaBoost classifier there are clearly

noticeable improvements over previously used methods. Vife also detail several

experiments conducted for identification of synapse structures in the microscopy

images.

C O N T E N T S

A B S T R A C T iv

L I S T O F F I G U R E S vii

L I S T O F T A B L E S ix

A C K N O W L E D G E M E N T S x

C H A P T E R S

1. I N T R O D U C T I O N 1

1.1 Motivation 1 1.1.1 Previous work 2 1.1.2 Our work 2

1.2 Outline 3

2. B A C K G R O U N D 4

2.1 Neural circuitry 4 2.2 Transmission electron microscopy 5

2.2.1 Components and working 6 2.2.2 Challenges 6 2.2.3 Data preparation 6

2.3 Decision stumps 7 2.3.1 Decision stump learning 9 2.3.2 Decision stump classification 9

2.4 AdaBoost 9 2.5 Classifier on artificial datasets 9

2.5.1 Crescent dataset 10 2.5.2 Concentric circles dataset 10 2.5.3 Star dataset 10 2.5.4 Nonaxis aligned dataset 12 2.5.5 Comparison with other classifiers 13

3 . M E T H O D S 17

3.1 Image enhancement 18 3.2 Cell membrane detection 19

3.2.1 Features 19 3.2.2 Classifier 21

3.3 Synapse detection 24

CONTENTS

A B STRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. IV

LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. VII

LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . IX

ACKNOWLEDGEMENTS X

CHAPTERS

1. INTRODUCTION 1

1.1 Motivation ............... .... .... . ... . . .. ....... ....... 1 1.1.1 Previous work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 2 1.1.2 Our work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2 Outline......... .... . . . .... ..... . . ...... . .. . .. .. ....... 3

2. BACKGROUND .. . .... . .. . ... .. . . ..... . .. . .. ............ 4

2.1 Neural circuitry .... .. .................... . . . ... . ..... .. . 2.2 Transmission electron microscopy .. .. . .. ... . ..... . ... . ... .. .

2.2.1 Components and working ....... .. ... . ..... . . ........ . 2.2.2 Challenges.. . . ... . ... .. .. . .... . .. . . . ...... .. . .. . ... 2.2.3 Data preparation . . ................................. .

2.3 Decision stumps .............. . .. . . . . . ... . .. . ........... . 2.3 .1 Decision stump learning ... .. .. .. . .. ...... . ........ . . . 2.3.2 Decision stump classification ... . .... . . . . .. . ..... . ..... .

2.4 AdaBoost. .. . . . .. ... .. . . . . .. . ....... . . . . . ........ . . . . ..

4 5 6 6 6 7 9 9 9

2.5 Classifier on artificial datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 9 2.5.1 Crescent dataset .................................... 10 2.5.2 Concentric circles dataset .. ... . .. .... ..... . ...... ..... 10 2.5.3 Star dataset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.5.4 Nonaxis aligned dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.5.5 Comparison with other classifiers ....................... 13

3. METHODS ..... . ......... .. ............ . .. ........ . .. . . 17

3.1 Image enhancement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 18 3.2 Cell membrane detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 19

3.2.1 Features . . . ... . . . ... . ..... .. . .. .................... 19 3.2.2 Classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 21

3.3 Synapse detection ................. . . . .. . . ..... . .. . . .. . . . 24

3.3.1 Features 24 3.3.2 Scale and rotation invariant moments 25 3.3.3 Cumulative histogram and area features 27 3.3.4 Classifier 27 3.3.5 Orientation estimation 29

4. R E S U L T S 31

4.1 Cell membrane detection 31 4.2 Synapse detection 37

5. C O N C L U S I O N 39

5.1 Contribution 39 5.2 Future work 39

A P P E N D I C E S

A . A S S E M B L I N G L A R G E M O S A I C S O F E L E C T R O N M I C R O S C O P E I M A G E S U S I N G G P U S 40

B . S Y N A P S E V I E W E R 44

R E F E R E N C E S 46

vi

3.3.1 Features.... . .......... . ........ . ... . ... .......... . 24 3.3.2 Scale and rotation invariant moments. . . . . . . . . . . . . . . . . . .. 25 3.3.3 Cumulative histogram and area features . . . . . . . . . . . . . . . . .. 27 3.3.4 Classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.3.5 Orientation estimation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 29

4. RESULTS .. . ..... . ........ .. . . . ..... .. . . . ............... 31

4.1 Cell membrane detection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 31 4.2 Synapse detection . ...... .. ......... . ..... . ......... . .... 37

5. CONCLUSION ................................... . ...... 39

5.1 Contribution .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 39 5.2 Future work. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

APPENDICES

A. ASSEMBLING LARGE MOSAICS OF ELECTRON MICROSCOPE IMAGES USING GPUS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 40

B. SYNAPSE VIEWER ..... . .. . .. . ......................... 44

REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 46

vi

L I S T O F F I G U R E S

2.1 Neural tissue slices of C. elegans worm and rabbit retina 5

2.2 Neural tissue slices of rabbit retina showing synapses 5

2.3 Decision stump 7

2.4 Decision tree 8

2.5 Crescent dataset 11

2.6 Classifier performance at every round 11

2.7 Concentric circles dataset 12


2.9 Star dataset 14


2.11 Nonaxis aligned dataset 15


3.1 Block diagram of the proposed methods in the overall reconstruction pipeline 17

3.2 Comparison between original and CLAHE enhanced image 18

3.3 Stencil neighborhood 21

3.4 Plot of different functions in the decision stump learning algorithm . . 24

3.5 Cascade architecture 28

4.1 Semilog plot of number of boosting rounds versus the area under the ROC curve for that boosting round 32

4.2 ROC curves of the classifiers trained with AdaBoost at boosting round 3000. For comparison, the ROC for the method by Jurrus et al. [14] is also shown 33

4.3 Membrane detection results of the fold-1 test images in the 5-fold cross-validation: original images (left), and detected membranes (right). 34



LIST OF FIGURES

2.1 eural tissue slices of C. elegans worm and rabbit retina

2.2 Neural tissue slices of rabbit retina showing synapses ..... . . .. . . . .

2.3 Decision stump ... . . ... . ... . .......... .......... .. . . ..... .

2.4 Decision tree ......................................... . .. .

5

5

7

8

2.5 Crescent dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.6 Classifier performance at every round .. . .. . ... ............ . ... 11

2.7 Concentric circles dataset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 12

2.8 Classifier performance at every round .. .. . . .... .. ...... . ..... . 13

2.9 Star dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.10 Classifier performance at every round . . ....... .. ...... .... . ... 14

2.11 Nonaxis aligned dataset. . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . .. 15

2. 12 Classifier performance at every round ..... .. . ................. 16

3.1 Block diagram of the proposed methods in the overall reconstruction pipeline. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3.2 Comparison between original and CLARE enhanced image. . . . . . . .. 18

3.3 Stencil neighborhood. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 21

3.4 P lot of different functions in the decision stump learning algorithm .. 24

3.5 Cascade architecture ............................ .. .... . . . . 28

4.1 Semilog plot of number of boosting rounds versus the area under the ROC curve for t hat boosting round. . ......... . .. ............. 32

4.2 ROC curves of the classifiers trained with AdaBoost at boosting round 3000. For comparison , the ROC for the method by Jurrus et al. [14] is also shown. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 33

4.3 Membrane detection results of the fold-l test images in the 5-fold cross-validation: original images (left) , and detected membranes (right) . 34

4.4 Membrane detection results of the fold-2 test image in the 5-fold cross-validation: original images (left ), and detected membranes (right). 34

4.5 Membrane detection results of the fold- 3 test images in the 5-fold cross-validation: original images (left) , and detected membranes (right). 35



A.l Mice neuronal image tile slice and triangle mesh overlay 41

A.2 Sixteen tile mosaic showing 6 of the tiles and position 42

B.l Synapse viewer 45

viii

4.6 Membrane detection results of the fold-4 test images in t he 5-fold cross-validation: original images (left) and detected membranes (right). 35

4.7 Membrane detection results of the fold-5 test images in t he 5-fold cross-validation: original images (left) and detected membranes (right) . 36

A.l Mice neuronal image tile slice and triangle mesh overlay. . . . . . . . . . . 41

A.2 Sixteen tile mosaic showing 6 of t he t iles and position . . . . . . . . . . . .. 42

B.l Synapse viewer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 45

V lll

L I S T O F T A B L E S

2.1 Crescent dataset properties 10

2.2 Concentric circles dataset properties 12

2.3 Star dataset properties 13

2.4 Nonaxis aligned dataset properties 15

2.5 Comparison of classifiers 16

3.1 Eigenvalue of ellipses of different parameters 20

4.1 Statistics of samples in various experiments (average of all folds) . . . . 33

A.l Comparison of classifiers 43

LIST OF TABLES

2.1 Crescent dataset properties ................................. 10

2.2 Concentric circles dataset properties ............. . . . .. .. ...... 12

2.3 Star dataset properties. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 13

2.4 Nonaxis aligned dataset properties. . . . . . . . . . . . . . . . . . . . . . . . . . .. 15

2.5 Comparison of classifiers ................................... 16

3.1 Eigenvalue of ellipses of different parameters. . . . . . . . . . . . . . . . . . .. 20

4.1 Statistics of samples in various experiments (average of all folds) . . .. 33

A.l Comparison of classifiers .......... ....... .. . .. . ............ 43

A C K N O W L E D G E M E N T S

I am grateful to my advisor Dr. Tolga Tasdizen for being my mentor throughout

my graduate school life. I thank my committee members Dr. Ross T. Whitaker and

Dr. Hal Daume III for their valuable guidance on my work. I thank Dr. Antonio

R. C. Paiva and Elizabeth Jurrus for their collaboration in parts of this work and

support. I thank all fellow CRCNS group members for making my research work

in graduate school successful and thoroughly enjoyable. I would like to thank my

parents and extended family for their unconditional love and support through all

my endeavors. Finally, my special thanks to my friends Abigail M., Maheshwar D.,

Siddharth Shankar and others for all their support.

ACKNOWLEDGEMENTS

I am grateful to my advisor Dr. Tolga Tasdizen for being my mentor throughout

my graduate school life. I thank my committee members Dr. Ross T. Whitaker and

Dr. Hal Daume III for their valuable guidance on my work. I thank Dr. Antonio

R. C. Paiva and Elizabeth Jurrus for their collaboration in parts of this work and

support . I thank all fellow CRCNS group members for making my research work

in graduate school successful and t horoughly enjoyable. I would like to t hank my

parents and extended family for their unconditional love and support through all

my endeavors. Finally, my special thanks to my friends Abigail M., Maheshwar D.,

Siddharth Shankar and others for all their support.

C H A P T E R 1

I N T R O D U C T I O N

This chapter descibes the motivation for automating the detection of membranes

and synapses. Futher, it explains the organization of remaining chapters of this

thesis.

1 .1 M o t i v a t i o n

Neuro-scientists are currently developing new imaging techniques to better un

derstand the complex structure of the central nervous system. In particular, re

searchers are making efforts to map the connectivity of large volumes of individual

neurons in order to understand how signals are communicated across processes.

Mapping the connectivity of large volumes helps in identification of network mo

tifs [1]. The most extensive study undertaken thus far uses electron microscopy

to create detailed diagrams of neuronal structure [8] and connectivity [26, 4]. The

most well-known example of neural circuit reconstruction is of the 302 neurons

in the C. elegans worm. Even though this is one of the simplest organisms with

a nervous system, the manual reconstruction process took 10 years [4]. Human

interpretation of data over large volumes of neural anatomy is so labor intensive

that very little ground t ru th exists. For this reason, image processing and machine

learning algorithms are needed to automate the process and allow analysis of large

datasets by neural circuit reconstruction.

Serial-section transmission electron microscopy (TEM) is the preferred data

acquisition technology for capturing images of large sections of neuronal tissue.

Images from TEM span a wide field of view, capturing processes that may wander

through a specimen and have an in-plane resolution useful for identifying cellular

features such as synapses. These structures are critical in understanding neuron

CHAPTER 1

INTRODUCTION

This chapter descibes the motivation for automating the detection of membranes

and synapses. Futher, it explains the organization of remaining chapters of this

thesis.

1.1 Motivation

Neuro-scientists are currently developing new imaging techniques to better un

derstand the complex structure of the central nervous system. In particular, re

searchers are making efforts to map the connectivity of large volumes of individual

neurons in order to understand how signals are communicated across processes.

Mapping the connectivity of large volumes helps in identification of network mo

tifs [1]. The most extensive study undertaken thus far uses electron microscopy

to create detailed diagrams of neuronal structure [8] and connectivity [26 , 4]. The

most well-known example of neural circuit reconstruction is of the 302 neurons

in the C. elegans worm. Even though this is one of the simplest organisms with

a nervous system, the manual reconstruction process took 10 years [4]. Human

interpretation of data over large volumes of neural anatomy is so labor intensive

that very little ground truth exists . For t his reason, image processing and machine

learning algorithms are needed to automate the process and allow analysis of large

datasets by neural circuit reconstruction.

Serial-section transmission electron microscopy (TEM) is the preferred data

acquisition technology for capturing images of large sections of neuronal tissue.

Images from TEM span a wide field of view, capt uring processes t hat may wander

t hrough a specimen and have an in-plane resolution useful for identifying cellular

features such as synapses. These structures are critical in underst anding neuron

2

activity and function. Images from serial-section TEM are captured by cutting a

section from the specimen and suspending it over an electron beam which passes

through the section creating a projection that is captured as a digital image. See

figure 3.2(a) for an example serial-section TEM image corresponding to a cross-

section of the nematode C. elegans with a resolution of 6nmx6nmx33nm.

1.1.1 P r e v i o u s w o r k

An accurate mapping of neuron features begins with the segmentation of the

neuron boundaries. Jurrus et al. [14] use these boundaries to extract the three-

dimensional (3D) connectivity present in similar image volumes. In their method,

a contrast enhancing filter followed by a directional diffusion filter is applied to

the raw images to enhance and connect cellular membranes. The images are then

thresholded and neuron cell bodies are identified using a watershed segmentation

method. This method fails when membranes are weak or there are too many intra

cellular features. This indicates that more adaptive algorithms need to be developed

to segment these structures. For this reason, machine learning algorithms have been

shown as a successful alternative for identifying membranes in TEM data. In related

work, Jain et al. [13] use a multilayer convolution neural network to classify pixels as

membrane and nonmembrane. However, the stain used on the specimen highlights

cell boundaries, attenuating intracellular structures, simplifying the segmentation

task. Another successful application of learning applied to TEM is the use of a

perceptron trained with a set of predefined image features [17]. However, extensive

postprocessing is required to close the detected cell membranes and remove internal

cellular structures.

1.1.2 O u r w o r k

Our method described in this thesis improves upon previous work by utilizing

context information for discrimination of membrane pixels from the nonmembrane

pixels. By including the features of neighboring pixels as inputs to the classifier, the

classifier can utilize the context to deal with membrane disconnectivities. The fea

tures were designed to improve the classification accuracy of elongated structures,

2

activity and function . Images from serial-section TEM are captured by cutting a

section from the specimen and suspending it over an electron beam which passes

through the section creating a projection that is captured as a digital image. See

figure 3.2 (a) for an example serial-section TEM image corresponding to a cross

section of the nematode C. elegans with a resolution of 6nmx6nmx33nm.

1.1.1 Previous work

An accurate mapping of neuron features begins with the segmentation of the

neuron boundaries. Jurrus et al. [14] use these boundaries to extract the three

dimensional (3D) connectivity present in similar image volumes. In their method,

a contrast enhancing filter followed by a directional diffusion filter is applied to

the raw images to enhance and connect cellular membranes. The images are then

t hresholded and neuron cell bodies are identified using a watershed segmentation

method. This method fails when membranes are weak or there are too many intra

cellular features. This indicates that more adaptive algorithms need to be developed

to segment t hese structures. For this reason, machine learning algorithms have been

shown as a successful alternative for ident ifying membranes in TEM data. In related

work, J ain et al. [13] use a multilayer convolution neural network to classify pixels as

membrane and nonmembrane. However, the stain used on the specimen highlights

cell boundaries, attenuating intracellular structures, simplifying the segmentation

task. Another successful application of learning applied to TEM is the use of a

percept ron trained with a set of predefined image features [17]. However , extensive

postprocessing is required to close the detected cell membranes and remove internal

cellular structures.

1.1.2 Our work

Our method described in this thesis improves upon previous work by ut ilizing

context information for discrimination of membrane pixels from the nonmembrane

pixels. By including the features of neighboring pixels as inputs to the classifier , the

classifier can utilize the context to deal with membrane disconnectivities. The fea

tures were designed to improve the classification accuracy of elongated structures,

3

like the membranes. The nonlinear decision boundary is learned by a classifier

trained under the AdaBoost framework.

We also attempted identification of the synapses in the images. Region-based

attributes were used to quantify the shape of the regions. A framework capable

of detecting rare events effectively was used to detect synapses since synapses are

sparsely distributed across the image mosaic.

The following section gives the overview of the entire thesis.

1 .2 O u t l i n e

The remaining chapters of this thesis are organized as follows:

• Chapter 2 provides background material for the rest of the thesis. Section 2.1

gives a brief overview of neural circuitry, TEM and image acquisition using TEM

are described in section 2.2. Section 2.3 and section 2.4 give a brief introduction

to decision stumps and AdaBoost. Lastly in section 2.5 we discuss about our

classifier's performance on artificial datasets and its limitations.

• Chapter 3 is divided into two major sections. Section 3.2 and section 3.3

are dedicated to cell membrane detection and synspase detection respectively. Sec

tion 3.1 describes about the preprocessing of images before setting up the classifi

cation experiment. Subsections 3.2.1 and 3.3.1 describe the features used by the

classifier. The last subsections (section 3.2.2 and section 3.3.4) explain the machine

learning classifier setup in detail.

• Chapter 4 presents a detailed evaluation of the proposed methods of cell

membrane detection and synapse detection. Section 4.1 discusses the cell membrane

detection classifier training and testing on a C. elegans worm dataset, whereas the

synapse detection algorithm is run over a rabbit retina dataset. It is discussed in

section 4.2.

• Chapter 5 concludes the thesis and discusses the contributions of this thesis

(section 5.1) and proposed future work (section 5.2).

3

like the membranes . The nonlinear decision boundary is learned by a classifier

trained under the AdaBoost framework.

We also attempted identification of the synapses in the images. Region-based

attributes were used to quantify the shape of the regions. A framework capable

of detecting rare events effectively was used to detect synapses since synapses are

sparsely distributed across the image mosaic.

The following section gives the overview of the entire thesis.

1.2 Outline

The remaining chapters of this thesis are organized as follows:

• Chapter 2 provides background material for the rest of the thesis . Section 2.1

gives a brief overview of neural circuitry, TEM and image acquisition using TEM

are described in section 2.2. Section 2.3 and section 2.4 give a brief introduction

to decision stumps and AdaBoost . Lastly in section 2.5 we discuss about our

classifier's performance on artificial datasets and its limitations.

• Chapter 3 is divided into two major sections. Section 3.2 and section 3.3

are dedicated to cell membrane detection and synspase detection respectively. Sec

tion 3.1 describes about the preprocessing of images before setting up the classifi

cation experiment. Subsections 3.2.1 and 3.3.1 describe the features used by the

classifier. The last subsections (section 3.2.2 and section 3.3.4) explain the machine

learning classifier setup in detail.

• Chapter 4 presents a detailed evaluation of the proposed methods of cell

membrane detection and synapse detection. Section 4.1 discusses the cell membrane

detection classifier training and testing on a C. elegans worm dataset, whereas the

synapse detection algorithm is run over a rabbit retina dataset. It is discussed in

section 4.2.

• Chapter 5 concludes the thesis and discusses the contributions of this thesis

(section 5.1) and proposed future work (section 5.2).

C H A P T E R 2

B A C K G R O U N D

This chapter provides the necessary background information about the data used

in the experiments. It also gives the basic information necessary to understand the

experiment setup.

2 . 1 N e u r a l c i r c u i t r y

The nervous systems of animals are made up of numerous neural circuits that

transmit and process the sensory perception signals, motor activity control signals of

these organisms [16]. The neural circuits are a collection of neurons interconnected

to each other communicating through electrochemical reactions and electrical im

pulses. There are various types of neuronal cells and equally numerous types of

interconnections between them. The complexity of the neural circuit is dependent

on the types of neurons and their interconnectivity. The communication between

these neurons is through synapses that through the electrically excitable membranes

of neurons propagate the electrical signals. The physical structure of the neural

circuit can be observed using high magnification electron microscopes. Sample

images of neural circuits of C. elegans and rabbit mouse retina are shown in

Figure 2.1. In both the images of Figure 2.1 we can see cell membranes separating

one neuron from the other.

The images in Figure 2.2 show the synapse structures near the cell membranes.

The bulged shape is because of accumulation of vesicles near the region of transmis

sion. This bulged dark region is called the presynaptic density of the synapse [6].

There are various other types of synapses that result in just darkening of the

membrane structure. There is a type of synapse called as the "Gap junctions",

CHAPTER 2

BACKGROUND

This chapter provides the necessary background information about the data used

in the experiments. It also gives the basic information necessary to understand the

experiment setup.

2.1 Neural circuitry

The nervous systems of animals are made up of numerous neural circuits that

transmit and process the sensory perception signals, motor activity control signals of

these organisms [16]. The neural circuits are a collection of neurons interconnected

to each other communicating through electrochemical reactions and electrical im

pulses. There are various types of neuronal cells and equally numerous types of

interconnections between them. The complexity of the neural circuit is dependent

on the types of neurons and their interconnectivity. The communication between

these neurons is through synapses that through the electrically excitable membranes

of neurons propagate the electrical signals. The physical structure of the neural

circuit can be observed using high magnification electron microscopes. Sample

images of neural circuits of C. elegans and rabbit mouse retina are shown in

Figure 2.1. In both the images of Figure 2.1 we can see cell membranes separating

one neuron from the other.

The images in Figure 2.2 show the synapse structures near the cell membranes.

The bulged shape is because of accumulation of vesicles near the region of transmis

sion. This bulged dark region is called the presynaptic density of the synapse [6].

There are various other types of synapses that result in just darkening of the

membrane structure. There is a type of synapse called as the "Gap junctions",

(a) Synapse 1 (b) Synapse 2 (c) Synapse 3 (d) Synapse 4

F i g u r e 2.2. Neural tissue slices of rabbit retina showing synapses

which is a rod like structure spanning across neurons. This synapse structure

shows clumping of the vesicles even in the interior of the neurons.

2 . 2 T r a n s m i s s i o n e l e c t r o n m i c r o s c o p y

Transmission electron microscopy(TEM) is a microscopy technique in which a

beam of electrons is transmitted through a very thin sliced specimen (in order of mi

crometers or nanometers) interacting with the specimen as they pass through. This

interaction of electrons with the specimen results in an image of the specimen [27],

which is then magnified and focused onto an imaging devices like a fluorescent

screen or a photographic film. Modern microscopes have CCD cameras to directly

fa) Synapse 1 (b) Synapse 2 (c) Synapse 3 (d) Synapse 4

5

(a) C. elegans worm neural tissue (b) Rabbit retina neural tissue

Figure 2.1. Neural tissue slices of C. elegans worm and rabbit retina

(a) Synapse 1 (b) Synapse 2 (c) Synapse 3 (d) Synapse 4

Figure 2.2 . Neural tissue slices of rabbit retina showing synapses

which is a rod like structure spanning across neurons. This synapse structure

shows clumping of the vesicles even in the interior of the neurons.

2.2 Transmission electron microscopy

Transmission electron microscopy(TEM) is a microscopy technique in which a

beam of electrons is transmitted through a very thin sliced specimen (in order of mi

crometers or nanometers) interacting with the specimen as they pass through. This

interaction of electrons with the specimen results in an image of the specimen [27],

which is then magnified and focused onto an imaging devices like a fluorescent

screen or a photographic film. Modern microscopes have CCD cameras to directly

6

digitize these high resolution images.

Since electrons are equivalent to electromagnetic waves of very small wavelength,

TEMs are capable of very high resolution (in order of nanometer/pixel) images

compared to light microscopes, which operate with electromagnetic waves of much

higher wavelength. This enables the neural tissues to be scanned at very high

magnifications.

2 .2 .1 C o m p o n e n t s a n d w o r k i n g

The TEM is vacuumized system where the electrons travel and interact with

the specimen. One end of the TEM has an electron gun, which is the source of the

beam of electrons. The other end of the TEM has the imaging device to capture the

image of the specimen. A series of electromagnetic lenses and electrostatic plates

guide the electron beam through the specimen and focus on the imaging device.

The imaging device provides the observer with the image of the specimen under

observation.

2.2.2 C h a l l e n g e s

The usage of TEM for imaging the neural tissue has the following challenges:

Since the TEM operates at very high magnifications, it can scan only a small area

of the specimen. However, the neural tissue spans much more than a few microns;

thus a single specimen has to be scanned as multiple tiles. Since electron beams are

equivalent to high energy beams impacting the specimen, bombardment of these

beams on the specimen causes nonuniform heating of the tissue and subsequent

distortion of the sample.

2.2.3 D a t a p r e p a r a t i o n

The section of neural tissue that is the region of interest is pigmented and frozen.

Then thin specimens are sliced out using a microtome. Once ultrathin specimens

are prepared, as discussed in the section 2.2.2, the specimens are imaged as tiles.

The ir-tool chain [15] is used to assemble the tiles to a mosaic and further the

slices are registered together to reconstruct the final volume. A faster method to

6

digitize these high resolution images.

Since electrons are equivalent to electromagnetic waves of very small wavelength,

TEMs are capable of very high resolution (in order of nanometer/ pixel) images

compared to light microscopes, which operate with electromagnetic waves of much

higher wavelength. This enables the neural t issues to be scanned at very high

magnifications.

2.2.1 Components and working

The TEM is vacuumized system where the electrons travel and interact with

the specimen. One end of the TEM has an electron gun , which is t he source of the

beam of electrons. The ot her end of the TEM has t he imaging device to capture the

image of the specimen. A series of electromagnetic lenses and electrostatic plates

guide the electron beam through the specimen and focus on the imaging device.

The imaging device provides the observer with t he image of the specimen under

observation.

2.2.2 Challenges

The usage of TEM for imaging the neural tissue has the following challenges:

Since the TEM operates at very high magnifications, it can scan only a small area

of the specimen. However, the neural tissue spans much more than a few microns;

thus a single specimen has to be scanned as multiple t iles . Since electron beams are

equivalent to high energy beams impacting the specimen, bombardment of these

beams on the specimen causes nonuniform heating of the tissue and subsequent

distortion of the sample.

2 .2 .3 Data preparation

The section of neural tissue that is the region of interest is pigmented and frozen.

Then thin specimens are sliced out using a microtome. Once ultrathin specimens

are prepared, as discussed in the section 2.2.2, the specimens are imaged as t iles.

The ir- tool chain [15] is used to assemble t he t iles to a mosaic and further the

slices are registered together to reconstruct the final volume. A faster method to

7

assemble such tiles to a mosaic is explained in chapter A.

2 . 3 D e c i s i o n s t u m p s

The decision stump is a special case of decision tree [18], [12], which is a class

of supervised learning algorithms frequently used in data mining and machine

learning.

Figure 2.3 and Figure 2.4 show an example of a decision stump and a decision

tree, respectively. The algorithm constructs a decision tree with just one decision

node and two classification leaves during training based on a given set of training

samples. Decision stump is the weak learner, i.e., it cannot give the best classifica

tion for the samples but a rather simple and fast classifier with accuracy at least

just greater than 50% where possible. The following section explains the learning

and classification of samples using decision stumps.

F i g u r e 2 .3 . Decision stump

7

assemble such tiles to a mosaic is explained in chapter A.

2.3 Decision stumps

The decision stump is a special case of decision tree [18], [12], which is a class

of supervised learning algorithms frequent ly used in data mining and machine

learning.

Figure 2.3 and Figure 2.4 show an example of a decision stump and a decision

tree , respectively. The algorithm constructs a decision tree with just one decision

node and two classification leaves during training based on a given set of training

samples . Decision stump is the weak learner , i. e. , it cannot give the best classifica

tion for the samples but a rather simple and fast classifier with accuracy at least

just greater than 50% where possible. T he following section explains the learning

and classification of samples using decision stumps.

Yes

Class 0

Figure 2.3 . Decision stump

Sample X (N attributes)

Is (X[attributenl > threshold)

No

Class 1

F i g u r e 2.4. Decision tree

Yes

Is (X[attributem] >

thresholdm)

Yes

8 Figure 2.4. Decision tree

Yes,---<

Is (X[attributep] >

thresholdp)

No

Is (X[attributen] >

threshold n)

No

Yes

Is

NA Is

(X[attributeq] > threshold q )

(X[attribute,] > ~ thre/

Class 1

00

9

2.3.1 Dec i s ion s t u m p l e a r n i n g

Like the rest of the supervised learning algorithms, the learning algorithm takes

the following as the input:

1. Attributes

2. Attribute values

3. Sample classification

4. Sample weights

The learning algorithm chooses the at tr ibute and a threshold value that gives the

best classification performance and margin for the decision stump as classifier.

2.3.2 Dec i s ion s t u m p class i f icat ion

The classification of a test sample based on the learned model is trivial. The

model gives the attribute, threshold value and the inequality relation of the at

tribute value to the threshold. The evaluation of the inequality equation gives the

classification of the sample.

2 . 4 A d a B o o s t

AdaBoost [20] is a type of boosting algorithm. Boosting is an ensemble learning

technique where weak learners such as decision stumps are used as components to

create strong classifiers. The AdaBoost meta-algorithm at each round learns a weak

classifier with accuracy at least greater than 50% for a set of weighted samples. This

weak classifier adds to the set of weak classifiers learned in the previous rounds.

The final classification depends on the classification of weak learners in each round.

2 . 5 C l a s s i f i e r o n a r t i f i c i a l d a t a s e t s

The boosted decision stumps were tested for their classification performance on

few artificial datasets before applying to the real image datasets. This was done to

access the performance of the classifier on certain specific nature of these datasets.

The datasets and their unique traits are listed below:

9

2.3.1 Decision stump learning

Like the rest of the supervised learning algorithms, the learning algorithm takes

the following as the input:

1. Attributes

2. Attribute values

3. Sample classification

4. Sample weights

The learning algorithm chooses the attribute and a threshold value that gives the

best classification performance and margin for the decision stump as classifier.

2.3.2 Decision stump classification

The classification of a test sample based on the learned model is trivial. The

model gives the attribute, threshold value and the inequality relation of the at

tribute value to the threshold . The evaluation of the inequality equation gives the

classification of the sample.

2.4 AdaBoost

AdaBoost [20] is a type of boosting algorithm. Boosting is an ensemble learning

technique where weak learners such as decision stumps are used as components to

create strong classifiers. The AdaBoost meta-algorithm at each round learns a weak

classifier with accuracy at least greater than 50% for a set of weighted samples. This

weak classifier adds to t he set of weak classifiers learned in the previous rounds.

The final classification depends on the classification of weak learners in each round.

2.5 Classifier on artificial datasets

The boosted decision stumps were tested for t heir classification performance on

few artificial datasets before applying to the real image datasets. This was done to

access the performance of the classifier on certain specific nature of these datasets .

The datasets and their unique traits are listed below:

10

1. Crescent dataset - Linearly nonseparable

2. Concentric circles dataset - Linearly nonseparable and means of the classes

are very close to each other if not the same

3. Star dataset - Linearly nonseparable and multiple clustering of samples of

same dataset

4. Nonaxis aligned dataset - Decision boundary is not aligned to any of the

dimension's axes

The results of such experiments are discussed in the following sections.

2 .5 .1 C r e s c e n t d a t a s e t

Table 2.1 describes the dataset properties. Figure 2.5 shows a typical crescent

dataset and Figure 2.6 shows the performance of the classifier at various rounds of

the learning.

2.5.2 C o n c e n t r i c c i rc les d a t a s e t

Table 2.2 describes the dataset properties. Figure 2.7 shows a typical concentric

circles dataset, and Figure 2.8 shows the performance of the classifier at various

rounds of the learning.

2 .5.3 S t a r d a t a s e t

Table 2.3 describes the dataset properties. Figure 2.9 shows a typical star

dataset, and Figure 2.10 shows the performance of the classifier at various rounds

of the learning.

T a b l e 2 . 1 . Crescent dataset properties Attribute Attribute nature Data, dimensionality 2 Number of classes 2 Separability Separable but linearly nonseparable Description The samples of one class are distributed as a crescent. The

mean of the samples are clearly separated.

10

1. Crescent dataset - Linearly nonseparable

2. Concentric circles dataset - Linearly nonseparable and means of the classes

are very close to each other if not the same

3. Star dataset - Linearly nonseparable and multiple clustering of samples of

same dat aset

4. Nonaxis aligned dataset - Decision boundary is not aligned to any of the

dimension 's axes

The results of such experiments are discussed in the following sections.

2.5.1 Crescent dataset

Table 2.1 describes the dataset properties. Figure 2.5 shows a typical crescent

dataset and Figure 2.6 shows the performance of the classifier at various rounds of

the learning.

2.5.2 Concentric circles dataset

Table 2.2 describes the dataset properties. Figure 2.7 shows a typical concentric

circles dataset , and Figure 2.8 shows the performance of the classifier at various

rounds of the learning.

2.5.3 Star dataset

Table 2.3 describes the dataset properties. Figure 2.9 shows a typical star

dataset , and Figure 2.10 shows the performance of the classifier at various rounds

of the learning.

Table 2.1. Crescent dataset properties Attribute Attribute nature Data dimensionality 2 N umber of classes 2 Separability Separable but linearly nonseparable Description The samples of one class are distributed as a crescent. The

mean of the samples are clearly separated .

6

5

4

3

< 2

1

0

-1

"-2i

Class 0 Class 1

* ' ~ « :%*

1 2 X-Axis

F i g u r e 2 .5 . Crescent dataset

0.8

D (0 > 0.6 "D 0) N

I 0.4

0.2

Accuracy True positive rate False positive rate

4 6 8 Boost ing Round

10 12

F i g u r e 2.6. Classifier performance at every round

6

5

4

3 .~ ><

2 -< I

>-1

0

- 1

- 2 -1 o

Figure 2 .5. Crescent dataset

Q) :::l ctJ

0.8

> 0.6 '0 Q)

.~ ctJ E 0.4 ... o z

0.2

.• -----'t" //'

2

1 2 X-Axis

,." /

" //

/ __ .+-_---"!r----- ¥

I ' Class 01

Class 1 t

3 4

/" / " /

" '/

- Accuracy --a-- True positive rate

False positive rate

...... __ ..

468 Boosting Round

10 12

Figure 2.6. Classifier performance at every round

11

12

T a b l e 2.2. Concentric circles dataset properties Attribute Attribute nature Data dimensionality 2 Number of classes to

Separability Separable but linearly nonseparable Description The samples of one class are distributed within a circle.

The other class is a ring surrounding the other class. The mean of the samples are very close to each other.

3

2

1 CD ">< n < 0 >•

-1

C lass 0 C la s s 1

- 2

- 3 - 3 - 2 0

X-Axis

F i g u r e 2.7. Concentric circles dataset

2.5.4 N o n a x i s a l i gned d a t a s e t

We have seen that the classifier is able to learn all the above artificial datasets to

100 percent accuracy in training. One of the main disadvantages of using decision

stumps as weak classifiers is that the model cannot learn classifications based on

multiple attributes or it can learn based on distribution of just one attr ibute.

This statement is corroborated in the following experiment with nonaxis aligned

dataset. Table 2.4 describes the dataset properties. Figure 2.11 shows a typical

nonaxis aligned dataset and Figure 2.12 shows the performance of the classifier at

various rounds of the learning. We can see from Figure 2.12 that the classifier takes

Table 2 .2. Concentric circles dataset properties Attribute Data dimensionality N umber of classes Separability Description

3

2

1

1/1 .;;: 0 «

I >-

- 1

- 2

-3 - 3 -2

Attribute nature 2 2 Separable but linearly nonseparable The samples of one class are distributed within a circle. The other class is a ring surrounding the other class. The mean of the samples are very close to each other.

-1 0 X-Axis

, Class 0 Class 1

2 3

Figure 2.7. Concentric circles dataset

2.5 .4 Nonaxis aligned dataset

12

We have seen that the classifier is able to learn all the above artificial datasets to

100 percent accuracy in training. One of the main disadvantages of using decision

stumps as weak classifiers is that the model cannot learn classifications based on

multiple attributes or it can learn based on distribution of just one attribute.

This statement is corroborated in the following experiment with nonaxis aligned

dataset . Table 2.4 describes the dataset properties. Figure 2.11 shows a typical

nonaxis aligned dataset and Figure 2.12 shows the performance of the classifier at

various rounds of the learning. We can see from Figure 2.12 that the classifier takes

13

0.2


10 20 30 Boosting Round

40


T a b l e 2 .3 . Star dataset properties Attribute Attribute nature Data dimensionality 2 Number of classes to

Separability Separable but linearly nonseparable Description The samples of one class are distributed as four distinct

clusters with cluster centers on the axis and equidistant from the origin. The samples of the other classes are distributed as clusters with centers exactly in between the clusters of the other class. The means of the samples are very close to each other.

many rounds to learn the 2D linear decision boundary. Section 2.5.5 compares the

classification performance of boosted decision stump classifier with other types of

classifiers.

2.5.5 C o m p a r i s o n w i t h o t h e r classifiers

More complex learning algorithms, such as a perceptron, can learn these kind

of distinct linear decision boundaries in just one round of learning. Support vector

machines(SVM) [23, 2, 5] are capable of projecting the data into infinite dimen-

Q) :J nJ > 0.6 '0 Q)

.!::! nJ E 0.4 ~

o z 0.2

, "

~Accuracy

-e-True positive rate False positive rate

10 20 30 40 Boosting Round


Table 2.3. Star dataset properties Attribute Attribute nature Data dimensionali ty 2 Number of classes 2 Separability Separable but linearly nonseparable Description The samples of one class are distributed as four distinct

clusters with cluster centers on the axis and equidistant from the origin. The samples of the other classes are distributed as clusters with centers exactly in between the clusters of the other class. The means of the samples are very close to each other.

13

many rounds to learn the 2D linear decision boundary. Section 2.5.5 compares the

classification performance of boosted decision stump classifier with other types of

classifiers.

2.5.5 Comparison with other classifiers

More complex learning algorithms, such as a perceptron , can learn these kind

of distinct linear decision boundaries in just one round of learning. Support vector

machines(SVM) [23, 2, 5] are capable of projecting the data into infinite dimen-

3

2

1

< 0

- 1

- 2

"-33 - 2 -1 0 1 X-Axis

Class 0 Class 1

F i g u r e 2.9. Star dataset

0.8 o (0 > 0.6 "D CD N E 0.4

0.2


4 6 8 Boost ing Round

10 12


3

2

1 III "x

0 oCt I

>-

'~' . ~ .

, .Jfi v

-1

-2

- 3 - 3 -2

F igure 2.9. Star dataset

1

0.8 Q) :::J (ij > 0.6 " Q)

.!::! IV E 0.4 ~

0 z 0.2

00 2

-1 o X-Axis

/

/

/

1

- .- Accuracy

2

-e- True positive rate

3

False positive rate

468 Boosting Round

10 12

F igure 2.10. Classifier performance at every round

14

15

T a b l e 2.4. Nonaxis aligned dataset properties Attribute Attribute nature Data dimensionality 2 Number of classes 2 Separability Linearly nonseparable Description The samples of both classes are Gaussian distributions

with variance along one direction much larger than the other variance in direction in the perpendicular direction. The two classes are separated by a linear decision boundary that is not parallel to any of the axis. The mean of the samples are well separated from each other.

20 r

10

(A <

•10-

Class 0 Class 1

-20

- 3 -20 -10 0 10

X-Axis 20 30

F i g u r e 2 .11 . Nonaxis aligned dataset

sions using the kernel trick and learning nonlinear decision boundaries. Here we

have conducted experiments with SVMs which are capable of learning decision

boundaries with maximum margin. The test results of few such algorithms are

shown in Table 2.5.

The problem with these models is that their training time increases with the

increase in dimensionality and number of samples of the training dataset. We tried

training the SVM classifiers on microscopy datasets on subset of its samples and

learned classifiers did not yield the same classification performance as the boosted

Table 2.4. Nonaxis aligned dataset properties Attribute Attribute nature Data dimensionali ty 2 N umber of classes 2 Separabili ty Linearly nonseparable Description The samples of both classes are Gaussian distribut ions

wi th variance along one direction much larger than the other variance in direction in the perpendicular direction . The two classes are separated by a linear decision boundary that is not parallel to any of the axis. The mean of the samples are well separated from each other .

20'---~----~--~----~r=======~

I

. Class 0

1/1 'x « I

10

o

>- -10

-20 -.

- 30 -30 -20

~. ". '---_C_la_s_s_1--" ·t-.

~-. ., "' . ..., .. jt;'''''''~

'"'-i;/f~ .Y

, l.';~':. ~ •• ;fll' •

r.',;' / .... "

- 10 0 X- Axis

10 20 30

Figure 2.11 . Nonaxis aligned dataset

15

sions using the kernel trick and learning nonlinear decision boundaries. Here we

have conducted experiments with SVMs which are capable of learning decision

boundaries with maximum margin. The test results of few such algorithms are

shown in Table 2.5.

The problem with t hese models is that their t raining time increases with the

increase in dimensionality and number of samples of the training dataset. We tried

training the SVM classifiers on microscopy datasets on subset of its samples and

learned classifiers did not yield the same classification performance as the boosted

16

1

0.8 o 3 re > 0.6 ~o o N re . . E 0.4 >_ o z

0.2

0 5 10 15 20 25 Boost ing Round


T a b l e 2.5. Comparison of classifiers Classifier Boosting

rounds AdaBoost and decision >6 stump AdaBoost and percep 1 tion AdaBoost and SVM with 1 linear kernel AdaBoost and SVM with 1 RBF kernel

rf 1 /

- ^ A c c u r a c y ^ T r u e positive rate

False positive rate

i Y

decision stumps trained on the whole dataset. Thus we chose the decision stump

based classifier for our problem.

Q) ~

ca

0.8

> 0.6 "0 Q)

.~

---+- Accuracy --<3- True positive rate

ca E 0.4

- False positive rate ... o z

0.2

5 10 15 Boosting Round


Table 2.5. Comparison of classifiers Classifier Boost ing

rounds AdaBoost and decision 26 stump AdaBoost and percep- 1 tron AdaBoost and SVM with 1 linear kernel AdaBoost and SVM with 1 RBF kernel

20 25

16

decision stumps trained on the whole dataset. Thus we chose the decision stump

based classifier for our problem.

C H A P T E R 3

M E T H O D S

This chapter describes the proposed methods for membrane detection and synapse

detection in detail. Figure 3.1 provides an overview of the fundamental steps.

Initially, the contrast of the gray scale images is normalized using CLAHE [11].

Then the feature values for the individual pixels of the enhanced images are gen

erated. Using the ground t ru th markup of cell membranes and synapses for these

images, a supervised learning experiment is set up. The decision stumps (weak

AdaBoost & Decision Stumps

Generate Hessian features

Predicted membrane

images

Segmentation

Input image CLAHE

enhanced image

Generate Moment features

Cascade of AdaBoost &

Decision stumps

Predict neuron interconnects

Orientation of predicted synapses

Connection Graph XML

F i g u r e 3 .1 . Block diagram of the proposed methods in the overall reconstruction pipeline

CHAPTER 3

METHODS

This chapter describes the proposed methods for membrane detection and synapse

detection in detail. Figure 3.1 provides an overview of the fundamental steps.

Initially, the contrast of the gray scale images is normalized using CLARE [11] .

Then the feature values for the individual pixels of the enhanced images are gen

erated. Using the ground truth markup of cell membranes and synapses for these

images, a supervised learning experiment is set up. The decision stumps (weak

AdaBoost & Predicted Decision f- membrane Stumps images

t • Generate

I

Hessian Segmentation features

CLAHE Connection

Input image f- enhanced Graph XML

image

I Generate

Predict neuron Moment features

interconnects

~ t Cascade of

Orientation of AdaBoost& f- predicted

Decision stumps

synapses

Figure 3.1. Block diagram of the proposed methods in the overall reconstruction pipeline

18

classifiers) are boosted [9] for several rounds until a high accuracy classifier is ob

tained in case of the membrane detection experiment. In case of synapse detection,

the same classifier is used in a cascaded architecture. The following sections review

these steps in greater detail.

3 . 1 I m a g e e n h a n c e m e n t

Before the images are used for feature extraction step, a contrast limited adap

tive histogram equalization (CLAHE) [11] is applied to the raw electron microscopy

images. This method changes the grey value of the pixels depending upon the

pixel values of neighboring pixels in the image, thus improving the local contrast.

This improves the contrast of the cell membranes locally against the contents

inside and outside the neuron cell, and also fixes overall brightness variability

between images [14]. The decrease in variability greatly helps the classifier since

it reduces the difference between training images, between training and testing

images. An example of such CLAHE enhancement is shown in Figure 3.2. The

CLAHE algorithm is shown in Algorithm 1.

(a) Original image (b) CLAHE Enhanced Image

F i g u r e 3.2. Comparison between original and CLAHE enhanced image

18

classifiers) are boosted [9] for several rounds until a high accuracy classifier is ob

tained in case of the membrane detection experiment. In case of synapse detection,

the same classifier is used in a cascaded architecture. The following sections review

these steps in greater detail.

3.1 Image enhancem ent

Before the images are used for feature extraction step, a contrast limited adap

tive histogram equalization (CLARE) [11] is applied to the raw electron microscopy

images. This method changes the grey value of t he pixels depending upon the

pixel values of neighboring pixels in the image, thus improving t he local contrast .

This improves the contrast of the cell membranes locally against the contents

inside and outside the neuron cell , and also fixes overall brightness variability

between images [14] . The decrease in variability greatly helps the classifier since

it reduces the difference between training images, between training and testing

images . An example of such CLARE enhancement is shown in Figure 3.2. The

CLARE algorithm is shown in Algorithm 1.

(a) Original image (b) CLARE Enhanced Image

Figure 3.2 . Comparison between original and CLARE enhanced image

19

A l g o r i t h m 1 CLAHE 7 <— Image to be enhanced O <— Enhanced Image W *— Moving window s <— Maximum contrast limit (n, n) <— Height and width of W Pad image 7 with (n — l ) / 2 pixels on all sides for For every pixel p in 7 d o

Construct window W around pixel p Pw P D F of grey values of pixels in W cw <— CDF of grey values of pixels in W such that the max difference between consecutive bins in s Op <— cw{pixelp)

e n d for

3 . 2 C e l l m e m b r a n e d e t e c t i o n

3.2.1 F e a t u r e s

Four features were computed for each pixel in the image: the pixel intensity,

and eigenvalues and orientation of the first eigenvector of the Gaussian smoothed

Hessian matrix. The gray value of the pixel is utilized since membranes are usually

dark and therefore are useful for segmentation, as verified in previous works [14, 13,

17]. The other three features are properties derived from the Gaussian smoothed

Hessian matrix,

H(x,y) = G„ * d2i o2i dx2 dxdy d2I d2I (3.1)

dydx dy2

where 7 is the (CLAHE enhanced) image, and Ga is the Gaussian blurring kernel

with standard deviation a. The Hessian matrix was used in the context of fil

tering [21] and segmenting [17] electron microscopy images. Since membranes are

elongated structures, the eigenvalues of the smoothened Hessian matrix represent

the anisotropic nature of the region around the pixel. The eigenvalue of the

principal eigenvector of the Hessian is proportional to the gradient orthogonal to

the membrane and the smaller eigenvalue is proportional to the gradient along

the cell membrane. The ability of this feature to measure the anisotropic nature

of the shape can be seen when we compare the eigenvalues of ellipses of different

eccentricities.

Algorithm 1 CLAHE I +- Image to be enhanced o +- Enhanced Image W +- Moving window s +- Maximum contrast limit (n, n) +- Height and width of W Pad image I with (n - 1) /2 pixels on all sides for For every pixel p in I do

Construct window W around pixel p Pw +- PDF of grey values of pixels in W

19

Cw +- CDF of grey values of pixels in W such that the max difference between consecutive bins in s Op +- cw(pixelp )

end for

3.2 Cell membrane detection

3.2.1 Features

Four features were computed for each pixel in the image: the pixel intensity,

and eigenvalues and orientation of the first eigenvector of the Gaussian smoothed

Hessian matrix. The gray value of the pixel is utilized since membranes are usually

dark and therefore are useful for segmentation , as verified in previous works [14, 13,

17]. The other three features are properties derived from the Gaussian smoothed

Hessian matrix,

[

[P I

H(x, y) = Ga * ~~~ oyox

(3.1 )

where I is the (CLAHE enhanced) image, and Ga is t he Gaussian blurring kernel

with standard deviation cr. The Hessian matrix was used in the context of fil

tering [21] and segmenting [17] electron microscopy images. Since membranes are

elongated structures, the eigenvalues of the smoothened Hessian matrix represent

the anisotropic nature of the region around the pixel. The eigenvalue of the

principal eigenvector of the Hessian is proportional to the gradient orthogonal to

the membrane and the smaller eigenvalue is proportional to the gradient along

the cell membrane. The ability of this feature to measure the anisotropic nature

of the shape can be seen when we compare the eigenvalues of ellipses of different

eccentricit ies .

20

The eigenvalues of ellipse of different eccentricities are shown in Table 3.1. As we

see, the eigenvalue of the principal eigenvector increases with increase in eccentricity

of the ellipse as expected. Thus when the ratio of the major axis to the minor axis is

near one, the shape is more circular and the blob region is more likely to represent

a vesicle than a membrane.

The fourth feature is the orientation of the principal eigenvector at that point.

The inclusion of this feature gains significance during learning of the classifier

because the neighboring pixels features are also considered.

The feature vector for every pixel in the image consists of the feature values

of that pixel and of its neighbors. The neighborhood is defined by a star shaped

stencil with its 8 arms forking out every 45 degrees (Figure 3.3). We show in the

results section that the neighboring pixel features adds relevant information for the

T a b l e 3 .1 . Eigenvalue of ellipses of different parameters Ellipse Parameters Ratio of eigenvalues

• Major axis = 20, Minor axis = 20

1:1

• Major axis = 20, Minor axis = 40

2:1

Major axis =100, Minor axis =10

10:1

20

The eigenvalues of ellipse of different eccentricities are shown in Table 3.1. As we

see, the eigenvalue of the principal eigenvector increases with increase in eccentricity

of the ellipse as expected. Thus when the ratio of the major axis to the minor axis is

near one , the shape is more circular and the blob region is more likely to represent

a vesicle than a membrane.

The fourth feature is the orientation of the principal eigenvector at that point.

The inclusion of this feature gains significance during learning of the classifier

because the neighboring pixels features are also considered .

The feature vector for every pixel in the image consists of the feature values

of that pixel and of its neighbors. The neighborhood is defined by a star shaped

stencil with its 8 arms forking out every 45 degrees (Figure 3.3). We show in the

results section that the neighboring pixel features adds relevant information for the

Table 3.1. Eigenvalue of ellipses of different parameters Ellipse Parameters Ratio of eigenvalues

Major axis = 20, Minor 1:1 axis = 20



21

F i g u r e 3.3. Stencil neighborhood.

classification. The context helps to identify membranes at regions were there are

minor discontinuities, as it allows for the classifier to utilize the context information

to "interpolate" the cell membrane. In this regard, the orientation feature plays

an important role by imposing a smoothness constraint on the curvature of the

membrane.

3.2.2 Classifier

We propose to utilize a classifier trained with AdaBoost [9] since such a classifier

can model a nonlinear decision boundary. AdaBoost is a meta-algorithm that builds

the classifier from "weak" classifiers, such as a decision stump. At each round,

AdaBoost adds a weak classifier to the set of weak classifiers by training for best

classification performance according to samples weights. The sample weights are

varied depending on the classification result of the previous round, by increasing

the weights of incorrectly classified samples and decreasing the weights of correctly

classified samples. The final classifier is a weighted sum of the weak classifiers

according to their accuracy in the training rounds. It has been observed empirically

in previous experiments that the obtained classifiers generally do not over fit [10],

[3], [7], [19]. The algorithm for AdaBoost is given in Algorithm 2.

In this paper, decision stumps are used for the weak classifier. Decision stumps

are the simplest form of binary decision trees with just one decision node. The deci

sion stump makes the classification decision based on just the value of a particular

feature with respect to a threshold. Given the feature set, desired classification

and prior of the samples, the threshold for a particular feature can be chosen

21

o

o

C -G

c

o

Figure 3.3. Stencil neighborhood.

classification. The context helps to identify membranes at regions were there are

minor discontinuit ies, as it allows for t he classifier to utilize t he context information

to "interpolate" the cell membrane. In this regard, t he orientation feature plays

an important role by imposing a smoothness constraint on the curvature of the

membrane.

3.2.2 Classifier

We propose to utilize a classifier trained wi th AdaBoost [9] since such a classifier

can model a nonlinear decision boundary. AdaBoost is a meta-algorithm that builds

the classifier from "weak" classifiers, such as a decision stump. At each round ,

AdaBoost adds a weak classifier to t he set of weak classifiers by t raining for best

classification performance according to samples weights. The sample weights are

varied depending on t he classification result of the previous round , by increasing

t he weights of incorrectly classified samples and decreasing the weights of correctly

classified samples. The final classifier is a weighted sum of the weak classifiers

according to their accuracy in the t raining rounds. It has been observed empirically

in previous experiments that t he obtained classifiers generally do not over fit [10],

[3], [7], [19]. The algorithm for AdaBoost is given in Algorithm 2.

In this paper , decision stumps are used for the weak classifier. Decision stumps

are the simplest form of binary decision trees with just one decision node. The deci

sion stump makes the classification decision based on just the value of a part icular

feature with respect to a threshold. Given t he feature set , desired classification

and prior of the samples , the t hreshold for a part icular feature can be chosen

22

A l g o r i t h m 2 AdaBoost X <— Samples Y <— Classification N <r— Number of samples D <— Number of dimensions T <— Number of rounds of boosting Wi(n) <— 4 , where n = 1 . . . N {Initializing weight distribution of samples} for t = 1 . . . T d o

ht <— Train weak learner using weight distribution Wt

N et <— 2_\Wt{n)[yn 7̂ ht{xn)\{Calculate weighted error rate}

based on the probability distribution functions of membrane and nonmembrane

classes over the feature values without making any underlying assumption about

the distribution of the feature. This gives the stump of best accuracy compared to

the ones built using other metrics like information gain. The AdaBoost mechanism

along with the decision stump classifier acts as a feature selection mechanism [25].

Algorithm 3 shown the algorithm for building the decision stump with maximum

classification performance. The algorithm is modified to get maximum performance

by vectorizing the operations. This algorithm also takes care of maximizing the

margin for the decision boundary.

The various functions specified in the algorithm are specified in the following

Wt+i(n) +- Wt(n)eatyMxn){Calculate new weight distribution} Wf+i(n) < Wi+i(n)—{Normalize weights}

t=i

figures:

1. fpos, P D F of class 1 samples-*— Class 1 function of Figure 3.4(a)

2. fneg, PDF of class 0 s a m p l e s ^ Class 0 function of Figure 3.4(a)

3. Cpos, CDF of class 1 samples*— Class 1 function of Figure 3.4(b)

Algorithm 2 AdaBoost X f- Samples Y f- Classification N f- Number of samples D f- Number of dimensions T f- Number of rounds of boosting WI (n) f- iJ , where n = 1 ... N {Initializing weight distribution of samples} for t = 1 ... T do

ht f- Train weak learner using weight distribution Wt N

Et f- L Wt(n)[Yn =1= ht(xn)]{Calculate weighted error rate} n=1

a f- l.in I- ft. t 2 f t.

WH1 (n) f- Wt(n )eCl: tYn h t(xn) {Calculate new weight distribution} Wt+1(n) f- N W t+ l(n) {Normalize weights}

L Wt+1(n) n = 1

end for T

Final boosted classifier H (x) f- sign(L atht(x)) t=l

22

based on the probability distribution functions of membrane and nonmembrane

classes over the feature values without making any underlying assumption about

the distribution of the feature . This gives the stump of best accuracy compared to

the ones built using other metrics like information gain. The AdaBoost mechanism

along with the decision stump classifier acts as a feature selection mechanism [25].

Algorithm 3 shown the algorithm for building the decision stump with maximum

classification performance. The algorithm is modified to get maximum performance

by vectorizing t he operations. This algorithm also takes care of maximizing the

margin for the decision boundary.

The various functions specified in the algorithm are specified in the following

figures:

1. jpos, PDF of class 1 samplesf- Class 1 function of Figure 3.4(a)

2. fneg, PDF of class 0 samplesf- Class 0 function of Figure 3.4(a)

3. cpos, CDF of class 1 samplesf- Class 1 function of Figure 3.4(b)

23

A l g o r i t h m 3 Decision stump learning (Vectorized) X *— Samples Y <— Classification N *— Number of samples D *— Number of dimensions W «— Weight distribution of samples for d = 1 . . . D d o

[X^ s o r t e d , sortlndices] <— sort(Xa){Sort at tr ibute values of one dimension} /pos PDF of sample of class 1 weighted by distribution W fneg <— PDF of sample of class 0 weighted by distribution W Cpos *— CDF of sample of class 1 weighted by distribution W cneg <— CDF of sample of class 0 weighted by distribution W ^pos * Cpos ineg * TYICLX (Cneg) Cneg for t = Every value of X^ d o

Accuracyi(t) <— ipos(t) + [maa:(«n e 9) — i n e f f ( t )]{Accuracy of Decision stump 1 at all values of t} Accuracy 2{t) *— ineg(t) + [max(ipo8) — ipos{t)]{Accuracy of Decision stump 2 at all values of t}

e n d for Accuracy(d) <— max(max(Accuracyi),max(Accuracy2)) Threshold(d) <— Threshold corresponding to Accuracy(d) I inequality (d) <— Inequality corresponding to Accuracy(d)

e n d for Accuracy.max *— max (Accuracy) ^Threshold ^~ Threshold corresponding to Accuracymax

hAttribute <— Attribute d corresponding to Accuracymax

^equality <- Inequality corresponding to A c e u r a q / m a x

4. c n e f l , CDF of class 0 samples*— Class 0 function of Figure 3.4(b)

5. ipos, CDF of class 1 samples normalized over the entire sample set*— Class 1

function of Figure 3.4(c)

6. i n e g , CDF of class 0 samples normalized over the entire sample set*— Class 0


7. Accuracy \, Accuracy of decision stump where the inequality is attr ibute

valuegethreshold *— Accuracy 1 function of Figure 3.4(d)

Algorithm 3 Decision stump learning (Vectorized) X i- Samples Y i- Classification N i- Number of samples D i- Number of dimensions Wi-Weight distribution of samples for d = 1 .. . D do

[XdsQI·ted, sartI ndices] i- sort(Xd){Sort attribute values of one dimension} Ipos i- PDF of sample of class 1 weighted by distribution W j~,eg i- PDF of sample of class 0 weighted by distribution W cpos i- CDF of sample of class 1 weighted by distribution W cneg i- CDF of sample of class 0 weighted by distribution W 2pos i- cpos ineg i- max ( cneg) - cneg for t = Every value of X d do

23

AccuracYl(t) i- ipos(t) + [max(ineg) - ineg(t)]{Accuracy of Decision stump 1 at all values of t } AccuracY2(t ) i- ineg(t) + [max(ipos) - ipos(t)]{Accuracy of Decision stump 2 at all values of t }

end for Accuracy(d) i- max(max(AccuracYl) , max (AccuracY2)) Threshold(d) i- Threshold corresponding to Accuracy(d) I nequality( d) i- Inequality corresponding to Accuracy ( d)

end for AccuracYma3; i- max (Accuracy) hTlu'eshold i- Threshold corresponding to AccuracYmax hAttl';,/J'/lle i- Attribute d corresponding to AccuracYmax hinequality i- Inequality corresponding to AccuracYmax

4, cneg' CDF of class 0 samplesi- Class 0 function of Figure 3.4(b)

5, ipos , CDF of class 1 samples normalized over the entire sample seti- Class 1


6. ineg, CDF of class 0 samples normalized over t he ent ire sample seti- Class 0

function of Figure 3.4( c)

7. AccuracYl , Accuracy of decision stump where the inequality is attribute

valuegethreshold i- Accuracy 1 function of Figure 3.4(d)

24

LL Q Q-

Q

C l a s s 0 C l a s s 1

1 0

S 5

0 1 0 S a m p l e v a l u e

2 0 Q 0 1 0

S a m p l e v a l u e

C l a s s 0 C l a s s 1

2 0

(a)

0 _3

> C o

^3 o c 3

(c)

Probability distribution of the two classes (b) Cumulative Probability distribution of the two classes

2 0

0 . 5

Q

F u n c t i o n 1 F u n c t i o n 2 o

(0 3 1 0 o o

<


2 0 Q

A c c u r a c y 1 A c c u r a c y 2


2 0

Intermediate functions in the algorithm (d) Accuracies of the two stumps at the attribute value

F i g u r e 3.4. Plot of different functions in the decision stump learning algorithm

8. Accuracy2, Accuracy of decision stump where the inequality is at tr ibute

value/tthreshold <— Accuracy 2 function of Figure 3.4(d)

3 . 3 S y n a p s e d e t e c t i o n

3.3.1 F e a t u r e s

The features used in this setup are properties of regions of interest extracted

from the enhanced image. The enhanced image is thresholded and the thresholded

regions are used as masks to extract regions of interest. Due to the huge size of

the image, calculation of features for each and every pixel of the image would be

very time consuming in both the training and testing phases. Thus the image is

4~--~--~r=====~

LL 02 a.

0'--o

Class 0 Class 1

10 Sample value

20

1 Or------~·

LL o 5 o

Class 0 Class 1

10 20 Sample value

24

(a) P robabili ty distribution of the two classes (b) Cumulative Probability distribut ion ofthe two classes

1 '---~r=======~ (1) :::J nJ > c: o 0.5 ... (.) c: :::J

LL 0-o

Function 1 Function 2

10 20 Sample value

~ (.) nJ :s 10 (.) (.)

« Accuracy 1 Accuracy 2 10 20

Sample value (c) Intermediate functions in the algorithm (d) Accuracies of the two stumps at the at

tribute value

Figure 3.4. Plot of different functions in t he decision stump learning algorithm

8. AccuracY2, Accuracy of decision stump where the inequality is attribute

valueltthreshold <- Accuracy 2 function of Figure 3.4( d)

3.3 Synapse detection

3.3.1 Features

The features used in this setup are propert ies of regions of interest extracted

from the enhanced image. The enhanced image is thresholded and the thresholded

regions are used as masks to extract regions of interest . Due to the huge size of

t he image, calculation of features for each and every pixel of the image would be

very t ime consuming in both the training and testing phases. Thus the image is

25

first down sampled so that we have a reasonable training dataset size. Even then,

the number of data points was huge. By visual examination, it was seen that the

synapses are darker structures. Thus an optimum grey threshold value is learned

such that the thresholded regions are around the synapse location or its vicinity.

For all the connected component regions extracted from the thresholded region, the

following features are calculated:

1. 7 Rotation, translation and scale invariant moments

2. 30 bin cumulative histogram bin values

3. Area of the region

These features are explained in detail in the following sections.

3.3.2 Scale a n d r o t a t i o n inva r i an t m o m e n t s

The raw moment of any discrete region is given by the following equation

My = £ 5 > y / ( * , y ) (3.2) x y

In our case since we are calculating moments for regions of various sizes. We

normalize them by dividing any moment by the following value.

x y

We chose the centroid of the region as the point around with the moments are

calculated. The centroid of the region is given by the following equation

{ic, y} = { M 1 0 / M 0 0 , MQI/MOO} (3.4)

To introduce translation invariance, all the moments are calculated around the

centroid of the region as follows

25

first down sampled so that we have a reasonable training dataset size. Even then,

the number of data points was huge . By visual examination, it was seen that the

synapses are darker structures. Thus an optimum grey threshold value is learned

such that the thresholded regions are around the synapse location or its vicinity.

For all the connected component regions extracted from the thresholded region, the

following features are calculated:

1. 7 Rotation, translation and scale invariant moments

2. 30 bin cumulative histogram bin values

3. Area of the region

These features are explained in detail in the following sections.

3.3.2 Scale and rotation invariant moments

The raw moment of any discrete region is given by the following equation

(3.2) x y

In our case since we are calculating moments for regions of various sizes. We

normalize them by dividing any moment by the following value.

LL I(x ,y) (3.3) x y

We chose t he centroid of the region as the point around with the moments are

calculated. The centroid of the region is given by the following equation

{x , ]]} = {MlO/Moo,MoI/Moo} (3.4)

To introduce translation invariance, all the moments are calculated around the

centroid of the region as follows

26

x y We calculate the central moments (up to an order of 3) to determine the seven

scale, rotation moments The central moments are given by the following equations:

Moo — MDO (3.6)

MOI = 0 (3.7)

Mio = 0 (3.8)

Mil = Mn - xM01 = Mn - xM01 (3.9)

Mao = M 2 0 - xMio (3.10)

M02 = M02 - yM01 (3.11)

M21 = M 2 i - 2 z M n - j / M 2 0 + 2 x 2 M 0 i (3.12)

M12 = M12 - 2 y M u - z M 0 2 + 2 £ 2 M 1 0 (3.13)

M3 0 = M 3 0 - 3xM 2 ( ) + 2a : 2 M 1 0 (3.14)

/ i 0 3 = M 0 3 - 3 z M 0 2 + 2 x 2 M 0 i (3.15)

The scale invariant moments rjij can be constructed from the central moments

by dividing by the properly scaled 0th moments as shown below.

m = (3.i6) Moo

The scale invariant seven moments used as input features for the classifier are

given by the following equations. These moments are also called Hu invariant

moments.

26

(3.5) x y

We calculate the central moments (up to an order of 3) to determine the seven

scale, rotation moments The central moments are given by the following equations:

/-loa = Moo

/-lOI = 0

/-lIO = 0

(3.6)

(3.7)

(3.8)

(3.9)

(3.10)

(3. 11)

(3.12)

(3 .13)

(3.14)

(3.15)

The scale invariant moments 'r}'ij can be constructed from the central moments

by dividing by the properly scaled oth moments as shown below.

/-lij 'r}'ij = 1+( i¥)

/-loa

(3.16)

The scale invariant seven moments used as input features for the classifier are

given by the following equations, These moments are also called Hu invariant

moments.

27

h = ??20 + 7]Q2 (3.17)

h = (7720 - V02)2 + (2T7H) 2 (3.18)

^3 = (7/30 - 37712)2 + (3r/2i - 7?o3)2 (3.19)

h = (7/30 + flu)2 + (T?21 + ^7o3)2 (3.20)

h = {mo - 37712X7730 + 7712) [(7730 + 7/12)2 - 3(7721 + 7703)2]

+ (3r/2i - 7703)(mi + 7703)[3(7730 + 7712)2 - (7721 + m)2} (3.21)

h = (mo ~ 77o2)[(7730 + 7712)2 - (7/21 + 7/so)2] + 47711(7730 + 7712)(7721 + 7703) (3.22)

^7 = (37721 - 7703)(7730 + 7712)[(r/30 + 7/12)2 - 3(7721 + 7703)2]

Circular regions of interest are extracted with synapse point as the center. The

Hu invariant features are calculated for these circular regions.

3.3.3 C u m u l a t i v e h i s t o g r a m a n d a r e a f ea tu re s

The grey values of the region are rescaled to a normalized scale between the grey

scale minimum to threshold value. The rescaled range is divided into 30 equally

sized bins and a cumulative histogram is estimated. The 30 values of the 30 bins

are used as additional input features. The entire thresholded region is used for

estimating the bin values rather than using just the circular regions which were

used to calculate the moment features. Since the regions are extracted by masking

with a threshold, the range of grey scale is from zero (minimum grey value) to the

threshold value instead on the entire range of grey values.

The area of the region is the number of pixels in the extracted region.

3.3.4 Classif ier

In this problem we have an unbalanced training dataset, i.e., the number of

examples of synapse regions are far less than the number of examples of nonsynapses

+ (7730 - 37712)(7721 + 7703)13(7730 + 7712)2 - (7/21 + 7703)2] (3.23)

h = rJ20 + rJ02

12 = (rJ20 - rJ02)2 + (2rJll)2

13 = (rJ30 - 3rJ12)2 + (3rJ21 - 1703)2

14 = (rJ30 + rJ12)2 + (rJ21 + rJ03)2

h (rJ30 - 3rJ12) (rJ30 + rJ12) [(rJ30 + rJ12) 2 - 3(rJ21 + rJ03)2]

27

(3.17)

(3.18)

(3.19)

(3.20)

+ (31721 - rJ03)( rJ21 + rJ03) [3 (rJ30 + rJ12)2 - (rJ21 + rJ03)2] (3 .21)

h = (rJ20 - rJ02) [( rJ30 + rJ12)2 - (rJ21 + rJ30)2] + 4rJll(rJ30 + rJ12) (rJ21 + rJ03) (3.22)

17 (3rJ21 - rJ03)(rJ30 + rJ12) [(rJ30 + rJ12) 2 - 3(rJ21 + rJ03) 2]

+ (rJ30 - 3rJ12)(rJ21 + rJ03)[3(rJ30 + rJ12)2 - (rJ21 + rJ03)2] (3.23)

Circular regions of interest are extracted with synapse point as the center. The

Bu invariant features are calculated for these circular regions.

3.3.3 Cumulative histogram and area features

The grey values of the region are rescaled to a normalized scale between the grey

scale minimum to threshold value. The rescaled range is divided into 30 equally

sized bins and a cumulative histogram is estimated. The 30 values of the 30 bins

are used as additional input features. The entire thresholded region is used for

estimating the bin values rather than using just t he circular regions which were

used to calculate the moment features. Since the regions are extracted by masking

with a threshold , the range of grey scale is from zero (minimum grey value) to the

threshold value instead on the entire range of grey values.

The area of the region is the number of pixels in the extracted region.

3.3.4 Classifier

In this problem we have an unbalanced training dataset, i.e., the number of

examples of synapse regions are far less than the number of examples of nonsynapses

28

regions. In the testing phase also we observe that detection of synapses is like

finding a needle in a haystack. Thus we use a cascading architecture that works

well with other kinds of such rare event detection problem, such as face detection

in photographs [24]. In this architecture, results of several high prediction rate

classifiers are cascaded. The output of the the final classifier of the cascade gives

the prediction for synapses with few false negatives. The architecture is shown

in Figure 3.5. As shown in the figure, every classifier is an ensemble of weighted

decision stumps. The final boosted classifiers prediction is based on equation 3.24.

N Class = sgn(y^ wnHn(x) — threshold) (3.24)

n=0

AdaBoost + Decision Stump, adjusted threshold for very high prediction rate

Negatives

Negatives

Classifies

Positives

Negatives Positives

F i g u r e 3.5. Cascade architecture

28

regions. In the testing phase also we observe t hat detection of synapses is like

finding a needle in a haystack. Thus we use a cascading architecture that works

well with other kinds of such rare event detection problem, such as face detection

in photographs [24]. In this architecture, results of several high prediction rate

classifiers are cascaded. The output of the the final classifier of the cascade gives

the predict ion for synapses with few false negatives. The architecture is shown

in Figure 3.5. As shown in the figure, every classifier is an ensemble of weighted

decision stumps. The final boosted classifiers prediction is based on equation 3.24.

N

Class = sgn(L wnHn(x) - threshold) n=O

Figure 3.5. Cascade architecture

Classifier =

AdaBoost + Decision Stump, adjusted threshold for very high prediction rate

(3.24)

29

From equation 3.24 we can infer that the classification of a sample by the

ensemble can be varied by adjusting the threshold. This property is used at

every node of the ensemble, such that by varying the threshold for the ensemble,

we improve the prediction rate (approximately 100%). For the first stage of the

cascade, all the positive examples and an equal number of negative examples are

chosen as the training set. The ensemble is trained to a point where adding more

weak classifier does not improve the prediction rate of the ensemble. Then the

threshold of the ensemble is varied such that ensemble has a very high prediction

rate. After adjusting the threshold, the ensemble will predict a few samples as

nonsynapses. These samples will not be used for training in the following levels

of cascade. At the next stage of the cascade, we take all the samples predicted as

synapses in the previous stage and add a set negative samples such that the training

set is balanced. The classifier is trained just like the previous stage. The cascade

is trained until we run out of training examples or the rate of rejection negative

examples drops down.

3.3.5 O r i e n t a t i o n e s t i m a t i o n

Once the synapses regions are detected by the cascaded classifier, we estimate

the orientation of the synapses so that the predicted synapses can be aesthetically

viewed on the markup viewer. The original image is down sampled to 25% of its

size and blurred using perona-malik smoothening. This takes care that the blurring

occurs within the synapse alone and does not blur the entire area. A circular region

around the synapse is extracted and thresholded at the median grey value. From

this thresholded image, the orientation of the synapse is calculated as follows.

The orientation of principal axis of the binary image patch can be calculated

from the covariance matrix constructed from the 2nd order central moments as

follows.

The covariance matrix is given by

cov[I(x,y)] =

where

M20 M11 M11 M02

(3.25)

29

From equation 3.24 we can infer that the classification of a sample by the

ensemble can be varied by adjusting the threshold. This property is used at

every node of the ensemble, such that by varying the threshold for t he ensemble,

we improve the prediction rate (approximately 100%). For the first stage of the

cascade, all t he positive examples and an equal number of negative examples are

chosen as the training set. The ensemble is trained to a point where adding more

weak classifier does not improve the prediction rate of the ensemble. Then the

threshold of the ensemble is varied such that ensemble has a very high prediction

rate . After adjusting t he threshold, the ensemble will predict a few samples as

nonsynapses. These samples will not be used for training in the following levels

of cascade. At the next stage of the cascade, we take all t he samples predicted as

synapses in the previous stage and add a set negative samples such that the training

set is balanced . The classifier is trained just like the previous stage. The cascade

is trained until we run out of training examples or the rate of rejection negative

examples drops down.

3 .3 .5 Orientation estimation

Once the synapses regions are detected by the cascaded classifier , we estimate

the orientation of the synapses so that the predicted synapses can be aesthetically

viewed on the markup viewer. The original image is down sampled to 25% of its

size and blurred using perona-malik smoothening. This takes care that the blurring

occurs within the synapse alone and does not blur the entire area. A circular region

around the synapse is extracted and thresholded at t he median grey value. From

this thresholded image, the orientation of the synapse is calculated as follows.

The orientation of principal axis of the binary image patch can be calculated

from the covariance matrix constructed from the 2nd order central moments as

follows.

The covariance matrix is given by

(3.25)

where

30

1 M20 M 2 0

M20 —

Moo ~ M 0 0

/ M02 M 0 2

Mo2 —

Moo ~ M 0 0

Mil = Mn M n

Mil = Moo ~ Moo

(3.26)

f (3-27)

^ (3.28)

The orientation of the image patch is given by the angle of the principal eigen

vector. The angle of the principal eigenvector is given by

9 = \arctan{ ^ f ) (3.29) 2 M20 ~~ M02

The orientation of the principal axis corresponds to the orientation of the

membrane. Thus the orientation of the synapses corresponds to the the minor

eigenvector which is orthogonal to the principal eigenvector on the 2D plane. Thus

the orientation of the synapses is given by,

Bsynapse = 6 ± 90° (3.30)

30

, J-l 20 11120 - 2 J-l 20 = - = --- X

J-loo Moo (3.26)

, J-l02 11102 - 2 J-l02 = - = -- - Y

J-loo Moo (3.27)

, J-lll Mll -- (3 ?8) J-lll = - = - - xy .~

J-loo Moo

The orientation of the image patch is given by the angle of the principal eigen-

vector. The angle of the principal eigenvector is given by

1 2J-l' 8 = -arctan( , II , )

2 J-l 20 - J-l02 (3.29)

The orientation of the principal axis corresponds to the orientation of the

membrane. Thus the orientation of the synapses corresponds to the the minor

eigenvector which is orthogonal to the principal eigenvector on the 2D plane. Thus

the orientation of the synapses is given by,

8 synapse = 8 ± 90 0 (3.30)

C H A P T E R 4

R E S U L T S

This chapter presents a detailed evaluation of the proposed methods of cell

membrane detection and synapse detection on a C. elegans worm dataset and a

rabbit retina dataset, respectively.

4 . 1 C e l l m e m b r a n e d e t e c t i o n

The proposed method for cell membrane detection was tested on a C. elegans

dataset. The entire volume is made of 149 slices of 662x697 gray scale images. Out

of this stack, 5 image slices where chosen at random from the first 50 slices and the

accuracy of the method was assessed using 5-fold cross-validation. In each case, the

training was done using four of the five images and tested on the image that was left

out of training. The ratio of membrane/nonmembrane pixels is unbalanced in the

order of 1:10 and thus affects the performance of the classifier. The classifier trained

with a balanced dataset (1:1 ratio) had the best accuracy compared to classifiers

trained with various ratios of positive (membrane) and negative (nonmembrane)

samples, with results shown for this case. The negative samples were chosen at

random.

The feature vectors were generated as described in Chapter 3, with a 7 x 7

neighborhood and Gaussian standard deviation a = 5. At any location, these

parameters yielded 100 features (25 points in the neighborhood x 4 features for

every pixel). Initially, the decision stumps were boosted for 10000 rounds and the

area under the ROC (averaged over the 5 folds) computed after each round. We

can observe from Figure 4.1 that the area under the ROC curve flattens out after

around 3000 rounds of boosting. The corresponding ROCs are shown in Figure 4.2,

and the test images results in Figure 4.3, 4.4, 4.5, 4.6, 4.7. Table 4.1 shows the

CHAPTER 4

RESULTS

This chapter presents a detailed evaluation of the proposed methods of cell

membrane detection and synapse detection on a C. elegans worm dataset and a

rabbit retina dataset, respectively.

4.1 Cell membrane detection

The proposed method for cell membrane detection was tested on a C. elegans

dataset. The entire volume is made of 149 slices of 662 x 697 gray scale images . Out

of this stack, 5 image slices where chosen at random from the first 50 slices and the

accuracy of the method was assessed using 5-fold cross-validation. In each case, the

training was done using four of the five images and tested on the image that was left

out of training. The ratio of membrane/nonmembrane pixels is unbalanced in the

order of 1: 10 and thus affects the performance of the classifier. The classifier trained

with a balanced dataset (1: 1 ratio) had the best accuracy compared to classifiers

trained with various ratios of positive (membrane) and negative (nonmembrane)

samples, with results shown for this case. The negative samples were chosen at

random.

The feature vectors were generated as described in Chapter 3, with a 7 x 7

neighborhood and Gaussian standard deviation (J = 5. At any location , these

parameters yielded 100 features (25 points in the neighborhood x 4 features for

every pixel). Initially, the decision stumps were boosted for 10000 rounds and the

area under the ROC (averaged over the 5 folds) computed after each round. We

can observe from Figure 4.1 that the area under the ROC curve flattens out after

around 3000 rounds of boosting. The corresponding ROCs are shown in Figure 4.2 ,

and the test images results in Figure 4.3, 4.4, 4.5, 4.6, 4.7. Table 4.1 shows the

32

0.94

0.92 -

10° 101 102 103 104

Boosting Rounds

F i g u r e 4 . 1 . Semilog plot of number of boosting rounds versus the area under the ROC curve for that boosting round.

sample statistics (average of all folds) used in the experiments. At the knee of the

testing ROC curve, the false positive rate = 0.15 (60000 pixels) and true positive

rate = 0.85 (31000). The high false positive rate is because the pixels neighboring

the membranes and pixels of vescicles are classified as membrane pixels. Figure 4.2

clearly shows that the use of neighborhood context combined with the proposed

feature set yields significantly better results than thresholding of the diffusion filter

image [14]. Moreover, comparing with the results without context information

underlines the importance of using neighborhood for membrane detection.

32

0.94

0.92

0.9

Q)

2: 0.88 :::J u 0 0 0.86 0::: ~

Q) "0 0.84 c :::J

ro Q)

0.82 ~ «

0.8

0.78

0.76 10° 10

1 10

2 10

3 10

4

Boosting Rounds

Figure 4. 1. Semilog plot of number of boosting rounds versus the area under the ROC curve for that boosting round.

sample statistics (average of all folds) used in the experiments. At the knee of the

testing ROC curve, the false positive rate = 0.15 (60000 pixels) and true positive

rate = 0.85 (31000) . The high false positive rate is because the pixels neighboring

the membranes and' pixels of vescicles are classified as membrane pixels. Figure 4.2

clearly shows that the use of neighbor hood context combined wi th the proposed

feature set yields significantly better results than thresholding of the diffusion fi lter

image [14]. Moreover , comparing with the results without context information

underlines the importance of using neighborhood for membrane detection.

33

T a b l e 4 . 1 . Statistics of samples in various experiments (average of all folds) Mode Total Samples Number of

Positives Number of Negatives

Training 296380 148190 148190 (randomly chosen from 408320 pixels)

Testing 461414 (all pixels in the test image)

37047 408320 (approx. 10 times number of positives

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 False Positive rate

F i g u r e 4.2. ROC curves of the classifiers trained with AdaBoost at boosting round 3000. For comparison, the ROC for the method by Jurrus et al. [14] is also shown.

33

Table 4.1. Statistics of samples in various experiments (average of all folds) Mode Total Samples Number of Number of Negatives

Positives Training 296380 148190 148190 (randomly chosen from

408320 pixels) Testing 461414 (all pix- 37047 408320 (approx. 10 times number

els in the test of positives image)

0.9

0.8

0.7

Q) -CO 0.6 L-

Q) > :;::::; . iii 0.5 0

0... Q)

0.4 :J L-

~

0.3

0.2 -Training

- - - Testing

0.1 -e- Testing without neighborhood

-- Jurrus 2008

o 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

False Positive rate

Figure 4.2 . ROC curves of the classifiers trained with AdaBoost at boosting round 3000. For comparison, the ROC for t he method by Jurrus et al. [14] is also shown.

34

F i g u r e 4.4. Membrane detection results of the fold-2 test images in the 5-fold cross-validation: original images (left), and detected membranes (right).

34

Figure 4.3 . Membrane detection results of the fold-l test images in the 5-fold cross-validation: original images (left), and detected membranes (right).

Figure 4.4. Membrane detection results of the fold-2 test images in the 5-fold cross-validation: original images (left), and detected membranes (right).

35


35

Figure 4.5. Membrane detection results of the fold-3 test images in the 5-fold cross-validation: original images (left) , and detected membranes (right).

~

III b.O C'd S

1--4 ..., f (/J

~ I.

Figure 4.6. Membrane detection results of the fold-4 test images in the 5-fold cross-validation: original images (left) , and detected membranes (right).

36


36

Figure 4 .7. Membrane detection results of the fold-5 test images in the 5-fold cross-validation: original images (left) , and detected membranes (right).

37

4 . 2 S y n a p s e d e t e c t i o n

The proposed method for synapse detection was tested on a single slice cross-

section of a rabbit retina dataset. The image is scanned under a 5000x magnifi

cation. One pixel in the digital image represents 2.18 nm at this magnification.

The full resolution image was 16720 x 16750 pixel sized grey scale image. The

image was first down sampled four times, to a size of 4180 x 4188 pixels. The

accuracy of the proposed method was assessed using four-fold cross-validation. The

image was divided into four quadrants and in each fold, the training was done on

3 quadrants and tested on the third quadrant. In the entire image, the ratio of

synapse regions/nonsynapses region was in the order of 1:100.

The initial problem was the reduction in the number of testing samples in a

image so as to reduce the testing time. The masking of thresholded regions provided

the initial dataset reduction. The next problem was to choose a representative point

around which the region attributes will be calculated. Many ways of choosing the

representative point were tried. A few significant ones are listed below:

1. The centroid of the binary thresholded region was chosen.

2. The weighted centroid was chosen, where the weight of individual pixel was

directly proportional to the darkness of the pixels in the CLAHE image.

3. SIFT key points, which are representative of corners and peaks were calculated

for the regions of interest. Based on darkness of each key point in a particular

region, a representative point was chosen.

4. The SIFT key points were allowed to converge towards the darkest pixels in

the region and the converged point was chosen.

In all the above methods, only a lesser percentage of the representative points

was near the actual marked up synapses, thus, the region around the synapses was

not even in the test dataset. For the few representative points that were near the

synapses, few were classified as nonsynapses regions since the features calculated

did not have the required separability in this dataset. Because of the above reasons,

37

4.2 Synapse detection

The proposed method for synapse detection was tested on a single slice cross

section of a rabbit retina dataset. The image is scanned under a 5000x magnifi

cation. One pixel in the digital image represents 2.18 nm at this magnification.

The full resolution image was 16720 x 16750 pixel sized grey scale image. The

image was first down sampled four times, to a size of 4180 x 4188 pixels. The

accuracy of the proposed method was assessed using four-fold cross-validation. The

image was divided into four quadrants and in each fold, the training was done on

3 quadrants and tested on the third quadrant. In the ent ire image, the ratio of

synapse regions/nonsynapses region was in the order of 1:100.

The initial problem was the reduction in the number of testing samples in a

image so as to reduce the testing time. The masking of thresholded regions provided

the initial dataset reduction. The next problem was to choose a representative point

around which the region attributes will be calculated. Many ways of choosing the

representative point were tried. A few significant ones are listed below:

1. The centroid of the binary thresholded region was chosen.

2. The weighted centroid was chosen, where the weight of individual pixel was

directly proportional to the darkness of the pixels in the CLARE image .

3. SIFT key points, which are representative of corners and peaks were calculated

for the regions of interest. Based on darkness of each key point in a particular

region, a representative point was chosen.

4. The SIFT key points were allowed to converge towards the darkest pixels in

the region and the converged point was chosen.

In all the above methods, only a lesser percentage of the representative points

was near t he actual marked up synapses, thus, the region around the synapses was

not even in the test dataset. For the few representative points that were near the

synapses, few were classified as nonsynapses regions since the features calculated

did not have the required separability in this dataset. Because of the above reasons,

38

the classifier either had unreliable detection rate necessitating more user guidance

to identify the synapses.

38

the classifier either had unreliable detection rate necessitating more user guidance

to identify the synapses.

C H A P T E R 5

C O N C L U S I O N

This chapter discusses the contributions of this thesis and proposes future

research directions.

5 . 1 C o n t r i b u t i o n

The proposed method utilizes neighborhood context information to improve the

accuracy of membrane detection. Along with the nonlinear discrimination ability

of the AdaBoost classifier and the Hessian feature set, this results in improved

membrane detection compared to previous methods. Thus one can expect a more

robust segmentation of the individual neurons.

5 . 2 F u t u r e w o r k

Even though the classifier does good work in classification of the membranes,

the classifier fails to discern certain structures like vesicles from membranes, which

may result in over-segmentation of individual neurons. Utilizing additional fea

tures that discriminate these regions from membranes may prevent these false

positives. Moreover, recent work suggests that cascading the classifier predictions

and additional feature set onto another classifier may help connect discontinuities in

membranes and thereby avoid under segmentation [22]. Future work would address

these problems in membrane detection to improve the segmentation accuracy of the

individual neurons. The segmentation of one slice of the volume could be used in

a more robust segmentation of successive slices where membranes are weak. As far

as the problem of detection of synapses, new features that can quantify the shape

of the region have to be developed. The feature has to account for the variablity of

the synapse shapes because of their types.

CHAPTER 5

CONCLUSION

This chapter discusses the contributions of this thesis and proposes future

research directions.

5.1 Contribution

The proposed method utilizes neighborhood context information to improve the

accuracy of membrane detection. Along with the nonlinear discrimination ability

of the AdaBoost classifier and the Hessian feature set, this results in improved

membrane detection compared to previous methods. Thus one can expect a more

robust segmentation of the individual neurons.

5.2 Future work

Even though the classifier does good work in classification of the membranes,

the classifier fails to discern certain structures like vesicles from membranes, which

may result in over-segmentation of individual neurons. Utilizing additional fea

tures that discriminate these regions from membranes may prevent these false

positives. Moreover, recent work suggests that cascading the classifier predictions

and additional feature set onto another classifier may help connect discontinuities in

membranes and thereby avoid under segmentation [22]. Future work would address

these problems in membrane detection to improve the segmentation accuracy of the

individual neurons. The segmentation of one slice of the volume could be used in

a more robust segmentation of successive slices where membranes are weak. As far

as the problem of detection of synapses, new features that can quantify the shape

of the region have to be developed. The feature has to account for the variablity of

the synapse shapes because of their types.

A P P E N D I X A

A S S E M B L I N G L A R G E M O S A I C S O F

E L E C T R O N M I C R O S C O P E I M A G E S

U S I N G G P U S

Novel imaging techniques are being used to map the connectivity of individual

neurons in large neuronal tissue sections, to understand the neural circuitry of the

retina, and particularly how signals are communicated across processes. Extensive

studies have been undertaken using electron microscopy to create detailed diagrams

of general neuronal structures [8] and their connectivities [1, 26, 4]. The entire

volume of neuronal tissue is scanned as ultra thin slices (approx. 70nm) sliced using

a micro tome. The thin slices are assembled together to reconstruct the volume.

The neuronal tissue has to be scanned at very high resolutions (around 2nm/pixel)

to unambiguously identify the neurons and synapses in the scanned volume and

create detailed maps. The serial-section Transmission Electron Microscope (TEM)

is the preferred imaging modality for capturing large sections of neuronal tissues at

this magnification level. The section to be scanned spans a few millimeters. Rarely

do we find an electron microscope that can capture such a wide field of view and

at the required nanoscale resolution. Thus the sample of interest is imaged as a

sequence of tiles with some overlap. Figure A.l shows sample neuronal tissue image

tiles of mice scanned using TEM.

The imaging of these tiles using TEM requires the sample to be suspended over a

beam of electrons. The passage of high energy electron beams through the specimen

causes it to heat up and subsequently distort. The distortion is not uniform among

tiles and thus has to be unwarped individually. Thus, reconstructing the image

from the set of tiles, called image mosaicing, involves significant computation to

APPENDIX A

ASSEMBLING LARGE MOSAICS OF

ELECTRON MICROSCOPE IMAGES

USING GPUS

Novel imaging techniques are being used to map the connectivity of individual

neurons in large neuronal tissue sections , to understand the neural circuitry of the

retina, and particularly how signals are communicated across processes. Extensive

studies have been undertaken using electron microscopy to create detailed diagrams

of general neuronal structures [8] and their connectivities [1 , 26 , 4]. The entire

volume of neuronal tissue is scanned as ultra thin slices (approx. 70nm) sliced using

a micro tome. The thin slices are assembled together to reconstruct the volume.

The neuronal tissue has to be scanned at very high resolutions (around 2nm/ pixel)

to unambiguously identify the neurons and synapses in the scanned volume and

create detailed maps. The serial-section Transmission Electron Microscope (TEM)

is t he preferred imaging modality for capturing large sections of neuronal tissues at

this magnification level. The section to be scanned spans a few millimeters. Rarely

do we find an electron microscope that can capture such a wide field of view and

at the required nanoscale resolution. Thus the sample of interest is imaged as a

sequence of tiles with some overlap. Figure A.l shows sample neuronal tissue image

tiles of mice scanned using TEM.

The imaging of these tiles using TEM requires the sample to be suspended over a

beam of electrons. The passage of high energy electron beams through the specimen

causes it to heat up and subsequently distort. The distortion is not uniform among

tiles and thus has to be unwarped individually. Thus, reconstructing the image

from the set of tiles, called image mosaicing, involves significant computation to

11

(a) Sample neuronal image tile scanned using (b) Triangle mesh over a regular grid of con-an Transmission Electron Microscope trol points

F i g u r e A . l . Mice neuronal image tile slice and triangle mesh overlay

identify and handle overlapping portions of the image, and correct nonuniform

distortions over a very large number of tiles. A typical section of neuronal tissue

is 2500 microns in diameter and is scanned as 1000 tiles of 4080x4080 pixels

each. Currently, researchers assemble the volume from the scanned tiles using a

multithreaded tool chain [15], but this computation is one of the bottlenecks in

the critical path to reconstruct the volume since it is estimated to take around

90 days to assemble a volume made of 270 mosaics with each mosaic made of

approximately 1000 tiles. In this appendix, we describe our experiences using

GPUs to accelerate this computation. Because of the inherent parallelism of the

computation, the roughly identical computation at each pixel, and the data locality

across neighboring tiles, our initial observation was that this computation ought to

achieve high speedup on a GPU if we can effectively manage the streaming of

data. In the current method [15], every image tile contributing to a region of

mosaic is unwarped to calculate the value of the pixels. The warping is modeled

as a discontinuous transform. Every tile is sampled as a uniform triangle mesh

as shown in Figure A.l . The vertices of the triangles in the mesh are control

points whose positions are known in tile space and mosaic space. The location

41

(a) Sample neuronal image t ile scanned using (b) Tr iangle mesh over a regular grid of con-an Transmission Electron Microscope trol points

Figure A.!. Mice neuronal image tile slice and triangle mesh overlay

identify and handle overlapping portions of the image, and correct nonuniform

distortions over a very large number of tiles . A typical section of neuronal t issue

is 2500 microns in diameter and is scanned as 1000 tiles of 4080 x 4080 pixels

each. Currently, researchers assemble the volume from the scanned tiles using a

multithreaded tool chain [15], but this computation is one of the bottlenecks in

t he critical path to reconstruct the volume since it is estimated to take around

90 days to assemble a volume made of 270 mosaics with each mosaic made of

approximately 1000 t iles . In this appendix, we describe our experiences using

GPUs to accelerate this computation. Because of the inherent parallelism of t he

computation, the roughly identical computation at each pixel, and the data locality

across neighboring tiles, our init ial observation was t hat this computation ought to

achieve high speedup on a GPU if we can effectively manage the streaming of

data. In the current method [15] , every image tile contributing to a region of

mosaic is unwarped to calculate the value of t he pixels. The warping is modeled

as a discontinuous transform. Every tile is sampled as a uniform triangle mesh

as shown in Figure A.I. The vertices of the triangles in the mesh are control

points whose positions are known in tile space and mosaic space. The location

42

of every point on the mosaic can be mapped to a triangle in one or more tiles

using a barycentric coordinate system. Thus with this position information we can

find the value of every pixel in the mosaic. We have implemented the mosaicing

application in CUDA for the NVIDIA platforms below and compare its performance

with existing sequential and parallel library implementations. A 13783 x 13686 pixel

mosaic of mice neural tissue shown in Figure A.2 was reconstructed from 16 image

tiles using the CUDA implementation.

Salient features of the implementation are:

1. Mosaic is calculated as equally sized tiles.

2. The scanned tiles are stored as textures, and pixel values of the mosaic are

calculated using the texture look up. The textures are stored as unsigned

character images. We get significant performance gain doing nearest neighbor

interpolation because of hardware-accelerated lookups.

F i g u r e A.2 . Sixteen tile mosaic showing 6 of the tiles and position

42

of every point on the mosaic can be mapped to a triangle in one or more tiles

using a barycentric coordinate system. Thus with this position information we can

find the value of every pixel in the mosaic. We have implemented the mosaicing

application in CUDA for the NVIDIA platforms below and compare its performance

with existing sequential and parallel library implementations. A 13783 x 13686 pixel

mosaic of mice neural tissue shown in Figure A.2 was reconstructed from 16 image

tiles using the CUDA implementation.

Salient features of the implementation are:

1. Mosaic is calculated as equally sized tiles .

2. The scanned tiles are stored as textures, and pixel values of the mosaic are

calculated using the texture look up. The textures are stored as unsigned

character images. We get significant performance gain doing nearest neighbor

interpolation because of hardware-accelerated lookups.

Figure A.2. Sixteen tile mosaic showing 6 of the t iles and posit ion

43

3. Smooth blending of tiles is possible at the tile transitions of the mosaic since

the kernel has access to all overlapping tile textures for any point in the

mosaic.

Table A.l compares the performance of the different implementations of the mo-

saicing application used to reconstruct the above test mosaic. We can see tha t the

NVIDIA CUDA based gives 12x speedup compared to the current multithreaded

ITK based CPU application [15] without the use of acceleration data structures.

Tab le A . l . Comparison of classifiers Programming model machine details Time

Elapsed (in seconds)

Speed Up

Single Threaded C Intel Core 2 Quad CPU Q9550 @ 2.83 GHz

2022.3 N/A

OpenMP Multi-threaded (16 threads)

Intel Core 2 Quad CPU Q9550 @ 2.83 GHz

1140.46 1.77x

ITK based and multithreaded ir-assemble

Intel Core 2 Quad CPU Q9550 @ 2.83 GHz

120 16x

NVIDIA CUDA Intel Core 2 Quad CPU Q9550 @ 2.83 GHz

10.8 187.23x

43

3. Smooth blending of tiles is possible at the tile transitions of the mosaic since

the kernel has access to all overlapping tile textures for any point in the

mosaic.

Table A.l compares the performance of the different implementations of the mo

saicing application used to reconstruct the above test mosaic. We can see that the

NVIDIA CUDA based gives 12x speedup compared to the current multithreaded

ITK based CPU application [15] without the use of acceleration data structures.

Table A.!, Comparison of classifiers Programming model machine details Time Speed

Elapsed(in Up seconds)

Single Threaded C Intel Core 2 Quad CPU Q9550 @ 2022.3 N/ A 2.83 GHz

OpenMP Multi-threaded Intel Core 2 Quad CPU Q9550 @ 1140.46 1.77x (16 threads) 2.83 GHz ITK based and multi- Intel Core 2 Quad CPU Q9550 @ 120 16x threaded ir-assemble 2.83 GHz NVIDIA CUDA Intel Core 2 Quad CPU Q9550 @ 10.8 187.23x

2.83 GHz

A P P E N D I X B

S Y N A P S E V I E W E R

A synapse viewer is a cross platform viewer of image data and markups. This

software is built on Nokia Q t / C + + . It works for subsampled datasets. The features

of the software are described below:

• The 2D image data in JPEG, PNG, BMP formats can be browsed in the

image viewer.

• Sea ling: The image viewer has controls to scale down the data and view the

entire image within the window. The scaling can be achieved using the scaling

bar or by clicking the zoom in or zoom out widgets. The image can be scaled

in steps 10% of the original size.

• Scrolling: The image can be scrolled by simple click and drag operation on

the image.

• Synapse markup overlay: The application can read an XML file listing the

position and orientation of synapses. Multiple markup files can be opened

up. This feature is useful when comparing the ground t ruth and predicted

datasets. The viewer has the ability to change the color, transperancy and

visibility of a particular synapse group.

• Membrane markup overlay: The application can show overlay of predicted

membranes. Multiple membrane overlays can be seen simultaneously.

Figure B.l shows a snapshot of the synapse viewer.

APPENDIX B

SYNAPSE VIEWER

A synapse viewer is a cross platform viewer of image data and markups. This

software is built on Nokia QtjC++. It works for subsampled datasets. The features

of the software are described below:

• The 2D image data in JPEG, PNG , BMP formats can be browsed in the

image viewer.

• Scaling:The image viewer has controls to scale down the data and view the

entire image within the window. The scaling can be achieved using the scaling

bar or by clicking the zoom in or zoom out widgets. The image can be scaled

in steps 10% of the original size.

• Scrolling: The image can be scrolled by simple click and drag operation on

the image.

• Synapse markup overl ay: The application can read an XML file listing the

position and orientation of synapses. Multiple markup files can be opened

up. This feature is useful when comparing the ground truth and predicted

datasets. The viewer has the ability to change the color , transperancy and

visibility of a particular synapse group.

• Membrane markup overlay: The application can show overlay of predicted

membranes. Multiple membrane overlays can be seen simultaneously.

Figure B.1 shows a snapshot of the synapse viewer.

45

Synapses \ ie wei File View Tools

F i g u r e B . l . Synapse viewer

45

Eile ~iew !ools

Figure B .1. Synapse viewer

R E F E R E N C E S

J. R, Anderson, B. W. Jones, J.-H. Yang, M. V. Shaw, C. B. Watt , R Ko-shevoy, J. Spaltenstein, E. Jurrus, K. UV, R. T. Whitaker, D. Mastronarde, T. Tasdizen, and R. E. Marc. A computational framework for ultrastructural mapping of neural circuitry. Public Library of Science Biology, 7(3), 2009.

B. E. Boser, I. M. Guyon, and V. N. Vapnik. A training algorithm for optimal margin classifiers. In Proceedings of the Fifth Annual ACM Conference on Computational Learning, pages 144-152. ACM Press, 1992.

L. Breiman. Arcing classifiers. Annals of Statistics, pages 801-849, 1998.

K. L. Briggman and W. Denk. Towards neural circuit reconstruction with volume electron microscopy techniques. Current Opinion in Neurobiology, 16(5):562-570, Oct. 2006.

C. Cortes and V. Vapnik. Support vector networks. In Machine Learning, volume 20, pages 273-297. Kluwer Academic Publishers, 1995.

W. M. Cowan, T. C. Sdhof, C. F. Stevens, and K. Davies. Synapses. JHU Press, illustrated edition, 2003.

H. Drucker and C. Cortes. Boosting decision trees. In Advances in Neural Information Processing Systems 8, pages 479-485, 1996.

J. C. Fiala and K. M. Harris. Extending unbiased stereology of brain ul-trastructure to three-dimensional volumes. Journal of the American Medical Informatics Association, 8(1): 1-16, 2001.

Y. Freund and R. E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. In Proceedings in European Conference on Computational Learning Theory (EuroCOLT), Barcelona, Spain, 1995.

Y. Freund and R. E. Schapire. A short introduction to boosting. Journal of the Japanese Society for Artificial Intelligence, 14:771-780, Sept. 1999.

R. C. Gonzalez and R. E. Woods. Digital Image Processing. Prentice-Hall, 2nd edition, 2002.

T. Hastie, R. Tibshirani, and J. H. Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, illustrated edition, 2001.

REFERENCES

[1] J . R. Anderson, B. W. Jones, J.-H. Yang, M. V. Shaw, C. B. Watt, P. Koshevoy, J . Spaltenstein, E. Jurrus , K. UV, R. T. Whitaker , D. Mastronarde, T. Tasdizen, and R. E. Marc. A computational framework for ultrastructural mapping of neural circuitry. Public Library of Science Biology, 7(3) , 2009.

[2] B. E. Boser, 1. M. Guyon, and V. N. Vapnik. A training algorithm for optimal margin classifiers. In Proceedings of the Fifth Annual ACM Conference on Computational Learning, pages 144- 152. ACM Press , 1992.

[3] L. Breiman. Arcing classifiers. Annals of Statistics, pages 801- 849, 1998.

[4] K. L. Briggman and W. Denk. Towards neural circuit reconstruction with volume electron microscopy techniques. Current Opinion in Neurobiology, 16(5) :562- 570, Oct. 2006.

[5] C. Cortes and V. Vapnik. Support vector networks. In Machine Learning, volume 20, pages 273- 297. Kluwer Academic Publishers, 1995.

[6] W. M. Cowan, T. C. Sdhof, C. F . Stevens, and K. Davies. Synapses. JHU Press, illustrated edition, 2003.

[7] H. Drucker and C. Cortes. Boosting decision trees. In Advances in Neural Information Processing Systems 8, pages 479- 485, 1996.

[8] J. C. Fiala and K. M. Harris. Extending unbiased stereology of brain ultrastructure to three-dimensional volumes. Journal of the American Medical Informatics Association, 8(1):1- 16, 2001.

[9] Y. Freund and R. E. Schapire. A decision- theoretic generalization of on- line learning and an application to boosting. In Proceedings in European Conference on Computational Learning Th eory (EuroCOLT) , Barcelona, Spain, 1995.

[10] Y. Freund and R. E. Schapire. A short introduction to boosting. Journal of the Japanese Society for Artificial Intelligence, 14:771- 780, Sept. 1999.

[11] R. C. Gonzalez and R. E. Woods. Digital Image Processing. Prentice-Hall, 2nd edition, 2002.

[12] T. Hastie, R. Tibshirani , and J . H. Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, illustrated edition, 2001.

47

V. Jain, J. Murray, F. Roth, S. Turaga, V. Zhigulin, K. Briggman, M. Helm-staedter, W. Denk, and H. Seung. Supervised learning of image restoration with convolutional networks. In Proceedings of International Conference on Computer Vision(ICCV), pages 1-8, Oct. 2007.

E. Jurrus, R. Whitaker, B. Jones, R. Marc, and T. Tasdizen. An optimal-path approach for neural circuit reconstruction. In Proceedings of International Symposium on Biomedical Imaging (ISBI), pages 1609-1612, 2008.

P. A. Koshevoy, T. Tasdizen, and R. T. Whitaker. Automatic assembly of TEM mosaics and mosaic stacks using phase correlation. SCI Technical Report, April 2007.

I. B. Levitan and L. K. Kaczmarek. The Neuron. Oxford University Press, US, 3rd edition, 2002.

Y. Mishchenko. Automation of 3D reconstruction of neural tissue from large volume of conventional serial section transmission electron micrographs. Journal of N euro science Methods, Sept. 2008.

J. R. Quinlan. C4-5: Programs for Machine Learning. Morgan Kaufmann, 1993.

J. R. Quinlan. Bagging, boosting, and C4.5. In Proceedings of the Thirteenth National Conference on Artificial Intelligence, pages 725-730, 1996.

R. E. Schapire, Y. Freund, P. Bartlett, and W. S. Lee. Boosting the margin: A new explanation for the effectiveness of voting methods. Annals of Statistics, 1998.

T. Tasdizen, R. Whitaker, R. Marc, and B. Jones. Enhancement of cell boundaries in transmission electron microscopy images. In Proceedings in International Conference on Image Processing (ICIP), pages 642-645, 2005.

Z. Tu. Auto-context and its application to high-level vision tasks. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1-8, jun 2008.

V. N. Vapnik. The Nature of Statistical Learning Theory. Springer, 2nd illustrated edition, 2000.

P. Viola and M. Jones. Robust real-time object detection. International Journal of Computer Vision, 2001.

A. R. Webb. Statistical Pattern Recognition. Oxford University Press, 2nd edition, 1999.

J. White, E. Southgate, J. Thomson, and F. Brenner. The structure of the nervous system of the nematode caenorhabditis elegans. Philosophical Transactions of the Royal Society B: Biological Sciences, 314:1-340, 1986.

47

[13] V. Jain, J. Murray, F. Roth, S. Thraga, V. Zhigulin, K. Briggman, M. Helmstaedter, W. Denk, and H. Seung. Supervised learning of image restoration with convolutional networks. In Proceedings of International Conference on Computer Vision{ICCV), pages 1- 8, Oct. 2007.

[14] E. Jurrus, R. Whitaker , B. Jones, R. Marc, and T. Tasdizen. An optimal-path approach for neural circuit reconstruction. In Proceedings of International Symposium on Biomedical Imaging (ISBI), pages 1609- 1612, 2008.

[15] P. A. Koshevoy, T . Tasdizen , and R. T. Whitaker. Automatic assembly of TEM mosaics and mosaic stacks using phase correlation. SCI Technical Report, April 2007.

[16] I. B. Levitan and L. K. Kaczmarek. The Neuron. Oxford University Press, US, 3rd edition, 2002.

[17] Y. Mishchenko. Automation of 3D reconstruction of neural tissue from large volume of conventional serial section transmission electron micrographs. Journal of Neuroscience Methods, Sept . 2008.

[18] J. R. Quinlan. C4 .5: Programs for Machine Learning. Morgan Kaufmann, 1993.

[19] J. R. Quinlan. Bagging, boosting, and C4.5. In Proceedings of the Thirteenth National Conference on Artificial Intelligence, pages 725- 730, 1996.

[20] R. E. Schapire, Y. Freund, P. Bartlett , and W . S. Lee. Boosting the margin: A new explanation for the effectiveness of voting methods . Annals of Statistics, 1998.

[2 1] T. Tasdizen, R. Whitaker , R. Marc, and B. Jones . Enhancement of cell boundaries in transmission electron microscopy images . In Proceedings in International Conference on Image Processing (ICIP) , pages 642- 645, 2005.

[22] Z. Th. Auto-context and its application to high-level vision tasks. In Proceedings of IEEE Conference on Computer Vis ion and Pattern Recognition (CVPR), pages 1- 8, jun 2008.

[23] V. N. Vapnik. The Nature of Statistical Learning Theory. Springer , 2nd illustrated edition, 2000.

[24] P. Viola and M. Jones . Robust real-time object detection. International Journal of Computer Vision, 200l.

[25] A. R. Webb. Statistical Pattern Recognition. Oxford University Press, 2nd edition, 1999.

[26] J. White, E. Southgate, J. Thomson, and F . Brenner. The structure of the nervous system of t he nematode caenorhabditis elegans. Philosophical Transactions of the Royal Society B: Biological Sciences, 314:1- 340, 1986.

48

[27] D. B. Williams and C. B. Carter. Transmission Electron Microscopy: A Textbook for Materials Science. Springer, 2nd edition, 1996.

48

[27] D. B. Williams and C. B. Carter. Transmission Electron Microscopy: A Textbook for Materials Science. Springer , 2nd edit ion, 1996.

Automatic markup of neural cell membranes using boosted ...

Documents

Transcript of Automatic markup of neural cell membranes using boosted ...