Automatic markup of neural cell membranes using boosted ...
Transcript of Automatic markup of neural cell membranes using boosted ...
A U T O M A T I C M A R K U P O F N E U R A L C E L L
M E M B R A N E S U S I N G B O O S T E D D E C I S I O N
S T U M P S
by
Kannan Umadevi Venkataraju
A thesis submitted to the faculty of The University of Utah
in partial fulfillment of the requirements for the degree of
Master of Science
in
Computer Science
School of Computing
The University of Utah
May 2010
AUTOMATIC MARKUP OF NEURAL CELL
MEMBRANES USING BOOSTED DECISION
STUMPS
by
Kannan Umadevi Venkataraju
A thesis submitted to the faculty of The University of Utah
in partial fulfillment of the requirements for the degree of
Master of Science
In
Computer Science
School of Computing
The University of Utah
May 2010
Copyright © K a n n a n Umadevi Venkataraju 2010
All Rights Reserved
Copyright @Kannan Umadevi Venkataraju 2010
All Rights Reserved
THE UNIVERSITY OF UTAH GRADUATE SCHOOL
SUPERVISORY COMMITTEE APPROVAL
of a thesis submitted by
Kannan Umadevi Venkataraju
This thesis has been read by each member of the following supervisory committee and by majority vote has been found to be satisfactory.
Ross Whitake
4bH�Dab?=41
THE UNIVERSITY OF UTAH GRADUATE SCHOOL
FINAL READING APPROVAL
To the Graduate Council of the University of Utah:
I have read the thesis of Kannan Umadevi Venkataraju in its final form and have
found that (1) its format, citations, and bibliographic style are consistent and acceptable; (2) its illustrative materials including figures, tables, and charts are in place; and (3) the final manuscript is satisfactory to the Supervisory Committee and is ready for submission to T he Graduate School.
Date Tolga Tasdizen Chair, Supervisory Committee
Approved for the Major Department
Martin Berzins Chair/Dean
Approved for the Graduate Council
so. cQ. f--.
David S. Chapman Dean of The Graduate School
A B S T R A C T
For neuro-biologists to understand the working of the central nervous system,
they need to reconstruct the underlying neural circuitry. The neural circuit, which
consists of the neuron cells and synapses in a three-dimensional (3D) volume of
tissue are scanned slice-by-slice at very high magnifications using an electron mi
croscope. From the electron microscopy images, the neurons and their connec
tions (synapses) are identified to lay out the connections of the neural circuitry.
One of the necessary tasks in this process is to segment the individual neurons
in the images of the sliced volume. To effectively carry out this segmentation
we need to delineate the cell membranes of the neurons. For this purpose, we
propose a supervised learning approach to detect the cell membranes. The classifier
was trained using decision stumps boosted using AdaBoost, on local and context
features. The features were selected to highlight the curve like characteristics
of cell membranes. It is also shown that using features from context positions
allows for more information to be utilized in the classification. Together with
the nonlinear discrimination ability of the AdaBoost classifier there are clearly
noticeable improvements over previously used methods. We also detail several
experiments conducted for identification of synapse structures in the microscopy
images.
ABSTRACT
For neuro-biologists to understand the working of the central nervous system,
they need to reconstruct the underlying neural circuitry. The neural circuit , which
consists of the neuron cells and synapses in a three-dimensional (3D) volume of
t issue are scanned slice-by-slice at very high magnifications using an electron mi
croscope. From the electron microscopy images, the neurons and their connec
tions(synapses) are identified to layout the connections of the neural circuitry.
One of the necessary tasks in this process is to segment the individual neurons
in the images of the sliced volume. To effectively carry out this segmentation
we need to delineate the cell membranes of the neurons. For this purpose, we
propose a supervised learning approach to detect the cell membranes. The classifier
was trained using decision stumps boosted using AdaBoost, on local and context
features. The features were selected to highlight the curve like characteristics
of cell membranes. It is also shown that using features from context positions
allows for more information to be utilized in the classification. Together with
the nonlinear discrimination ability of the AdaBoost classifier there are clearly
noticeable improvements over previously used methods. Vife also detail several
experiments conducted for identification of synapse structures in the microscopy
images.
C O N T E N T S
A B S T R A C T iv
L I S T O F F I G U R E S vii
L I S T O F T A B L E S ix
A C K N O W L E D G E M E N T S x
C H A P T E R S
1. I N T R O D U C T I O N 1
1.1 Motivation 1 1.1.1 Previous work 2 1.1.2 Our work 2
1.2 Outline 3
2. B A C K G R O U N D 4
2.1 Neural circuitry 4 2.2 Transmission electron microscopy 5
2.2.1 Components and working 6 2.2.2 Challenges 6 2.2.3 Data preparation 6
2.3 Decision stumps 7 2.3.1 Decision stump learning 9 2.3.2 Decision stump classification 9
2.4 AdaBoost 9 2.5 Classifier on artificial datasets 9
2.5.1 Crescent dataset 10 2.5.2 Concentric circles dataset 10 2.5.3 Star dataset 10 2.5.4 Nonaxis aligned dataset 12 2.5.5 Comparison with other classifiers 13
3 . M E T H O D S 17
3.1 Image enhancement 18 3.2 Cell membrane detection 19
3.2.1 Features 19 3.2.2 Classifier 21
3.3 Synapse detection 24
CONTENTS
A B STRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. IV
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. VII
LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . IX
ACKNOWLEDGEMENTS X
CHAPTERS
1. INTRODUCTION 1
1.1 Motivation ............... .... .... . ... . . .. ....... ....... 1 1.1.1 Previous work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 2 1.1.2 Our work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Outline......... .... . . . .... ..... . . ...... . .. . .. .. ....... 3
2. BACKGROUND .. . .... . .. . ... .. . . ..... . .. . .. ............ 4
2.1 Neural circuitry .... .. .................... . . . ... . ..... .. . 2.2 Transmission electron microscopy .. .. . .. ... . ..... . ... . ... .. .
2.2.1 Components and working ....... .. ... . ..... . . ........ . 2.2.2 Challenges.. . . ... . ... .. .. . .... . .. . . . ...... .. . .. . ... 2.2.3 Data preparation . . ................................. .
2.3 Decision stumps .............. . .. . . . . . ... . .. . ........... . 2.3 .1 Decision stump learning ... .. .. .. . .. ...... . ........ . . . 2.3.2 Decision stump classification ... . .... . . . . .. . ..... . ..... .
2.4 AdaBoost. .. . . . .. ... .. . . . . .. . ....... . . . . . ........ . . . . ..
4 5 6 6 6 7 9 9 9
2.5 Classifier on artificial datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 9 2.5.1 Crescent dataset .................................... 10 2.5.2 Concentric circles dataset .. ... . .. .... ..... . ...... ..... 10 2.5.3 Star dataset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.5.4 Nonaxis aligned dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.5.5 Comparison with other classifiers ....................... 13
3. METHODS ..... . ......... .. ............ . .. ........ . .. . . 17
3.1 Image enhancement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 18 3.2 Cell membrane detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 19
3.2.1 Features . . . ... . . . ... . ..... .. . .. .................... 19 3.2.2 Classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 21
3.3 Synapse detection ................. . . . .. . . ..... . .. . . .. . . . 24
3.3.1 Features 24 3.3.2 Scale and rotation invariant moments 25 3.3.3 Cumulative histogram and area features 27 3.3.4 Classifier 27 3.3.5 Orientation estimation 29
4. R E S U L T S 31
4.1 Cell membrane detection 31 4.2 Synapse detection 37
5. C O N C L U S I O N 39
5.1 Contribution 39 5.2 Future work 39
A P P E N D I C E S
A . A S S E M B L I N G L A R G E M O S A I C S O F E L E C T R O N M I C R O S C O P E I M A G E S U S I N G G P U S 40
B . S Y N A P S E V I E W E R 44
R E F E R E N C E S 46
vi
3.3.1 Features.... . .......... . ........ . ... . ... .......... . 24 3.3.2 Scale and rotation invariant moments. . . . . . . . . . . . . . . . . . .. 25 3.3.3 Cumulative histogram and area features . . . . . . . . . . . . . . . . .. 27 3.3.4 Classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.3.5 Orientation estimation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 29
4. RESULTS .. . ..... . ........ .. . . . ..... .. . . . ............... 31
4.1 Cell membrane detection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 31 4.2 Synapse detection . ...... .. ......... . ..... . ......... . .... 37
5. CONCLUSION ................................... . ...... 39
5.1 Contribution .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 39 5.2 Future work. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
APPENDICES
A. ASSEMBLING LARGE MOSAICS OF ELECTRON MICROSCOPE IMAGES USING GPUS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 40
B. SYNAPSE VIEWER ..... . .. . .. . ......................... 44
REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 46
vi
L I S T O F F I G U R E S
2.1 Neural tissue slices of C. elegans worm and rabbit retina 5
2.2 Neural tissue slices of rabbit retina showing synapses 5
2.3 Decision stump 7
2.4 Decision tree 8
2.5 Crescent dataset 11
2.6 Classifier performance at every round 11
2.7 Concentric circles dataset 12
2.8 Classifier performance at every round 13
2.9 Star dataset 14
2.10 Classifier performance at every round 14
2.11 Nonaxis aligned dataset 15
2.12 Classifier performance at every round 16
3.1 Block diagram of the proposed methods in the overall reconstruction pipeline 17
3.2 Comparison between original and CLAHE enhanced image 18
3.3 Stencil neighborhood 21
3.4 Plot of different functions in the decision stump learning algorithm . . 24
3.5 Cascade architecture 28
4.1 Semilog plot of number of boosting rounds versus the area under the ROC curve for that boosting round 32
4.2 ROC curves of the classifiers trained with AdaBoost at boosting round 3000. For comparison, the ROC for the method by Jurrus et al. [14] is also shown 33
4.3 Membrane detection results of the fold-1 test images in the 5-fold cross-validation: original images (left), and detected membranes (right). 34
4.4 Membrane detection results of the fold-2 test images in the 5-fold cross-validation: original images (left), and detected membranes (right). 34
4.5 Membrane detection results of the fold-3 test images in the 5-fold cross-validation: original images (left), and detected membranes (right). 35
LIST OF FIGURES
2.1 eural tissue slices of C. elegans worm and rabbit retina
2.2 Neural tissue slices of rabbit retina showing synapses ..... . . .. . . . .
2.3 Decision stump ... . . ... . ... . .......... .......... .. . . ..... .
2.4 Decision tree ......................................... . .. .
5
5
7
8
2.5 Crescent dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.6 Classifier performance at every round .. . .. . ... ............ . ... 11
2.7 Concentric circles dataset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 12
2.8 Classifier performance at every round .. .. . . .... .. ...... . ..... . 13
2.9 Star dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.10 Classifier performance at every round . . ....... .. ...... .... . ... 14
2.11 Nonaxis aligned dataset. . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . .. 15
2. 12 Classifier performance at every round ..... .. . ................. 16
3.1 Block diagram of the proposed methods in the overall reconstruction pipeline. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.2 Comparison between original and CLARE enhanced image. . . . . . . .. 18
3.3 Stencil neighborhood. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 21
3.4 P lot of different functions in the decision stump learning algorithm .. 24
3.5 Cascade architecture ............................ .. .... . . . . 28
4.1 Semilog plot of number of boosting rounds versus the area under the ROC curve for t hat boosting round. . ......... . .. ............. 32
4.2 ROC curves of the classifiers trained with AdaBoost at boosting round 3000. For comparison , the ROC for the method by Jurrus et al. [14] is also shown. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 33
4.3 Membrane detection results of the fold-l test images in the 5-fold cross-validation: original images (left) , and detected membranes (right) . 34
4.4 Membrane detection results of the fold-2 test image in the 5-fold cross-validation: original images (left ), and detected membranes (right). 34
4.5 Membrane detection results of the fold- 3 test images in the 5-fold cross-validation: original images (left) , and detected membranes (right). 35
4.6 Membrane detection results of the fold-4 test images in the 5-fold cross-validation: original images (left), and detected membranes (right). 35
4.7 Membrane detection results of the fold-5 test images in the 5-fold cross-validation: original images (left), and detected membranes (right). 36
A.l Mice neuronal image tile slice and triangle mesh overlay 41
A.2 Sixteen tile mosaic showing 6 of the tiles and position 42
B.l Synapse viewer 45
viii
4.6 Membrane detection results of the fold-4 test images in t he 5-fold cross-validation: original images (left) and detected membranes (right). 35
4.7 Membrane detection results of the fold-5 test images in t he 5-fold cross-validation: original images (left) and detected membranes (right) . 36
A.l Mice neuronal image tile slice and triangle mesh overlay. . . . . . . . . . . 41
A.2 Sixteen tile mosaic showing 6 of t he t iles and position . . . . . . . . . . . .. 42
B.l Synapse viewer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 45
V lll
L I S T O F T A B L E S
2.1 Crescent dataset properties 10
2.2 Concentric circles dataset properties 12
2.3 Star dataset properties 13
2.4 Nonaxis aligned dataset properties 15
2.5 Comparison of classifiers 16
3.1 Eigenvalue of ellipses of different parameters 20
4.1 Statistics of samples in various experiments (average of all folds) . . . . 33
A.l Comparison of classifiers 43
LIST OF TABLES
2.1 Crescent dataset properties ................................. 10
2.2 Concentric circles dataset properties ............. . . . .. .. ...... 12
2.3 Star dataset properties. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 13
2.4 Nonaxis aligned dataset properties. . . . . . . . . . . . . . . . . . . . . . . . . . .. 15
2.5 Comparison of classifiers ................................... 16
3.1 Eigenvalue of ellipses of different parameters. . . . . . . . . . . . . . . . . . .. 20
4.1 Statistics of samples in various experiments (average of all folds) . . .. 33
A.l Comparison of classifiers .......... ....... .. . .. . ............ 43
A C K N O W L E D G E M E N T S
I am grateful to my advisor Dr. Tolga Tasdizen for being my mentor throughout
my graduate school life. I thank my committee members Dr. Ross T. Whitaker and
Dr. Hal Daume III for their valuable guidance on my work. I thank Dr. Antonio
R. C. Paiva and Elizabeth Jurrus for their collaboration in parts of this work and
support. I thank all fellow CRCNS group members for making my research work
in graduate school successful and thoroughly enjoyable. I would like to thank my
parents and extended family for their unconditional love and support through all
my endeavors. Finally, my special thanks to my friends Abigail M., Maheshwar D.,
Siddharth Shankar and others for all their support.
ACKNOWLEDGEMENTS
I am grateful to my advisor Dr. Tolga Tasdizen for being my mentor throughout
my graduate school life. I thank my committee members Dr. Ross T. Whitaker and
Dr. Hal Daume III for their valuable guidance on my work. I thank Dr. Antonio
R. C. Paiva and Elizabeth Jurrus for their collaboration in parts of this work and
support . I thank all fellow CRCNS group members for making my research work
in graduate school successful and t horoughly enjoyable. I would like to t hank my
parents and extended family for their unconditional love and support through all
my endeavors. Finally, my special thanks to my friends Abigail M., Maheshwar D.,
Siddharth Shankar and others for all their support.
C H A P T E R 1
I N T R O D U C T I O N
This chapter descibes the motivation for automating the detection of membranes
and synapses. Futher, it explains the organization of remaining chapters of this
thesis.
1 .1 M o t i v a t i o n
Neuro-scientists are currently developing new imaging techniques to better un
derstand the complex structure of the central nervous system. In particular, re
searchers are making efforts to map the connectivity of large volumes of individual
neurons in order to understand how signals are communicated across processes.
Mapping the connectivity of large volumes helps in identification of network mo
tifs [1]. The most extensive study undertaken thus far uses electron microscopy
to create detailed diagrams of neuronal structure [8] and connectivity [26, 4]. The
most well-known example of neural circuit reconstruction is of the 302 neurons
in the C. elegans worm. Even though this is one of the simplest organisms with
a nervous system, the manual reconstruction process took 10 years [4]. Human
interpretation of data over large volumes of neural anatomy is so labor intensive
that very little ground t ru th exists. For this reason, image processing and machine
learning algorithms are needed to automate the process and allow analysis of large
datasets by neural circuit reconstruction.
Serial-section transmission electron microscopy (TEM) is the preferred data
acquisition technology for capturing images of large sections of neuronal tissue.
Images from TEM span a wide field of view, capturing processes that may wander
through a specimen and have an in-plane resolution useful for identifying cellular
features such as synapses. These structures are critical in understanding neuron
CHAPTER 1
INTRODUCTION
This chapter descibes the motivation for automating the detection of membranes
and synapses. Futher, it explains the organization of remaining chapters of this
thesis.
1.1 Motivation
Neuro-scientists are currently developing new imaging techniques to better un
derstand the complex structure of the central nervous system. In particular, re
searchers are making efforts to map the connectivity of large volumes of individual
neurons in order to understand how signals are communicated across processes.
Mapping the connectivity of large volumes helps in identification of network mo
tifs [1]. The most extensive study undertaken thus far uses electron microscopy
to create detailed diagrams of neuronal structure [8] and connectivity [26 , 4]. The
most well-known example of neural circuit reconstruction is of the 302 neurons
in the C. elegans worm. Even though this is one of the simplest organisms with
a nervous system, the manual reconstruction process took 10 years [4]. Human
interpretation of data over large volumes of neural anatomy is so labor intensive
that very little ground truth exists . For t his reason, image processing and machine
learning algorithms are needed to automate the process and allow analysis of large
datasets by neural circuit reconstruction.
Serial-section transmission electron microscopy (TEM) is the preferred data
acquisition technology for capturing images of large sections of neuronal tissue.
Images from TEM span a wide field of view, capt uring processes t hat may wander
t hrough a specimen and have an in-plane resolution useful for identifying cellular
features such as synapses. These structures are critical in underst anding neuron
2
activity and function. Images from serial-section TEM are captured by cutting a
section from the specimen and suspending it over an electron beam which passes
through the section creating a projection that is captured as a digital image. See
figure 3.2(a) for an example serial-section TEM image corresponding to a cross-
section of the nematode C. elegans with a resolution of 6nmx6nmx33nm.
1.1.1 P r e v i o u s w o r k
An accurate mapping of neuron features begins with the segmentation of the
neuron boundaries. Jurrus et al. [14] use these boundaries to extract the three-
dimensional (3D) connectivity present in similar image volumes. In their method,
a contrast enhancing filter followed by a directional diffusion filter is applied to
the raw images to enhance and connect cellular membranes. The images are then
thresholded and neuron cell bodies are identified using a watershed segmentation
method. This method fails when membranes are weak or there are too many intra
cellular features. This indicates that more adaptive algorithms need to be developed
to segment these structures. For this reason, machine learning algorithms have been
shown as a successful alternative for identifying membranes in TEM data. In related
work, Jain et al. [13] use a multilayer convolution neural network to classify pixels as
membrane and nonmembrane. However, the stain used on the specimen highlights
cell boundaries, attenuating intracellular structures, simplifying the segmentation
task. Another successful application of learning applied to TEM is the use of a
perceptron trained with a set of predefined image features [17]. However, extensive
postprocessing is required to close the detected cell membranes and remove internal
cellular structures.
1.1.2 O u r w o r k
Our method described in this thesis improves upon previous work by utilizing
context information for discrimination of membrane pixels from the nonmembrane
pixels. By including the features of neighboring pixels as inputs to the classifier, the
classifier can utilize the context to deal with membrane disconnectivities. The fea
tures were designed to improve the classification accuracy of elongated structures,
2
activity and function . Images from serial-section TEM are captured by cutting a
section from the specimen and suspending it over an electron beam which passes
through the section creating a projection that is captured as a digital image. See
figure 3.2 (a) for an example serial-section TEM image corresponding to a cross
section of the nematode C. elegans with a resolution of 6nmx6nmx33nm.
1.1.1 Previous work
An accurate mapping of neuron features begins with the segmentation of the
neuron boundaries. Jurrus et al. [14] use these boundaries to extract the three
dimensional (3D) connectivity present in similar image volumes. In their method,
a contrast enhancing filter followed by a directional diffusion filter is applied to
the raw images to enhance and connect cellular membranes. The images are then
t hresholded and neuron cell bodies are identified using a watershed segmentation
method. This method fails when membranes are weak or there are too many intra
cellular features. This indicates that more adaptive algorithms need to be developed
to segment t hese structures. For this reason, machine learning algorithms have been
shown as a successful alternative for ident ifying membranes in TEM data. In related
work, J ain et al. [13] use a multilayer convolution neural network to classify pixels as
membrane and nonmembrane. However, the stain used on the specimen highlights
cell boundaries, attenuating intracellular structures, simplifying the segmentation
task. Another successful application of learning applied to TEM is the use of a
percept ron trained with a set of predefined image features [17]. However , extensive
postprocessing is required to close the detected cell membranes and remove internal
cellular structures.
1.1.2 Our work
Our method described in this thesis improves upon previous work by ut ilizing
context information for discrimination of membrane pixels from the nonmembrane
pixels. By including the features of neighboring pixels as inputs to the classifier , the
classifier can utilize the context to deal with membrane disconnectivities. The fea
tures were designed to improve the classification accuracy of elongated structures,
3
like the membranes. The nonlinear decision boundary is learned by a classifier
trained under the AdaBoost framework.
We also attempted identification of the synapses in the images. Region-based
attributes were used to quantify the shape of the regions. A framework capable
of detecting rare events effectively was used to detect synapses since synapses are
sparsely distributed across the image mosaic.
The following section gives the overview of the entire thesis.
1 .2 O u t l i n e
The remaining chapters of this thesis are organized as follows:
• Chapter 2 provides background material for the rest of the thesis. Section 2.1
gives a brief overview of neural circuitry, TEM and image acquisition using TEM
are described in section 2.2. Section 2.3 and section 2.4 give a brief introduction
to decision stumps and AdaBoost. Lastly in section 2.5 we discuss about our
classifier's performance on artificial datasets and its limitations.
• Chapter 3 is divided into two major sections. Section 3.2 and section 3.3
are dedicated to cell membrane detection and synspase detection respectively. Sec
tion 3.1 describes about the preprocessing of images before setting up the classifi
cation experiment. Subsections 3.2.1 and 3.3.1 describe the features used by the
classifier. The last subsections (section 3.2.2 and section 3.3.4) explain the machine
learning classifier setup in detail.
• Chapter 4 presents a detailed evaluation of the proposed methods of cell
membrane detection and synapse detection. Section 4.1 discusses the cell membrane
detection classifier training and testing on a C. elegans worm dataset, whereas the
synapse detection algorithm is run over a rabbit retina dataset. It is discussed in
section 4.2.
• Chapter 5 concludes the thesis and discusses the contributions of this thesis
(section 5.1) and proposed future work (section 5.2).
3
like the membranes . The nonlinear decision boundary is learned by a classifier
trained under the AdaBoost framework.
We also attempted identification of the synapses in the images. Region-based
attributes were used to quantify the shape of the regions. A framework capable
of detecting rare events effectively was used to detect synapses since synapses are
sparsely distributed across the image mosaic.
The following section gives the overview of the entire thesis.
1.2 Outline
The remaining chapters of this thesis are organized as follows:
• Chapter 2 provides background material for the rest of the thesis . Section 2.1
gives a brief overview of neural circuitry, TEM and image acquisition using TEM
are described in section 2.2. Section 2.3 and section 2.4 give a brief introduction
to decision stumps and AdaBoost . Lastly in section 2.5 we discuss about our
classifier's performance on artificial datasets and its limitations.
• Chapter 3 is divided into two major sections. Section 3.2 and section 3.3
are dedicated to cell membrane detection and synspase detection respectively. Sec
tion 3.1 describes about the preprocessing of images before setting up the classifi
cation experiment. Subsections 3.2.1 and 3.3.1 describe the features used by the
classifier. The last subsections (section 3.2.2 and section 3.3.4) explain the machine
learning classifier setup in detail.
• Chapter 4 presents a detailed evaluation of the proposed methods of cell
membrane detection and synapse detection. Section 4.1 discusses the cell membrane
detection classifier training and testing on a C. elegans worm dataset, whereas the
synapse detection algorithm is run over a rabbit retina dataset. It is discussed in
section 4.2.
• Chapter 5 concludes the thesis and discusses the contributions of this thesis
(section 5.1) and proposed future work (section 5.2).
C H A P T E R 2
B A C K G R O U N D
This chapter provides the necessary background information about the data used
in the experiments. It also gives the basic information necessary to understand the
experiment setup.
2 . 1 N e u r a l c i r c u i t r y
The nervous systems of animals are made up of numerous neural circuits that
transmit and process the sensory perception signals, motor activity control signals of
these organisms [16]. The neural circuits are a collection of neurons interconnected
to each other communicating through electrochemical reactions and electrical im
pulses. There are various types of neuronal cells and equally numerous types of
interconnections between them. The complexity of the neural circuit is dependent
on the types of neurons and their interconnectivity. The communication between
these neurons is through synapses that through the electrically excitable membranes
of neurons propagate the electrical signals. The physical structure of the neural
circuit can be observed using high magnification electron microscopes. Sample
images of neural circuits of C. elegans and rabbit mouse retina are shown in
Figure 2.1. In both the images of Figure 2.1 we can see cell membranes separating
one neuron from the other.
The images in Figure 2.2 show the synapse structures near the cell membranes.
The bulged shape is because of accumulation of vesicles near the region of transmis
sion. This bulged dark region is called the presynaptic density of the synapse [6].
There are various other types of synapses that result in just darkening of the
membrane structure. There is a type of synapse called as the "Gap junctions",
CHAPTER 2
BACKGROUND
This chapter provides the necessary background information about the data used
in the experiments. It also gives the basic information necessary to understand the
experiment setup.
2.1 Neural circuitry
The nervous systems of animals are made up of numerous neural circuits that
transmit and process the sensory perception signals, motor activity control signals of
these organisms [16]. The neural circuits are a collection of neurons interconnected
to each other communicating through electrochemical reactions and electrical im
pulses. There are various types of neuronal cells and equally numerous types of
interconnections between them. The complexity of the neural circuit is dependent
on the types of neurons and their interconnectivity. The communication between
these neurons is through synapses that through the electrically excitable membranes
of neurons propagate the electrical signals. The physical structure of the neural
circuit can be observed using high magnification electron microscopes. Sample
images of neural circuits of C. elegans and rabbit mouse retina are shown in
Figure 2.1. In both the images of Figure 2.1 we can see cell membranes separating
one neuron from the other.
The images in Figure 2.2 show the synapse structures near the cell membranes.
The bulged shape is because of accumulation of vesicles near the region of transmis
sion. This bulged dark region is called the presynaptic density of the synapse [6].
There are various other types of synapses that result in just darkening of the
membrane structure. There is a type of synapse called as the "Gap junctions",
(a) Synapse 1 (b) Synapse 2 (c) Synapse 3 (d) Synapse 4
F i g u r e 2.2. Neural tissue slices of rabbit retina showing synapses
which is a rod like structure spanning across neurons. This synapse structure
shows clumping of the vesicles even in the interior of the neurons.
2 . 2 T r a n s m i s s i o n e l e c t r o n m i c r o s c o p y
Transmission electron microscopy(TEM) is a microscopy technique in which a
beam of electrons is transmitted through a very thin sliced specimen (in order of mi
crometers or nanometers) interacting with the specimen as they pass through. This
interaction of electrons with the specimen results in an image of the specimen [27],
which is then magnified and focused onto an imaging devices like a fluorescent
screen or a photographic film. Modern microscopes have CCD cameras to directly
fa) Synapse 1 (b) Synapse 2 (c) Synapse 3 (d) Synapse 4
5
(a) C. elegans worm neural tissue (b) Rabbit retina neural tissue
Figure 2.1. Neural tissue slices of C. elegans worm and rabbit retina
(a) Synapse 1 (b) Synapse 2 (c) Synapse 3 (d) Synapse 4
Figure 2.2 . Neural tissue slices of rabbit retina showing synapses
which is a rod like structure spanning across neurons. This synapse structure
shows clumping of the vesicles even in the interior of the neurons.
2.2 Transmission electron microscopy
Transmission electron microscopy(TEM) is a microscopy technique in which a
beam of electrons is transmitted through a very thin sliced specimen (in order of mi
crometers or nanometers) interacting with the specimen as they pass through. This
interaction of electrons with the specimen results in an image of the specimen [27],
which is then magnified and focused onto an imaging devices like a fluorescent
screen or a photographic film. Modern microscopes have CCD cameras to directly
6
digitize these high resolution images.
Since electrons are equivalent to electromagnetic waves of very small wavelength,
TEMs are capable of very high resolution (in order of nanometer/pixel) images
compared to light microscopes, which operate with electromagnetic waves of much
higher wavelength. This enables the neural tissues to be scanned at very high
magnifications.
2 .2 .1 C o m p o n e n t s a n d w o r k i n g
The TEM is vacuumized system where the electrons travel and interact with
the specimen. One end of the TEM has an electron gun, which is the source of the
beam of electrons. The other end of the TEM has the imaging device to capture the
image of the specimen. A series of electromagnetic lenses and electrostatic plates
guide the electron beam through the specimen and focus on the imaging device.
The imaging device provides the observer with the image of the specimen under
observation.
2.2.2 C h a l l e n g e s
The usage of TEM for imaging the neural tissue has the following challenges:
Since the TEM operates at very high magnifications, it can scan only a small area
of the specimen. However, the neural tissue spans much more than a few microns;
thus a single specimen has to be scanned as multiple tiles. Since electron beams are
equivalent to high energy beams impacting the specimen, bombardment of these
beams on the specimen causes nonuniform heating of the tissue and subsequent
distortion of the sample.
2.2.3 D a t a p r e p a r a t i o n
The section of neural tissue that is the region of interest is pigmented and frozen.
Then thin specimens are sliced out using a microtome. Once ultrathin specimens
are prepared, as discussed in the section 2.2.2, the specimens are imaged as tiles.
The ir-tool chain [15] is used to assemble the tiles to a mosaic and further the
slices are registered together to reconstruct the final volume. A faster method to
6
digitize these high resolution images.
Since electrons are equivalent to electromagnetic waves of very small wavelength,
TEMs are capable of very high resolution (in order of nanometer/ pixel) images
compared to light microscopes, which operate with electromagnetic waves of much
higher wavelength. This enables the neural t issues to be scanned at very high
magnifications.
2.2.1 Components and working
The TEM is vacuumized system where the electrons travel and interact with
the specimen. One end of the TEM has an electron gun , which is t he source of the
beam of electrons. The ot her end of the TEM has t he imaging device to capture the
image of the specimen. A series of electromagnetic lenses and electrostatic plates
guide the electron beam through the specimen and focus on the imaging device.
The imaging device provides the observer with t he image of the specimen under
observation.
2.2.2 Challenges
The usage of TEM for imaging the neural tissue has the following challenges:
Since the TEM operates at very high magnifications, it can scan only a small area
of the specimen. However, the neural tissue spans much more than a few microns;
thus a single specimen has to be scanned as multiple t iles . Since electron beams are
equivalent to high energy beams impacting the specimen, bombardment of these
beams on the specimen causes nonuniform heating of the tissue and subsequent
distortion of the sample.
2 .2 .3 Data preparation
The section of neural tissue that is the region of interest is pigmented and frozen.
Then thin specimens are sliced out using a microtome. Once ultrathin specimens
are prepared, as discussed in the section 2.2.2, the specimens are imaged as t iles.
The ir- tool chain [15] is used to assemble t he t iles to a mosaic and further the
slices are registered together to reconstruct the final volume. A faster method to
7
assemble such tiles to a mosaic is explained in chapter A.
2 . 3 D e c i s i o n s t u m p s
The decision stump is a special case of decision tree [18], [12], which is a class
of supervised learning algorithms frequently used in data mining and machine
learning.
Figure 2.3 and Figure 2.4 show an example of a decision stump and a decision
tree, respectively. The algorithm constructs a decision tree with just one decision
node and two classification leaves during training based on a given set of training
samples. Decision stump is the weak learner, i.e., it cannot give the best classifica
tion for the samples but a rather simple and fast classifier with accuracy at least
just greater than 50% where possible. The following section explains the learning
and classification of samples using decision stumps.
F i g u r e 2 .3 . Decision stump
7
assemble such tiles to a mosaic is explained in chapter A.
2.3 Decision stumps
The decision stump is a special case of decision tree [18], [12], which is a class
of supervised learning algorithms frequent ly used in data mining and machine
learning.
Figure 2.3 and Figure 2.4 show an example of a decision stump and a decision
tree , respectively. The algorithm constructs a decision tree with just one decision
node and two classification leaves during training based on a given set of training
samples . Decision stump is the weak learner , i. e. , it cannot give the best classifica
tion for the samples but a rather simple and fast classifier with accuracy at least
just greater than 50% where possible. T he following section explains the learning
and classification of samples using decision stumps.
Yes
Class 0
Figure 2.3 . Decision stump
Sample X (N attributes)
Is (X[attributenl > threshold)
No
Class 1
F i g u r e 2.4. Decision tree
Yes
Is (X[attributem] >
thresholdm)
Yes
8 Figure 2.4. Decision tree
Yes,---<
Is (X[attributep] >
thresholdp)
No
Is (X[attributen] >
threshold n)
No
Yes
Is
NA Is
(X[attributeq] > threshold q )
(X[attribute,] > ~ thre/
Class 1
00
9
2.3.1 Dec i s ion s t u m p l e a r n i n g
Like the rest of the supervised learning algorithms, the learning algorithm takes
the following as the input:
1. Attributes
2. Attribute values
3. Sample classification
4. Sample weights
The learning algorithm chooses the at tr ibute and a threshold value that gives the
best classification performance and margin for the decision stump as classifier.
2.3.2 Dec i s ion s t u m p class i f icat ion
The classification of a test sample based on the learned model is trivial. The
model gives the attribute, threshold value and the inequality relation of the at
tribute value to the threshold. The evaluation of the inequality equation gives the
classification of the sample.
2 . 4 A d a B o o s t
AdaBoost [20] is a type of boosting algorithm. Boosting is an ensemble learning
technique where weak learners such as decision stumps are used as components to
create strong classifiers. The AdaBoost meta-algorithm at each round learns a weak
classifier with accuracy at least greater than 50% for a set of weighted samples. This
weak classifier adds to the set of weak classifiers learned in the previous rounds.
The final classification depends on the classification of weak learners in each round.
2 . 5 C l a s s i f i e r o n a r t i f i c i a l d a t a s e t s
The boosted decision stumps were tested for their classification performance on
few artificial datasets before applying to the real image datasets. This was done to
access the performance of the classifier on certain specific nature of these datasets.
The datasets and their unique traits are listed below:
9
2.3.1 Decision stump learning
Like the rest of the supervised learning algorithms, the learning algorithm takes
the following as the input:
1. Attributes
2. Attribute values
3. Sample classification
4. Sample weights
The learning algorithm chooses the attribute and a threshold value that gives the
best classification performance and margin for the decision stump as classifier.
2.3.2 Decision stump classification
The classification of a test sample based on the learned model is trivial. The
model gives the attribute, threshold value and the inequality relation of the at
tribute value to the threshold . The evaluation of the inequality equation gives the
classification of the sample.
2.4 AdaBoost
AdaBoost [20] is a type of boosting algorithm. Boosting is an ensemble learning
technique where weak learners such as decision stumps are used as components to
create strong classifiers. The AdaBoost meta-algorithm at each round learns a weak
classifier with accuracy at least greater than 50% for a set of weighted samples. This
weak classifier adds to t he set of weak classifiers learned in the previous rounds.
The final classification depends on the classification of weak learners in each round.
2.5 Classifier on artificial datasets
The boosted decision stumps were tested for t heir classification performance on
few artificial datasets before applying to the real image datasets. This was done to
access the performance of the classifier on certain specific nature of these datasets .
The datasets and their unique traits are listed below:
10
1. Crescent dataset - Linearly nonseparable
2. Concentric circles dataset - Linearly nonseparable and means of the classes
are very close to each other if not the same
3. Star dataset - Linearly nonseparable and multiple clustering of samples of
same dataset
4. Nonaxis aligned dataset - Decision boundary is not aligned to any of the
dimension's axes
The results of such experiments are discussed in the following sections.
2 .5 .1 C r e s c e n t d a t a s e t
Table 2.1 describes the dataset properties. Figure 2.5 shows a typical crescent
dataset and Figure 2.6 shows the performance of the classifier at various rounds of
the learning.
2.5.2 C o n c e n t r i c c i rc les d a t a s e t
Table 2.2 describes the dataset properties. Figure 2.7 shows a typical concentric
circles dataset, and Figure 2.8 shows the performance of the classifier at various
rounds of the learning.
2 .5.3 S t a r d a t a s e t
Table 2.3 describes the dataset properties. Figure 2.9 shows a typical star
dataset, and Figure 2.10 shows the performance of the classifier at various rounds
of the learning.
T a b l e 2 . 1 . Crescent dataset properties Attribute Attribute nature Data, dimensionality 2 Number of classes 2 Separability Separable but linearly nonseparable Description The samples of one class are distributed as a crescent. The
mean of the samples are clearly separated.
10
1. Crescent dataset - Linearly nonseparable
2. Concentric circles dataset - Linearly nonseparable and means of the classes
are very close to each other if not the same
3. Star dataset - Linearly nonseparable and multiple clustering of samples of
same dat aset
4. Nonaxis aligned dataset - Decision boundary is not aligned to any of the
dimension 's axes
The results of such experiments are discussed in the following sections.
2.5.1 Crescent dataset
Table 2.1 describes the dataset properties. Figure 2.5 shows a typical crescent
dataset and Figure 2.6 shows the performance of the classifier at various rounds of
the learning.
2.5.2 Concentric circles dataset
Table 2.2 describes the dataset properties. Figure 2.7 shows a typical concentric
circles dataset , and Figure 2.8 shows the performance of the classifier at various
rounds of the learning.
2.5.3 Star dataset
Table 2.3 describes the dataset properties. Figure 2.9 shows a typical star
dataset , and Figure 2.10 shows the performance of the classifier at various rounds
of the learning.
Table 2.1. Crescent dataset properties Attribute Attribute nature Data dimensionality 2 N umber of classes 2 Separability Separable but linearly nonseparable Description The samples of one class are distributed as a crescent. The
mean of the samples are clearly separated .
6
5
4
3
< 2
1
0
-1
"-2i
Class 0 Class 1
* ' ~ « :%*
1 2 X-Axis
F i g u r e 2 .5 . Crescent dataset
0.8
D (0 > 0.6 "D 0) N
I 0.4
0.2
Accuracy True positive rate False positive rate
4 6 8 Boost ing Round
10 12
F i g u r e 2.6. Classifier performance at every round
6
5
4
3 .~ ><
2 -< I
>-1
0
- 1
- 2 -1 o
Figure 2 .5. Crescent dataset
Q) :::l ctJ
0.8
> 0.6 '0 Q)
.~ ctJ E 0.4 ... o z
0.2
.• -----'t" //'
2
1 2 X-Axis
,." /
" //
/ __ .+-_---"!r----- ¥
I ' Class 01
Class 1 t
3 4
/" / " /
" '/
- Accuracy --a-- True positive rate
False positive rate
...... __ ..
468 Boosting Round
10 12
Figure 2.6. Classifier performance at every round
11
12
T a b l e 2.2. Concentric circles dataset properties Attribute Attribute nature Data dimensionality 2 Number of classes to
Separability Separable but linearly nonseparable Description The samples of one class are distributed within a circle.
The other class is a ring surrounding the other class. The mean of the samples are very close to each other.
3
2
1 CD ">< n < 0 >•
-1
C lass 0 C la s s 1
- 2
- 3 - 3 - 2 0
X-Axis
F i g u r e 2.7. Concentric circles dataset
2.5.4 N o n a x i s a l i gned d a t a s e t
We have seen that the classifier is able to learn all the above artificial datasets to
100 percent accuracy in training. One of the main disadvantages of using decision
stumps as weak classifiers is that the model cannot learn classifications based on
multiple attributes or it can learn based on distribution of just one attr ibute.
This statement is corroborated in the following experiment with nonaxis aligned
dataset. Table 2.4 describes the dataset properties. Figure 2.11 shows a typical
nonaxis aligned dataset and Figure 2.12 shows the performance of the classifier at
various rounds of the learning. We can see from Figure 2.12 that the classifier takes
Table 2 .2. Concentric circles dataset properties Attribute Data dimensionality N umber of classes Separability Description
3
2
1
1/1 .;;: 0 «
I >-
- 1
- 2
-3 - 3 -2
Attribute nature 2 2 Separable but linearly nonseparable The samples of one class are distributed within a circle. The other class is a ring surrounding the other class. The mean of the samples are very close to each other.
-1 0 X-Axis
, Class 0 Class 1
2 3
Figure 2.7. Concentric circles dataset
2.5 .4 Nonaxis aligned dataset
12
We have seen that the classifier is able to learn all the above artificial datasets to
100 percent accuracy in training. One of the main disadvantages of using decision
stumps as weak classifiers is that the model cannot learn classifications based on
multiple attributes or it can learn based on distribution of just one attribute.
This statement is corroborated in the following experiment with nonaxis aligned
dataset . Table 2.4 describes the dataset properties. Figure 2.11 shows a typical
nonaxis aligned dataset and Figure 2.12 shows the performance of the classifier at
various rounds of the learning. We can see from Figure 2.12 that the classifier takes
13
0.2
Accuracy True positive rate False positive rate
10 20 30 Boosting Round
40
F i g u r e 2.8. Classifier performance at every round
T a b l e 2 .3 . Star dataset properties Attribute Attribute nature Data dimensionality 2 Number of classes to
Separability Separable but linearly nonseparable Description The samples of one class are distributed as four distinct
clusters with cluster centers on the axis and equidistant from the origin. The samples of the other classes are distributed as clusters with centers exactly in between the clusters of the other class. The means of the samples are very close to each other.
many rounds to learn the 2D linear decision boundary. Section 2.5.5 compares the
classification performance of boosted decision stump classifier with other types of
classifiers.
2.5.5 C o m p a r i s o n w i t h o t h e r classifiers
More complex learning algorithms, such as a perceptron, can learn these kind
of distinct linear decision boundaries in just one round of learning. Support vector
machines(SVM) [23, 2, 5] are capable of projecting the data into infinite dimen-
Q) :J nJ > 0.6 '0 Q)
.!::! nJ E 0.4 ~
o z 0.2
, "
~Accuracy
-e-True positive rate False positive rate
10 20 30 40 Boosting Round
Figure 2.8. Classifier performance at every round
Table 2.3. Star dataset properties Attribute Attribute nature Data dimensionali ty 2 Number of classes 2 Separability Separable but linearly nonseparable Description The samples of one class are distributed as four distinct
clusters with cluster centers on the axis and equidistant from the origin. The samples of the other classes are distributed as clusters with centers exactly in between the clusters of the other class. The means of the samples are very close to each other.
13
many rounds to learn the 2D linear decision boundary. Section 2.5.5 compares the
classification performance of boosted decision stump classifier with other types of
classifiers.
2.5.5 Comparison with other classifiers
More complex learning algorithms, such as a perceptron , can learn these kind
of distinct linear decision boundaries in just one round of learning. Support vector
machines(SVM) [23, 2, 5] are capable of projecting the data into infinite dimen-
3
2
1
< 0
- 1
- 2
"-33 - 2 -1 0 1 X-Axis
Class 0 Class 1
F i g u r e 2.9. Star dataset
0.8 o (0 > 0.6 "D CD N E 0.4
0.2
Accuracy True positive rate False positive rate
4 6 8 Boost ing Round
10 12
F i g u r e 2.10. Classifier performance at every round
3
2
1 III "x
0 oCt I
>-
'~' . ~ .
, .Jfi v
-1
-2
- 3 - 3 -2
F igure 2.9. Star dataset
1
0.8 Q) :::J (ij > 0.6 " Q)
.!::! IV E 0.4 ~
0 z 0.2
00 2
-1 o X-Axis
/
/
/
1
- .- Accuracy
2
-e- True positive rate
3
False positive rate
468 Boosting Round
10 12
F igure 2.10. Classifier performance at every round
14
15
T a b l e 2.4. Nonaxis aligned dataset properties Attribute Attribute nature Data dimensionality 2 Number of classes 2 Separability Linearly nonseparable Description The samples of both classes are Gaussian distributions
with variance along one direction much larger than the other variance in direction in the perpendicular direction. The two classes are separated by a linear decision boundary that is not parallel to any of the axis. The mean of the samples are well separated from each other.
20 r
10
(A <
•10-
Class 0 Class 1
-20
- 3 -20 -10 0 10
X-Axis 20 30
F i g u r e 2 .11 . Nonaxis aligned dataset
sions using the kernel trick and learning nonlinear decision boundaries. Here we
have conducted experiments with SVMs which are capable of learning decision
boundaries with maximum margin. The test results of few such algorithms are
shown in Table 2.5.
The problem with these models is that their training time increases with the
increase in dimensionality and number of samples of the training dataset. We tried
training the SVM classifiers on microscopy datasets on subset of its samples and
learned classifiers did not yield the same classification performance as the boosted
Table 2.4. Nonaxis aligned dataset properties Attribute Attribute nature Data dimensionali ty 2 N umber of classes 2 Separabili ty Linearly nonseparable Description The samples of both classes are Gaussian distribut ions
wi th variance along one direction much larger than the other variance in direction in the perpendicular direction . The two classes are separated by a linear decision boundary that is not parallel to any of the axis. The mean of the samples are well separated from each other .
20'---~----~--~----~r=======~
I
. Class 0
1/1 'x « I
10
o
>- -10
-20 -.
- 30 -30 -20
~. ". '---_C_la_s_s_1--" ·t-.
~-. ., "' . ..., .. jt;'''''''~
'"'-i;/f~ .Y
, l.';~':. ~ •• ;fll' •
r.',;' / .... "
- 10 0 X- Axis
10 20 30
Figure 2.11 . Nonaxis aligned dataset
15
sions using the kernel trick and learning nonlinear decision boundaries. Here we
have conducted experiments with SVMs which are capable of learning decision
boundaries with maximum margin. The test results of few such algorithms are
shown in Table 2.5.
The problem with t hese models is that their t raining time increases with the
increase in dimensionality and number of samples of the training dataset. We tried
training the SVM classifiers on microscopy datasets on subset of its samples and
learned classifiers did not yield the same classification performance as the boosted
16
1
0.8 o 3 re > 0.6 ~o o N re . . E 0.4 >_ o z
0.2
0 5 10 15 20 25 Boost ing Round
F i g u r e 2.12. Classifier performance at every round
T a b l e 2.5. Comparison of classifiers Classifier Boosting
rounds AdaBoost and decision >6 stump AdaBoost and percep 1 tion AdaBoost and SVM with 1 linear kernel AdaBoost and SVM with 1 RBF kernel
rf 1 /
- ^ A c c u r a c y ^ T r u e positive rate
False positive rate
i Y
decision stumps trained on the whole dataset. Thus we chose the decision stump
based classifier for our problem.
Q) ~
ca
0.8
> 0.6 "0 Q)
.~
---+- Accuracy --<3- True positive rate
ca E 0.4
- False positive rate ... o z
0.2
5 10 15 Boosting Round
Figure 2.12. Classifier performance at every round
Table 2.5. Comparison of classifiers Classifier Boost ing
rounds AdaBoost and decision 26 stump AdaBoost and percep- 1 tron AdaBoost and SVM with 1 linear kernel AdaBoost and SVM with 1 RBF kernel
20 25
16
decision stumps trained on the whole dataset. Thus we chose the decision stump
based classifier for our problem.
C H A P T E R 3
M E T H O D S
This chapter describes the proposed methods for membrane detection and synapse
detection in detail. Figure 3.1 provides an overview of the fundamental steps.
Initially, the contrast of the gray scale images is normalized using CLAHE [11].
Then the feature values for the individual pixels of the enhanced images are gen
erated. Using the ground t ru th markup of cell membranes and synapses for these
images, a supervised learning experiment is set up. The decision stumps (weak
AdaBoost & Decision Stumps
Generate Hessian features
Predicted membrane
images
Segmentation
Input image CLAHE
enhanced image
Generate Moment features
Cascade of AdaBoost &
Decision stumps
Predict neuron interconnects
Orientation of predicted synapses
Connection Graph XML
F i g u r e 3 .1 . Block diagram of the proposed methods in the overall reconstruction pipeline
CHAPTER 3
METHODS
This chapter describes the proposed methods for membrane detection and synapse
detection in detail. Figure 3.1 provides an overview of the fundamental steps.
Initially, the contrast of the gray scale images is normalized using CLARE [11] .
Then the feature values for the individual pixels of the enhanced images are gen
erated. Using the ground truth markup of cell membranes and synapses for these
images, a supervised learning experiment is set up. The decision stumps (weak
AdaBoost & Predicted Decision f- membrane Stumps images
t • Generate
I
Hessian Segmentation features
CLAHE Connection
Input image f- enhanced Graph XML
image
I Generate
Predict neuron Moment features
interconnects
~ t Cascade of
Orientation of AdaBoost& f- predicted
Decision stumps
synapses
Figure 3.1. Block diagram of the proposed methods in the overall reconstruction pipeline
18
classifiers) are boosted [9] for several rounds until a high accuracy classifier is ob
tained in case of the membrane detection experiment. In case of synapse detection,
the same classifier is used in a cascaded architecture. The following sections review
these steps in greater detail.
3 . 1 I m a g e e n h a n c e m e n t
Before the images are used for feature extraction step, a contrast limited adap
tive histogram equalization (CLAHE) [11] is applied to the raw electron microscopy
images. This method changes the grey value of the pixels depending upon the
pixel values of neighboring pixels in the image, thus improving the local contrast.
This improves the contrast of the cell membranes locally against the contents
inside and outside the neuron cell, and also fixes overall brightness variability
between images [14]. The decrease in variability greatly helps the classifier since
it reduces the difference between training images, between training and testing
images. An example of such CLAHE enhancement is shown in Figure 3.2. The
CLAHE algorithm is shown in Algorithm 1.
(a) Original image (b) CLAHE Enhanced Image
F i g u r e 3.2. Comparison between original and CLAHE enhanced image
18
classifiers) are boosted [9] for several rounds until a high accuracy classifier is ob
tained in case of the membrane detection experiment. In case of synapse detection,
the same classifier is used in a cascaded architecture. The following sections review
these steps in greater detail.
3.1 Image enhancem ent
Before the images are used for feature extraction step, a contrast limited adap
tive histogram equalization (CLARE) [11] is applied to the raw electron microscopy
images. This method changes the grey value of t he pixels depending upon the
pixel values of neighboring pixels in the image, thus improving t he local contrast .
This improves the contrast of the cell membranes locally against the contents
inside and outside the neuron cell , and also fixes overall brightness variability
between images [14] . The decrease in variability greatly helps the classifier since
it reduces the difference between training images, between training and testing
images . An example of such CLARE enhancement is shown in Figure 3.2. The
CLARE algorithm is shown in Algorithm 1.
(a) Original image (b) CLARE Enhanced Image
Figure 3.2 . Comparison between original and CLARE enhanced image
19
A l g o r i t h m 1 CLAHE 7 <— Image to be enhanced O <— Enhanced Image W *— Moving window s <— Maximum contrast limit (n, n) <— Height and width of W Pad image 7 with (n — l ) / 2 pixels on all sides for For every pixel p in 7 d o
Construct window W around pixel p Pw P D F of grey values of pixels in W cw <— CDF of grey values of pixels in W such that the max difference between consecutive bins in s Op <— cw{pixelp)
e n d for
3 . 2 C e l l m e m b r a n e d e t e c t i o n
3.2.1 F e a t u r e s
Four features were computed for each pixel in the image: the pixel intensity,
and eigenvalues and orientation of the first eigenvector of the Gaussian smoothed
Hessian matrix. The gray value of the pixel is utilized since membranes are usually
dark and therefore are useful for segmentation, as verified in previous works [14, 13,
17]. The other three features are properties derived from the Gaussian smoothed
Hessian matrix,
H(x,y) = G„ * d2i o2i dx2 dxdy d2I d2I (3.1)
dydx dy2
where 7 is the (CLAHE enhanced) image, and Ga is the Gaussian blurring kernel
with standard deviation a. The Hessian matrix was used in the context of fil
tering [21] and segmenting [17] electron microscopy images. Since membranes are
elongated structures, the eigenvalues of the smoothened Hessian matrix represent
the anisotropic nature of the region around the pixel. The eigenvalue of the
principal eigenvector of the Hessian is proportional to the gradient orthogonal to
the membrane and the smaller eigenvalue is proportional to the gradient along
the cell membrane. The ability of this feature to measure the anisotropic nature
of the shape can be seen when we compare the eigenvalues of ellipses of different
eccentricities.
Algorithm 1 CLAHE I +- Image to be enhanced o +- Enhanced Image W +- Moving window s +- Maximum contrast limit (n, n) +- Height and width of W Pad image I with (n - 1) /2 pixels on all sides for For every pixel p in I do
Construct window W around pixel p Pw +- PDF of grey values of pixels in W
19
Cw +- CDF of grey values of pixels in W such that the max difference between consecutive bins in s Op +- cw(pixelp )
end for
3.2 Cell membrane detection
3.2.1 Features
Four features were computed for each pixel in the image: the pixel intensity,
and eigenvalues and orientation of the first eigenvector of the Gaussian smoothed
Hessian matrix. The gray value of the pixel is utilized since membranes are usually
dark and therefore are useful for segmentation , as verified in previous works [14, 13,
17]. The other three features are properties derived from the Gaussian smoothed
Hessian matrix,
[
[P I
H(x, y) = Ga * ~~~ oyox
(3.1 )
where I is the (CLAHE enhanced) image, and Ga is t he Gaussian blurring kernel
with standard deviation cr. The Hessian matrix was used in the context of fil
tering [21] and segmenting [17] electron microscopy images. Since membranes are
elongated structures, the eigenvalues of the smoothened Hessian matrix represent
the anisotropic nature of the region around the pixel. The eigenvalue of the
principal eigenvector of the Hessian is proportional to the gradient orthogonal to
the membrane and the smaller eigenvalue is proportional to the gradient along
the cell membrane. The ability of this feature to measure the anisotropic nature
of the shape can be seen when we compare the eigenvalues of ellipses of different
eccentricit ies .
20
The eigenvalues of ellipse of different eccentricities are shown in Table 3.1. As we
see, the eigenvalue of the principal eigenvector increases with increase in eccentricity
of the ellipse as expected. Thus when the ratio of the major axis to the minor axis is
near one, the shape is more circular and the blob region is more likely to represent
a vesicle than a membrane.
The fourth feature is the orientation of the principal eigenvector at that point.
The inclusion of this feature gains significance during learning of the classifier
because the neighboring pixels features are also considered.
The feature vector for every pixel in the image consists of the feature values
of that pixel and of its neighbors. The neighborhood is defined by a star shaped
stencil with its 8 arms forking out every 45 degrees (Figure 3.3). We show in the
results section that the neighboring pixel features adds relevant information for the
T a b l e 3 .1 . Eigenvalue of ellipses of different parameters Ellipse Parameters Ratio of eigenvalues
• Major axis = 20, Minor axis = 20
1:1
• Major axis = 20, Minor axis = 40
2:1
Major axis =100, Minor axis =10
10:1
20
The eigenvalues of ellipse of different eccentricities are shown in Table 3.1. As we
see, the eigenvalue of the principal eigenvector increases with increase in eccentricity
of the ellipse as expected. Thus when the ratio of the major axis to the minor axis is
near one , the shape is more circular and the blob region is more likely to represent
a vesicle than a membrane.
The fourth feature is the orientation of the principal eigenvector at that point.
The inclusion of this feature gains significance during learning of the classifier
because the neighboring pixels features are also considered .
The feature vector for every pixel in the image consists of the feature values
of that pixel and of its neighbors. The neighborhood is defined by a star shaped
stencil with its 8 arms forking out every 45 degrees (Figure 3.3). We show in the
results section that the neighboring pixel features adds relevant information for the
Table 3.1. Eigenvalue of ellipses of different parameters Ellipse Parameters Ratio of eigenvalues
Major axis = 20, Minor 1:1 axis = 20
Major axis = 20, Minor 2:1 axis = 40
Major axis = 100, Minor 10:1 axis = 10
21
F i g u r e 3.3. Stencil neighborhood.
classification. The context helps to identify membranes at regions were there are
minor discontinuities, as it allows for the classifier to utilize the context information
to "interpolate" the cell membrane. In this regard, the orientation feature plays
an important role by imposing a smoothness constraint on the curvature of the
membrane.
3.2.2 Classifier
We propose to utilize a classifier trained with AdaBoost [9] since such a classifier
can model a nonlinear decision boundary. AdaBoost is a meta-algorithm that builds
the classifier from "weak" classifiers, such as a decision stump. At each round,
AdaBoost adds a weak classifier to the set of weak classifiers by training for best
classification performance according to samples weights. The sample weights are
varied depending on the classification result of the previous round, by increasing
the weights of incorrectly classified samples and decreasing the weights of correctly
classified samples. The final classifier is a weighted sum of the weak classifiers
according to their accuracy in the training rounds. It has been observed empirically
in previous experiments that the obtained classifiers generally do not over fit [10],
[3], [7], [19]. The algorithm for AdaBoost is given in Algorithm 2.
In this paper, decision stumps are used for the weak classifier. Decision stumps
are the simplest form of binary decision trees with just one decision node. The deci
sion stump makes the classification decision based on just the value of a particular
feature with respect to a threshold. Given the feature set, desired classification
and prior of the samples, the threshold for a particular feature can be chosen
21
o
o
C -G
c
o
Figure 3.3. Stencil neighborhood.
classification. The context helps to identify membranes at regions were there are
minor discontinuit ies, as it allows for t he classifier to utilize t he context information
to "interpolate" the cell membrane. In this regard, t he orientation feature plays
an important role by imposing a smoothness constraint on the curvature of the
membrane.
3.2.2 Classifier
We propose to utilize a classifier trained wi th AdaBoost [9] since such a classifier
can model a nonlinear decision boundary. AdaBoost is a meta-algorithm that builds
the classifier from "weak" classifiers, such as a decision stump. At each round ,
AdaBoost adds a weak classifier to t he set of weak classifiers by t raining for best
classification performance according to samples weights. The sample weights are
varied depending on t he classification result of the previous round , by increasing
t he weights of incorrectly classified samples and decreasing the weights of correctly
classified samples. The final classifier is a weighted sum of the weak classifiers
according to their accuracy in the t raining rounds. It has been observed empirically
in previous experiments that t he obtained classifiers generally do not over fit [10],
[3], [7], [19]. The algorithm for AdaBoost is given in Algorithm 2.
In this paper , decision stumps are used for the weak classifier. Decision stumps
are the simplest form of binary decision trees with just one decision node. The deci
sion stump makes the classification decision based on just the value of a part icular
feature with respect to a threshold. Given t he feature set , desired classification
and prior of the samples , the t hreshold for a part icular feature can be chosen
22
A l g o r i t h m 2 AdaBoost X <— Samples Y <— Classification N <r— Number of samples D <— Number of dimensions T <— Number of rounds of boosting Wi(n) <— 4 , where n = 1 . . . N {Initializing weight distribution of samples} for t = 1 . . . T d o
ht <— Train weak learner using weight distribution Wt
N et <— 2_\Wt{n)[yn 7̂ ht{xn)\{Calculate weighted error rate}
based on the probability distribution functions of membrane and nonmembrane
classes over the feature values without making any underlying assumption about
the distribution of the feature. This gives the stump of best accuracy compared to
the ones built using other metrics like information gain. The AdaBoost mechanism
along with the decision stump classifier acts as a feature selection mechanism [25].
Algorithm 3 shown the algorithm for building the decision stump with maximum
classification performance. The algorithm is modified to get maximum performance
by vectorizing the operations. This algorithm also takes care of maximizing the
margin for the decision boundary.
The various functions specified in the algorithm are specified in the following
Wt+i(n) +- Wt(n)eatyMxn){Calculate new weight distribution} Wf+i(n) < Wi+i(n)—{Normalize weights}
t=i
figures:
1. fpos, P D F of class 1 samples-*— Class 1 function of Figure 3.4(a)
2. fneg, PDF of class 0 s a m p l e s ^ Class 0 function of Figure 3.4(a)
3. Cpos, CDF of class 1 samples*— Class 1 function of Figure 3.4(b)
Algorithm 2 AdaBoost X f- Samples Y f- Classification N f- Number of samples D f- Number of dimensions T f- Number of rounds of boosting WI (n) f- iJ , where n = 1 ... N {Initializing weight distribution of samples} for t = 1 ... T do
ht f- Train weak learner using weight distribution Wt N
Et f- L Wt(n)[Yn =1= ht(xn)]{Calculate weighted error rate} n=1
a f- l.in I- ft. t 2 f t.
WH1 (n) f- Wt(n )eCl: tYn h t(xn) {Calculate new weight distribution} Wt+1(n) f- N W t+ l(n) {Normalize weights}
L Wt+1(n) n = 1
end for T
Final boosted classifier H (x) f- sign(L atht(x)) t=l
22
based on the probability distribution functions of membrane and nonmembrane
classes over the feature values without making any underlying assumption about
the distribution of the feature . This gives the stump of best accuracy compared to
the ones built using other metrics like information gain. The AdaBoost mechanism
along with the decision stump classifier acts as a feature selection mechanism [25].
Algorithm 3 shown the algorithm for building the decision stump with maximum
classification performance. The algorithm is modified to get maximum performance
by vectorizing t he operations. This algorithm also takes care of maximizing the
margin for the decision boundary.
The various functions specified in the algorithm are specified in the following
figures:
1. jpos, PDF of class 1 samplesf- Class 1 function of Figure 3.4(a)
2. fneg, PDF of class 0 samplesf- Class 0 function of Figure 3.4(a)
3. cpos, CDF of class 1 samplesf- Class 1 function of Figure 3.4(b)
23
A l g o r i t h m 3 Decision stump learning (Vectorized) X *— Samples Y <— Classification N *— Number of samples D *— Number of dimensions W «— Weight distribution of samples for d = 1 . . . D d o
[X^ s o r t e d , sortlndices] <— sort(Xa){Sort at tr ibute values of one dimension} /pos PDF of sample of class 1 weighted by distribution W fneg <— PDF of sample of class 0 weighted by distribution W Cpos *— CDF of sample of class 1 weighted by distribution W cneg <— CDF of sample of class 0 weighted by distribution W ^pos * Cpos ineg * TYICLX (Cneg) Cneg for t = Every value of X^ d o
Accuracyi(t) <— ipos(t) + [maa:(«n e 9) — i n e f f ( t )]{Accuracy of Decision stump 1 at all values of t} Accuracy 2{t) *— ineg(t) + [max(ipo8) — ipos{t)]{Accuracy of Decision stump 2 at all values of t}
e n d for Accuracy(d) <— max(max(Accuracyi),max(Accuracy2)) Threshold(d) <— Threshold corresponding to Accuracy(d) I inequality (d) <— Inequality corresponding to Accuracy(d)
e n d for Accuracy.max *— max (Accuracy) ^Threshold ^~ Threshold corresponding to Accuracymax
hAttribute <— Attribute d corresponding to Accuracymax
^equality <- Inequality corresponding to A c e u r a q / m a x
4. c n e f l , CDF of class 0 samples*— Class 0 function of Figure 3.4(b)
5. ipos, CDF of class 1 samples normalized over the entire sample set*— Class 1
function of Figure 3.4(c)
6. i n e g , CDF of class 0 samples normalized over the entire sample set*— Class 0
function of Figure 3.4(c)
7. Accuracy \, Accuracy of decision stump where the inequality is attr ibute
valuegethreshold *— Accuracy 1 function of Figure 3.4(d)
Algorithm 3 Decision stump learning (Vectorized) X i- Samples Y i- Classification N i- Number of samples D i- Number of dimensions Wi-Weight distribution of samples for d = 1 .. . D do
[XdsQI·ted, sartI ndices] i- sort(Xd){Sort attribute values of one dimension} Ipos i- PDF of sample of class 1 weighted by distribution W j~,eg i- PDF of sample of class 0 weighted by distribution W cpos i- CDF of sample of class 1 weighted by distribution W cneg i- CDF of sample of class 0 weighted by distribution W 2pos i- cpos ineg i- max ( cneg) - cneg for t = Every value of X d do
23
AccuracYl(t) i- ipos(t) + [max(ineg) - ineg(t)]{Accuracy of Decision stump 1 at all values of t } AccuracY2(t ) i- ineg(t) + [max(ipos) - ipos(t)]{Accuracy of Decision stump 2 at all values of t }
end for Accuracy(d) i- max(max(AccuracYl) , max (AccuracY2)) Threshold(d) i- Threshold corresponding to Accuracy(d) I nequality( d) i- Inequality corresponding to Accuracy ( d)
end for AccuracYma3; i- max (Accuracy) hTlu'eshold i- Threshold corresponding to AccuracYmax hAttl';,/J'/lle i- Attribute d corresponding to AccuracYmax hinequality i- Inequality corresponding to AccuracYmax
4, cneg' CDF of class 0 samplesi- Class 0 function of Figure 3.4(b)
5, ipos , CDF of class 1 samples normalized over the entire sample seti- Class 1
function of Figure 3.4(c)
6. ineg, CDF of class 0 samples normalized over t he ent ire sample seti- Class 0
function of Figure 3.4( c)
7. AccuracYl , Accuracy of decision stump where the inequality is attribute
valuegethreshold i- Accuracy 1 function of Figure 3.4(d)
24
LL Q Q-
Q
C l a s s 0 C l a s s 1
1 0
S 5
0 1 0 S a m p l e v a l u e
2 0 Q 0 1 0
S a m p l e v a l u e
C l a s s 0 C l a s s 1
2 0
(a)
0 _3
> C o
^3 o c 3
(c)
Probability distribution of the two classes (b) Cumulative Probability distribution of the two classes
2 0
0 . 5
Q
F u n c t i o n 1 F u n c t i o n 2 o
(0 3 1 0 o o
<
0 1 0 S a m p l e v a l u e
2 0 Q
A c c u r a c y 1 A c c u r a c y 2
0 1 0 S a m p l e v a l u e
2 0
Intermediate functions in the algorithm (d) Accuracies of the two stumps at the attribute value
F i g u r e 3.4. Plot of different functions in the decision stump learning algorithm
8. Accuracy2, Accuracy of decision stump where the inequality is at tr ibute
value/tthreshold <— Accuracy 2 function of Figure 3.4(d)
3 . 3 S y n a p s e d e t e c t i o n
3.3.1 F e a t u r e s
The features used in this setup are properties of regions of interest extracted
from the enhanced image. The enhanced image is thresholded and the thresholded
regions are used as masks to extract regions of interest. Due to the huge size of
the image, calculation of features for each and every pixel of the image would be
very time consuming in both the training and testing phases. Thus the image is
4~--~--~r=====~
LL 02 a.
0'--o
Class 0 Class 1
10 Sample value
20
1 Or------~·
LL o 5 o
Class 0 Class 1
10 20 Sample value
24
(a) P robabili ty distribution of the two classes (b) Cumulative Probability distribut ion ofthe two classes
1 '---~r=======~ (1) :::J nJ > c: o 0.5 ... (.) c: :::J
LL 0-o
Function 1 Function 2
10 20 Sample value
~ (.) nJ :s 10 (.) (.)
« Accuracy 1 Accuracy 2 10 20
Sample value (c) Intermediate functions in the algorithm (d) Accuracies of the two stumps at the at
tribute value
Figure 3.4. Plot of different functions in t he decision stump learning algorithm
8. AccuracY2, Accuracy of decision stump where the inequality is attribute
valueltthreshold <- Accuracy 2 function of Figure 3.4( d)
3.3 Synapse detection
3.3.1 Features
The features used in this setup are propert ies of regions of interest extracted
from the enhanced image. The enhanced image is thresholded and the thresholded
regions are used as masks to extract regions of interest . Due to the huge size of
t he image, calculation of features for each and every pixel of the image would be
very t ime consuming in both the training and testing phases. Thus the image is
25
first down sampled so that we have a reasonable training dataset size. Even then,
the number of data points was huge. By visual examination, it was seen that the
synapses are darker structures. Thus an optimum grey threshold value is learned
such that the thresholded regions are around the synapse location or its vicinity.
For all the connected component regions extracted from the thresholded region, the
following features are calculated:
1. 7 Rotation, translation and scale invariant moments
2. 30 bin cumulative histogram bin values
3. Area of the region
These features are explained in detail in the following sections.
3.3.2 Scale a n d r o t a t i o n inva r i an t m o m e n t s
The raw moment of any discrete region is given by the following equation
My = £ 5 > y / ( * , y ) (3.2) x y
In our case since we are calculating moments for regions of various sizes. We
normalize them by dividing any moment by the following value.
x y
We chose the centroid of the region as the point around with the moments are
calculated. The centroid of the region is given by the following equation
{ic, y} = { M 1 0 / M 0 0 , MQI/MOO} (3.4)
To introduce translation invariance, all the moments are calculated around the
centroid of the region as follows
25
first down sampled so that we have a reasonable training dataset size. Even then,
the number of data points was huge . By visual examination, it was seen that the
synapses are darker structures. Thus an optimum grey threshold value is learned
such that the thresholded regions are around the synapse location or its vicinity.
For all the connected component regions extracted from the thresholded region, the
following features are calculated:
1. 7 Rotation, translation and scale invariant moments
2. 30 bin cumulative histogram bin values
3. Area of the region
These features are explained in detail in the following sections.
3.3.2 Scale and rotation invariant moments
The raw moment of any discrete region is given by the following equation
(3.2) x y
In our case since we are calculating moments for regions of various sizes. We
normalize them by dividing any moment by the following value.
LL I(x ,y) (3.3) x y
We chose t he centroid of the region as the point around with the moments are
calculated. The centroid of the region is given by the following equation
{x , ]]} = {MlO/Moo,MoI/Moo} (3.4)
To introduce translation invariance, all the moments are calculated around the
centroid of the region as follows
26
x y We calculate the central moments (up to an order of 3) to determine the seven
scale, rotation moments The central moments are given by the following equations:
Moo — MDO (3.6)
MOI = 0 (3.7)
Mio = 0 (3.8)
Mil = Mn - xM01 = Mn - xM01 (3.9)
Mao = M 2 0 - xMio (3.10)
M02 = M02 - yM01 (3.11)
M21 = M 2 i - 2 z M n - j / M 2 0 + 2 x 2 M 0 i (3.12)
M12 = M12 - 2 y M u - z M 0 2 + 2 £ 2 M 1 0 (3.13)
M3 0 = M 3 0 - 3xM 2 ( ) + 2a : 2 M 1 0 (3.14)
/ i 0 3 = M 0 3 - 3 z M 0 2 + 2 x 2 M 0 i (3.15)
The scale invariant moments rjij can be constructed from the central moments
by dividing by the properly scaled 0th moments as shown below.
m = (3.i6) Moo
The scale invariant seven moments used as input features for the classifier are
given by the following equations. These moments are also called Hu invariant
moments.
26
(3.5) x y
We calculate the central moments (up to an order of 3) to determine the seven
scale, rotation moments The central moments are given by the following equations:
/-loa = Moo
/-lOI = 0
/-lIO = 0
(3.6)
(3.7)
(3.8)
(3.9)
(3.10)
(3. 11)
(3.12)
(3 .13)
(3.14)
(3.15)
The scale invariant moments 'r}'ij can be constructed from the central moments
by dividing by the properly scaled oth moments as shown below.
/-lij 'r}'ij = 1+( i¥)
/-loa
(3.16)
The scale invariant seven moments used as input features for the classifier are
given by the following equations, These moments are also called Hu invariant
moments.
27
h = ??20 + 7]Q2 (3.17)
h = (7720 - V02)2 + (2T7H) 2 (3.18)
^3 = (7/30 - 37712)2 + (3r/2i - 7?o3)2 (3.19)
h = (7/30 + flu)2 + (T?21 + ^7o3)2 (3.20)
h = {mo - 37712X7730 + 7712) [(7730 + 7/12)2 - 3(7721 + 7703)2]
+ (3r/2i - 7703)(mi + 7703)[3(7730 + 7712)2 - (7721 + m)2} (3.21)
h = (mo ~ 77o2)[(7730 + 7712)2 - (7/21 + 7/so)2] + 47711(7730 + 7712)(7721 + 7703) (3.22)
^7 = (37721 - 7703)(7730 + 7712)[(r/30 + 7/12)2 - 3(7721 + 7703)2]
Circular regions of interest are extracted with synapse point as the center. The
Hu invariant features are calculated for these circular regions.
3.3.3 C u m u l a t i v e h i s t o g r a m a n d a r e a f ea tu re s
The grey values of the region are rescaled to a normalized scale between the grey
scale minimum to threshold value. The rescaled range is divided into 30 equally
sized bins and a cumulative histogram is estimated. The 30 values of the 30 bins
are used as additional input features. The entire thresholded region is used for
estimating the bin values rather than using just the circular regions which were
used to calculate the moment features. Since the regions are extracted by masking
with a threshold, the range of grey scale is from zero (minimum grey value) to the
threshold value instead on the entire range of grey values.
The area of the region is the number of pixels in the extracted region.
3.3.4 Classif ier
In this problem we have an unbalanced training dataset, i.e., the number of
examples of synapse regions are far less than the number of examples of nonsynapses
+ (7730 - 37712)(7721 + 7703)13(7730 + 7712)2 - (7/21 + 7703)2] (3.23)
h = rJ20 + rJ02
12 = (rJ20 - rJ02)2 + (2rJll)2
13 = (rJ30 - 3rJ12)2 + (3rJ21 - 1703)2
14 = (rJ30 + rJ12)2 + (rJ21 + rJ03)2
h (rJ30 - 3rJ12) (rJ30 + rJ12) [(rJ30 + rJ12) 2 - 3(rJ21 + rJ03)2]
27
(3.17)
(3.18)
(3.19)
(3.20)
+ (31721 - rJ03)( rJ21 + rJ03) [3 (rJ30 + rJ12)2 - (rJ21 + rJ03)2] (3 .21)
h = (rJ20 - rJ02) [( rJ30 + rJ12)2 - (rJ21 + rJ30)2] + 4rJll(rJ30 + rJ12) (rJ21 + rJ03) (3.22)
17 (3rJ21 - rJ03)(rJ30 + rJ12) [(rJ30 + rJ12) 2 - 3(rJ21 + rJ03) 2]
+ (rJ30 - 3rJ12)(rJ21 + rJ03)[3(rJ30 + rJ12)2 - (rJ21 + rJ03)2] (3.23)
Circular regions of interest are extracted with synapse point as the center. The
Bu invariant features are calculated for these circular regions.
3.3.3 Cumulative histogram and area features
The grey values of the region are rescaled to a normalized scale between the grey
scale minimum to threshold value. The rescaled range is divided into 30 equally
sized bins and a cumulative histogram is estimated. The 30 values of the 30 bins
are used as additional input features. The entire thresholded region is used for
estimating the bin values rather than using just t he circular regions which were
used to calculate the moment features. Since the regions are extracted by masking
with a threshold , the range of grey scale is from zero (minimum grey value) to the
threshold value instead on the entire range of grey values.
The area of the region is the number of pixels in the extracted region.
3.3.4 Classifier
In this problem we have an unbalanced training dataset, i.e., the number of
examples of synapse regions are far less than the number of examples of nonsynapses
28
regions. In the testing phase also we observe that detection of synapses is like
finding a needle in a haystack. Thus we use a cascading architecture that works
well with other kinds of such rare event detection problem, such as face detection
in photographs [24]. In this architecture, results of several high prediction rate
classifiers are cascaded. The output of the the final classifier of the cascade gives
the prediction for synapses with few false negatives. The architecture is shown
in Figure 3.5. As shown in the figure, every classifier is an ensemble of weighted
decision stumps. The final boosted classifiers prediction is based on equation 3.24.
N Class = sgn(y^ wnHn(x) — threshold) (3.24)
n=0
AdaBoost + Decision Stump, adjusted threshold for very high prediction rate
Negatives
Negatives
Classifies
Positives
Negatives Positives
F i g u r e 3.5. Cascade architecture
28
regions. In the testing phase also we observe t hat detection of synapses is like
finding a needle in a haystack. Thus we use a cascading architecture that works
well with other kinds of such rare event detection problem, such as face detection
in photographs [24]. In this architecture, results of several high prediction rate
classifiers are cascaded. The output of the the final classifier of the cascade gives
the predict ion for synapses with few false negatives. The architecture is shown
in Figure 3.5. As shown in the figure, every classifier is an ensemble of weighted
decision stumps. The final boosted classifiers prediction is based on equation 3.24.
N
Class = sgn(L wnHn(x) - threshold) n=O
Figure 3.5. Cascade architecture
Classifier =
AdaBoost + Decision Stump, adjusted threshold for very high prediction rate
(3.24)
29
From equation 3.24 we can infer that the classification of a sample by the
ensemble can be varied by adjusting the threshold. This property is used at
every node of the ensemble, such that by varying the threshold for the ensemble,
we improve the prediction rate (approximately 100%). For the first stage of the
cascade, all the positive examples and an equal number of negative examples are
chosen as the training set. The ensemble is trained to a point where adding more
weak classifier does not improve the prediction rate of the ensemble. Then the
threshold of the ensemble is varied such that ensemble has a very high prediction
rate. After adjusting the threshold, the ensemble will predict a few samples as
nonsynapses. These samples will not be used for training in the following levels
of cascade. At the next stage of the cascade, we take all the samples predicted as
synapses in the previous stage and add a set negative samples such that the training
set is balanced. The classifier is trained just like the previous stage. The cascade
is trained until we run out of training examples or the rate of rejection negative
examples drops down.
3.3.5 O r i e n t a t i o n e s t i m a t i o n
Once the synapses regions are detected by the cascaded classifier, we estimate
the orientation of the synapses so that the predicted synapses can be aesthetically
viewed on the markup viewer. The original image is down sampled to 25% of its
size and blurred using perona-malik smoothening. This takes care that the blurring
occurs within the synapse alone and does not blur the entire area. A circular region
around the synapse is extracted and thresholded at the median grey value. From
this thresholded image, the orientation of the synapse is calculated as follows.
The orientation of principal axis of the binary image patch can be calculated
from the covariance matrix constructed from the 2nd order central moments as
follows.
The covariance matrix is given by
cov[I(x,y)] =
where
M20 M11 M11 M02
(3.25)
29
From equation 3.24 we can infer that the classification of a sample by the
ensemble can be varied by adjusting the threshold. This property is used at
every node of the ensemble, such that by varying the threshold for t he ensemble,
we improve the prediction rate (approximately 100%). For the first stage of the
cascade, all t he positive examples and an equal number of negative examples are
chosen as the training set. The ensemble is trained to a point where adding more
weak classifier does not improve the prediction rate of the ensemble. Then the
threshold of the ensemble is varied such that ensemble has a very high prediction
rate . After adjusting t he threshold, the ensemble will predict a few samples as
nonsynapses. These samples will not be used for training in the following levels
of cascade. At the next stage of the cascade, we take all t he samples predicted as
synapses in the previous stage and add a set negative samples such that the training
set is balanced . The classifier is trained just like the previous stage. The cascade
is trained until we run out of training examples or the rate of rejection negative
examples drops down.
3 .3 .5 Orientation estimation
Once the synapses regions are detected by the cascaded classifier , we estimate
the orientation of the synapses so that the predicted synapses can be aesthetically
viewed on the markup viewer. The original image is down sampled to 25% of its
size and blurred using perona-malik smoothening. This takes care that the blurring
occurs within the synapse alone and does not blur the entire area. A circular region
around the synapse is extracted and thresholded at t he median grey value. From
this thresholded image, the orientation of the synapse is calculated as follows.
The orientation of principal axis of the binary image patch can be calculated
from the covariance matrix constructed from the 2nd order central moments as
follows.
The covariance matrix is given by
(3.25)
where
30
1 M20 M 2 0
M20 —
Moo ~ M 0 0
/ M02 M 0 2
Mo2 —
Moo ~ M 0 0
Mil = Mn M n
Mil = Moo ~ Moo
(3.26)
f (3-27)
^ (3.28)
The orientation of the image patch is given by the angle of the principal eigen
vector. The angle of the principal eigenvector is given by
9 = \arctan{ ^ f ) (3.29) 2 M20 ~~ M02
The orientation of the principal axis corresponds to the orientation of the
membrane. Thus the orientation of the synapses corresponds to the the minor
eigenvector which is orthogonal to the principal eigenvector on the 2D plane. Thus
the orientation of the synapses is given by,
Bsynapse = 6 ± 90° (3.30)
30
, J-l 20 11120 - 2 J-l 20 = - = --- X
J-loo Moo (3.26)
, J-l02 11102 - 2 J-l02 = - = -- - Y
J-loo Moo (3.27)
, J-lll Mll -- (3 ?8) J-lll = - = - - xy .~
J-loo Moo
The orientation of the image patch is given by the angle of the principal eigen-
vector. The angle of the principal eigenvector is given by
1 2J-l' 8 = -arctan( , II , )
2 J-l 20 - J-l02 (3.29)
The orientation of the principal axis corresponds to the orientation of the
membrane. Thus the orientation of the synapses corresponds to the the minor
eigenvector which is orthogonal to the principal eigenvector on the 2D plane. Thus
the orientation of the synapses is given by,
8 synapse = 8 ± 90 0 (3.30)
C H A P T E R 4
R E S U L T S
This chapter presents a detailed evaluation of the proposed methods of cell
membrane detection and synapse detection on a C. elegans worm dataset and a
rabbit retina dataset, respectively.
4 . 1 C e l l m e m b r a n e d e t e c t i o n
The proposed method for cell membrane detection was tested on a C. elegans
dataset. The entire volume is made of 149 slices of 662x697 gray scale images. Out
of this stack, 5 image slices where chosen at random from the first 50 slices and the
accuracy of the method was assessed using 5-fold cross-validation. In each case, the
training was done using four of the five images and tested on the image that was left
out of training. The ratio of membrane/nonmembrane pixels is unbalanced in the
order of 1:10 and thus affects the performance of the classifier. The classifier trained
with a balanced dataset (1:1 ratio) had the best accuracy compared to classifiers
trained with various ratios of positive (membrane) and negative (nonmembrane)
samples, with results shown for this case. The negative samples were chosen at
random.
The feature vectors were generated as described in Chapter 3, with a 7 x 7
neighborhood and Gaussian standard deviation a = 5. At any location, these
parameters yielded 100 features (25 points in the neighborhood x 4 features for
every pixel). Initially, the decision stumps were boosted for 10000 rounds and the
area under the ROC (averaged over the 5 folds) computed after each round. We
can observe from Figure 4.1 that the area under the ROC curve flattens out after
around 3000 rounds of boosting. The corresponding ROCs are shown in Figure 4.2,
and the test images results in Figure 4.3, 4.4, 4.5, 4.6, 4.7. Table 4.1 shows the
CHAPTER 4
RESULTS
This chapter presents a detailed evaluation of the proposed methods of cell
membrane detection and synapse detection on a C. elegans worm dataset and a
rabbit retina dataset, respectively.
4.1 Cell membrane detection
The proposed method for cell membrane detection was tested on a C. elegans
dataset. The entire volume is made of 149 slices of 662 x 697 gray scale images . Out
of this stack, 5 image slices where chosen at random from the first 50 slices and the
accuracy of the method was assessed using 5-fold cross-validation. In each case, the
training was done using four of the five images and tested on the image that was left
out of training. The ratio of membrane/nonmembrane pixels is unbalanced in the
order of 1: 10 and thus affects the performance of the classifier. The classifier trained
with a balanced dataset (1: 1 ratio) had the best accuracy compared to classifiers
trained with various ratios of positive (membrane) and negative (nonmembrane)
samples, with results shown for this case. The negative samples were chosen at
random.
The feature vectors were generated as described in Chapter 3, with a 7 x 7
neighborhood and Gaussian standard deviation (J = 5. At any location , these
parameters yielded 100 features (25 points in the neighborhood x 4 features for
every pixel). Initially, the decision stumps were boosted for 10000 rounds and the
area under the ROC (averaged over the 5 folds) computed after each round. We
can observe from Figure 4.1 that the area under the ROC curve flattens out after
around 3000 rounds of boosting. The corresponding ROCs are shown in Figure 4.2 ,
and the test images results in Figure 4.3, 4.4, 4.5, 4.6, 4.7. Table 4.1 shows the
32
0.94
0.92 -
10° 101 102 103 104
Boosting Rounds
F i g u r e 4 . 1 . Semilog plot of number of boosting rounds versus the area under the ROC curve for that boosting round.
sample statistics (average of all folds) used in the experiments. At the knee of the
testing ROC curve, the false positive rate = 0.15 (60000 pixels) and true positive
rate = 0.85 (31000). The high false positive rate is because the pixels neighboring
the membranes and pixels of vescicles are classified as membrane pixels. Figure 4.2
clearly shows that the use of neighborhood context combined with the proposed
feature set yields significantly better results than thresholding of the diffusion filter
image [14]. Moreover, comparing with the results without context information
underlines the importance of using neighborhood for membrane detection.
32
0.94
0.92
0.9
Q)
2: 0.88 :::J u 0 0 0.86 0::: ~
Q) "0 0.84 c :::J
ro Q)
0.82 ~ «
0.8
0.78
0.76 10° 10
1 10
2 10
3 10
4
Boosting Rounds
Figure 4. 1. Semilog plot of number of boosting rounds versus the area under the ROC curve for that boosting round.
sample statistics (average of all folds) used in the experiments. At the knee of the
testing ROC curve, the false positive rate = 0.15 (60000 pixels) and true positive
rate = 0.85 (31000) . The high false positive rate is because the pixels neighboring
the membranes and' pixels of vescicles are classified as membrane pixels. Figure 4.2
clearly shows that the use of neighbor hood context combined wi th the proposed
feature set yields significantly better results than thresholding of the diffusion fi lter
image [14]. Moreover , comparing with the results without context information
underlines the importance of using neighborhood for membrane detection.
33
T a b l e 4 . 1 . Statistics of samples in various experiments (average of all folds) Mode Total Samples Number of
Positives Number of Negatives
Training 296380 148190 148190 (randomly chosen from 408320 pixels)
Testing 461414 (all pixels in the test image)
37047 408320 (approx. 10 times number of positives
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 False Positive rate
F i g u r e 4.2. ROC curves of the classifiers trained with AdaBoost at boosting round 3000. For comparison, the ROC for the method by Jurrus et al. [14] is also shown.
33
Table 4.1. Statistics of samples in various experiments (average of all folds) Mode Total Samples Number of Number of Negatives
Positives Training 296380 148190 148190 (randomly chosen from
408320 pixels) Testing 461414 (all pix- 37047 408320 (approx. 10 times number
els in the test of positives image)
0.9
0.8
0.7
Q) -CO 0.6 L-
Q) > :;::::; . iii 0.5 0
0... Q)
0.4 :J L-
~
0.3
0.2 -Training
- - - Testing
0.1 -e- Testing without neighborhood
-- Jurrus 2008
o 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
False Positive rate
Figure 4.2 . ROC curves of the classifiers trained with AdaBoost at boosting round 3000. For comparison, the ROC for t he method by Jurrus et al. [14] is also shown.
34
F i g u r e 4.4. Membrane detection results of the fold-2 test images in the 5-fold cross-validation: original images (left), and detected membranes (right).
34
Figure 4.3 . Membrane detection results of the fold-l test images in the 5-fold cross-validation: original images (left), and detected membranes (right).
Figure 4.4. Membrane detection results of the fold-2 test images in the 5-fold cross-validation: original images (left), and detected membranes (right).
35
F i g u r e 4.6. Membrane detection results of the fold-4 test images in the 5-fold cross-validation: original images (left), and detected membranes (right).
35
Figure 4.5. Membrane detection results of the fold-3 test images in the 5-fold cross-validation: original images (left) , and detected membranes (right).
~
III b.O C'd S
1--4 ..., f (/J
~ I.
Figure 4.6. Membrane detection results of the fold-4 test images in the 5-fold cross-validation: original images (left) , and detected membranes (right).
36
F i g u r e 4.7. Membrane detection results of the fold-5 test images in the 5-fold cross-validation: original images (left), and detected membranes (right).
36
Figure 4 .7. Membrane detection results of the fold-5 test images in the 5-fold cross-validation: original images (left) , and detected membranes (right).
37
4 . 2 S y n a p s e d e t e c t i o n
The proposed method for synapse detection was tested on a single slice cross-
section of a rabbit retina dataset. The image is scanned under a 5000x magnifi
cation. One pixel in the digital image represents 2.18 nm at this magnification.
The full resolution image was 16720 x 16750 pixel sized grey scale image. The
image was first down sampled four times, to a size of 4180 x 4188 pixels. The
accuracy of the proposed method was assessed using four-fold cross-validation. The
image was divided into four quadrants and in each fold, the training was done on
3 quadrants and tested on the third quadrant. In the entire image, the ratio of
synapse regions/nonsynapses region was in the order of 1:100.
The initial problem was the reduction in the number of testing samples in a
image so as to reduce the testing time. The masking of thresholded regions provided
the initial dataset reduction. The next problem was to choose a representative point
around which the region attributes will be calculated. Many ways of choosing the
representative point were tried. A few significant ones are listed below:
1. The centroid of the binary thresholded region was chosen.
2. The weighted centroid was chosen, where the weight of individual pixel was
directly proportional to the darkness of the pixels in the CLAHE image.
3. SIFT key points, which are representative of corners and peaks were calculated
for the regions of interest. Based on darkness of each key point in a particular
region, a representative point was chosen.
4. The SIFT key points were allowed to converge towards the darkest pixels in
the region and the converged point was chosen.
In all the above methods, only a lesser percentage of the representative points
was near the actual marked up synapses, thus, the region around the synapses was
not even in the test dataset. For the few representative points that were near the
synapses, few were classified as nonsynapses regions since the features calculated
did not have the required separability in this dataset. Because of the above reasons,
37
4.2 Synapse detection
The proposed method for synapse detection was tested on a single slice cross
section of a rabbit retina dataset. The image is scanned under a 5000x magnifi
cation. One pixel in the digital image represents 2.18 nm at this magnification.
The full resolution image was 16720 x 16750 pixel sized grey scale image. The
image was first down sampled four times, to a size of 4180 x 4188 pixels. The
accuracy of the proposed method was assessed using four-fold cross-validation. The
image was divided into four quadrants and in each fold, the training was done on
3 quadrants and tested on the third quadrant. In the ent ire image, the ratio of
synapse regions/nonsynapses region was in the order of 1:100.
The initial problem was the reduction in the number of testing samples in a
image so as to reduce the testing time. The masking of thresholded regions provided
the initial dataset reduction. The next problem was to choose a representative point
around which the region attributes will be calculated. Many ways of choosing the
representative point were tried. A few significant ones are listed below:
1. The centroid of the binary thresholded region was chosen.
2. The weighted centroid was chosen, where the weight of individual pixel was
directly proportional to the darkness of the pixels in the CLARE image .
3. SIFT key points, which are representative of corners and peaks were calculated
for the regions of interest. Based on darkness of each key point in a particular
region, a representative point was chosen.
4. The SIFT key points were allowed to converge towards the darkest pixels in
the region and the converged point was chosen.
In all the above methods, only a lesser percentage of the representative points
was near t he actual marked up synapses, thus, the region around the synapses was
not even in the test dataset. For the few representative points that were near the
synapses, few were classified as nonsynapses regions since the features calculated
did not have the required separability in this dataset. Because of the above reasons,
38
the classifier either had unreliable detection rate necessitating more user guidance
to identify the synapses.
38
the classifier either had unreliable detection rate necessitating more user guidance
to identify the synapses.
C H A P T E R 5
C O N C L U S I O N
This chapter discusses the contributions of this thesis and proposes future
research directions.
5 . 1 C o n t r i b u t i o n
The proposed method utilizes neighborhood context information to improve the
accuracy of membrane detection. Along with the nonlinear discrimination ability
of the AdaBoost classifier and the Hessian feature set, this results in improved
membrane detection compared to previous methods. Thus one can expect a more
robust segmentation of the individual neurons.
5 . 2 F u t u r e w o r k
Even though the classifier does good work in classification of the membranes,
the classifier fails to discern certain structures like vesicles from membranes, which
may result in over-segmentation of individual neurons. Utilizing additional fea
tures that discriminate these regions from membranes may prevent these false
positives. Moreover, recent work suggests that cascading the classifier predictions
and additional feature set onto another classifier may help connect discontinuities in
membranes and thereby avoid under segmentation [22]. Future work would address
these problems in membrane detection to improve the segmentation accuracy of the
individual neurons. The segmentation of one slice of the volume could be used in
a more robust segmentation of successive slices where membranes are weak. As far
as the problem of detection of synapses, new features that can quantify the shape
of the region have to be developed. The feature has to account for the variablity of
the synapse shapes because of their types.
CHAPTER 5
CONCLUSION
This chapter discusses the contributions of this thesis and proposes future
research directions.
5.1 Contribution
The proposed method utilizes neighborhood context information to improve the
accuracy of membrane detection. Along with the nonlinear discrimination ability
of the AdaBoost classifier and the Hessian feature set, this results in improved
membrane detection compared to previous methods. Thus one can expect a more
robust segmentation of the individual neurons.
5.2 Future work
Even though the classifier does good work in classification of the membranes,
the classifier fails to discern certain structures like vesicles from membranes, which
may result in over-segmentation of individual neurons. Utilizing additional fea
tures that discriminate these regions from membranes may prevent these false
positives. Moreover, recent work suggests that cascading the classifier predictions
and additional feature set onto another classifier may help connect discontinuities in
membranes and thereby avoid under segmentation [22]. Future work would address
these problems in membrane detection to improve the segmentation accuracy of the
individual neurons. The segmentation of one slice of the volume could be used in
a more robust segmentation of successive slices where membranes are weak. As far
as the problem of detection of synapses, new features that can quantify the shape
of the region have to be developed. The feature has to account for the variablity of
the synapse shapes because of their types.
A P P E N D I X A
A S S E M B L I N G L A R G E M O S A I C S O F
E L E C T R O N M I C R O S C O P E I M A G E S
U S I N G G P U S
Novel imaging techniques are being used to map the connectivity of individual
neurons in large neuronal tissue sections, to understand the neural circuitry of the
retina, and particularly how signals are communicated across processes. Extensive
studies have been undertaken using electron microscopy to create detailed diagrams
of general neuronal structures [8] and their connectivities [1, 26, 4]. The entire
volume of neuronal tissue is scanned as ultra thin slices (approx. 70nm) sliced using
a micro tome. The thin slices are assembled together to reconstruct the volume.
The neuronal tissue has to be scanned at very high resolutions (around 2nm/pixel)
to unambiguously identify the neurons and synapses in the scanned volume and
create detailed maps. The serial-section Transmission Electron Microscope (TEM)
is the preferred imaging modality for capturing large sections of neuronal tissues at
this magnification level. The section to be scanned spans a few millimeters. Rarely
do we find an electron microscope that can capture such a wide field of view and
at the required nanoscale resolution. Thus the sample of interest is imaged as a
sequence of tiles with some overlap. Figure A.l shows sample neuronal tissue image
tiles of mice scanned using TEM.
The imaging of these tiles using TEM requires the sample to be suspended over a
beam of electrons. The passage of high energy electron beams through the specimen
causes it to heat up and subsequently distort. The distortion is not uniform among
tiles and thus has to be unwarped individually. Thus, reconstructing the image
from the set of tiles, called image mosaicing, involves significant computation to
APPENDIX A
ASSEMBLING LARGE MOSAICS OF
ELECTRON MICROSCOPE IMAGES
USING GPUS
Novel imaging techniques are being used to map the connectivity of individual
neurons in large neuronal tissue sections , to understand the neural circuitry of the
retina, and particularly how signals are communicated across processes. Extensive
studies have been undertaken using electron microscopy to create detailed diagrams
of general neuronal structures [8] and their connectivities [1 , 26 , 4]. The entire
volume of neuronal tissue is scanned as ultra thin slices (approx. 70nm) sliced using
a micro tome. The thin slices are assembled together to reconstruct the volume.
The neuronal tissue has to be scanned at very high resolutions (around 2nm/ pixel)
to unambiguously identify the neurons and synapses in the scanned volume and
create detailed maps. The serial-section Transmission Electron Microscope (TEM)
is t he preferred imaging modality for capturing large sections of neuronal tissues at
this magnification level. The section to be scanned spans a few millimeters. Rarely
do we find an electron microscope that can capture such a wide field of view and
at the required nanoscale resolution. Thus the sample of interest is imaged as a
sequence of tiles with some overlap. Figure A.l shows sample neuronal tissue image
tiles of mice scanned using TEM.
The imaging of these tiles using TEM requires the sample to be suspended over a
beam of electrons. The passage of high energy electron beams through the specimen
causes it to heat up and subsequently distort. The distortion is not uniform among
tiles and thus has to be unwarped individually. Thus, reconstructing the image
from the set of tiles, called image mosaicing, involves significant computation to
11
(a) Sample neuronal image tile scanned using (b) Triangle mesh over a regular grid of con-an Transmission Electron Microscope trol points
F i g u r e A . l . Mice neuronal image tile slice and triangle mesh overlay
identify and handle overlapping portions of the image, and correct nonuniform
distortions over a very large number of tiles. A typical section of neuronal tissue
is 2500 microns in diameter and is scanned as 1000 tiles of 4080x4080 pixels
each. Currently, researchers assemble the volume from the scanned tiles using a
multithreaded tool chain [15], but this computation is one of the bottlenecks in
the critical path to reconstruct the volume since it is estimated to take around
90 days to assemble a volume made of 270 mosaics with each mosaic made of
approximately 1000 tiles. In this appendix, we describe our experiences using
GPUs to accelerate this computation. Because of the inherent parallelism of the
computation, the roughly identical computation at each pixel, and the data locality
across neighboring tiles, our initial observation was that this computation ought to
achieve high speedup on a GPU if we can effectively manage the streaming of
data. In the current method [15], every image tile contributing to a region of
mosaic is unwarped to calculate the value of the pixels. The warping is modeled
as a discontinuous transform. Every tile is sampled as a uniform triangle mesh
as shown in Figure A.l . The vertices of the triangles in the mesh are control
points whose positions are known in tile space and mosaic space. The location
41
(a) Sample neuronal image t ile scanned using (b) Tr iangle mesh over a regular grid of con-an Transmission Electron Microscope trol points
Figure A.!. Mice neuronal image tile slice and triangle mesh overlay
identify and handle overlapping portions of the image, and correct nonuniform
distortions over a very large number of tiles . A typical section of neuronal t issue
is 2500 microns in diameter and is scanned as 1000 tiles of 4080 x 4080 pixels
each. Currently, researchers assemble the volume from the scanned tiles using a
multithreaded tool chain [15], but this computation is one of the bottlenecks in
t he critical path to reconstruct the volume since it is estimated to take around
90 days to assemble a volume made of 270 mosaics with each mosaic made of
approximately 1000 t iles . In this appendix, we describe our experiences using
GPUs to accelerate this computation. Because of the inherent parallelism of t he
computation, the roughly identical computation at each pixel, and the data locality
across neighboring tiles, our init ial observation was t hat this computation ought to
achieve high speedup on a GPU if we can effectively manage the streaming of
data. In the current method [15] , every image tile contributing to a region of
mosaic is unwarped to calculate the value of t he pixels. The warping is modeled
as a discontinuous transform. Every tile is sampled as a uniform triangle mesh
as shown in Figure A.I. The vertices of the triangles in the mesh are control
points whose positions are known in tile space and mosaic space. The location
42
of every point on the mosaic can be mapped to a triangle in one or more tiles
using a barycentric coordinate system. Thus with this position information we can
find the value of every pixel in the mosaic. We have implemented the mosaicing
application in CUDA for the NVIDIA platforms below and compare its performance
with existing sequential and parallel library implementations. A 13783 x 13686 pixel
mosaic of mice neural tissue shown in Figure A.2 was reconstructed from 16 image
tiles using the CUDA implementation.
Salient features of the implementation are:
1. Mosaic is calculated as equally sized tiles.
2. The scanned tiles are stored as textures, and pixel values of the mosaic are
calculated using the texture look up. The textures are stored as unsigned
character images. We get significant performance gain doing nearest neighbor
interpolation because of hardware-accelerated lookups.
F i g u r e A.2 . Sixteen tile mosaic showing 6 of the tiles and position
42
of every point on the mosaic can be mapped to a triangle in one or more tiles
using a barycentric coordinate system. Thus with this position information we can
find the value of every pixel in the mosaic. We have implemented the mosaicing
application in CUDA for the NVIDIA platforms below and compare its performance
with existing sequential and parallel library implementations. A 13783 x 13686 pixel
mosaic of mice neural tissue shown in Figure A.2 was reconstructed from 16 image
tiles using the CUDA implementation.
Salient features of the implementation are:
1. Mosaic is calculated as equally sized tiles .
2. The scanned tiles are stored as textures, and pixel values of the mosaic are
calculated using the texture look up. The textures are stored as unsigned
character images. We get significant performance gain doing nearest neighbor
interpolation because of hardware-accelerated lookups.
Figure A.2. Sixteen tile mosaic showing 6 of the t iles and posit ion
43
3. Smooth blending of tiles is possible at the tile transitions of the mosaic since
the kernel has access to all overlapping tile textures for any point in the
mosaic.
Table A.l compares the performance of the different implementations of the mo-
saicing application used to reconstruct the above test mosaic. We can see tha t the
NVIDIA CUDA based gives 12x speedup compared to the current multithreaded
ITK based CPU application [15] without the use of acceleration data structures.
Tab le A . l . Comparison of classifiers Programming model machine details Time
Elapsed (in seconds)
Speed Up
Single Threaded C Intel Core 2 Quad CPU Q9550 @ 2.83 GHz
2022.3 N/A
OpenMP Multi-threaded (16 threads)
Intel Core 2 Quad CPU Q9550 @ 2.83 GHz
1140.46 1.77x
ITK based and multithreaded ir-assemble
Intel Core 2 Quad CPU Q9550 @ 2.83 GHz
120 16x
NVIDIA CUDA Intel Core 2 Quad CPU Q9550 @ 2.83 GHz
10.8 187.23x
43
3. Smooth blending of tiles is possible at the tile transitions of the mosaic since
the kernel has access to all overlapping tile textures for any point in the
mosaic.
Table A.l compares the performance of the different implementations of the mo
saicing application used to reconstruct the above test mosaic. We can see that the
NVIDIA CUDA based gives 12x speedup compared to the current multithreaded
ITK based CPU application [15] without the use of acceleration data structures.
Table A.!, Comparison of classifiers Programming model machine details Time Speed
Elapsed(in Up seconds)
Single Threaded C Intel Core 2 Quad CPU Q9550 @ 2022.3 N/ A 2.83 GHz
OpenMP Multi-threaded Intel Core 2 Quad CPU Q9550 @ 1140.46 1.77x (16 threads) 2.83 GHz ITK based and multi- Intel Core 2 Quad CPU Q9550 @ 120 16x threaded ir-assemble 2.83 GHz NVIDIA CUDA Intel Core 2 Quad CPU Q9550 @ 10.8 187.23x
2.83 GHz
A P P E N D I X B
S Y N A P S E V I E W E R
A synapse viewer is a cross platform viewer of image data and markups. This
software is built on Nokia Q t / C + + . It works for subsampled datasets. The features
of the software are described below:
• The 2D image data in JPEG, PNG, BMP formats can be browsed in the
image viewer.
• Sea ling: The image viewer has controls to scale down the data and view the
entire image within the window. The scaling can be achieved using the scaling
bar or by clicking the zoom in or zoom out widgets. The image can be scaled
in steps 10% of the original size.
• Scrolling: The image can be scrolled by simple click and drag operation on
the image.
• Synapse markup overlay: The application can read an XML file listing the
position and orientation of synapses. Multiple markup files can be opened
up. This feature is useful when comparing the ground t ruth and predicted
datasets. The viewer has the ability to change the color, transperancy and
visibility of a particular synapse group.
• Membrane markup overlay: The application can show overlay of predicted
membranes. Multiple membrane overlays can be seen simultaneously.
Figure B.l shows a snapshot of the synapse viewer.
APPENDIX B
SYNAPSE VIEWER
A synapse viewer is a cross platform viewer of image data and markups. This
software is built on Nokia QtjC++. It works for subsampled datasets. The features
of the software are described below:
• The 2D image data in JPEG, PNG , BMP formats can be browsed in the
image viewer.
• Scaling:The image viewer has controls to scale down the data and view the
entire image within the window. The scaling can be achieved using the scaling
bar or by clicking the zoom in or zoom out widgets. The image can be scaled
in steps 10% of the original size.
• Scrolling: The image can be scrolled by simple click and drag operation on
the image.
• Synapse markup overl ay: The application can read an XML file listing the
position and orientation of synapses. Multiple markup files can be opened
up. This feature is useful when comparing the ground truth and predicted
datasets. The viewer has the ability to change the color , transperancy and
visibility of a particular synapse group.
• Membrane markup overlay: The application can show overlay of predicted
membranes. Multiple membrane overlays can be seen simultaneously.
Figure B.1 shows a snapshot of the synapse viewer.
45
Synapses \ ie wei File View Tools
F i g u r e B . l . Synapse viewer
45
Eile ~iew !ools
Figure B .1. Synapse viewer
R E F E R E N C E S
J. R, Anderson, B. W. Jones, J.-H. Yang, M. V. Shaw, C. B. Watt , R Ko-shevoy, J. Spaltenstein, E. Jurrus, K. UV, R. T. Whitaker, D. Mastronarde, T. Tasdizen, and R. E. Marc. A computational framework for ultrastructural mapping of neural circuitry. Public Library of Science Biology, 7(3), 2009.
B. E. Boser, I. M. Guyon, and V. N. Vapnik. A training algorithm for optimal margin classifiers. In Proceedings of the Fifth Annual ACM Conference on Computational Learning, pages 144-152. ACM Press, 1992.
L. Breiman. Arcing classifiers. Annals of Statistics, pages 801-849, 1998.
K. L. Briggman and W. Denk. Towards neural circuit reconstruction with volume electron microscopy techniques. Current Opinion in Neurobiology, 16(5):562-570, Oct. 2006.
C. Cortes and V. Vapnik. Support vector networks. In Machine Learning, volume 20, pages 273-297. Kluwer Academic Publishers, 1995.
W. M. Cowan, T. C. Sdhof, C. F. Stevens, and K. Davies. Synapses. JHU Press, illustrated edition, 2003.
H. Drucker and C. Cortes. Boosting decision trees. In Advances in Neural Information Processing Systems 8, pages 479-485, 1996.
J. C. Fiala and K. M. Harris. Extending unbiased stereology of brain ul-trastructure to three-dimensional volumes. Journal of the American Medical Informatics Association, 8(1): 1-16, 2001.
Y. Freund and R. E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. In Proceedings in European Conference on Computational Learning Theory (EuroCOLT), Barcelona, Spain, 1995.
Y. Freund and R. E. Schapire. A short introduction to boosting. Journal of the Japanese Society for Artificial Intelligence, 14:771-780, Sept. 1999.
R. C. Gonzalez and R. E. Woods. Digital Image Processing. Prentice-Hall, 2nd edition, 2002.
T. Hastie, R. Tibshirani, and J. H. Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, illustrated edition, 2001.
REFERENCES
[1] J . R. Anderson, B. W. Jones, J.-H. Yang, M. V. Shaw, C. B. Watt, P. Koshevoy, J . Spaltenstein, E. Jurrus , K. UV, R. T. Whitaker , D. Mastronarde, T. Tasdizen, and R. E. Marc. A computational framework for ultrastructural mapping of neural circuitry. Public Library of Science Biology, 7(3) , 2009.
[2] B. E. Boser, 1. M. Guyon, and V. N. Vapnik. A training algorithm for optimal margin classifiers. In Proceedings of the Fifth Annual ACM Conference on Computational Learning, pages 144- 152. ACM Press , 1992.
[3] L. Breiman. Arcing classifiers. Annals of Statistics, pages 801- 849, 1998.
[4] K. L. Briggman and W. Denk. Towards neural circuit reconstruction with volume electron microscopy techniques. Current Opinion in Neurobiology, 16(5) :562- 570, Oct. 2006.
[5] C. Cortes and V. Vapnik. Support vector networks. In Machine Learning, volume 20, pages 273- 297. Kluwer Academic Publishers, 1995.
[6] W. M. Cowan, T. C. Sdhof, C. F . Stevens, and K. Davies. Synapses. JHU Press, illustrated edition, 2003.
[7] H. Drucker and C. Cortes. Boosting decision trees. In Advances in Neural Information Processing Systems 8, pages 479- 485, 1996.
[8] J. C. Fiala and K. M. Harris. Extending unbiased stereology of brain ultrastructure to three-dimensional volumes. Journal of the American Medical Informatics Association, 8(1):1- 16, 2001.
[9] Y. Freund and R. E. Schapire. A decision- theoretic generalization of on- line learning and an application to boosting. In Proceedings in European Conference on Computational Learning Th eory (EuroCOLT) , Barcelona, Spain, 1995.
[10] Y. Freund and R. E. Schapire. A short introduction to boosting. Journal of the Japanese Society for Artificial Intelligence, 14:771- 780, Sept. 1999.
[11] R. C. Gonzalez and R. E. Woods. Digital Image Processing. Prentice-Hall, 2nd edition, 2002.
[12] T. Hastie, R. Tibshirani , and J . H. Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, illustrated edition, 2001.
47
V. Jain, J. Murray, F. Roth, S. Turaga, V. Zhigulin, K. Briggman, M. Helm-staedter, W. Denk, and H. Seung. Supervised learning of image restoration with convolutional networks. In Proceedings of International Conference on Computer Vision(ICCV), pages 1-8, Oct. 2007.
E. Jurrus, R. Whitaker, B. Jones, R. Marc, and T. Tasdizen. An optimal-path approach for neural circuit reconstruction. In Proceedings of International Symposium on Biomedical Imaging (ISBI), pages 1609-1612, 2008.
P. A. Koshevoy, T. Tasdizen, and R. T. Whitaker. Automatic assembly of TEM mosaics and mosaic stacks using phase correlation. SCI Technical Report, April 2007.
I. B. Levitan and L. K. Kaczmarek. The Neuron. Oxford University Press, US, 3rd edition, 2002.
Y. Mishchenko. Automation of 3D reconstruction of neural tissue from large volume of conventional serial section transmission electron micrographs. Journal of N euro science Methods, Sept. 2008.
J. R. Quinlan. C4-5: Programs for Machine Learning. Morgan Kaufmann, 1993.
J. R. Quinlan. Bagging, boosting, and C4.5. In Proceedings of the Thirteenth National Conference on Artificial Intelligence, pages 725-730, 1996.
R. E. Schapire, Y. Freund, P. Bartlett, and W. S. Lee. Boosting the margin: A new explanation for the effectiveness of voting methods. Annals of Statistics, 1998.
T. Tasdizen, R. Whitaker, R. Marc, and B. Jones. Enhancement of cell boundaries in transmission electron microscopy images. In Proceedings in International Conference on Image Processing (ICIP), pages 642-645, 2005.
Z. Tu. Auto-context and its application to high-level vision tasks. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1-8, jun 2008.
V. N. Vapnik. The Nature of Statistical Learning Theory. Springer, 2nd illustrated edition, 2000.
P. Viola and M. Jones. Robust real-time object detection. International Journal of Computer Vision, 2001.
A. R. Webb. Statistical Pattern Recognition. Oxford University Press, 2nd edition, 1999.
J. White, E. Southgate, J. Thomson, and F. Brenner. The structure of the nervous system of the nematode caenorhabditis elegans. Philosophical Transactions of the Royal Society B: Biological Sciences, 314:1-340, 1986.
47
[13] V. Jain, J. Murray, F. Roth, S. Thraga, V. Zhigulin, K. Briggman, M. Helmstaedter, W. Denk, and H. Seung. Supervised learning of image restoration with convolutional networks. In Proceedings of International Conference on Computer Vision{ICCV), pages 1- 8, Oct. 2007.
[14] E. Jurrus, R. Whitaker , B. Jones, R. Marc, and T. Tasdizen. An optimal-path approach for neural circuit reconstruction. In Proceedings of International Symposium on Biomedical Imaging (ISBI), pages 1609- 1612, 2008.
[15] P. A. Koshevoy, T . Tasdizen , and R. T. Whitaker. Automatic assembly of TEM mosaics and mosaic stacks using phase correlation. SCI Technical Report, April 2007.
[16] I. B. Levitan and L. K. Kaczmarek. The Neuron. Oxford University Press, US, 3rd edition, 2002.
[17] Y. Mishchenko. Automation of 3D reconstruction of neural tissue from large volume of conventional serial section transmission electron micrographs. Journal of Neuroscience Methods, Sept . 2008.
[18] J. R. Quinlan. C4 .5: Programs for Machine Learning. Morgan Kaufmann, 1993.
[19] J. R. Quinlan. Bagging, boosting, and C4.5. In Proceedings of the Thirteenth National Conference on Artificial Intelligence, pages 725- 730, 1996.
[20] R. E. Schapire, Y. Freund, P. Bartlett , and W . S. Lee. Boosting the margin: A new explanation for the effectiveness of voting methods . Annals of Statistics, 1998.
[2 1] T. Tasdizen, R. Whitaker , R. Marc, and B. Jones . Enhancement of cell boundaries in transmission electron microscopy images . In Proceedings in International Conference on Image Processing (ICIP) , pages 642- 645, 2005.
[22] Z. Th. Auto-context and its application to high-level vision tasks. In Proceedings of IEEE Conference on Computer Vis ion and Pattern Recognition (CVPR), pages 1- 8, jun 2008.
[23] V. N. Vapnik. The Nature of Statistical Learning Theory. Springer , 2nd illustrated edition, 2000.
[24] P. Viola and M. Jones . Robust real-time object detection. International Journal of Computer Vision, 200l.
[25] A. R. Webb. Statistical Pattern Recognition. Oxford University Press, 2nd edition, 1999.
[26] J. White, E. Southgate, J. Thomson, and F . Brenner. The structure of the nervous system of t he nematode caenorhabditis elegans. Philosophical Transactions of the Royal Society B: Biological Sciences, 314:1- 340, 1986.
48
[27] D. B. Williams and C. B. Carter. Transmission Electron Microscopy: A Textbook for Materials Science. Springer, 2nd edition, 1996.
48
[27] D. B. Williams and C. B. Carter. Transmission Electron Microscopy: A Textbook for Materials Science. Springer , 2nd edit ion, 1996.