Post on 24-Jun-2020
1
Published in: J. Rittscher, R. Machiraju, S.T.C. Wong;
Microscopic Image Analysis for Lifescience Applications
TECHNIQUES FOR CELLULAR and TISSUE-BASED IMAGE
QUANTITATION OF PROTEIN BIOMARKERS
Ali Can, Musodiq Bello, Xiaodong Tao, Michael Gerdes, Anup Sood, Michael C.
Montalto, Fiona Ginty
Molecular Imaging and Diagnostics Advanced Technology Program,
GE Global Research Center, Niskayuna, NY, 12309.
Current methods for histological and tissue-based biomarker analysis
Hematoxylin-Eosin (H&E) staining of thin (5–7 micron) tissue sections has been used by
pathologists for well over one hundred years and is widely accepted as the foundation of
disease classification. Hematoxylin stains cell nuclei blue, while eosin, as a counter-stain,
stains cytoplasm and connective tissue pink. Due to its long history, as well as low cost,
fast preparation, and easy image acquisition, there is a strong belief that H&E will
continue to be the common practice for the foreseeable future [1, 2].
Pathologists typically make diagnostic decisions from H&E stained tissue sections
based on the attributes of cell size, shape, texture and color contrast of various fine
features as viewed under a microscope [3]. Although a pathologist is well trained to
decipher fine differences in tissue features, the analysis is inherently subjective, and use
of objective quantitative analysis is limited in current clinical practice. With the
advancement of digital microscopy, high quality microscopic images of specimens are
becoming digitally available in large quantities [4]. Digital technology will likely lend
itself to quantititive objective analysis of H&E stained tissue sections [5, 6]. However,
the complex interpretations that are learned through years of training and that rely on
2
the complexities of the human mind will be difficult to fully recapitulate, even with
current and future computational power.
One area in clinical pathology practice that lends itself more readily to
quantitative image analysis is immunohistochemistry (IHC) and flourescence in situ
hybridization (FISH), collectively known as “molecular pathology”. The advent of IHC in
the 1970s enabled the visualization of specific protien biomarkers using antibodies
tagged with chromagenic subsrates. This technique co-exists with and augments H&E,
and is becoming increasingly favored for prognostic purposes. Routine clinical tests
include detection of estrogen and progesterone receptors (ER, PR) by
immunohistochemistry (IHC) and determination of the ERBB2 (HER-2) receptor level
based on IHC or gene copy number by fluorescent in situ hybridization (FISH). These
tests are critically important for determination of appropriate therapy. ER positive
[ER(+)] patients are offered anti-estrogen therapy and patients with HER-2 over
expression or amplification are offered Herceptin [7]. The only systemic alternative
currently available for ER-negative [ER(-)] patients is chemotherapy.
Like H&E, IHC succumbs to the limitation of subjective quantitation and is
typcially anlayzed in a semi-quantitive manner by visual inspection. This limitation is
inherent to chromagenic staining methods [3]. For this reason, image analysis methods
that can objectively assess intensity of chromagenic substrates are gaining traction in
clinical practive and several standard IHC tests have recently become FDA approved [8].
More recently, immunofluorescent methods that lend themselves to more
senstive and linearly quantitative techniques are beginging to emerge [9]. The clinical
value of automated fluorescence-based image analysis of protein biomarkers has been
demonstrated in breast cancer [9] and lung cancer [10]. Although such methods are not
yet approved for routine clinical use, it is likely that they will lead to new diagnostic
approaches in clinical pathology.
Multiplexing
Gene and protein arrays are commonly used for measuring multiple targets
(multiplexing) at the molecular level. However, gene expression may not represent
actual protein expression, nor does it provide information on the cellular localization
within the context of the tissue specimen. Multiplexing directly on tissue or cells, without
3
the need for extraction and dispersion on chips, preserves the spatial integrity of
proteins while still allowing to assess multiple interactions between those proteins. This
provides an entirely new way of examining biomarkers and could shed light on
previously un-examined relationships between spatial location and protein-protein
interactions.
For example, prediction of disease outcome or therapeutic response will likely require
the analysis of multiple components from several biological pathways. In oncology,
many cancer types are very heterogeneous in phenotype, and require complex
screening. Breast cancer is a typical example whereby a single patient may be screened
for expression of different hormone receptors (estrogen and progesterone), keratin
profile to determine tumor subtype (luminal or basal), and over-expression of the
oncogene Her2 for which an antibody therapy exists as an adjuvant therapy [11].
Through multiplexing analysis, it will become increasingly possible to provide customized
medicine.
Fluorescence microscopy
An important advantage of immunofluorescent techniques for tissue-based
biomarker interrogation is the ability to multiplex proteins in a singe tissue section.
Fluorescence lends itself to multiplexing for two main reasons, 1) there is a wide range
of flourophore dyes and/or nanocrystals with non-overlapping emission spectra; 2)
sequential staining and imaging is possible with methods such as photobleaching.
Detection agents can take the form of small organic molecules (such as Cye and Alexa
dyes, FITC, Rhodamine, etc), nanocrystals (“quantum dots”) [12], proteins with inherent
fluorescent properties [13], and genetic tags that have been coupled with fluorochromes
[14]. These molecules are frequently functionalized to couple efficiently with other
biological agents such as antibodies or nucleic acids. The main forms of multiplexing
include 3-4 “channels” through standard fluorescence microscopy, 5-6 channel analysis
with quantum dots, “spectral bar-coding” in which a finite number of dyes are mixed in
combination and precise spectral characteristics are determined on a per-pixel basis, and
repeated use of a few fluorescent channels through regulating the dyes fluorescent
properties such as photo-bleaching or antibody denaturation/stripping. A common
requirement of all these methods is the ability to digitally reconstruct the patterns seen
4
for the different biomarkers. This is accomplished through either digital “layering” in
which multiple images are overlapped on top of one another [15, 16], or spectral
unmixing in which numerous agents each with unique spectral properties are detected
simultaneously [17].
Fluorescent dyes
There are numerous fluorescent dyes available for use as labeling agents to tag
detection agents in biological samples [14]. These dyes each have unique spectral
properties, in particular differing wavelengths of light for excitation and the resulting
fluorescent emission. Detection is accomplished by the use of band pass and dichroic
filters allowing specific wavelengths of light to pass such that for a given filter
combination a single fluorochrome is seen [18]. It is typical to examine blue, green,
orange, and red fluorochromes in this manner whereby the spectrum of each dye is
distinct that there is no cross-talk between the different fluorochromes. Using
combinations of dyes together allows a higher level of multiplexing to be obtained
through the generation of unique spectral signatures. These methods have been utilized
in human genetic analysis in a process known as spectral karyotyping to detect each
individual human chromosome and to determine rearrangements or fragmentation in
cytogenetic analysis [19].
One potential drawback to fluorescent dyes is the impermanence of signal and
loss of activity with time. A second potential limitation, which has been overcome to a
large degree with advances in CCD camera technology, is the limits of detection of low
signal abundance. A third challenge is the auto-fluorescence generated by the tissues.
The longer the exposure required to detect the signal, the more auto-fluorescence is
also accumulated in the resultant image. Thus the amount of light generated
specifically by the fluorochromes must be a certain degree above the noise in the
system, which largely stems from tissue auto-fluorescence. Several hardware
applications such as laser scanning confocal microscopy and the use of structured light
with multiple image acquisition (such as the Zeiss Apotome) allow a mechanical
mechanism for removing the auto-fluorescence [20]. Through the use of spectral
acquisition and deconvolution, the analysis of individual pixels based on wavelengths of
5
light passing through the detector (such as liquid crystal tunable filters) can be used in
numerous multiplexing platforms for signal detection [17].
Quantum Dots
Quantum dots have been used for multispectral analysis of biological specimens
[21, 22]. These are nanostructures containing a photo-activatable metal core (such as
CdSe/ZnS) surrounded by a shell. Fluorescence can be generated from these molecules
by excitation in the near UV range which produces a fluorescence with very distinct peak
spectral output spanning as low as 10nm allowing for spectral separation between the
different nanostructures. Through coupling of the nanospheres to avidin and binding to
biotinylated antibodies, five distinct molecules were visualized along with nuclear
staining with 4'6-DiAmidino-2-PhenylIndole (DAPI) [23]. Quantum dots have also been
used for fluorescent in situ hybridization (FISH) analysis for multiplexed analysis of
cellular mRNAs and cytogenetic analysis of human chromosomes [24-27] The main
limitation to the level of multiplexing achievable with quantum dots are the size of the
nanoparticles and full application of the use of these particles for molecular analysis will
be accomplished through ongoing development.
Photobleaching
Fluorescent dyes are inherently unstable to photo bleaching and can be
inactivated by exposure to proper wavelengths to photo-excite the fluorescent dyes
rendering them incapable of further fluorescence. However, for some dyes, their
susceptibility to photobleaching has enabled multiplexing. Using this approach, a dye-
labeled antibody can be localized to the sample of interest, imaged to record the
localization pattern, photo-bleached to clear the localization pattern, then re-incubated
with a second dye-labeled antibody and the process is sequentially repeated. Methods
for measuring multiple protein networks in tissue (the “toponome”) have relied on the
use of photo bleaching to localize over 100 antigens in a single sample [28]. This
method has a limitation in that the fluorescent dyes must be directly coupled to the
antibodies to achieve high level multiplexing. For complex tissue samples and low
abundance proteins, signal amplification may be needed to generated sufficient signal.
6
Image Analysis
Tissue micro-arrays (TMAs) provide the means for simultaneous analysis of tissue
from large numbers of patients. Tissue microarrays contain hundreds of tissue spots
(approximately 0.6 mm or greater in diameter) originating from cores of tissue from
regions of interest in paraffin embedded tumor samples. The tissue cores are transferred
to a recipient paraffin block, in a precisely spaced array pattern. Each block can
generate up from 100-500 sections analyzed using independent IHC tests and can serve
as a valuable discovery tool for tissue based biomarkers [29, 30]. However, the large
amount of information in TMAs and the high dimensional nature of the data makes
automated image analysis algorithms essential for high throughput segmentation of TMA
images into sub-cellular compartments (i.e. cytoplasm, nucleus, membrane) and
quantitation of biomarkers in these compartments [9]. In the following sections we
describe algorithms for registering and segmenting multi-channel microscopic images of
breast TMAs into sub-cellular compartments for automated quantitation of multiple
target proteins. The multi-channel images of different biomarkers are obtained by a
multi-step image acquisition procedure. These images are corrected for non-uniform
illumination. A robust, multi-resolution image registration algorithm is applied to
transform images into the same coordinate system. A multi-channel segmentation
method is then applied to segment the registered images into sub-cellular
compartments. The algorithms are general and able to handle with multiple images with
arbitrary number of channels that are acquired in multiple steps. Finally, multiple target
proteins are superimposed on individual compartments to calculate metrics associated
with subcellular protein expressions and translocation. The overall workflow is show in
Figure 1. In the following sections we describe the details of each of the preprocessing,
registration, segmentation, and quantitation steps.
Image preprocessing
In the preprocessing step, raw images are smoothed by a Gaussian filter to remove
noise. Non-uniform illumination is also corrected. The illumination pattern can be
estimated from the images, or directly computed by using calibration targets. Most filter-
cube and microscope manufacturers carry fluorescent plastic that can be used for
7
calibration. If the calibration images are not taken during the acqusition the illumination
pattern can be estimated from a series of images. The observed image, ),( yxI , can be
modeled as a product of the excitation pattern, ),( yxE , and the emission pattern,
),( yxM . While the emission pattern captures the tissue dependent staining, the
excitation pattern captures the illumination.
),(),(),( yxMyxEyxI = (1)
In the logarithm domain, the equation above can be transformed to a linear form:
)),(log()),(log()),(log( yxMyxEyxI += (2)
From a set of N images, let ),( yxI n denote ordered set of pixels. In other words, the
pixels are sorted for any given ),( yx location such that
),(),(),(),( yxIyxIyxIyxI Nn ≤≤≤ LL21 . (3)
Assuming that a certain percentage (p) of the image is formed from stained tissue (non-
zero backgound), then a trimmed average of the brightest pixels can be used to
estimate the excitation pattern:
,)),(log(1
1=),('
=
AVE yxIKN
yxE n
N
Kn
∑+− (4)
where K is set to an integer closest to 11 +− )( pN . In our experiments, we set p to 0.1
(10%). In the above equation, the average emission pattern of the tissue is assumed to
be uniform. Since the images are recovered upto a scale factor, we can drop the
constant term introducecd by the uniform emission pattern. This approximation holds if
a large number of images are used in the averaging process. However a large
precentage of pixels (90%) are already excluded to eliminate the non-tissue pixels in the
images. To overcome the limited sampling size, we approximate the log of the excitation
pattern with polynomials;
8
.=),(;,0
ji
ij
pjipji
AVE yxayxE ∑≤+≤≤
' (5)
The parameters ija are solved by minimizing the mean squared error [31]. The surface
generated by the polynomial coefficients are then used to correct individual images. A
sample average excitation pattern and the estimated polynomial illumination surface are
shown in Figures 2a and 2b, respectively. Figures 3a and 3b show a fluorescent image
before correction and after correction, respectively. Use of ploynomial surface to
estimate the illimunation pattern is not limited to fluorescent microscopy; brightfield
image can be corrected similary. If each color channel is corrected separetly, this
corrects for the color temprature of the light source as well (See Figures 3c and 3d).
Image Registration
Image registration techniques, such as mutual information or correlation-based
techniques can be used to register images from each step of a sequentially multiplexed
study and align them accurately. The experiments are designed such that at each step
of the squential staining, images of the nuclei are acquired. Figure 4 shows and example
sequence of images where the first image is the image of the nuclei. Similary, the nuclei
can be stained by Hematoxylin in the H&E stain. The first nuclei image is set as the
reference image and each of the subsequent nuclei images are registered to the
reference. Once the transformation parameters are estimated, then all the channels can
be mapped onto the reference coordinate system.
Given two set of nuclei images, one being the reference image from the first step,
),((1)
N yxI , and the second being either from the subsequent fluorescent acqusitions, or
the nuclei channel from the final H&E step, ),()(
N yxI k , we find a transformation 1,kT ,
such that the image similarity measure between ),((1)
N yxI and )),(( 1)(
N yxI kk ,T is
maximized. Different combinations of transformations and image similarity measures
have different performance computation requirements, and are suitable for different
applications. In this work, we use a rigid transformation and a mutual information based
image similarity measure [32-42]:
9
)()|(
)|,(log)|,(=)(
κικι
κικι FM pp
ppS
t
ttt ∑∑− (6)
where p , Mp , and Fp are the joint, marginal moving, and marginal fixed probability
distribution of the image intensities; t is the parameter vector of the transform; ι and
κ are the intensity values in the respective images. In order to improve the robustness
of the algorithm, we use a multi-resolution strategy to find the transform that aligns the
two images. Figure 5 illustrates the registration of a breast cancer tissue image using
the DAPI channel and subsequent transformation of the CFP channel image using the
registration results of the DAPI channel.
While for fluorescent acqusitions, the nuclei images are part of the experiment (DAPI
acqusition in our case), computing the nuclei image from color images of the H&E needs
an additional step. The red, green and blue components of the H&E image are used to
generate nuclei image using the following equation;
( )γ),(),(),(),( GREENREDBLUEN yxIyxIyxIcyxI HEHEHEHE ⋅⋅= / (7)
where c and γ are tuning parameters for contrast and gamma correction, respectively.
Since the hematoxylin stains the nuclei to blue, the highest contrast is achived when the
blue channel is normalized by the geometric mean of the red and green channels.
Figures 6a and 6b show an H&E image and its estimated nuclei component computed by
the above equation, respectively. A two-channel fluorescent imaging of the same tissue
stained with molecular biomarkers with green representing a membrane related marker
(beta-catenin) and blue representing a nuclei related stain (DAPI) are shown in Figures
6c and 6d, respectively. The registered DAPI image in the H&E coordinate system is
shown in Figure 6e. The registration parameters estimated from the DAPI and H&E
images are then used to map the beta-catenin channel into the H&E coordinate system
(shown in green color in Figure 6f).
10
Image Segmentation
Subcellular quantification of target proteins requires segmentation of the subcellular
compartments, such as nuclei, membrane, and cytoplasm. The segmentation can be
achieved either by segmenting each channel separately or segmenting all channels at
once. Segmentation of ridge-like and blob-like structures is one of the most common
segmentation tasks in medical and life sciences imaging applications. Commonly, such
applications require detecting vessels [43], bronchial tree [44], bones [45, 46], nodules
[47-49] in medical applications, and detecting neurons [42, 50], nuclei [51], and
membrane [52] structures in microscopy applications. Partitioning of a multiple channel
digital image into multiple segments (regions/compartments) is one of the most critical
steps for quantifying one or more biomarkers in molecular cell biology, molecular
pathology, and pharmaceutical research.
Subcellular segmentation can be achieved in most cases using a single channel. For
many of the approaches in the literature, pre-processing of the images is required to
reduce noise or to extract features from the images. Common pre-processing steps
include thresholding, edge detection, and smoothing. A number of cellular-level
segmentation tasks have been accomplished by combining only these “pre-processing”
steps. In [53], median filtering and morphological operations of erosion and dilation
were adapted to segment melanomas and lymphocytes. Binary thresholding was
similarly combined with morphological operations and image thinning to segment cells in
histology images in [54]. However, more intelligent pre-processing is needed for non-
trivial applications. In [55, 56], cellular images of breast tissue were denoised using a
non-linear diffusion filter followed by a directional coherence filter to enhance the
boundaries of the nucleus. A non-linear illumination correction method was described
earlier.
Region-based segmentation, in which the image is divided into regions based on some
homogeneity criteria, is commonly used. Typically, segmentation starts from a seed pixel
or region (e.g., center of the nucleus) determined automatically or specified by some
user interaction, and then grows to include neighboring pixels that meet a specified
intensity, texture, or shape criteria. In the connected-threshold region-growing
11
algorithm [57] for example, all neighboring pixels between an upper and lower threshold
values are included recursively. This is after smoothing with an edge-preserving
smoothing filter such as anisotropic diffusion [52], curvature flow [53], and bilateral [53]
filters. In fluorescence cellular microscopy image, the seed points may be obtained by
initial thresholding, using a top-hat filter [58], or by computing distance transforms, to
obtain pixels in the cells, which are then grown to the cellular boundaries. In the
confidence-connected region-growing approach [34, 59-62], the mean and standard
deviation of the seed region, and all pixels around the region that fall within a range of
the mean are recursively accepted as part of the region. When there are no more pixels
to include, the mean and standard deviation of the newly obtained region is computed
and the process iterated a few times. The isolated-connected algorithm [34, 35] in
which two seed points are given from two different regions, e.g., nucleus and
cytoplasm, can be used to segment the whole cell. The goal of the algorithm is to grow
a region that is connected to the first region but not connected to the second. A binary
search is used to find the optimal separating intensity value. In [63], region growing
was employed after binary thresholding to segment the nuclei in FISH images. In Figure
7b, the result of membrane segmentation is overlaid on the tissue image. A membrane-
stained channel image was smoothed with curvature flow filter to remove the noise
while preserving the membrane markings. A non-maximal suppression algorithm was
then employed to remove spurious signals and retain only those that form a continuous
boundary.
A Unified Segmentation Algorithm
While different segmentation algorithms can be used for each of the nuclei and
membrane compartments, an alternative is to use the same algorithm in different
modes. For example the curvature metric derived from nuclei and membrane images
can be used as metrics to classify segments using supervised or unsupervised
(parametric or non-parametric) algorithms [64]. A commonly used approach to compute
curvature-based metrics is to extract them from the eigenvalues of the Hessian matrix.
Due to their invariance to rigid transformations, these metrics can be used for a broad
class of ridge-like and blob-like structures [65]. The Hessian of an image ),( yxI is
defined as
12
∂∂
∂∂∂
∂∂∂
∂∂
=Η
2
22
2
2
2
y
yxI
xy
yxI
yx
yxI
x
yxI
yxI),(),(
),(),(
)),(( . (8)
The eigenvalues ( ),(),( yxyx 21 λλ ≤ ) of the Hessian matrix can either be numerically
calculated or analytically written in terms of the elements of the Hessian Matrix:
∂∂
∂+
∂
∂−
∂
∂
∂
∂+
∂
∂=
22
2
2
2
2
2
2
2
2
2
12
),(4
),(),(),(),(
2
1),(
yx
yxI
y
yxI
x
yxI
y
yxI
x
yxIyx mλ . (9)
The eigenvalues encode the curvature information of the image, and provide useful cues
for detecting ridge-like membrane structures, or blob-like nuclei structures. However,
the eigenvalues are dependent on image brightness. We define the following two
curvature-based features that are independent of image brightness:
= −
),(
),(tan),(
yx
yxyx
2
11
λλ
θ , (10)
),(
)),(),((tan),(
/
yxI
yxyxyx
212
2
2
11 λλφ
+= −
, (11)
and refer to them as shape index, and normalized-curvature index respectively. This is
essentially the same as defining the eigenvalues in a polar coordinate system. This
transformation also results in bounded features, 44
3 πθ
π≤≤− ),( yx , and
20 /),( πφ ≤≤ yx .
A general likelihood function estimator that calculates the probability maps of vessel,
membrane and nuclei-like structures in images can be formulated by exploring the
expected values of the features. For example for bright membrane and vessel like
structures, the shape index is close to 2/π− , whereas for blob-like nuclei structures,
the shape index is close to 43 /π− . These constraints are used to compute the initial
13
foreground set for membrane and nuclei structures. Also for bright structures, it is less
likely for objects to have non-negative shape index values compared to noise where
non-negative values can easily occur. An initial segmentation based on the shape index
and the normalized-curvature index separates the image pixels into three subsets:
background, foreground, and indeterminate. Indeterminate subset comprises all the
pixels that are not included in the background or foreground subsets. From these
subsets, the background and foreground intensity distributions, as well as the intensity
log-likelihood functions are estimated. The algorithm keeps iterating by using two out of
the three features at a time to estimate the distribution of the feature that is left out.
Usually three iterations are sufficient for a convergence. In the final step these log-
likelihood functions are combined to determine the overall likelihood function. A
probability map that represents the probability of a pixel being a foreground is
calculated.
This non-parametric method is different from existing parametric approaches, because it
can handle arbitrary mixtures of blob and ridge like structures. This is essential in
applications such as in tissue imaging where a nuclei image in an epithelial tissue
comprises both ridge- and blob-like structures. The network of membrane structures in
tissue images is another example where the intersection of ridges can form structures
that are partially blobs. Accurate segmentation of membrane and nuclei structures forms
the base for higher level scoring and statistical analysis applications. For example,
distribution of a target protein on each of the segmented compartments can be
quantified and related to clinical outcomes. Scores measuring translocation of a protein
between segmented compartments can reveal protein specific pathways, and response
to drug therapy. Using the spatial and intensity interrelation of nuclei and membrane
biomarkers, it is possible to separate the epithelial nuclei from the stromal nuclei. Figure
8 shows sample images of membrane and nuclei and their segmentation results.
Segmentation of Cytoplasm and Epithelial Regions
Cytoplasm can be detected either by using a specific marker, or detecting it
computationally. Based on binary image morphological operations, Ding [63] describes a
method that identifies the cytoplasm as the region around the nuclei. While this works
14
well for sparse cell images, defining these regions in tissue is more complicated.
Whenever we have membrane compartment in addition to nuclei we can determine the
cytoplasm as the region between membrane and nuclei.
Let us denote the thresholded nuclei, and membrane sets with ),( yxM , and ),( yxN ,
respectively. Cytoplasm, denoted by ),( yxC , is defined as the union of sets of small
regions circumscribed by membrane alone or membrane and nuclei pixels (See green
regions in Figures 8b and 8d). Only pixels that are not defined as ),( yxM or ),( yxN
can be defined as ),( yxC .
Epithelial and stromal regions are morphologically different. Let ),( yxU defined as
),(),(),(),( yxNyxMyxCyxU ∪∪= (12)
denote the superset union of the nuclei, cytoplasm, and membrane sets. Since the
stromal nuclei are not connected through membrane structures, and are sparsely
distributed, they can be detected by a connected component analysis of ),( yxU . An
epithelial mask, ),( yxE , is generated as a union of large connected components of
),( yxU . For the sample images, any connected component larger than 800 pixels is
accepted as a part of the epithelial mask. The nuclei set is then separated into epithelial
nuclei ( ),( yxN e ) and stromal nuclei ( ),( yxN s ) by masking,
),(),(),( yxEyxNyxN e ⋅= , (13)
)),((),(),( yxEyxNyxN s −⋅= 1 . (14)
Figures 8b and 8d show the separated epithelial nuclei (blue) from the stromal nuclei
(grey).
Multi-channel Segmentation Techniques
After images are registered, points with the same coordinates in different channels
correspond to the same structural location. Each pixel in the image has a vector of
features, i.e., intensity values of images from different channels. These features are
then used to segment the vector valued image into different sub-cellular compartments.
15
Many segmentation algorithms have been developed for digital microscopic images. In
this work, we use one of the most widely used segmentation algorithms, k-means
clustering , for its simplicity and robustness. k-means clustering algorithm divides
feature space into clusters and maximizes the distances between the cluster centers. An
iterative procedure is used to find these cluster centers. Each pixel is given the same
label as the cluster center that is closest to it in the feature space.
Suppose [ ] Ω∈),(,),(,),,(=),( 1 yxyxIyxIyx cLI is a c -channel image of one tissue
sample. The k -means segmentation algorithm proceed as follows:
Step 0. Initialize cluster centers kµµ ,,1 L ;
Step 1. Assign a label ),( yxl to each pixel based on its distance in the feature space to
the cluster centers:
jj yxargminyxl µI −),(=),( (15)
Step 2. Update cluster centers as:
),(1
==),(
yxN jyxlj
j Iµ ∑ (16)
where the summation is over all image pixels with label j , and jN is the total number
of pixels with label j .
Step 3. Go to step 1 if not convergent.
Convergence is achieved when the changes in clusters centers are smaller than a preset
threshold. K-means segmentation result of the multiplexed images shown in Figure 4 is
shown in Figure 7a.
Two rounds of images were acquired for a TMA with 60 samples including both normal
breast tissues and breast cancer tissues. Each round consisted of multiple biomarkers.
Figure 4 shows four images of these markers for one tissue sample. k -means
segmentation algorithm is applied to segment multi-channel images into five
compartments Figure 7a shows one typical segmentation result): nucleus (blue),
16
membrane (red), smooth muscle (violet), cytoplasm (green), and background (black).
Figure 7b shows the overlay of the boundaries of the membrane class on one
membrane-stained channel.
Quantitation of Subcellular Biomarkers
A number of steps are required for quantitation of subcellular biomarkers. The
membrane compartment of tumor cells can be identified by pan-cadherin staining and
4',6-Diamidino-2-phenylindole (DAPI) is used to identify nuclei. A nonlinear non-
parametric mapping function with multiple inputs, including the local geometry of the
pixel distributions as well as the intensity values, maps each of these two channels to
probability values indicating the likelihood of pan-cadherin pixels being membrane, and
likelihood of DAPI pixels being nuclei. To define cytoplasm, a definite decision for each
pixel can be determined by thresholding the probability maps at 50% rate. For example,
a 50% probability of a pixel on a DAPI image implies that the pixel is equally likely to be
background and nucleus.
The distribution of biomarkers in each of these regions can be represented by a
probability distribution function (PDF). For example the PDF of the biomarker on the
membrane corresponds to its weighted empirical distribution, where the membrane
probability map determines the weights. We denote the mean and the standard
deviation of the biomarker distribution on each of the regions as Rµ , and Rσ ,
respectively, where R can be any of the nuclei, membrane, cytoplasm or extra cellular
matrix (ECM) regions. ECM is defined as all the non-background pixels not classified as
nuclei, membrane or cytoplasm. Compartmental biomarker distribution can then be
related to disease outcome or response to therapy.
Summary
Quantitation of multiple tissue based biomarkers requires several sequential steps
including tissue staining with target specific antibodies labeled with fluorescent reagents,
image capture, preprocessing, registration, segmentation and subcellular quantitation.
This provides the flexiblity to quantify biomarkers in more than one cellular
compartment, thus maximizing the amount of data present in a tissue image, and
17
enabling more extensive analysis of the role of biomarkers in predicting response to
therapy and patient survival.
Acknowledgment
The authors would like to thank the significant contributions of Maximilian Seel, Harvey
Cline, Melinda Larsen to this work.
18
Images of
Biomarkers
Pre-processing &
Registration
Vector Image
Segmentation
Compartment
Labels
Statistical
Analysis
Cancer Score &
Survival Analysis
Image Analysis
Cancer Normal
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
1.1
1.2
Images of
Biomarkers
Pre-processing &
Registration
Vector Image
Segmentation
Compartment
Labels
Statistical
Analysis
Cancer Score &
Survival Analysis
Image Analysis
Cancer Normal
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
1.1
1.2
Figure 1: Schematic representation of image processing procedures. In this paper, we focus on the steps in the shaded box.
19
( a )
( b )
Figure 2: (a) Trimmed average image of one channel; (b) A third order polynomial approximation of the illumination pattern.
20
( a )
( b )
( c )
( d )
Figure 3: Fluorescent image showing DAPI staining; (a) before correction, (b) after illumination correction. A bright field DAB staining image before illumination correction, (c) before correction, (d) after illumination correction.
21
Figure 4: Four channels of one tissue sample. The top row shows images from the first imaging round with a nucleus-stain(left) and a membrane-stain (right). The bottom row shows images from the second imaging round with cyan fluorescent protein (CFP) stain (left) and smooth muscle actin stain (right).
22
(a)
(b)
(c)
(d)
Figure 5: Overlay of DAPI images before registration (a) and after registration (b) and overlay of CFP images before transformation (c) and after transformation (d). The color channels of the images are from sebsequent steps of the sequential staining. The reference image is in red component, and the subsequent step is in blue channel, and the green is set to average of the two images. Any color shifts are due to misregistration.
23
( a )
( b )
( c )
( d )
( e )
( f )
Figure 6: (a) A three-channel (red, green, blue) color image of an H&E stained breast tissue section. (b) The nuclei component of the H&E color image. (c) A two-channel fluorescent imaging of the same tissue stained with molecular biomarkers. Green and blue colors are used to visualize the beta-catenin, and DAPI images, respectively. (d) DAPI channel of the molecular markers. (e) Registered DAPI image in the H&E coordinate system. (f) The beta-catenin image (shown in green color) superimposed with the H&E image.
24
( a )
( b )
Figure 7: a) K-means segmentation result of the multiplexed images shown in Figure 4. b) Curvature flow smoothing based membrane detection from Figure 4 overlaid on the membrane stained image.
25
( a ) ( b )
( c ) ( d ) Figure 8: (a) Raw image intensities (Breast cancer tissue), Red-Membrane, Blue-Nuclei, Green-Estrogen Receptor biomarkers. (b) Detected Compartments; Red-Membrane, Blue-Epithelial Nuclei, Green-Cytoplasm. (c) Raw image Intensities (Colon Tissue), Red-Membrane, Blue-Nuclei, Green-cMet markers. (d) Detected Compartments; Red-Membrane, Blue-Epithelial Nuclei, Gray-Stromal Nuclei, Green-Cytoplasm. Pink regions are excluded from quantitation. Both the background and the Extra Cellular Matrix (ECM) are presented with black color.
26
References
1. Fox, H., Is H&E morphology coming to an end? Journal of Clinical Pathology,
2000. 53: p. 38-40. 2. Kirkham, N., The pathologist in the 21st century generalist or specialist? Journal
of Clinical Pathology, 2000. 53(Millennial reviews): p. 7-9. 3. Dolled-Filhart, M., et al., Quantitative in situ analysis of beta-catenin expression
in breast cancer shows decreased expression is associated with poor outcome. Cancer Res, 2006. 66(10): p. 5487-94.
4. Schubert, W., et al., Analyzing proteome topology and function by automated multidimensional fluorescence microscopy. Nature Biotechnology, 2006. 24(10): p. 1270-1278.
5. Kayser, K., et al., Digitized pathology: theory and experiences in automated tissue-based virtual diagnosis. Rom J Morphol Embryol, 2006. 47(1): p. 21-8.
6. Schrader, T., et al., The diagnostic path, a useful visualisation tool in virtual microscopy. Diagn Pathol, 2006. 1: p. 40.
7. Pauletti, G., et al., Assessment of Methods for Tissue-Based Detection of the HER-2/neu Alteration in Human Breast Cancer: A Direct Comparison of Fluorescence In Situ Hybridization and Immunohistochemistry. Journal of Clinical Oncology, 2000. 18(21): p. 3651.
8. Cregger, M., A.J. Berger, and D.L. Rimm, Immunohistochemistry and quantitative analysis of protein expression. Arch Pathol Lab Med, 2006. 130(7): p. 1026-30.
9. Camp, R.L., G.D. Chung, and D.L. Rimm, Automated subcellular localization and quantification of protein expression in tissue microarrays. Nature Medicine, 2002. 11: p. 1323-1327.
10. Zheng, Z., et al., DNA synthesis and repair genes RRM1 and ERCC1 in lung cancer. N Engl J Med, 2007. 356(8): p. 800-8.
11. Pusztai, L., et al., Molecular classification of breast cancer: limitations and potential. Oncologist, 2006. 11(8): p. 868-77.
12. Smith, A.M. and S. Nie, Chemical analysis and cellular imaging with quantum dots. Analyst, 2004. 129(8): p. 672-7.
13. Hoffman, R.M., Advantages of multi-color fluorescent proteins for whole-body and in vivo cellular imaging. J Biomed Opt, 2005. 10(4): p. 41202.
14. Giepmans, B.N., et al., The fluorescent toolbox for assessing protein location and function. Science, 2006. 312(5771): p. 217-24.
15. McCabe, A., et al., Automated quantitative analysis (AQUA) of in situ protein expression, antibody concentration, and prognosis. J Natl Cancer Inst, 2005. 97(24): p. 1808-15.
16. Schubert, W., Topological proteomics, toponomics, MELK-technology. Adv Biochem Eng Biotechnol, 2003. 83: p. 189-209.
17. Levenson, R.M. and J.R. Mansfield, Multispectral imaging in biology and medicine: slices of life. Cytometry A, 2006. 69(8): p. 748-58.
18. Petty, H.R., Fluorescence microscopy: established and emerging methods, experimental strategies, and applications in immunology. Microsc Res Tech, 2007. 70(8): p. 687-709.
19. Liehr, T., et al., Multicolor FISH probe sets and their applications. Histol Histopathol, 2004. 19(1): p. 229-37.
27
20. Conchello, J.A. and J.W. Lichtman, Optical sectioning microscopy. Nat Methods, 2005. 2(12): p. 920-31.
21. Xing, Y., et al., Molecular profiling of single cancer cells and clinical tissue specimens with semiconductor quantum dots. Int J Nanomedicine, 2006. 1(4): p. 473-81.
22. Zhang, Y., et al., Nanobiotechnology: quantum dots in bioimaging. Expert Rev Proteomics, 2007. 4(4): p. 565-72.
23. Fountaine, T.J., et al., Multispectral imaging of clinically relevant cellular targets in tonsil and lymphoid tissue using semiconductor quantum dots. Mod Pathol, 2006. 19(9): p. 1181-91.
24. Bentolila, L.A. and S. Weiss, Single-step multicolor fluorescence in situ hybridization using semiconductor quantum dot-DNA conjugates. Cell Biochem Biophys, 2006. 45(1): p. 59-70.
25. Chan, P., et al., Method for multiplex cellular detection of mRNAs using quantum dot fluorescent in situ hybridization. Nucleic Acids Res, 2005. 33(18): p. e161.
26. Knoll, J.H., Human Metaphase Chromosome FISH Using Quantum Dot Conjugates. Methods Mol Biol, 2007. 374: p. 55-66.
27. Uen, Y.H., et al., Clinical significance of MUC1 and c-Met RT-PCR detection of circulating tumor cells in patients with gastric carcinoma. Clin Chim Acta, 2006. 367(1-2): p. 55-61.
28. Schubert, W., Exploring molecular networks directly in the cell. Cytometry A, 2006. 69(3): p. 109-12.
29. Rimm, D.L., et al., Tissue microarray: a new technology for amplification of tissue resources. Cancer J, 2001. 7(1): p. 24-31.
30. Seligson, D.B., The tissue micro-array as a translational research tool for biomarker profiling and validation. Biomarkers, 2005. 10: p. S77-82.
31. Mattes, D., et al., PET-CT Image Registration in the Chest Using Free-form Deformations. IEEE Trans. Med. Imag., 2003. 22(1): p. 120-128.
32. Abdul-Karim, M.A., et al., Automated tracing and change analysis of angiogenic vasculature from in vivo multiphoton confocal image time series. Microvasc. Res, 2003. 66(2): p. 113–125.
33. Can, A., et al., Rapid automated tracing and feature extraction from retinal fundusimages using direct exploratory algorithms. Information Technology in Biomedicine, IEEE Transactions on, 1999. 3(2): p. 125-138.
34. Frangi, A.F., et al., Multiscale vessel enhancement filtering. 1998. 35. Kirbas, C. and F. Quek, A review of vessel extraction techniques and algorithms.
ACM Computing Surveys, 2004. 36(2): p. 81-121. 36. Krissian, K., et al., Model-based detection of tubular structures in 3 D images.
Computer Vision and Image Understanding, 2000. 80(2): p. 130-171. 37. Nain, D., A. Yezzi, and G. Turk, Vessel segmentation using a shape driven flow.
Medical Imaging Copmuting and Computer-Assisted Intervention, 2004. 3216: p. 51-59.
38. Niemeijer, M., et al., Comparative study of retinal vessel segmentation methods on a new publicly available database. Proceedings of SPIE, 2004. 5370: p. 648.
39. Pham, D., Blind separation of instantaneous mixture of sources via an independent component analysis. IEEE TRANSACTIONS ON SIGNAL PROCESSING, 1996. 44(11): p. 2768-2779.
40. Staal, J., et al., Ridge-based vessel segmentation in color images of the retina.
28
Medical Imaging, IEEE Transactions on, 2004. 23(4): p. 501-509. 41. Zana, F. and J.C. Klein, A Multimodal Registration Algorithm of Eye Fundus
Images Using Vessels Detection and Hough Transform. IEEE Transactions onMedical Engineering, 1999. 18(5): p. 419-428.
42. Lin, G., et al., Automated Image Analysis Methods for 3-D Quantification of the Neurovascular Unit From Multichannel Confocal Microscope Images. CytometryPart A, 2005. 66: p. 9-23.
43. Shi, L., E.A. Hoffman, and J.M. Reinhardt, Segmentation of the ovine lung in 3D CT images. SPIE Medical Imaging Proc of SPIE, 2004. 5369.
44. Descoteaux, M., et al., Bone enhancement filtering: application to sinus bone segmentation and simulation of pituitary surgery. Medical Image Computing andComputer-Assisted Intervention, 2005: p. 9–16.
45. Mendonca, P.R., et al. Model-based analysis of local shape for lesion detection in CT scans. in Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention 2005. Palm Springs, CA, USA.
46. Rahul Bhotika, et al. Part-based Local Shape Models for Colon Polyp Detection. inProceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention 2006. Copenhagen, Denmark.
47. Al-Kofahi, K.A., et al., Median-based robust algorithms for tracing neurons from noisy confocal microscope images. Information Technology in Biomedicine, IEEETransactions on, 2003. 7(4): p. 302-317.
48. Al-Kofahi, K.A., et al., Rapid automated three-dimensional tracing of neurons from confocalimage stacks. Information Technology in Biomedicine, IEEETransactions on, 2002. 6(2): p. 171-187.
49. Meijering, E., et al., Neurite tracing in fluorescence microscopy images using ridge filtering and graph searching: principles and validation. BiomedicalImaging: Macro to Nano, 2004. IEEE International Symposium on, 2004: p.1219-1222.
50. Anoraganingrum, D., Cell segmentation with median filter and mathematical morphology operation. Proceedings of the 10th International Conference onImage Analysis and Processing, 1999: p. 1043.
51. Nedzved, A., S. Ablameyko, and I. Pitas. Morphological segmentation of histology cell images. in Pattern Recognition, 2000. Proceedings. 15th International Conference on. 2000.
52. Adiga, U., et al., High-Throughput Analysis of Multispectral Images of Breast Cancer Tissue. IEEE Trans. Imag. Proc., 2006. 15(8): p. 2259-2268.
53. Ibanez, L., et al., The ITK Software Guide: The Insight Segmentation and Registration Toolkit. 2005: Kitware Inc.
54. Perona, P. and J. Malik, Scale-space and edge detection using anisotropic diffusion. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1990.12(7): p. 629-639.
55. Malladi, R. and J.A. Sethian, Image Processing Via Level Set Curvature Flow. Proceedings of the National Academy of Sciences, 1995. 92(15): p. 7046-7050.
56. Malladi, R. and J.A. Sethian, Image Processing: Flows under Min/Max Curvature and Mean Curvature. Graphical Models and Image Processing, 1996. 58(2): p.127-141.
57. Shi, J. and C. Tomasi. Good Features to Track. in 1994 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 1994.
29
58. Clocksin, W.F. and B. Lerner, Automatic Analysis of Fluorescence In-Situ Hybridisation Images. Proceedings of the 11th British Machine Vision Conference,2000: p. 666-674.
59. McCulloch, C.C., et al., Model-based detection of lung nodules in computed tomography exams1 Thoracic computer-aided diagnosis. Academic Radiology,2004. 11(3): p. 258-266.
60. Melonakos, J., et al., A Probabilistic Model for Haustral Curvatures with Applications to Colon CAD.
61. Mendonça, P.R.S., et al., Lung Nodule Detection Via Bayesian Voxel Labeling. Information Processing in Medical Imaging: p. 134–145.
62. Sofka, M. and C.V. Stewart, Retinal Vessel Centerline Extraction Using Multiscale Matched Filters, Confidence and Edge Measures. IEEE Trans. Med. Imag, 2006.25: p. 1531-1546.
63. Ding, G., et al., Characterization and quantitation of NFkB nuclear translocation induced by interleukin-1 and tumor necrosis factor-a. J Biol Chem, 1998. 273: p.28897-28905.
64. Nattkemper, T.W., Automatic segmentation of digital micrographs: A survey. Proc. 11th World Congr. Medical Informatics (MEDINFO), Pt, 2004. 2: p. 847-851.
65. Hoos, A., et al., Validation of tissue microarrays for immunohistochemicalprofiling of cancer specimens using the example of human fibroblastic tumors. Am J Pathol, 2001. 158(4): p. 1245-51.
JB26798US