CONTENT BASED ECHO IMAGE RETRIEVAL (CBEIR)...
Transcript of CONTENT BASED ECHO IMAGE RETRIEVAL (CBEIR)...
CHAPTER 6
CONTENT BASED ECHO IMAGE RETRIEVAL
(CBEIR) FRAMEWORK
6.1 INTRODUCTION
Research in Content Based Image Retrieval (CBIR) today is an important discipline,
required in all applications. CBIR has emerged during the last several years as a powerful tool
to efficiently retrieve images visually similar to a query image. The main idea is to represent
each image as a feature vector and to measure the similarity between images with distance
between their corresponding feature vectors according to some metric. Finding the correct
features to represent images with, as well as the similarity metric that groups visually similar
images together, are important steps in the construction of any CBIR system [Hiremath, 2007].
Medical images have become a key investigation tool for medical diagnosis and pathology
follow-ups. Medical CBIR systems are different from the general purpose ones in several ways.
For one, the retrieval has to take place with respect to pathology bearing regions (PBR) that
tend to be highly localized. This means that retrieval on the basis of global signatures would
make no sense at all for medical databases. Consider the following scenario [Chia-Hung,
2006]:
John Doe, a radiologist in a university hospital, takes X-rays and MRI scans for patients
producing hundreds of digital images each day. In order to facilitate easy access in the future,
he registers each image in a medical image database based on the modality, region, and
orientation of the image. One day Alice Smith, a surgeon, comes to discuss a case with John
Doe as she suspects there is a tumor on the patient’s brain according to the brain MRI.
However, she cannot easily judge if it is a benign or malign tumor from the MRI scan, and
would like to compare with previous cases to decide if this patient requires a dangerous
operation. Understanding Alice’s needs, John helps Alice find similar-looking tumors from the
previous MRI images. He uses the query-by-example mode of the medical image database,
CHAPTER 6 - CONTENT BASED ECHO IMAGE RETRIEVAL (CBEIR) FRAMEWORK
151
delineates the tumor area in the MRI image, and then requests the database to return the brain
MRI images most similar to this one. Alice finds eleven similar images and their accompanying
reports after reviewing the search results. Alice compares those cases and verifies the pattern
of the tumor. Later on, she tells her patient that it is a benign tumor and the operation is
unnecessary unless the tumor grows.
With more and more patient records now containing multimodal imaging data, an exciting
application of image and video retrieval is emerging in the area of clinical decision support.
Cardiologists in particular, routinely use multiple imaging modalities including X-ray imaging,
ultrasound imaging, and CT imaging for their decision making. However, their diagnosis
methodology is still single sample-guided in that only the data from the given patient is used
along with their prior knowledge to make decisions [Syeda-Mahmood, 2010]. If content-based
retrieval techniques could be used to retrieve similar case data and hence similar patients, it can
enable the following advantages:
Enhanced decision making for physicians. For example, using similar case data,
physicians can validate their current hypothesis.
By examining the associated diseases with the similar patient cases retrieved, they
can check for any overlooked possibilities or alternate interpretations
They can learn of statistical correlations (or co-morbidities) between diseases,
treatment and outcomes, thus paving the way for a whole new way of practicing
medicine
Employ epidemiological study over a group or population in a particular
region/city/country
Training junior doctors in the field of echocardiography by selecting specific
disease/case/etc.
Avoid manual annotations, because they require too much time and are expensive to
implement because of large patient records (10000 patients undergo echo
examination every month in Jayadeva Cardiology Hospital, Bangalore, India and
this number grows every year!)
The contents or the semantics of echo images are difficult to be concretely described
in words
In this research work a novel Content Based Echo Image Retrieval (CBEIR) System is
proposed which can be used to retrieve 2D and Doppler images from a large echo image
database based on quantitative and qualitative feature descriptors. What makes two X-ray
images, or two echocardiogram images similar is not their color or texture, but the underlying
CHAPTER 6 - CONTENT BASED ECHO IMAGE RETRIEVAL (CBEIR) FRAMEWORK
152
disease they depict. However, if the color or texture can be mapped to a particular disease
(example: Doppler images in which blood flow velocity) it can be included in the feature
vector. Thus, image and video retrieval methods would need to focus on disease-specific
patterns for finding similar cases.
The retrieved images are then ranked and displayed for further decision making process by
the physician. A ranking strategy used by many CBIR systems is followed here. This is to
employ image content descriptors, so that returned images that are most similar to the query
image are placed higher in the rank.
6.1.1 CBIR
CBIR is a technique which uses visual contents (features) to search images from large scale
image databases according to users’ requests. Here, the features mean color, texture, spatial,
shape, etc. and is different from text based search [Johan, 2007].
A typical CBIR solution requires the construction of an image descriptor, characterized by
an extraction algorithm to encode image features into feature vectors and a similarity measure
to compare two images called as matching function defined as an inverse function of the
Euclidean distance (larger the distance lesser the match).
Fig. 6.1 A typical CBIR System
The images are stored in a database called “image database”. To compare the query image
with these images, we need to encode the images into some form based upon the features. The
CHAPTER 6 - CONTENT BASED ECHO IMAGE RETRIEVAL (CBEIR) FRAMEWORK
153
features may be color, texture, shape, or any other domain specific features. For instance, the
image histogram of each image can be stored in the “feature database” which in turn is
compared with the histogram of the query image as shown in Figure 6.1.
Image Indexing and Query
Unlike traditional data, however, images are complex. When images are represented as
feature vector, each image becomes a point in a k-dimensional space, where k is the number of
features in each vector. Such a feature vector can be created for any image by applying some
feature extraction algorithms.
When developing indexing methods for image data, then, researchers often operate under
the assumption that the images are represented as feature vector in the same multidimensional
space. To access data of this type, any of the dimensions should be used. So, for an indexing
technique for CBIR to perform efficiently, it must be designed to search all dimensions of the
data. The indexing technique used in CBIR should be able to efficiently satisfy several different
types of queries.
The CBIR system will usually pre-process the images stored in its database, by extracting
and indexing the feature vectors. This process is usually performed off-line, once per image.
Once the database is ready, the CBIR system allows the user to specify the queries by means of
a query pattern (which can be a sample image). The query is also processed by the feature
vector extractor, and the similarity function is used to evaluate its similarity to the database
images. Then, the database images will be ranked in decreasing order of similarity to the query,
and shown to the user in that order [Fabio, 2010].
6.1.2 EXISTING MEDICAL CBIR SYSTEMS
Although content-based image retrieval has frequently been proposed for use in medical
image management, only a few content-based retrieval systems have been developed
specifically for medical images. These research-oriented systems are usually constructed in
research institutes and continue to be improved, developed, and evaluated over time. This
section will introduce several major medical content-based retrieval systems [Henning Müller,
2004] [Chia-Hung, 2006].
CHAPTER 6 - CONTENT BASED ECHO IMAGE RETRIEVAL (CBEIR) FRAMEWORK
154
1. ASSERT [Shyu, 1998] (Automatic Search and Selection Engine with Retrieval Tools):
Developed by Purdue University, Indiana University, and University of Wisconsin,
USA.
Hospital, USA. http://rvl2.ecn.purdue.edu/~cbirdev/WWW/CBIRmain.html
Comments: Restricted access
2. CasImage: Developed by University Hospital of Geneva, Switzerland.
http://www.casimage.com/
Comments: Web site does not open!
3. IRMA (Image Retrieval in Medical Applications): Developed by Aachen University of
Technology, Germany.
http://libra.imib.rwth-aachen.de/irma/
Comments: Not in English
4. NHANES II (The Second National Health and Nutrition Examination Survey):
Developed by National Library of Medicine, USA.
http://archive.nlm.nih.gov/proj/webmirs/
Comments: Mainly for radiological images
As there are few medical CBIR systems, there is a strong requirement for development of
medical image retrieval systems; in particular for echo images.
6.2 UNIVERSAL MODEL FOR CBIR
As part of the research work, first a general CBIR framework has been developed under the
name "A Universal Model for CBIR". The aim of this framework is to endow a flexible
environment for the users to input a query image and display all similar images after ranking.
The novel idea here is that the user can select individual or combination of features such as
color, texture, shape, edge frequency, Haar wavelets, edge density [Phung, 2007] [Henning
Müller, 2004] [Alberto, 2003], etc. By observing the output one can modify the feature
selection and get better output. This is almost similar to 'relevance feedback' concept being
specified in the literatures. Figure 6.2 shows one such model. The shaded box "Other features"
may include a number of other features into the model and it can work as plug-n-play. This
means, any feature can be plugged-in or plugged-out at run-time once the feature modules are
added to the system. This way the user can decide at any point of time which feature or features
are relevant to his application.
CHAPTER 6 - CONTENT BASED ECHO IMAGE RETRIEVAL (CBEIR) FRAMEWORK
155
Fig. 6.2 Proposed Universal Model for CBIR
However, this model can not be directly used for echo images. This is because
segmentation, qualitative, and quantitative features are more specific to these types of medical
images. Therefore, the "Segmentation" block must handle this issue and include appropriate
features in the feature vector. Another technique called "greedy strategy" is employed to
enhance the performance in terms of speed and accuracy of retrieval.
6.2.1 SIMILARITY COMPARISON USING GREEDY METHOD
The main issue in image retrieval systems is the number of dimensions of the feature vector
which is normally large. For example, QBIC system reduces the 20-dimension feature vector to
two or three using Principle Component Analysis (PCA). It explores exponentially with the
increasing of the dimensionality and eventually reduces to sequential searching. To overcome
these problems a simple method based on greedy strategy is followed.
Consider three database images and their corresponding segments as I1(S1, S2, S4), I2(S2, S5,
S8, S7), and I3(S1). The sequence of the segments shown in I1, I2, and I3 are based on descending
order of the size/area of each segment. Similarly, let QI(S7, S2) denotes the segments of the
query image. The algorithm shown in Figure 6.3 follows the greedy strategy to compare the
similarity between the query image and the database images.
Image
Enhancement
s
Query Image
Image
Database
Image
Segmentation
Segmentation/
Subdivision
Similarity
Comparison
Indexing &
Retrieval
Output
Ranked Images
Extract Color,
Texture, and
Edge Density
Features
Build single
Feature Vector
Feature
Database
Color
Histogram Texture Edge
Density
Image
Subdivision
Other Features
CHAPTER 6 - CONTENT BASED ECHO IMAGE RETRIEVAL (CBEIR) FRAMEWORK
156
Algorithm ImageSimilarity
// I[N] – Image DB with N images
// QI – Query Image
foreach (Image I in I[N]) do
foreach (Segment s in SegmentSet) do
if (Euclidean(QI[s], I[s]) < threshold)
// continue to check other segments
else
// no need to check other segments
end.
Fig. 6.3 Algorithm for Similarity comparison based on greedy strategy
Suppose 20 features are fixed for each segment and there are five segments on an average
per image, then the comparison has to be repeated for each segment. With this proposed
approach, a reasonable enhancement in performance can be obtained when the number of
segments is large.
6.3 FEATURE VECTOR FORMULATION
Most image retrieval systems follow the paradigm of representing images using a set of
features, such as color, texture [Johan, 2007], shape, and edge orientation [Tristan, 2004]
[Manjunath, 2001]. Among these features, color is the most frequently used visual property in
content-based image retrieval because it is relatively robust, and invariant with respect to image
size and orientation [Missaoui, 2004]. However, in medical images color alone can not be
considered as the most prominent feature. In fact, to increase the accuracy of retrieval some of
the local features of pathological regions are to be obtained. The current research includes both
local and global features so that the most relevant echo images are retrieved. This section
discusses these features and how they are integrated and formulated as a single feature vector.
Here, a technique called "universal CBIR model" for the image retrieval is adopted which can
even be used for natural images.
As already explained in Figure 1 of Chapter 1, the feature extraction method for 2D echo
and color Doppler images is different. Subsequently in Chapter 3, 4, and 5 the feature
extraction methods were discussed. These features are systematically concatenated and used for
image retrieval.
CHAPTER 6 - CONTENT BASED ECHO IMAGE RETRIEVAL (CBEIR) FRAMEWORK
157
6.3.1 2D ECHO IMAGE FEATURES
Two different feature databases: (1) for 2D echo image and (2) for color Doppler images
are used. Similarly, when a query image is submitted by the user, first the modality is checked
and then the search is initiated in the appropriate database. Typically for any 2D echo image the
following features are sufficient to compute for clinical purpose:
The process of obtaining these features can be explained as follows: From the 2D echo
image database all the images are preprocessed, segmented (KMEP), cardiac chambers are
boundary traced, quantitative computations are carried out, and finally formulated as a feature
vector and stored in the feature database. Next, the submitted query image also undergoes a
similar process and its feature is stored in a separate data structure. This query image feature is
then compared with each of the database feature vector (image) by calculating the Euclidean
distance. Now these distances are ordered with the smallest one giving highest rank which
signifies most similar. Finally, top k images are displayed, where k is the user given input.
6.3.2 COLOR DOPPLER IMAGE FEATURES
The second image database consists of color Doppler images of both normal and abnormal
category. Chapter 5 explained the method to extract different features from these images. To
Sl. No 2D Echo Image
Features
1 LVHeightED
2 LVDiameterED
3 LVAreaED
4 LVVolumeED
5 LVHeightES
6 LVDiameterES
7 LVAreaES
8 LVVolumeES
9 EF
10 FS
11 LAHeightED
12 LADiameterED
13 LAAreaED
14 LAVolumeED
15 LAHeightES
16 LADiameterES
17 LAAreaES
18 LAVolumeES
19 RVHeightED
20 RVDiameterED
Sl. No 2D Echo Image
Features
21 RVAreaED
22 RVVolumeED
23 RVHeightES
24 RVDiameterES
26 RVAreaES
27 RVVolumeES
28 RAHeightED
29 RADiameterED
30 RAAreaED
31 RAVolumeED
32 RAHeightES
33 RADiameterES
34 RAAreaES
35 RAVolumeES
CHAPTER 6 - CONTENT BASED ECHO IMAGE RETRIEVAL (CBEIR) FRAMEWORK
158
extract the salient features, the Doppler image is first segmented and the color portion is
extracted. Pixel classification method is suitable for this task. This color portion, then, is
analyzed to extract various features as follows:
Here, features 1 to 6 represent the histogram mean and standard deviation for each color
channel [Rishav, 2009] [Sangoh, 2002]. Features 7 to 12 represent texture features in which
except contrast feature, the rest of the features the image is converted into grayscale) [Mihran,
1998]. Features 13 and 14 belong to the statistical categories. To extract features 15 and 18, the
image undergoes image processing techniques.
Edge Gradient Feature
Except for the edge gradient (EG) feature, all the rest of the features have been discussed
earlier [Minyoung, 2005]. Whenever the image has a mosaic color pattern, it suggests that the
patient is attacked by some kind of heart disease. For normal patients the color pattern is
generally uniform (red or blue) as explained in Chapter 3. The novel feature EG can detect this
by computing the gradient of each pixel with respect to their 4-neighbors (i.e. d = 1) for each
color channel R G B. The following pseudo code is used for this:
Step 1: Compute the gradient g(d) of each pixel in the image matrix I(x, y)
Sl. No. Color Doppler
Image Features
1 RedMean
2 GreenMean
3 BlueMean
4 RedSD
5 GreenSD
6 BlueSD
7 RedContrast
8 GreenContrast
9 BlueContrast
10 Energy
11 Entropy
12 Homogeneity
13 Skewness
14 Kurtosis
15 RedEG
16 GreenEG
17 BlueEG
18 ED
CHAPTER 6 - CONTENT BASED ECHO IMAGE RETRIEVAL (CBEIR) FRAMEWORK
159
Step 2: Edge Gradient feature is the average values of gradient in a specified distance d.
Fig. 6.4 Image matrix to illustrate the computation of Edge Gradient
To illustrate the method, consider an image matrix as shown in Figure 6.4. Considering a
pixel 3, say, the gradient can be calculated as 2, 2, 2, 1 assuming d = 1. Next step is to increase
the value of d by 1 and repeat the same procedure for all pixels and all color channels. The
value of d is varied from 1 to 50 and the average of all these values will be the final edge
gradient value of a particular color channel. The rest of the feature extraction methodologies
have already been explained in Chapter 5.
6.4 SIMILARITY MEASURE
Selection of similarity metrics has a direct impact on the performance of content-based
image retrieval. The kind of feature vectors selected determines the kind of measurement that
will be used to compare their similarity (Smeulders, Worring, Santini, Gupta, & Jain, 2000). In
this work, Euclidean distance as shown in equation 6.1 is found suitable, because it is the most
common metric used to measure the distance between two points in multi-dimensional space
(Qian, Sural, Gu, and Pramanik, 2004). A number of other metrics, such as Mahalanobis
Distance, Minkowski-Form Distance, Earth Mover’s Distance, and Proportional Transportation
Distance, have been proposed for specific purposes.
2
1
)][][(
N
i
DBQ iFiFd (6.1)
where, FQ[i] is the ith
feature of the query image, FDB[i] is the ith
feature of the database image,
N is the number of features (size of the feature vector). In our case, the detection of similarity
should account for variations in the heart chamber dimensions, or stenosis, or regurgitation
levels. In addition to this the similarity measure should be robust to individual inter-patient
variations in the shape profile within the same disease class.
5 1 6
2 3 1
4 5 8
CHAPTER 6 - CONTENT BASED ECHO IMAGE RETRIEVAL (CBEIR) FRAMEWORK
160
6.5 RETRIEVAL PROCESS
The process of retrieving the 2D echo or color Doppler images from the image database
consists of several steps. For 2D echo images, cardiac chambers are segmented and quantified
to build the feature vector, whereas in the case of color Doppler images features such as color
histogram, texture, edge density, etc., are extracted. The proposed CBIR model would consider
appropriate features selected by the user and retrieve the most similar images.
Consider the retrieval process of similar images for a given 2D echo query image. The
feature database consists of feature vectors of all the 2D echo images consisting of normal and
abnormal images. The query image is segmented and the cardiac image features are extracted
and formulated as a feature vector. The Euclidean distance between the query image feature
vector and database feature vector is computed. These distances are arranged in the ascending
order and top k images are displayed as the most relevant with respect to the query image.
Similarly, the retrieval process of color Doppler images is explained here. The feature
extraction process for these types of images is different from 2D echo images. As explained
earlier, the segmentation of the color Doppler images yield several features such as texture,
histogram, etc. From these features the users have the flexibility in selecting required features
and for which the distances are computed similar to the processing of 2D echo images.
The proposed CBEIR system, therefore, provides s framework to retrieve all images that
are similar to a given query image in terms of clinical features along with the flexibility in
setting few or all features.
6.6 RESULTS AND DISCUSSIONS
This section primarily shows the retrieval efficiency of the proposed CBEIR system for
various query images. The study had used the following image database
Total # of patients : 60 (42 Men and 18 Women with 42 ± 16 years of age)
Total # of images : 623 (Abnormal : 423 and Normal : 200)
# of image categories : 5
a) Normal : 200
b) Aortic Regurgitation (AR) : 53
c) Mitral Regurgitation (MR) : 51
d) Mitral Stenosis : 266
e) Aortic Stenosis : 53
CHAPTER 6 - CONTENT BASED ECHO IMAGE RETRIEVAL (CBEIR) FRAMEWORK
161
The above list includes both 2D echo and color Doppler images.
6.6.1 RETRIEVAL EFFICIENCY
Figure 6.6 shows the retrieval efficiency shown in terms of recall-precision graph for the
color Doppler images.
Fig. 6.5 Recall – Precision curve for the query image based on image ranking (top k images).
Abnormal Query Image shown at the right-top corner.
Fig. 6.6 Recall – Precision curve for the query image based on image ranking (top k images).
Normal Query Image shown at the right-top corner.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Pre
cisi
on
Recall
Color Doppler Features
Color Histogram Features
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Pre
cisi
on
Recall
Color Doppler Features
Color histogram Features
CHAPTER 6 - CONTENT BASED ECHO IMAGE RETRIEVAL (CBEIR) FRAMEWORK
162
The graph shows two curves: dotted line curve is drawn by considering only color
histogram feature and the other thick line with color Doppler image features. The later one is
superior in terms of better precision and it is not surprising because this feature includes
domain specific features rather than standard global features such as color histogram. It can be
observed that when the proposed color Doppler feature is combined with the traditional texture
properties, top 30 retrieved images are similar to the query image, except only one image.
The CBEIR system was tested with normal and abnormal query images and the
corresponding recall-precision curves appear in Figure 6.5 and Figure 6.6 respectively. It is
observed that the color Doppler feature based retrieval offers better performance than color
histogram method.
6.6.2 IMAGE RANKING: 2D ECHO AND COLOR DOPPLER IMAGES
There are two experiments conducted to study the performance of the CBEIR system.
(1) Query Image d = 0 (2) d = 2.3 (3) d = 8.0 (4) d = 19.8
(5) d = 24.0 (6) d = 25.5 (7) d = 26.5 (8) d = 27.2
(9) d = 53.6 (10) d = 61.2
Fig. 6.7 Rankings of 2D Echo images. Image 1 is the query image. Numbers in the parentheses represent
ranks and d is the distance with respected to the query image.
CHAPTER 6 - CONTENT BASED ECHO IMAGE RETRIEVAL (CBEIR) FRAMEWORK
163
The first experiment is to show the rankings of the 2D echo images retrieved from the
image database for the query image being shown as the first image in Figure 6.7. The image
database is populated with both normal and abnormal images.
As per the rankings shown, it can be inferred that the first 3 images, i.e. ranks 2 to 4, are
similar to the query image. Images with rankings 5 to 10 have larger distance and therefore
they are dissimilar to the query image. This indicates that these images are not normal images.
This result can visually be verified as these images belong to patients affected by mitral
stenosis having dilated LA.
The second experiment is to test the ranking of images consisting of color Doppler images.
Keeping an abnormal image as query image as shown in Figure 6.8, the ranking of the images
are marked from 2 to 16.
(1) d = 0.0 (2) d = 5.0 (3) d = 5.5 (4) d = 6.0 (5) d = 6.7
(6) d = 7.8 (7) d = 8.2 (8) d = 8.3 (9) d = 9.0 (10) d = 9.5
(11) d = 9.8 (12) d = 9.9 (13) d = 9.9 (14) d = 10.0 (15) d = 10.2
(16) d = 10.9
Fig. 6.8 Ranking of images. Image (1) is the query image and the distance, d is specified for each image
and numbers in the brackets represent corresponding rank.
CHAPTER 6 - CONTENT BASED ECHO IMAGE RETRIEVAL (CBEIR) FRAMEWORK
164
The image with distance 0 is the query image used for this experiment. When the distance,
d is less than 9.0 the images are ranked from 1 to 9 which are abnormal with mosaic color
pattern. At the same time when d is greater than 9, they are identified as normal patient images.
It can be observed that the color pattern of the query image (abnormal) matches with the
other abnormal images in the image database when top 9 images are retrieved. This implies that
both recall and precision values are 100%. A detailed discussion on the performance of CBEIR
system is given in Chapter 10.
6.7 SUMMARY
Medical CBIR systems are different from the conventional retrieval engines, because here
retrieval is based on pathology bearing regions (PBR) that tend to be highly localized. With
more and more patient records now containing multimodal imaging data, an exciting
application of image and video retrieval is emerging in the area of clinical decision support.
A novel Content Based Echo Image Retrieval (CBEIR) System is proposed which can be
used to retrieve 2D and Doppler images from a large echo image database based on quantitative
and qualitative feature descriptors. The design is based on multifeature based universal model
that could even be used for natural images with minimal changes.
Although content-based image retrieval has frequently been proposed for use in medical
image management, only a few content-based retrieval systems have been developed
specifically for medical images. The proposed model shows better retrieval efficiency after
extensive testing done using a large database of live patient images.