CONTENT BASED ECHO IMAGE RETRIEVAL (CBEIR)...

CHAPTER 6

CONTENT BASED ECHO IMAGE RETRIEVAL

(CBEIR) FRAMEWORK

6.1 INTRODUCTION

Research in Content Based Image Retrieval (CBIR) today is an important discipline,

required in all applications. CBIR has emerged during the last several years as a powerful tool

to efficiently retrieve images visually similar to a query image. The main idea is to represent

each image as a feature vector and to measure the similarity between images with distance

between their corresponding feature vectors according to some metric. Finding the correct

features to represent images with, as well as the similarity metric that groups visually similar

images together, are important steps in the construction of any CBIR system [Hiremath, 2007].

Medical images have become a key investigation tool for medical diagnosis and pathology

follow-ups. Medical CBIR systems are different from the general purpose ones in several ways.

For one, the retrieval has to take place with respect to pathology bearing regions (PBR) that

tend to be highly localized. This means that retrieval on the basis of global signatures would

make no sense at all for medical databases. Consider the following scenario [Chia-Hung,

2006]:

John Doe, a radiologist in a university hospital, takes X-rays and MRI scans for patients

producing hundreds of digital images each day. In order to facilitate easy access in the future,

he registers each image in a medical image database based on the modality, region, and

orientation of the image. One day Alice Smith, a surgeon, comes to discuss a case with John

Doe as she suspects there is a tumor on the patient’s brain according to the brain MRI.

However, she cannot easily judge if it is a benign or malign tumor from the MRI scan, and

would like to compare with previous cases to decide if this patient requires a dangerous

operation. Understanding Alice’s needs, John helps Alice find similar-looking tumors from the

previous MRI images. He uses the query-by-example mode of the medical image database,

CHAPTER 6 - CONTENT BASED ECHO IMAGE RETRIEVAL (CBEIR) FRAMEWORK

151

delineates the tumor area in the MRI image, and then requests the database to return the brain

MRI images most similar to this one. Alice finds eleven similar images and their accompanying

reports after reviewing the search results. Alice compares those cases and verifies the pattern

of the tumor. Later on, she tells her patient that it is a benign tumor and the operation is

unnecessary unless the tumor grows.

With more and more patient records now containing multimodal imaging data, an exciting

application of image and video retrieval is emerging in the area of clinical decision support.

Cardiologists in particular, routinely use multiple imaging modalities including X-ray imaging,

ultrasound imaging, and CT imaging for their decision making. However, their diagnosis

methodology is still single sample-guided in that only the data from the given patient is used

along with their prior knowledge to make decisions [Syeda-Mahmood, 2010]. If content-based

retrieval techniques could be used to retrieve similar case data and hence similar patients, it can

enable the following advantages:

Enhanced decision making for physicians. For example, using similar case data,

physicians can validate their current hypothesis.

By examining the associated diseases with the similar patient cases retrieved, they

can check for any overlooked possibilities or alternate interpretations

They can learn of statistical correlations (or co-morbidities) between diseases,

treatment and outcomes, thus paving the way for a whole new way of practicing

medicine

Employ epidemiological study over a group or population in a particular

region/city/country

Training junior doctors in the field of echocardiography by selecting specific

disease/case/etc.

Avoid manual annotations, because they require too much time and are expensive to

implement because of large patient records (10000 patients undergo echo

examination every month in Jayadeva Cardiology Hospital, Bangalore, India and

this number grows every year!)

The contents or the semantics of echo images are difficult to be concretely described

in words

In this research work a novel Content Based Echo Image Retrieval (CBEIR) System is

proposed which can be used to retrieve 2D and Doppler images from a large echo image

database based on quantitative and qualitative feature descriptors. What makes two X-ray

images, or two echocardiogram images similar is not their color or texture, but the underlying


152

disease they depict. However, if the color or texture can be mapped to a particular disease

(example: Doppler images in which blood flow velocity) it can be included in the feature

vector. Thus, image and video retrieval methods would need to focus on disease-specific

patterns for finding similar cases.

The retrieved images are then ranked and displayed for further decision making process by

the physician. A ranking strategy used by many CBIR systems is followed here. This is to

employ image content descriptors, so that returned images that are most similar to the query

image are placed higher in the rank.

6.1.1 CBIR

CBIR is a technique which uses visual contents (features) to search images from large scale

image databases according to users’ requests. Here, the features mean color, texture, spatial,

shape, etc. and is different from text based search [Johan, 2007].

A typical CBIR solution requires the construction of an image descriptor, characterized by

an extraction algorithm to encode image features into feature vectors and a similarity measure

to compare two images called as matching function defined as an inverse function of the

Euclidean distance (larger the distance lesser the match).

Fig. 6.1 A typical CBIR System

The images are stored in a database called “image database”. To compare the query image

with these images, we need to encode the images into some form based upon the features. The


153

features may be color, texture, shape, or any other domain specific features. For instance, the

image histogram of each image can be stored in the “feature database” which in turn is

compared with the histogram of the query image as shown in Figure 6.1.

Image Indexing and Query

Unlike traditional data, however, images are complex. When images are represented as

feature vector, each image becomes a point in a k-dimensional space, where k is the number of

features in each vector. Such a feature vector can be created for any image by applying some

feature extraction algorithms.

When developing indexing methods for image data, then, researchers often operate under

the assumption that the images are represented as feature vector in the same multidimensional

space. To access data of this type, any of the dimensions should be used. So, for an indexing

technique for CBIR to perform efficiently, it must be designed to search all dimensions of the

data. The indexing technique used in CBIR should be able to efficiently satisfy several different

types of queries.

The CBIR system will usually pre-process the images stored in its database, by extracting

and indexing the feature vectors. This process is usually performed off-line, once per image.

Once the database is ready, the CBIR system allows the user to specify the queries by means of

a query pattern (which can be a sample image). The query is also processed by the feature

vector extractor, and the similarity function is used to evaluate its similarity to the database

images. Then, the database images will be ranked in decreasing order of similarity to the query,

and shown to the user in that order [Fabio, 2010].

6.1.2 EXISTING MEDICAL CBIR SYSTEMS

Although content-based image retrieval has frequently been proposed for use in medical

image management, only a few content-based retrieval systems have been developed

specifically for medical images. These research-oriented systems are usually constructed in

research institutes and continue to be improved, developed, and evaluated over time. This

section will introduce several major medical content-based retrieval systems [Henning Müller,

2004] [Chia-Hung, 2006].


154

1. ASSERT [Shyu, 1998] (Automatic Search and Selection Engine with Retrieval Tools):

Developed by Purdue University, Indiana University, and University of Wisconsin,

USA.

Hospital, USA. http://rvl2.ecn.purdue.edu/~cbirdev/WWW/CBIRmain.html

Comments: Restricted access

2. CasImage: Developed by University Hospital of Geneva, Switzerland.

http://www.casimage.com/

Comments: Web site does not open!

3. IRMA (Image Retrieval in Medical Applications): Developed by Aachen University of

Technology, Germany.

http://libra.imib.rwth-aachen.de/irma/

Comments: Not in English

4. NHANES II (The Second National Health and Nutrition Examination Survey):

Developed by National Library of Medicine, USA.

http://archive.nlm.nih.gov/proj/webmirs/

Comments: Mainly for radiological images

As there are few medical CBIR systems, there is a strong requirement for development of

medical image retrieval systems; in particular for echo images.

6.2 UNIVERSAL MODEL FOR CBIR

As part of the research work, first a general CBIR framework has been developed under the

name "A Universal Model for CBIR". The aim of this framework is to endow a flexible

environment for the users to input a query image and display all similar images after ranking.

The novel idea here is that the user can select individual or combination of features such as

color, texture, shape, edge frequency, Haar wavelets, edge density [Phung, 2007] [Henning

Müller, 2004] [Alberto, 2003], etc. By observing the output one can modify the feature

selection and get better output. This is almost similar to 'relevance feedback' concept being

specified in the literatures. Figure 6.2 shows one such model. The shaded box "Other features"

may include a number of other features into the model and it can work as plug-n-play. This

means, any feature can be plugged-in or plugged-out at run-time once the feature modules are

added to the system. This way the user can decide at any point of time which feature or features

are relevant to his application.

http://rvl2.ecn.purdue.edu/~cbirdev/WWW/CBIRmain.html

http://www.casimage.com/

http://libra.imib.rwth-aachen.de/irma/

http://archive.nlm.nih.gov/proj/webmirs/


155

Fig. 6.2 Proposed Universal Model for CBIR

However, this model can not be directly used for echo images. This is because

segmentation, qualitative, and quantitative features are more specific to these types of medical

images. Therefore, the "Segmentation" block must handle this issue and include appropriate

features in the feature vector. Another technique called "greedy strategy" is employed to

enhance the performance in terms of speed and accuracy of retrieval.

6.2.1 SIMILARITY COMPARISON USING GREEDY METHOD

The main issue in image retrieval systems is the number of dimensions of the feature vector

which is normally large. For example, QBIC system reduces the 20-dimension feature vector to

two or three using Principle Component Analysis (PCA). It explores exponentially with the

increasing of the dimensionality and eventually reduces to sequential searching. To overcome

these problems a simple method based on greedy strategy is followed.

Consider three database images and their corresponding segments as I1(S1, S2, S4), I2(S2, S5,

S8, S7), and I3(S1). The sequence of the segments shown in I1, I2, and I3 are based on descending

order of the size/area of each segment. Similarly, let QI(S7, S2) denotes the segments of the

query image. The algorithm shown in Figure 6.3 follows the greedy strategy to compare the

similarity between the query image and the database images.

Image

Enhancement

s

Query Image

Image

Database

Image

Segmentation

Segmentation/

Subdivision

Similarity

Comparison

Indexing &

Retrieval

Output

Ranked Images

Extract Color,

Texture, and

Edge Density

Features

Build single

Feature Vector

Feature

Database

Color

Histogram Texture Edge

Density

Image

Subdivision

Other Features


156

Algorithm ImageSimilarity

// I[N] – Image DB with N images

// QI – Query Image

foreach (Image I in I[N]) do

foreach (Segment s in SegmentSet) do

if (Euclidean(QI[s], I[s]) < threshold)

// continue to check other segments

else

// no need to check other segments

end.

Fig. 6.3 Algorithm for Similarity comparison based on greedy strategy

Suppose 20 features are fixed for each segment and there are five segments on an average

per image, then the comparison has to be repeated for each segment. With this proposed

approach, a reasonable enhancement in performance can be obtained when the number of

segments is large.

6.3 FEATURE VECTOR FORMULATION

Most image retrieval systems follow the paradigm of representing images using a set of

features, such as color, texture [Johan, 2007], shape, and edge orientation [Tristan, 2004]

[Manjunath, 2001]. Among these features, color is the most frequently used visual property in

content-based image retrieval because it is relatively robust, and invariant with respect to image

size and orientation [Missaoui, 2004]. However, in medical images color alone can not be

considered as the most prominent feature. In fact, to increase the accuracy of retrieval some of

the local features of pathological regions are to be obtained. The current research includes both

local and global features so that the most relevant echo images are retrieved. This section

discusses these features and how they are integrated and formulated as a single feature vector.

Here, a technique called "universal CBIR model" for the image retrieval is adopted which can

even be used for natural images.

As already explained in Figure 1 of Chapter 1, the feature extraction method for 2D echo

and color Doppler images is different. Subsequently in Chapter 3, 4, and 5 the feature

extraction methods were discussed. These features are systematically concatenated and used for

image retrieval.


157

6.3.1 2D ECHO IMAGE FEATURES

Two different feature databases: (1) for 2D echo image and (2) for color Doppler images

are used. Similarly, when a query image is submitted by the user, first the modality is checked

and then the search is initiated in the appropriate database. Typically for any 2D echo image the

following features are sufficient to compute for clinical purpose:

The process of obtaining these features can be explained as follows: From the 2D echo

image database all the images are preprocessed, segmented (KMEP), cardiac chambers are

boundary traced, quantitative computations are carried out, and finally formulated as a feature

vector and stored in the feature database. Next, the submitted query image also undergoes a

similar process and its feature is stored in a separate data structure. This query image feature is

then compared with each of the database feature vector (image) by calculating the Euclidean

distance. Now these distances are ordered with the smallest one giving highest rank which

signifies most similar. Finally, top k images are displayed, where k is the user given input.

6.3.2 COLOR DOPPLER IMAGE FEATURES

The second image database consists of color Doppler images of both normal and abnormal

category. Chapter 5 explained the method to extract different features from these images. To

Sl. No 2D Echo Image

Features

1 LVHeightED

2 LVDiameterED

3 LVAreaED

4 LVVolumeED

5 LVHeightES

6 LVDiameterES

7 LVAreaES

8 LVVolumeES

9 EF

10 FS

11 LAHeightED

12 LADiameterED

13 LAAreaED

14 LAVolumeED

15 LAHeightES

16 LADiameterES

17 LAAreaES

18 LAVolumeES

19 RVHeightED

20 RVDiameterED

Sl. No 2D Echo Image

Features

21 RVAreaED

22 RVVolumeED

23 RVHeightES

24 RVDiameterES

26 RVAreaES

27 RVVolumeES

28 RAHeightED

29 RADiameterED

30 RAAreaED

31 RAVolumeED

32 RAHeightES

33 RADiameterES

34 RAAreaES

35 RAVolumeES


158

extract the salient features, the Doppler image is first segmented and the color portion is

extracted. Pixel classification method is suitable for this task. This color portion, then, is

analyzed to extract various features as follows:

Here, features 1 to 6 represent the histogram mean and standard deviation for each color

channel [Rishav, 2009] [Sangoh, 2002]. Features 7 to 12 represent texture features in which

except contrast feature, the rest of the features the image is converted into grayscale) [Mihran,

1998]. Features 13 and 14 belong to the statistical categories. To extract features 15 and 18, the

image undergoes image processing techniques.

Edge Gradient Feature

Except for the edge gradient (EG) feature, all the rest of the features have been discussed

earlier [Minyoung, 2005]. Whenever the image has a mosaic color pattern, it suggests that the

patient is attacked by some kind of heart disease. For normal patients the color pattern is

generally uniform (red or blue) as explained in Chapter 3. The novel feature EG can detect this

by computing the gradient of each pixel with respect to their 4-neighbors (i.e. d = 1) for each

color channel R G B. The following pseudo code is used for this:

Step 1: Compute the gradient g(d) of each pixel in the image matrix I(x, y)

Sl. No. Color Doppler

Image Features

1 RedMean

2 GreenMean

3 BlueMean

4 RedSD

5 GreenSD

6 BlueSD

7 RedContrast

8 GreenContrast

9 BlueContrast

10 Energy

11 Entropy

12 Homogeneity

13 Skewness

14 Kurtosis

15 RedEG

16 GreenEG

17 BlueEG

18 ED


159

Step 2: Edge Gradient feature is the average values of gradient in a specified distance d.

Fig. 6.4 Image matrix to illustrate the computation of Edge Gradient

To illustrate the method, consider an image matrix as shown in Figure 6.4. Considering a

pixel 3, say, the gradient can be calculated as 2, 2, 2, 1 assuming d = 1. Next step is to increase

the value of d by 1 and repeat the same procedure for all pixels and all color channels. The

value of d is varied from 1 to 50 and the average of all these values will be the final edge

gradient value of a particular color channel. The rest of the feature extraction methodologies

have already been explained in Chapter 5.

6.4 SIMILARITY MEASURE

Selection of similarity metrics has a direct impact on the performance of content-based

image retrieval. The kind of feature vectors selected determines the kind of measurement that

will be used to compare their similarity (Smeulders, Worring, Santini, Gupta, & Jain, 2000). In

this work, Euclidean distance as shown in equation 6.1 is found suitable, because it is the most

common metric used to measure the distance between two points in multi-dimensional space

(Qian, Sural, Gu, and Pramanik, 2004). A number of other metrics, such as Mahalanobis

Distance, Minkowski-Form Distance, Earth Mover’s Distance, and Proportional Transportation

Distance, have been proposed for specific purposes.

2

1

)][][(

N

i

DBQ iFiFd (6.1)

where, FQ[i] is the ith

feature of the query image, FDB[i] is the ith

feature of the database image,

N is the number of features (size of the feature vector). In our case, the detection of similarity

should account for variations in the heart chamber dimensions, or stenosis, or regurgitation

levels. In addition to this the similarity measure should be robust to individual inter-patient

variations in the shape profile within the same disease class.

5 1 6

2 3 1

4 5 8


160

6.5 RETRIEVAL PROCESS

The process of retrieving the 2D echo or color Doppler images from the image database

consists of several steps. For 2D echo images, cardiac chambers are segmented and quantified

to build the feature vector, whereas in the case of color Doppler images features such as color

histogram, texture, edge density, etc., are extracted. The proposed CBIR model would consider

appropriate features selected by the user and retrieve the most similar images.

Consider the retrieval process of similar images for a given 2D echo query image. The

feature database consists of feature vectors of all the 2D echo images consisting of normal and

abnormal images. The query image is segmented and the cardiac image features are extracted

and formulated as a feature vector. The Euclidean distance between the query image feature

vector and database feature vector is computed. These distances are arranged in the ascending

order and top k images are displayed as the most relevant with respect to the query image.

Similarly, the retrieval process of color Doppler images is explained here. The feature

extraction process for these types of images is different from 2D echo images. As explained

earlier, the segmentation of the color Doppler images yield several features such as texture,

histogram, etc. From these features the users have the flexibility in selecting required features

and for which the distances are computed similar to the processing of 2D echo images.

The proposed CBEIR system, therefore, provides s framework to retrieve all images that

are similar to a given query image in terms of clinical features along with the flexibility in

setting few or all features.

6.6 RESULTS AND DISCUSSIONS

This section primarily shows the retrieval efficiency of the proposed CBEIR system for

various query images. The study had used the following image database

Total # of patients : 60 (42 Men and 18 Women with 42 ± 16 years of age)

Total # of images : 623 (Abnormal : 423 and Normal : 200)

# of image categories : 5

a) Normal : 200

b) Aortic Regurgitation (AR) : 53

c) Mitral Regurgitation (MR) : 51

d) Mitral Stenosis : 266

e) Aortic Stenosis : 53


161

The above list includes both 2D echo and color Doppler images.

6.6.1 RETRIEVAL EFFICIENCY

Figure 6.6 shows the retrieval efficiency shown in terms of recall-precision graph for the

color Doppler images.

Fig. 6.5 Recall – Precision curve for the query image based on image ranking (top k images).

Abnormal Query Image shown at the right-top corner.

Fig. 6.6 Recall – Precision curve for the query image based on image ranking (top k images).

Normal Query Image shown at the right-top corner.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Pre

cisi

on

Recall

Color Doppler Features

Color Histogram Features

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Pre

cisi

on

Recall

Color Doppler Features

Color histogram Features


162

The graph shows two curves: dotted line curve is drawn by considering only color

histogram feature and the other thick line with color Doppler image features. The later one is

superior in terms of better precision and it is not surprising because this feature includes

domain specific features rather than standard global features such as color histogram. It can be

observed that when the proposed color Doppler feature is combined with the traditional texture

properties, top 30 retrieved images are similar to the query image, except only one image.

The CBEIR system was tested with normal and abnormal query images and the

corresponding recall-precision curves appear in Figure 6.5 and Figure 6.6 respectively. It is

observed that the color Doppler feature based retrieval offers better performance than color

histogram method.

6.6.2 IMAGE RANKING: 2D ECHO AND COLOR DOPPLER IMAGES

There are two experiments conducted to study the performance of the CBEIR system.

(1) Query Image d = 0 (2) d = 2.3 (3) d = 8.0 (4) d = 19.8

(5) d = 24.0 (6) d = 25.5 (7) d = 26.5 (8) d = 27.2

(9) d = 53.6 (10) d = 61.2

Fig. 6.7 Rankings of 2D Echo images. Image 1 is the query image. Numbers in the parentheses represent

ranks and d is the distance with respected to the query image.


163

The first experiment is to show the rankings of the 2D echo images retrieved from the

image database for the query image being shown as the first image in Figure 6.7. The image

database is populated with both normal and abnormal images.

As per the rankings shown, it can be inferred that the first 3 images, i.e. ranks 2 to 4, are

similar to the query image. Images with rankings 5 to 10 have larger distance and therefore

they are dissimilar to the query image. This indicates that these images are not normal images.

This result can visually be verified as these images belong to patients affected by mitral

stenosis having dilated LA.

The second experiment is to test the ranking of images consisting of color Doppler images.

Keeping an abnormal image as query image as shown in Figure 6.8, the ranking of the images

are marked from 2 to 16.

(1) d = 0.0 (2) d = 5.0 (3) d = 5.5 (4) d = 6.0 (5) d = 6.7

(6) d = 7.8 (7) d = 8.2 (8) d = 8.3 (9) d = 9.0 (10) d = 9.5

(11) d = 9.8 (12) d = 9.9 (13) d = 9.9 (14) d = 10.0 (15) d = 10.2

(16) d = 10.9

Fig. 6.8 Ranking of images. Image (1) is the query image and the distance, d is specified for each image

and numbers in the brackets represent corresponding rank.


164

The image with distance 0 is the query image used for this experiment. When the distance,

d is less than 9.0 the images are ranked from 1 to 9 which are abnormal with mosaic color

pattern. At the same time when d is greater than 9, they are identified as normal patient images.

It can be observed that the color pattern of the query image (abnormal) matches with the

other abnormal images in the image database when top 9 images are retrieved. This implies that

both recall and precision values are 100%. A detailed discussion on the performance of CBEIR

system is given in Chapter 10.

6.7 SUMMARY

Medical CBIR systems are different from the conventional retrieval engines, because here

retrieval is based on pathology bearing regions (PBR) that tend to be highly localized. With

more and more patient records now containing multimodal imaging data, an exciting

application of image and video retrieval is emerging in the area of clinical decision support.

A novel Content Based Echo Image Retrieval (CBEIR) System is proposed which can be

used to retrieve 2D and Doppler images from a large echo image database based on quantitative

and qualitative feature descriptors. The design is based on multifeature based universal model

that could even be used for natural images with minimal changes.

Although content-based image retrieval has frequently been proposed for use in medical

image management, only a few content-based retrieval systems have been developed

specifically for medical images. The proposed model shows better retrieval efficiency after

extensive testing done using a large database of live patient images.

CONTENT BASED ECHO IMAGE RETRIEVAL (CBEIR)...

Documents

Transcript of CONTENT BASED ECHO IMAGE RETRIEVAL (CBEIR)...