imp

4
Abstract—This paper presents an open image-mining framework that provides access to tools and methods for the characterization of medical images. Several image processing and feature extraction operators have been implemented and exposed through Web Services. Rapid-Miner, an open source data mining system has been utilized for applying classification operators and creating the essential processing workflows. The proposed framework has been applied for the detection of salient objects in Obstructive Nephropathy microscopy images. Initial classification results are quite promising demonstrating the feasibility of automated characterization of kidney biopsy images. I. INTRODUCTION he kidney is a multicellular, multistructure organ that is responsible for a part of the complex process of blood’s purification. Kidneys are being affected by many chronic diseases, like Obstructive Nephropathy [1], which is the main cause of renal failure. It is caused by obstruction of the urinary tract, with hydronephrosis, which is dilation of the renal pelvis and calyses resulting from obstruction to flow of urine. Considering that Obstructive Nephropathy is not a rare disease ([1]), an auto detection of the pathogenic areas on a kidney biopsy image is very useful, especially in such cases where the discrimination of healthy versus pathogenic biopsy samples is quite difficult and complex (see Fig. 1). The automated characterization of such images requires proper preprocessing, (e.g., image enhancement, color processing), feature extraction and classification. Nowadays the trend in image processing and software engineering in general is towards the development of algorithms and tools that provide each one of these functions individually [2], especially in form of Web Services, a technology that enables developers to programmatically access heterogeneous, distributed resources, providing easier integration and interoperability between data and applications. This paper presents an open framework based on Web Services that provides access to complete tools for Manuscript received April 1, 2010. This work is partly funded by the EU via the e-LICO FP7 Collaborative Project (grant agreement 231519). C. Doukas is with the University of the Aegean, Samos, Greece (e-mail: [email protected]). T. Goudas and I. Maglogiannis are with the University of Central Greece, Lamia, Greece (e-mail: {goudas, imaglo}@ucg.gr). S. Fischer and I. Mierswa are with Rapid-I GmbH (e-mail: {fischer, ingo.mierswa}@rapid-i.com). A. Chatziioannou is with the National Hellenic Research Foundation, Athens, Greece (email: [email protected]) data mining of biomedical image data. The tools implemented as Web Services can be directly integrated workflow management platforms (e.g., TAVERNA [3]), allowing their integration in several workflows corresponding to different image processing pipelines. Proper authentication and encryption mechanisms have been utilized in order to guarantee the appropriate security. The rest of the paper is organized as follows: Section 2 discusses related work in image mining through Web Services. Section 3 presents the tools and methods that enable the functionality of the proposed platform, whereas the architecture scheme is described in Section 4. Section 5 describes the Obstructive Nephropathy images characterization process followed by initial evaluation results in Section 6. Finally, Section 7 concludes the paper. a) b) Fig. 1. Samples of obstructive nephropathy images. a) healthy biopsy sample, b) pathogenic biopsy sample. II. RELATED WORK Despite the great impact of the Web Services in the development and deployment of web applications, the exploitation of the latter in the domain of biomedical data mining is still quite narrowed. Only a few systems exist in the literature that utilize Web Services in order to provide functionality and access to computational resources for processing, annotating and mining biomedical data and especially medical images. The MIAKT system [4] provides knowledge management of data generated by screening processes and means for medical staff to investigate, annotate, and analyze the data using web and GRID services. In [5], authors have proposed a model for implementing SOA (Service Oriented Architecture) based image processing systems. The proposed architecture consists of a programming model, a service model and a messaging model. The authors focused on the concept of service. The INBIOMED platform [6] is a Web Services An Open Data Mining Framework for the Analysis of Medical Images: Application on Obstructive Nephropathy Microscopy Images Charalampos Doukas, student Member, IEEE, Theodosis Goudas, Simon Fischer, Ingo Mierswa, Aristotle Chatziioannou and Ilias Maglogiannis, Member, IEEE T 32nd Annual International Conference of the IEEE EMBS Buenos Aires, Argentina, August 31 - September 4, 2010 978-1-4244-4124-2/10/$25.00 ©2010 IEEE 4108

Transcript of imp

Page 1: imp

Abstract—This paper presents an open image-mining framework that provides access to tools and methods for the characterization of medical images. Several image processing and feature extraction operators have been implemented and exposed through Web Services. Rapid-Miner, an open source data mining system has been utilized for applying classification operators and creating the essential processing workflows. The proposed framework has been applied for the detection of salient objects in Obstructive Nephropathy microscopy images. Initial classification results are quite promising demonstrating the feasibility of automated characterization of kidney biopsy images.

I. INTRODUCTION

he kidney is a multicellular, multistructure organ that is responsible for a part of the complex process of blood’s

purification. Kidneys are being affected by many chronic diseases, like Obstructive Nephropathy [1], which is the main cause of renal failure. It is caused by obstruction of the urinary tract, with hydronephrosis, which is dilation of the renal pelvis and calyses resulting from obstruction to flow of urine. Considering that Obstructive Nephropathy is not a rare disease ([1]), an auto detection of the pathogenic areas on a kidney biopsy image is very useful, especially in such cases where the discrimination of healthy versus pathogenic biopsy samples is quite difficult and complex (see Fig. 1). The automated characterization of such images requires proper preprocessing, (e.g., image enhancement, color processing), feature extraction and classification. Nowadays the trend in image processing and software engineering in general is towards the development of algorithms and tools that provide each one of these functions individually [2], especially in form of Web Services, a technology that enables developers to programmatically access heterogeneous, distributed resources, providing easier integration and interoperability between data and applications. This paper presents an open framework based on Web Services that provides access to complete tools for

Manuscript received April 1, 2010. This work is partly funded by the EU

via the e-LICO FP7 Collaborative Project (grant agreement 231519). C. Doukas is with the University of the Aegean, Samos, Greece (e-mail:

[email protected]). T. Goudas and I. Maglogiannis are with the University of Central Greece, Lamia, Greece (e-mail: {goudas, imaglo}@ucg.gr). S. Fischer and I. Mierswa are with Rapid-I GmbH (e-mail: {fischer, ingo.mierswa}@rapid-i.com). A. Chatziioannou is with the National Hellenic Research Foundation, Athens, Greece (email: [email protected])

data mining of biomedical image data. The tools implemented as Web Services can be directly integrated workflow management platforms (e.g., TAVERNA [3]), allowing their integration in several workflows corresponding to different image processing pipelines. Proper authentication and encryption mechanisms have been utilized in order to guarantee the appropriate security. The rest of the paper is organized as follows: Section 2 discusses related work in image mining through Web Services. Section 3 presents the tools and methods that enable the functionality of the proposed platform, whereas the architecture scheme is described in Section 4. Section 5 describes the Obstructive Nephropathy images characterization process followed by initial evaluation results in Section 6. Finally, Section 7 concludes the paper.

a) b)

Fig. 1. Samples of obstructive nephropathy images. a) healthy biopsy sample, b) pathogenic biopsy sample.

II. RELATED WORK

Despite the great impact of the Web Services in the development and deployment of web applications, the exploitation of the latter in the domain of biomedical data mining is still quite narrowed. Only a few systems exist in the literature that utilize Web Services in order to provide functionality and access to computational resources for processing, annotating and mining biomedical data and especially medical images. The MIAKT system [4] provides knowledge management of data generated by screening processes and means for medical staff to investigate, annotate, and analyze the data using web and GRID services. In [5], authors have proposed a model for implementing SOA (Service Oriented Architecture) based image processing systems. The proposed architecture consists of a programming model, a service model and a messaging model. The authors focused on the concept of service. The INBIOMED platform [6] is a Web Services

An Open Data Mining Framework for the Analysis of Medical Images: Application on Obstructive Nephropathy Microscopy

Images

Charalampos Doukas, student Member, IEEE, Theodosis Goudas, Simon Fischer, Ingo Mierswa, Aristotle Chatziioannou and Ilias Maglogiannis, Member, IEEE

T

32nd Annual International Conference of the IEEE EMBSBuenos Aires, Argentina, August 31 - September 4, 2010

978-1-4244-4124-2/10/$25.00 ©2010 IEEE 4108

Page 2: imp

oriented architecture that provides a framework for sharing resources and medical image processing algorithms. The Web Services integrated into the platform provide morphological operators and filters (e.g., erosion, dilation, opening, closing) and segmentation methods.

The aforementioned works focus mostly on the provision of specific functionalities for analyzing and processing medical images of a narrowed range of modalities through Web Services. To the best of our knowledge, there is no work that enables the mining of biomedical images based on a number of tools and frameworks providing complete functionality through a single Web Service framework. The major benefits of this approach can be summarized into the following: Open access to biomedical image mining

functionality without specific requirements Total interoperability Provision of a complete framework. Single access point Image mining consistency and reliability

III. TOOLS AND METHODS

This section provides more details regarding the tools and methods utilized for developing the presented image mining framework.

A. Image Acquisition and Processing

The following aspects of image processing are considered: Acquisition. The framework provides support for the

acquisition, sampling and writing of image files, complete program control and easy integration into image-enabled applications that utilize databases.

Transformations. It includes functions for transform an image from space to frequency dimension (Fourier, DCT) and wavelet transforms.

Image Enhancement. Functionalities for image enhancement such as subtraction, background correction, denoising, smoothing, spatial and median filtering, histogram equalization etc., are provided.

Color Processing. It refers to pixel based processing that will allow users to gather colour information (i.e. the counting of the number of unique colors within an image, or finding the dominant color are particularly useful functions as it enables one analyze an image).

Image Analysis and Feature Extraction. The toolbox supports the extraction of the produced quantitative features in XML files that may be directly stored into a database for further processing and exploitation (e.g., classification using the Rapid Miner tool).

B. Image Mining through Web Services & Secure Access

Web Services are emerging as a promising technology to build distributed applications. It is an implementation of SOA [7] that supports the concept of loosely-coupled, open-standard, language - and platform-independent systems. The

loosely-coupled features allow service providers to modify backend functions while maintaining the same interface to clients. Web Services are accessed through the HTTP/HTTPS protocols and utilize XML (eXtendible Markup Language) for data exchange. This in turn implies that Web Services are independent of platform, programming language, tool and network infrastructure. Services can be assembled and composed in such a way to foster the reuse of existing back-end infrastructure. The WS-Security kit (Rampart) [8] has been utilized for user authentication. WS-Security is a standard for adding security to SOAP Web Service message exchanges. It uses a SOAP message-header element to attach the security information to messages, in the form of tokens conveying different types of claims (which can include names, identities, keys, groups, privileges, capabilities, etc.) along with encryption and digital-signature information. On top of the WS-Security kit, the SSL [9] protocol has been used for the proper encryption of the data during transmission between the service consumer and the Web Service itself.

C. The Rapid-Miner Classification and Workflow Management Tool

RapidMiner [13] is a flexible, modular, and extensible data mining and data processing solution. Being an open source project it is available to the scientific community and has a large user base. Written entirely in Java it provides an open API and can be easily integrated into any existing application. In its standard version, RapidMiner comes with hundreds of operators for machine learning, data manipulation, filtering, format conversion, etc. As of version 5.0, the user interface and process execution engine has been completely rewritten as a generic workflow execution engine. Based on this, a set of new operators that integrate transparently with the Web Services presented in the preceding section have been developed. Using these operators, it is possible to design data mining workflows that connect seamlessly to the image mining tools described above (see Fig. 3). The image mining extension encompasses the following (groups of) operators: List Images: This operator simply lists all image files

found in a particular directory. Upload Images: The listed images are uploaded to the

image mining server, and a reference to the images is obtained and stored. Subsequently, only this reference is used and no intermediate up- or download of image data is necessary. Visualize Image: If desired, intermediate results can be

visualized within RapidMiner. This requires the download of the processed images. Image Transformation: Using the image references,

various image transformation algorithms are performed. The transformed images are stored on the server. Feature Extraction: Using image references, various

4109

Page 3: imp

features features can be extracted and transformed into a tabular format which can be further used by RapidMiner for data mining and processing.

The groups of image transformation and feature extraction operators constitute the most relevant part. Due to self-description methods of the Web Service, RapidMiner is able to detect the set of provided algorithms automatically. Hence, new image mining algorithms can be deployed automatically without a need to update the RapidMiner components. Whereas RapidMiner is not aware of image files, formats, etc., the image mining operators can be combined flexibly to transform images into a tabular format well-suited for data mining and further processing.

IV. THE FRAMEWORK ARCHITECTURE

The proposed framework is the based on the services-oriented architecture model as illustrated in Fig. 2. The main component is the Image Processing Web Services Core that hosts all the functionality exposed to the client communication through SOAP messages and the HTTP/HTTPS protocol. Appropriate classes and functions implement the aforementioned functionality utilizing any essential application programming interfaces (APIs) that provide access to advanced functionality (e.g., data management, image processing, etc.) or to data repositories and computational resources. The Web Services Core is hosted by an appropriate service container (i.e., usually an application server). The latter usually resides among with additional resources (e.g., databases) in the framework container (physically can be a framework server). The communication with the RapidMiner tool is performed through SOAP calls. This type of architecture is modular and allows the easy integration of new services.

V. SALIENT OBJECTS DETECTION IN OBSTRUCTIVE

NEPHROPATHY IMAGES

A. Image Processing and Feature Extraction

The characterization of obstructive nephropathy images requires initially appropriate image processing and feature extraction. Images are firstly enhanced by histogram equalization and then converted to 8-bit. A segmentation window of 40x40 pixels is applied for gridifying the initial image into smaller parts. This step is performed in order to increase the resolution of the dataset. Texture analysis follows by calculating features like:

The Mean, which gives the average value of the segment’s pixels, along with the Standard Deviation, which gives the value of the dispersion of the values around the Mean.

The Angular Second Moment (ASM), which gives a measurement related to orderliness. ASM is calculated using the Grey Level Co-occurrence Matrix (GLCM) of the segment of the specific ROI.

The GLCM is a tabulation of how often different combinations of pixels brightness values (grey levels) occur in an image. i = the row number j = the column number Pi,j = the element i, j of the normalized symmetrical GLCM

ASM iP i , j2

i , j 0

N 1

(1)

The Contrast, which gives a measure of how sharp the structural variations in the image are.

12

, 0

( )n

i ji j

Contrast P i j

(2)

The Correlation, which is a measure of grey – level linear dependency of the image’s segment.

2

1( )( )

, 0

ni j

i ji j

Correlation P

(3)

The Inverse Difference Moment, which gives a measure of the local homogeneity of the segment of the image.

21

1 ( ) i ji ji j

ID M P

(4)

The Entropy, which is a measurement of randomness

log( )i j iji j

Entropy P P (5)

The above features have been selected as the most appropriate in order to characterize and classify complex biomedical images, like the kidney biopsy images ([10], [12]. All processing steps are available as Web Service operators and can be inserted into the RapidMiner tool for creating an automated workflow procedure.

B. Data Classification

The features described in previous section are obtained through the appropriate workflow process designed and implemented in RapidMiner. Classification has been performed using the k-Nearest Neighbor, the Naïve Bayes and the Support Vector Machines (SVM) ([11]) classifiers. Afterwards, ten-fold cross validation has been applied for evaluation purposes.

Fig. 2. Illustration of the Framework Architecture

4110

Page 4: imp

Fig. 3. Screenshot of the RapidMiner [13] interface illustrating a workflow for processing, feature extraction and classification of Obstructive Nephropathy Images.

VI. INITIAL EVALUATION RESULTS

In order to evaluate the accuracy of the image mining framework for the characterization of obstructive nephropathy images, an initial dataset of 6 Kidney biopsy images has been utilized. The images have been provided by INSERM (France). They have been obtained from healthy and pathogenic kidney biopsies of mice, and have been treated following Masson's trichrome staining technique in order to disclose the most important structures (see Fig. 4). A magnification of 200, aperture of 0.5, 10 ms exposition and gain of 1.0 have been used as shooting settings. In order to overcome the issue with the small dataset, the images have been gridified (See Section V) resulting into a quite larger dataset.

After applying the aforementioned image processing and the k – Nearest Neighbor classifier, 260 over 283 non-pathogenic Glomerulus (see Fig. 4) have been successfully recognized (i.e. 91,87% accuracy) , whereas all pathogenic Glomerulus were successfully classified.

Fig. 4. Annotation of important structures that determine pathogenesis in a Kidney biopsy image. Strong line: Glomerulus, Dashed line: Tubulus

The Bayesian classifier achieved 67,82% accuracy in

predicting pathogenic areas. SVM have reached an accuracy of 76,87%, in addition to the fact that all non-pathogenic Glomerulus were successfully predicted.

VII. CONCLUSION

This paper has presented an open image mining framework for the characterization of Obstructive Nephropathy images. It is based on image processing and feature extraction operators available as Web Services and their integration in RapidMiner, an open data mining

workflow tool. The framework enables experts to utilize image mining techniques without any requirements for specific image processing or data mining knowledge. Initial evaluation results are quite promising. Future work includes more extensive evaluation of the platform using new datasets.

ACKNOWLEDGMENT

This work is funded by Information Society Technology program of the European Commission “e-Laboratory for Interdisciplinary Collaborative Research in Data Mining and Data-Intensive Sciences (e-LICO)” (IST-2007.4.4-231519). Authors would also like to thank Joost Schanstra and Julie Klein from INSERM for the provision and annotation of the biopsy images.

REFERENCES [1] Klahr S, “Obstructive nephropathy”, Internal medicine, vol. 39, no 5.,

2000, pp. 355-361. [2] Biological Web Services, June 2009, available online at:

http://maurobio.infobio.net/bws/biows.htm. [3] Tom Oinn, Matthew Addis, Justin Ferris, Darren Marvin, Martin

Senger, Mark Greenwood, Tim Carver, Kevin Glover, Matthew R. Pocock, Anil Wipat and Peter Li, “Taverna: a tool for the composition and enactment of bioinformatics workflows”, Bioinformatics, vol. 20, no. 17, pp. 3045-3054, June 2004.

[4] Shadbolt, N. Lewis, P., Dasmahapatra S., Dupplaw D., Hu B. and Lewis H., “MIAKT: Combining Grid and Web Services for Collaborative Medical Decision Making”, In Proc. of AHM2004 UK eScience All Hands Meeting, September 2004, Nottingham, UK.

[5] Todica V., Vaida M.F., “SOA-based medical image processing platform”, in Proc. of IEEE International Conference on Automation, Quality and Testing, Robotics, 2008, vol. 1, pp. 398-403, May 2008.

[6] D. Perez, J. Crespo, A. Anguita, J. Ordonez, J. Dorado, G. Bueno, V. Feliu, A. Estruch, J. Heredia, “Biomedical Image Processing integration through INBIOMED: A Web Services-based Platform”, presented at the 6th International Symposium on Biological and Medical Data Analysis (ISBMDA 2005).

[7] Newcomer, Eric; Lomow, Greg (2005). Understanding SOA with Web Services. Addison Wesley. ISBN 0-321-18086-0.

[8] Kyle Gabhart, “Secure, Reliable Web Services with Apache”, available online at: http://www.xml.com/pub/a/2007/05/02/sure-reliable-web-services-with-apache.html.

[9] The OpenSSL Project, information available online at: http://www.openssl.org/.

[10] Ilias Maglogiannis, Charalampos Doukas, “Overview of Advanced Computer Vision Systems for Skin Lesions Characterization”, IEEE Transactions on Information Technology in Biomedicine, vol 13, no 5, pp. 721-733, Sept. 2009, DOI: 10.1109/TITB.2009.2017529..

[11] I. Maglogiannis, E. Zafiropoulos: “Utilizing Support Vector Machines for the Characterization of Digital Medical Images” BMC Medical Informatics and Decision Making 2004.

[12] Haralick M et al, “Textural Features for Image Classification”, IEEE Transactions on systems man and cybernetics Vol. SMC-3 pp. 610-621, 1973.

[13] Miersw Ingo, Wurst Michae, Klinkenberg Ralf, Scholz Martin and Euler Timm. “YALE: Rapid Prototyping for Complex Data Mining Tasks”, in Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-06), 2006.

4111