INFSO-RI-508833 Enabling Grids for E-sciencE NA4/Biomed Demonstration Medical Data Management and...

22
INFSO-RI-508833 Enabling Grids for E-sciencE www.eu- egee.org NA4/Biomed Demonstration Medical Data Management and processing EGEE 3 rd review rehearsal, May 4 th , 2006 Johan Montagnat Tristan Glatard

Transcript of INFSO-RI-508833 Enabling Grids for E-sciencE NA4/Biomed Demonstration Medical Data Management and...

INFSO-RI-508833

Enabling Grids for E-sciencE

www.eu-egee.org

NA4/Biomed DemonstrationMedical Data Management and processing

EGEE 3rd review rehearsal, May 4th, 2006

Johan Montagnat

Tristan Glatard

NA4/biomed demonstration, 3rd EGEE review rehearsal, May 4th 2006 2

Enabling Grids for E-sciencE

INFSO-RI-508833

Demonstration content

• Medical Data Manager– Interface to clinical data storage (DICOM)

– Integrated to gLite 1.5 middleware

– Tackling data security and privacy needs

– Result of MDM TCG Working Group

• Application to medical images registration assessment– Data intensive workflow-based application

– Scientific results in the medical image processing area with consequences for clinical use

• Immediate scheduling of jobs submitted for the demonstration– Torque+MAUI configuration for efficient handling of short jobs

– Result of the SDJ TCG Working Group

NA4/biomed demonstration, 3rd EGEE review rehearsal, May 4th 2006 3

Enabling Grids for E-sciencE

INFSO-RI-508833

Medical Data Manager• Objectives

– Expose an standard grid interface (SRM) for medical image servers (DICOM)

– Fulfill application security requirements without interfering with clinical practice

DICOM server

gLiteIOserver

SR

M-D

ICO

Min

terf

ace

AMGA Metadata

User Interface

Worker Node

HydraKey store

The MDM componentsDICOM clients

FiremanFile

Catalog

NA4/biomed demonstration, 3rd EGEE review rehearsal, May 4th 2006 4

Enabling Grids for E-sciencE

INFSO-RI-508833

Interfacing sensitive medical data

• Computing interface to medical DICOM storage– Data are acquired from the hospital imagers in native DICOM

format

– Standard SRM interface exposed to the grid

– DICOM slices are assembled in 3D images

• Privacy– Fireman provides file level ACLs

– gLiteIO provides transparent access control

– AMGA provides metadata secured communication and ACLs

– SRM-DICOM provides on-the-fly data anonimization It is based on the dCache implementation (SRM v1.1)

• Data protection– Hydra provides encryption/decryption transparently

gLite 1.5 servicegLite 1.5 service

gLite 1.5 service

ARDA service

NA4/biomed demonstration, 3rd EGEE review rehearsal, May 4th 2006 5

Enabling Grids for E-sciencE

INFSO-RI-508833

Medical Data Registration

DICOM server

AMGA Metadata

HydraKey store

gLiteIOserver

1. Image is acquired

2. Image is stored in DICOM server

3. glite-eds-put

3a. Image is registered (a GUID is associated)

3b. Image keyis produced andregistered

4. image m

etadataare registered

FiremanFile

Catalog

NA4/biomed demonstration, 3rd EGEE review rehearsal, May 4th 2006 6

Enabling Grids for E-sciencE

INFSO-RI-508833

Medical Data Retrieval

DICOM serverS

RM

-DIC

OM

inte

rfac

e

AMGA Metadata

UserInterface

Worker Node

HydraKey store

gLiteIOclient 2. glite-eds-get

3. get SURL from GUID

4. request file

5. get file key

6. on-the-fly encryption and anonimyzation

return encrypted file

7. get file key and decrypt file locally

Metadata ACL control

Key ACL control

Anonimization & encryption

In-memorydecryption

1. get GUID from metadata

gLiteIOserver

FiremanFile

CatalogFile ACL control

NA4/biomed demonstration, 3rd EGEE review rehearsal, May 4th 2006 7

Enabling Grids for E-sciencE

INFSO-RI-508833

Data replication and retrieval usecase

gLiteIOserver

DICOM server

SRM-DICOMinterface

User Interface Worker Node

HydraKey store

gLiteIOclient

gLiteIOclient

Any SE

SRM interface

Anyfile

1. Requestreplication 6. Request file

2. Get SURL

3. Get file 8. Get file

4. Get file key

5. (encrypted) File replication

7. Get SURL

9. Return (encrypted) file

gLiteIOserver

FiremanFile

CatalogFile ACL control

NA4/biomed demonstration, 3rd EGEE review rehearsal, May 4th 2006 8

Enabling Grids for E-sciencE

INFSO-RI-508833

Bronze Standard application

• Medical image registration algorithms assessment– Registration needed in many clinical procedures

– Real clinical impact

• Interfaced to the medical data manager– To retrieve suitable input images

• Compute intensive– Medical image registration algorithms: minutes to hours of

computations on PCs

• Data intensive– Hundreds to thousands of image pairs

• Workflow-based– Using the MOTEUR service-based workflow manager

– Developed in the French ACI “Masse de données” AGIR project

No execution infrastructuregLite 1.5 phased out

May 4, 2:30pm updateB-plan should work:

- pre-install glite 1.5 DMS on prod.- Use production infrastructure for the

demo

NA4/biomed demonstration, 3rd EGEE review rehearsal, May 4th 2006 9

Enabling Grids for E-sciencE

INFSO-RI-508833

After registrationBefore registration

Image Registration

NA4/biomed demonstration, 3rd EGEE review rehearsal, May 4th 2006 10

Enabling Grids for E-sciencE

INFSO-RI-508833

Variability of a registration algorithm

Registration algorithm

Final transformation

External parameters

• Data (image) 1

• Data (image) 2

• Acquisition noise

• Patient effects

Varying internal parameters

• Initial transformation

• (…)

• Robustness: ability to find the right transformation (success/failure)

• Repeatability: w.r.t. some parameters (e.g. initialization)

• Accuracy: Variability w.r.t. the ground truth for typical data

Fixed internal parameters

• Multiscale resolution

• (Typical variance…)

NA4/biomed demonstration, 3rd EGEE review rehearsal, May 4th 2006 11

Enabling Grids for E-sciencE

INFSO-RI-508833

Performance Evaluation without Gold Std

• Bronze standard: The exact result is an unknown variable

• Unbiased estimation: use redundant information

– use many different registration algorithms

(average biases, so that precision ~ accuracy)

– Use many different data (redundant information to ensure precision)

– Average transformations (maximal consistency)

NA4/biomed demonstration, 3rd EGEE review rehearsal, May 4th 2006 12

Enabling Grids for E-sciencE

INFSO-RI-508833

Bronze Standard workflow

transrotjiT ,,,

NA4/biomed demonstration, 3rd EGEE review rehearsal, May 4th 2006 13

Enabling Grids for E-sciencE

INFSO-RI-508833

Data-intensive service-based applications

• Service-based approach versus task-based approach

Service0

Service1 Service2

Service3

input0

4 instances Job0

Job1 Job2

Job3

Graph of services DAG of tasks

Job0

Job1 Job2

Job3

Job0

Job1 Job2

Job3

Job0

Job1 Job2

Job3

Job0

Job1 Job2

Job3

Job0

Job1 Job2

Job3

Job0

Job1 Job2

Job3

Job0

Job1 Job2

Job3

Job0

Job1 Job2

Job3

Job0

Job1 Job2

Job3

Job0

Job1 Job2

Job3

Job0

Job1 Job2

Job3

Job0

Job1 Job2

Job3

Job0

Job1 Job2

Job3

Job0

Job1 Job2

Job3

Job0

Job1 Job2

Job3

NA4/biomed demonstration, 3rd EGEE review rehearsal, May 4th 2006 14

Enabling Grids for E-sciencE

INFSO-RI-508833

Data composition strategies

• Data composition patterns : data intensive applications– One-to-one All-to-all

– In our case: register all images of

the same patient

the same modality

A different exam date

Set 0 Set 1

I0

J0

I1

J1

I2

J2

Set 0 Set 1

I0

J0

I1

J1

I2

J2

NA4/biomed demonstration, 3rd EGEE review rehearsal, May 4th 2006 15

Enabling Grids for E-sciencE

INFSO-RI-508833

workflowmanager

MOTEUR services orchestration

EGEEUser Interface

EGEEResources

Input 0

Service B

Output 0

Input 0 Input 1

Service A

Output 0

Data 0

Img Ref 0

Img Ref 0Img Ref 0 Img Ref 0

Img Ref 0Data 1

Img Ref 1

Img Ref 1

Img Ref 1Img Ref 1

Img Ref 1

Img Ref 1Img Ref 1

Img Ref 1Data 2

Img Ref 2

Img Ref 2Img Ref 1Img Ref 2Img Ref 2

Img Ref 2

NA4/biomed demonstration, 3rd EGEE review rehearsal, May 4th 2006 16

Enabling Grids for E-sciencE

INFSO-RI-508833

workflowmanager

Legacy code encapsulation

Input 0 Input 1

Generic service

Output 0

Input 0 Input 1

Generic service

Output 0

Algorithm parametersdescription

Algorithm parametersdescription

Legacy code 2

Legacy code 1

<description> <executable name="CrestLines.pl"> <access type="URL"> <path value="http://colors.unice.fr:80/"/> </access> <value value="CrestLines.pl"/> <input name="image" option="-im1"> <access type="LFN" /> </input> <input name="scale" option="-s"/> <output name="crest_lines" option="-c2"> <access type="LFN" /> </output> <sandbox name="convert8bits"> <access type="URL"> <path value="http://colors.unice.fr:80/"/> </access> <value value="Convert8bits.pl"/> </sandbox> </executable></description>

NA4/biomed demonstration, 3rd EGEE review rehearsal, May 4th 2006 17

Enabling Grids for E-sciencE

INFSO-RI-508833

Short Deadline Jobs

• Torque + MAUI specific configuration– Virtual processors allocation

– Does not interfere with normal batch scheduling (shared processor time)

– Enables efficient processing of short tasks on the production infrastructure

– Ersatz for lack of jobs prioritization

• Special submission queues– Three SDJ queues deployed on biomed-compliant sites

– Time-limited queues

• Submit-or-reject paradigm– Jobs are immediately executed or rejected if a too high number of

short jobs are already executing.

NA4/biomed demonstration, 3rd EGEE review rehearsal, May 4th 2006 18

Enabling Grids for E-sciencE

INFSO-RI-508833

Workflow execution

NA4/biomed demonstration, 3rd EGEE review rehearsal, May 4th 2006 19

Enabling Grids for E-sciencE

INFSO-RI-508833

Post-mortem trace

NA4/biomed demonstration, 3rd EGEE review rehearsal, May 4th 2006 20

Enabling Grids for E-sciencE

INFSO-RI-508833

Scientific production

• 4 rigid-registration algorithms precision estimated on brain image database

• To be published in [HealthGrid’06]

Algorithm reg (deg) trans (mm)

CrestMatch

0.150 0.424

PFRegister

0.180 0.416

Baladin 0.139 0.395

Yasmina 0.137 0.445

NA4/biomed demonstration, 3rd EGEE review rehearsal, May 4th 2006 21

Enabling Grids for E-sciencE

INFSO-RI-508833

Why grids?

• From days to hours– 10s to 100s of algorithms

To adapt to many clinical cases

– Virtually illimited parameterization

– Virtually illimited number of image databases Different modalities, different body regions

• Complex computation procedure– Difficult experimental set up

– Future plan: application portal

• Data federation– Obtain data sources needed for validation

• Algorithms sharing– Use registration services developed in different research groups

– Reproducible results

NA4/biomed demonstration, 3rd EGEE review rehearsal, May 4th 2006 22

Enabling Grids for E-sciencE

INFSO-RI-508833

Conclusions

• Medical data management– Advanced Data management functionalities

– Application area-level layer on top of foundation middleware

– Dependent on the deployment of gLite 1.5 services

• Bronze Standard application– Complex, workflow-based application

– Data intensive Non-trivial parallel computations Data federation using grid data management services

– Production of scientific results

• Short deadline jobs– Immediate scheduling of short tasks

– Submit-or-reject paradigm