1 A National Virtual Specimen Database for Early Cancer Detection June 26, 2003 Daniel Crichton NASA...

23
1 A National Virtual Specimen Database for Early Cancer Detection June 26, 2003 Daniel Crichton NASA Jet Propulsion Laboratory Sean Kelly NASA Jet Propulsion Laboratory Mark Thornquist Fred Hutchinson Cancer Research Center Sudhir Srivastava National Cancer Institute Heather Kincaid Fred Hutchinson Cancer Research Center Donald Johnsey National Cancer Institute Marcy Winget Fred Hutchinson Cancer Research Center

Transcript of 1 A National Virtual Specimen Database for Early Cancer Detection June 26, 2003 Daniel Crichton NASA...

1

A National Virtual Specimen Database for Early Cancer Detection

June 26, 2003

Daniel CrichtonNASA Jet Propulsion Laboratory

Sean KellyNASA Jet Propulsion Laboratory

Mark Thornquist Fred Hutchinson Cancer Research Center

Sudhir Srivastava National Cancer Institute

Heather KincaidFred Hutchinson Cancer Research Center

Donald JohnseyNational Cancer Institute

Marcy WingetFred Hutchinson Cancer Research Center

2

Vision

Development of a world-wide knowledge and informatics environment for sharing cancer specimen data across repositories

Data and Computers interconnected to

form a virtual database Integrated Cancer Resources

•Specimens•Images•Assays•Biomarkers•etc

3

Early Detection Research Network(EDRN)

5-Year collaboration supported by NCI Goal: Identify, evaluate, and validate

promising biomarkers to support the early detection of cancer

Comprised of:• 18 Biomarker Laboratories• 9 Clinical and Epidemiology Centers• 3 Biomarker Validation Laboratories• Data Management and Coordinating Center

4

EDRN Resource Network Exchange (ERNE) Virtual Specimen Repository (real-time access to

distributed repositories) Informatics infrastructure created for EDRN Existing sites specimen databases maintained locally Uses EDRN Common Data Elements (CDEs) Maps institutions local data definitions to EDRN

CDEs Secure and Confidential Secure Dynamic Portal

5

Informatics Deployment

6

Information Infrastructure Progress

Initiation (10/00 - 3/01)

•Connect Moffitt and San Antonio•Finalize EDRN CDEs used in knowledge system•Create Dynamic Portal•Present Feasibility at EDRN S.C. Meeting

•Discuss Informatics at 2nd EDRN S.C. Meeting•Present Mock Knowledge System at EDRN S.C. Meeting

Feasibility (4/01 - 10/01)

Pilot (10/01 - 9/02)•Implement four sites•Finalize IRB Protocol template•Create Online Mapping Tool•Present at EDRN S.C. Meeting

Implementation (9/02 - 6/03)•Implement three additional sites•Present at EDRN S.C. Meeting

7

EDRN Bioinformatics Architecture3. Repositories for storing and retrieving many data types data

1. Bioformatics tools and applications use “API”

Visualization Tools

Analysis Tools

“OODT”Middleware

“OODT”Middleware

EDRNData

Repositories

EDRNData

Repositories

APIAPI

APIAPI

2. Middleware creates theinformatics infrastructure connecting systems and data

SPOREData

Repositories

SPOREData

Repositories

OtherData

Repositories

OtherData

Repositories

APIAPI

Web Search Tools

MetadataMediationStandard

Metadata

8

Informatics Infrastructure

Connect local databases via the Internet Query multiple institutional databases

concurrently Metadata-based distributed framework Object Oriented Data Technology (OODT)

framework (JPL)• Combines semantic data model with distributed

services to create a “grid” architecture

9

OODT Framework

Developed by NASA to support science data management for the robotic planetary program

Defines a reusable architectural pattern that enables• information clustering and retrieval across distributed

data resources• intelligent query algorithm for scalability• interoperability between disparate data models• a reusable software components• domain independence• plug-in for various distributed computing

implementations

OODT/Science Web Tools

OODT/Science Web Tools

LocalClient

L ab-w ide Component F ramework

ProfileXMLData

ProfileXMLData

Data System

2

Data System

2

Data System

1

Data System

1

QueryServiceQuery

ServiceProductServiceProductService

ProfileServiceProfileService

ArchiveServiceArchiveService

Bridge to External Services

Bridge to External Services

10

Critical OODT Components

Query Server – Manages and routes concurrent queries to distributed resources. Combines results.

Profile Server – Enables resource discovery providing information about what data resources are available (a resource is really an electronic object)

Product Server – Enables access and retrieval of data products from an online data source

Servers written in Java and supported on Windows, Linux, Solaris, Mac OS X, etc

11

Software Component Deployment

Userquery

EDRN Secure Website

Que

ryC

lien

t

Web

se

rver

sear

ch.js

p

Product ServerMoffitt

EDRN ProfileServer

EDRN CDE Mapping Database

SpecimenDatabase

SpecimenDatabase

SpecimenDatabase

SpecimenDatabase

SpecimenDatabase

SpecimenDatabase

DMCC – Fred Hutchinson Cancer Research Center

Science Tools

Userquery

SpecimenDatabase

SpecimenDatabase

Product ServerSan Antonio

Product ServerMD Anderson

Product ServerColorado

Product ServerCreighton

Product ServerGLNE

Product ServerPittsburgh

Product ServerNew York

Product ServerBrigham and Womens

SpecimenDatabase

12

Semantic Architecture Define a common data model for EDRN

• Common Data Elements• Relationships between elements

Institutions have existing specimen repositories with locally defined data models• Map local data elements to CDEs using EDRN CDE mapping and

repository tools• 39 CDEs Shared

Use Standards• ISO/IEC 11179• Resource Description Framework (RDF)

Use standard definitions for data exchange• Communicate using a standard XML schema

13

Gender Mapping ExampleEDRN CDE Institution DE

Table Name M_Sput_Subject

Name BASELINE_DEMOGRAPHICS-GENDER_CODE

SEX

Version 1.0

Data Type Integer Character

Document Text Gender (What is yourgender?)

Gender

Permissible Values 1 Male2 Female9 Unknown/Refused

MFU

Mapping Type Match by Query

14

Security and Confidentiality

Highly Sensitive Information Health Insurance Portability and Accountability Act

(HIPAA)• Removed Personal Health Information (PHI)

Security Measures• 128-bit strong encryption using Secure Socket Layer (SSL)• Access limited to remote connect from specific IP(s) on

specific ports. Firewalls augmented with rule set. Institutions IRBs

• Common Protocol

15

Dynamic Portal

16

Advanced Search

17

Results

18

Number of Participants by Specimen Type

19255

186

9751

443616 253

Blood

Bone Marrow

Tissue

Bronchial Washings / Brushings

Sputum

Urine

19

ERNE Achievements

Deployed Software Infrastructure to 10 institutions• Process of connecting new sites well understood

Software Infrastructure Maturing• Extensive nightly testing and monitoring of infrastructure

Team Maturing and Growing Policy Challenges Institutional Access Science Support

20

More Information

EDRN – http://www.cancer.gov/edrn OODT – http://www.jpl.nasa.gov

Contact:• Heather Kincaid: [email protected] • Dan Crichton: [email protected]• Don Johnsey: [email protected]

21

Quick Search

22

Dynamic Portal

JSP-based implementation that queries informatics infrastructure• Uses CDE terms for constructing query

expression Shows available servers Limit available choices based on selected

criteria Quick Search Advanced Search

23

Quick Search Results