ISPIDER – A Pilot Grid for Integrative Proteomics

ISPIDER – A Pilot Grid for Integrative Proteomics

BEP-II grantholders meeting,Edinburgh 24th Nov 2004

Diversity of proteome data

sequences>A01562MAPKATYLIGAADKFHW>A01567MAQQPKEMLNILADKFHWFLYC

gels

mass specStructures/folds

Other data:Species, PTMS, pathways, functional annotation, transcriptome data

Integration problems

• Lack of specific middleware– Existing resources not wrapped

• Lack of data standards– Standards for proteomics, incl. MS and protein

identification are emerging

• Data not modelled– New challenges from proteomics– Data not captured/modelled

• Data not captured– No mature repositories/databases for some proteome

data

• But there is lots of data …

Aims

• To develop an integrated platform of proteomic data resources enabled as Grid/Web services

• Integrate existing proteome resources, enabling them as Grid/Web services.

• To develop novel, proteome-specific databases as part of ISPIDER delivered as Grid/Web and browser-based services:– A repository for experimental proteome data– A proteome protein identification server and database– A phosphoproteome specific database

• To develop middleware & support for distributed querying, workflows and other integrated data analysis tasks

• Demonstrate effectiveness of the resulting infrastructure studies in proteomics, including:– Visualisation clients for proteomic data e.g. LRF data– Analyses for fungal species of industrial interest– Protein structural/functional trends in experimental proteomics

e.g. linking domain structural patterns

Existing Resources

PS

WS

PF

WS

TR

WS

GS

WS

FA

WS

PPI

WS

PID

WS

PRIDE

WS

PEDRo

WS

ISPIDER Resources

Integrated Proteomics Informatics Platform - Architecture

VanillaQuery Client

2D GelVisualisation

Client + Aspergil.Extensions

+ Phosph.Extensions

PPI Validation + Analysis

Client

Protein ID Client

ExistingE-ScienceInfrastructure

ISPIDERProteomics GridInfrastructure

ISPIDERProteomics Clients

PublicProteomicResources

myGridOntologyServices

myGridDQP

DASAutoMedmyGrid

Workflows

ProteomeRequestHandler

InstanceIdent/Mapping

Services

ProteomicOntologies/

Vocabularies

SourceSelectionServices

DataCleaningServices

Phos

WS

WP1 WP2

WP3

WP4 WP5

WP6

WP6

WP3

KEY: WS = Web services, GS = Genome sequence, TR = transcriptomic data, PS = protein structure, PF = protein family, FA = functional annotation, PPI = protein-protein interaction data, WP = Work PackageKEY: WS = Web services, GS = Genome sequence, TR = transcriptomic data, PS = protein structure, PF = protein family, FA = functional annotation, PPI = protein-protein interaction data, WP = Work Package

Web services

WP1

RA1-6

RA1

RA5 &6

RA2

RA2

RA6

RA3&4

RA3&4 RA2RA1

Work packages

• WP1 – A Skeleton Integrated Proteomics Grid

• WP2 - Integration of gel-based data with structural and functional annotation

• WP3 - Data mining tools for the phosphoproteome

• WP4 - Structural and functional proteomics for the Aspergilli

• WP5 - Integration of protein:protein interaction data with structural & functional annotations

• WP6 - A protein identification server and database

Personnel

RA1

RA2

RA3

RA4

RA5

RA6

Manchester: Khalid Belhajjame

Manchester: Jennifer Siepen

UCL: TBA

Birkbeck: Lucas Zamboulis / Hao Fan

EBI: Nishia Vinod

EBI: TBA

WP1

WP1

WP1

WP1

WP2

WP2 WP4 WP6

WP6 WP4

WP2 WP3 WP4 WP5 WP6

WP3

WP3

WP1 WP2 WP3 WP4 WP5 WP6

WP2 WP5

Deliverables

1. PRIDE db

2. Protein ID server

3. Phosphoproteome db

4. Extended isoform model

5. Integrated generic workflows/DQP/etc

6. “2D”-DAS clients

7. Grid wrapped BIOMAP

8. Integrated Protein-protein workflows

RA5

RA2

RA2

RA6

RA1

RA3

RA4

RA3

RA6 RA2

RA6

RA5 RA6

RA3RA4

RA1RA4

RA1 RA6

Primary RA Also involved

Existing infrastructure and skills

• myGRID• OGSA-DQP• AutoMed• PSI/Pedro infrastructure/standards • Protein id tools at Manchester

• 3 primary data integration strategies– Workflows– DQP using OGSA-DAI– Heterogenous schema integration technologies

Scufl Simple Conceptual Unified Flow LanguageTaverna Writing, running workflows & examining resultsSOAPLAB Makes applications available

Freefluo Workflow engine to run workflows

Freefluo

SOAPLABWeb Service

Any Application

Web Service e.g. DDBJ BLAST

Workflow Components

OGSA-DQP

• Used in Grave’s Disease• Uses OGSA-DAI data

access services to access individual data resources.

• A single query to access and join data from more than one OGSA-DAI wrapped data resource.

• Supports orchestration of computational as well as data access services.

• Interactive interface for integrating resources and executing requests.

• Implicit, pipelined and partitioned parallelism and optimisation

http://www.ogsa-dai.org.uk/dqp

AutoMed infrastructure

• Bidirectional mappings between schemas• Available in global and local views• Transformations between schemas

Potential clients and outputs

• A Vanilla client

Markup with:

• Identified peptides

•Across different tissues

•Different species

•PTMs

•etc

2D gel visualisation client

Potential annotations

Comparative proteomics

Real vs virtual

Add/subtract PTMs

Display pathways

Functional annotation

PPIs

Folds

Summary

• in silico Proteome Integrated Data Resource Environment

• Simon Hubbard• Suzanne Embury• Steve Oliver• Norman Paton• Carole Goble• Robert Stevens• Jennifer Siepen• Khalid Bellhajjame

• Rolf Apweiler• Weimin Zhu• Henning Hermjakob• Chris Taylor• Nishia Vinod• TBA

• Alex Poulovassilis• Nigel Martin• Lucas Zamboulis• Hao Fan

• David Jones• Christine Orengo• TBA

ISPIDER – A Pilot Grid for Integrative Proteomics

Documents

Transcript of ISPIDER – A Pilot Grid for Integrative Proteomics