ISPIDER – A Pilot Grid for Integrative Proteomics
-
Upload
nathaniel-carney -
Category
Documents
-
view
18 -
download
0
description
Transcript of ISPIDER – A Pilot Grid for Integrative Proteomics
ISPIDER – A Pilot Grid for Integrative Proteomics
BEP-II grantholders meeting,Edinburgh 24th Nov 2004
Diversity of proteome data
sequences>A01562MAPKATYLIGAADKFHW>A01567MAQQPKEMLNILADKFHWFLYC
gels
mass specStructures/folds
Other data:Species, PTMS, pathways, functional annotation, transcriptome data
Integration problems
• Lack of specific middleware– Existing resources not wrapped
• Lack of data standards– Standards for proteomics, incl. MS and protein
identification are emerging
• Data not modelled– New challenges from proteomics– Data not captured/modelled
• Data not captured– No mature repositories/databases for some proteome
data
• But there is lots of data …
Aims
• To develop an integrated platform of proteomic data resources enabled as Grid/Web services
• Integrate existing proteome resources, enabling them as Grid/Web services.
• To develop novel, proteome-specific databases as part of ISPIDER delivered as Grid/Web and browser-based services:– A repository for experimental proteome data– A proteome protein identification server and database– A phosphoproteome specific database
• To develop middleware & support for distributed querying, workflows and other integrated data analysis tasks
• Demonstrate effectiveness of the resulting infrastructure studies in proteomics, including:– Visualisation clients for proteomic data e.g. LRF data– Analyses for fungal species of industrial interest– Protein structural/functional trends in experimental proteomics
e.g. linking domain structural patterns
Existing Resources
PS
WS
PF
WS
TR
WS
GS
WS
FA
WS
PPI
WS
PID
WS
PRIDE
WS
PEDRo
WS
ISPIDER Resources
Integrated Proteomics Informatics Platform - Architecture
VanillaQuery Client
2D GelVisualisation
Client + Aspergil.Extensions
+ Phosph.Extensions
PPI Validation + Analysis
Client
Protein ID Client
ExistingE-ScienceInfrastructure
ISPIDERProteomics GridInfrastructure
ISPIDERProteomics Clients
PublicProteomicResources
myGridOntologyServices
myGridDQP
DASAutoMedmyGrid
Workflows
ProteomeRequestHandler
InstanceIdent/Mapping
Services
ProteomicOntologies/
Vocabularies
SourceSelectionServices
DataCleaningServices
Phos
WS
WP1 WP2
WP3
WP4 WP5
WP6
WP6
WP3
KEY: WS = Web services, GS = Genome sequence, TR = transcriptomic data, PS = protein structure, PF = protein family, FA = functional annotation, PPI = protein-protein interaction data, WP = Work PackageKEY: WS = Web services, GS = Genome sequence, TR = transcriptomic data, PS = protein structure, PF = protein family, FA = functional annotation, PPI = protein-protein interaction data, WP = Work Package
Web services
WP1
RA1-6
RA1
RA5 &6
RA2
RA2
RA6
RA3&4
RA3&4 RA2RA1
Work packages
• WP1 – A Skeleton Integrated Proteomics Grid
• WP2 - Integration of gel-based data with structural and functional annotation
• WP3 - Data mining tools for the phosphoproteome
• WP4 - Structural and functional proteomics for the Aspergilli
• WP5 - Integration of protein:protein interaction data with structural & functional annotations
• WP6 - A protein identification server and database
Personnel
RA1
RA2
RA3
RA4
RA5
RA6
Manchester: Khalid Belhajjame
Manchester: Jennifer Siepen
UCL: TBA
Birkbeck: Lucas Zamboulis / Hao Fan
EBI: Nishia Vinod
EBI: TBA
WP1
WP1
WP1
WP1
WP2
WP2 WP4 WP6
WP6 WP4
WP2 WP3 WP4 WP5 WP6
WP3
WP3
WP1 WP2 WP3 WP4 WP5 WP6
WP2 WP5
Deliverables
1. PRIDE db
2. Protein ID server
3. Phosphoproteome db
4. Extended isoform model
5. Integrated generic workflows/DQP/etc
6. “2D”-DAS clients
7. Grid wrapped BIOMAP
8. Integrated Protein-protein workflows
RA5
RA2
RA2
RA6
RA1
RA3
RA4
RA3
RA6 RA2
RA6
RA5 RA6
RA3RA4
RA1RA4
RA1 RA6
Primary RA Also involved
Existing infrastructure and skills
• myGRID• OGSA-DQP• AutoMed• PSI/Pedro infrastructure/standards • Protein id tools at Manchester
• 3 primary data integration strategies– Workflows– DQP using OGSA-DAI– Heterogenous schema integration technologies
Scufl Simple Conceptual Unified Flow LanguageTaverna Writing, running workflows & examining resultsSOAPLAB Makes applications available
Freefluo Workflow engine to run workflows
Freefluo
SOAPLABWeb Service
Any Application
Web Service e.g. DDBJ BLAST
Workflow Components
OGSA-DQP
• Used in Grave’s Disease• Uses OGSA-DAI data
access services to access individual data resources.
• A single query to access and join data from more than one OGSA-DAI wrapped data resource.
• Supports orchestration of computational as well as data access services.
• Interactive interface for integrating resources and executing requests.
• Implicit, pipelined and partitioned parallelism and optimisation
http://www.ogsa-dai.org.uk/dqp
AutoMed infrastructure
• Bidirectional mappings between schemas• Available in global and local views• Transformations between schemas
Potential clients and outputs
• A Vanilla client
Markup with:
• Identified peptides
•Across different tissues
•Different species
•PTMs
•etc
2D gel visualisation client
Potential annotations
Comparative proteomics
Real vs virtual
Add/subtract PTMs
Display pathways
Functional annotation
PPIs
Folds
Summary
• in silico Proteome Integrated Data Resource Environment
• Simon Hubbard• Suzanne Embury• Steve Oliver• Norman Paton• Carole Goble• Robert Stevens• Jennifer Siepen• Khalid Bellhajjame
• Rolf Apweiler• Weimin Zhu• Henning Hermjakob• Chris Taylor• Nishia Vinod• TBA
• Alex Poulovassilis• Nigel Martin• Lucas Zamboulis• Hao Fan
• David Jones• Christine Orengo• TBA