E-Science Tools For The Genomic Scale Characterisation Of Bacterial Secreted Proteins Tracy...

Post on 01-Jan-2016

218 views 1 download

Tags:

Transcript of E-Science Tools For The Genomic Scale Characterisation Of Bacterial Secreted Proteins Tracy...

e-Science Tools For The Genomic Scale Characterisation Of Bacterial Secreted Proteins

Tracy Craddock, Phillip Lord, Colin Harwood and Anil Wipat

Newcastle University

Outline

Computational challenges of bioinformatics

Secretion in Bacillus Classification and analysis workflows Results and discussion

Computational Challenges of Bioinformatics

New requirements from bioinformatics 3 major problems

Heterogeneity Distribution Autonomy

Experiments - series of workflows

myGrid and Taverna

Scufl Simple Conceptual Unified Flow LanguageTaverna Writing, running workflows & examining resultsSOAPLAB Makes applications available

Freefluo Workflow engine to run workflows

Freefluo

SOAPLABWeb Service

Any Application

Web Service e.g. DDBJ BLAST

Microbase

Grid-based system for microbial genome comparison and analysis

Information repository (and execution environment)

Pre-computed data

Outline

Computational challenges of bioinformatics

Secretion in Bacillus Classification and analysis workflows Results and discussion

Secretion in Bacillus

Predict characteristics & behavior of bacteria

Identify secreted proteins

Bacillus species diverse behaviour Soil inhabitants Harmful bacteria

Importance of Secretion

Mechanism of interaction with environment

Reveal capabilities of an organism

Pathogens are of great interest

Secretory Proteins

Cytoplasm

Medium

Membrane

Cell Wall

Signal Peptide

Lipoprotein

Cell wall binding

Transmembrane

LPXTG

Outline

Computational challenges of bioinformatics

Secretion in Bacillus Classification and analysis workflows Results and discussion

Bioinformatic Tools

Cytoplasm

Medium

Membrane

Cell Wall

Signal Peptide

Lipoprotein

Cell wall binding

Transmembrane

LPXTG

Signalp

TMHMMtmap

MEMSATLipoP

ps_scan

Classification Workflow

Process of Analysis

01

02

03

04

0

CP

00

00

01

AE

01

73

55

AE

01

72

25

AE

01

73

34

AE

01

68

79

AE

01

71

94

AE

01

68

77

AL

00

91

26

CP

00

00

02

AE

01

73

33

AP

00

66

27

BA

00

00

04

Putative secreted proteins

Protein families

Functional classification Relations

Analysis Workflow

Architecture

Custom-designed database Provenance tracking Analysis – computationally intensive Architecture differs from other systems

Web Portal

Outline

Computational challenges of bioinformatics

Secretion in Bacillus Classification and analysis workflows Results and discussion

Classification Results

Similar to unknown proteinsTransport/binding proteins and lipoproteinsCell WallMembrane bioenergeticsGerminationProtein secretionSporulationMetabolism of carbohydrates and related moleculesSpecific pathwaysTransformation/competenceMetabolism of lipidsMetabolism of phosphateTranscription regulationMetabolism of amino acids and related molecules

02

46

81

01

2

Functions of the Clusters

Num

ber

of

fam

ilies

Biologist’s Outlook

Results available for subsequent analysis

Data and results are of great interest

eScientist’s Outlook

Microbase simplified data analysis

But … Autonomy - most services

provided originally by external parties

Licensing – limits exposure of services

Distribution - difficulty came from the relatively large datasets

Future Enhancements

Use notification to automatically analyse recently annotated genomes

Migrate workflows to a remote enclosed environment?

Acknowledgments

Phillip Lord Colin Harwood Anil Wipat

myGrid Carole Goble Tom Oinn

… and the rest of the myGrid team

Microbase Yudong Sun Anil Wipat Matthew Pocock Pete A. Lee Paul Watson Keith Flanagan James T. Worthington