E-Science Tools For The Genomic Scale Characterisation Of Bacterial Secreted Proteins Tracy...

23
e-Science Tools For The Genomic Scale Characterisation Of Bacterial Secreted Proteins Tracy Craddock, Phillip Lord, Colin Harwood and Anil Wipat Newcastle University

Transcript of E-Science Tools For The Genomic Scale Characterisation Of Bacterial Secreted Proteins Tracy...

Page 1: E-Science Tools For The Genomic Scale Characterisation Of Bacterial Secreted Proteins Tracy Craddock, Phillip Lord, Colin Harwood and Anil Wipat Newcastle.

e-Science Tools For The Genomic Scale Characterisation Of Bacterial Secreted Proteins

Tracy Craddock, Phillip Lord, Colin Harwood and Anil Wipat

Newcastle University

Page 2: E-Science Tools For The Genomic Scale Characterisation Of Bacterial Secreted Proteins Tracy Craddock, Phillip Lord, Colin Harwood and Anil Wipat Newcastle.

Outline

Computational challenges of bioinformatics

Secretion in Bacillus Classification and analysis workflows Results and discussion

Page 3: E-Science Tools For The Genomic Scale Characterisation Of Bacterial Secreted Proteins Tracy Craddock, Phillip Lord, Colin Harwood and Anil Wipat Newcastle.

Computational Challenges of Bioinformatics

New requirements from bioinformatics 3 major problems

Heterogeneity Distribution Autonomy

Experiments - series of workflows

Page 4: E-Science Tools For The Genomic Scale Characterisation Of Bacterial Secreted Proteins Tracy Craddock, Phillip Lord, Colin Harwood and Anil Wipat Newcastle.

myGrid and Taverna

Scufl Simple Conceptual Unified Flow LanguageTaverna Writing, running workflows & examining resultsSOAPLAB Makes applications available

Freefluo Workflow engine to run workflows

Freefluo

SOAPLABWeb Service

Any Application

Web Service e.g. DDBJ BLAST

Page 5: E-Science Tools For The Genomic Scale Characterisation Of Bacterial Secreted Proteins Tracy Craddock, Phillip Lord, Colin Harwood and Anil Wipat Newcastle.

Microbase

Grid-based system for microbial genome comparison and analysis

Information repository (and execution environment)

Pre-computed data

Page 6: E-Science Tools For The Genomic Scale Characterisation Of Bacterial Secreted Proteins Tracy Craddock, Phillip Lord, Colin Harwood and Anil Wipat Newcastle.

Outline

Computational challenges of bioinformatics

Secretion in Bacillus Classification and analysis workflows Results and discussion

Page 7: E-Science Tools For The Genomic Scale Characterisation Of Bacterial Secreted Proteins Tracy Craddock, Phillip Lord, Colin Harwood and Anil Wipat Newcastle.

Secretion in Bacillus

Predict characteristics & behavior of bacteria

Identify secreted proteins

Bacillus species diverse behaviour Soil inhabitants Harmful bacteria

Page 8: E-Science Tools For The Genomic Scale Characterisation Of Bacterial Secreted Proteins Tracy Craddock, Phillip Lord, Colin Harwood and Anil Wipat Newcastle.

Importance of Secretion

Mechanism of interaction with environment

Reveal capabilities of an organism

Pathogens are of great interest

Page 9: E-Science Tools For The Genomic Scale Characterisation Of Bacterial Secreted Proteins Tracy Craddock, Phillip Lord, Colin Harwood and Anil Wipat Newcastle.

Secretory Proteins

Cytoplasm

Medium

Membrane

Cell Wall

Signal Peptide

Lipoprotein

Cell wall binding

Transmembrane

LPXTG

Page 10: E-Science Tools For The Genomic Scale Characterisation Of Bacterial Secreted Proteins Tracy Craddock, Phillip Lord, Colin Harwood and Anil Wipat Newcastle.

Outline

Computational challenges of bioinformatics

Secretion in Bacillus Classification and analysis workflows Results and discussion

Page 11: E-Science Tools For The Genomic Scale Characterisation Of Bacterial Secreted Proteins Tracy Craddock, Phillip Lord, Colin Harwood and Anil Wipat Newcastle.

Bioinformatic Tools

Cytoplasm

Medium

Membrane

Cell Wall

Signal Peptide

Lipoprotein

Cell wall binding

Transmembrane

LPXTG

Signalp

TMHMMtmap

MEMSATLipoP

ps_scan

Page 12: E-Science Tools For The Genomic Scale Characterisation Of Bacterial Secreted Proteins Tracy Craddock, Phillip Lord, Colin Harwood and Anil Wipat Newcastle.

Classification Workflow

Page 13: E-Science Tools For The Genomic Scale Characterisation Of Bacterial Secreted Proteins Tracy Craddock, Phillip Lord, Colin Harwood and Anil Wipat Newcastle.

Process of Analysis

01

02

03

04

0

CP

00

00

01

AE

01

73

55

AE

01

72

25

AE

01

73

34

AE

01

68

79

AE

01

71

94

AE

01

68

77

AL

00

91

26

CP

00

00

02

AE

01

73

33

AP

00

66

27

BA

00

00

04

Putative secreted proteins

Protein families

Functional classification Relations

Page 14: E-Science Tools For The Genomic Scale Characterisation Of Bacterial Secreted Proteins Tracy Craddock, Phillip Lord, Colin Harwood and Anil Wipat Newcastle.

Analysis Workflow

Page 15: E-Science Tools For The Genomic Scale Characterisation Of Bacterial Secreted Proteins Tracy Craddock, Phillip Lord, Colin Harwood and Anil Wipat Newcastle.

Architecture

Custom-designed database Provenance tracking Analysis – computationally intensive Architecture differs from other systems

Page 16: E-Science Tools For The Genomic Scale Characterisation Of Bacterial Secreted Proteins Tracy Craddock, Phillip Lord, Colin Harwood and Anil Wipat Newcastle.

Web Portal

Page 17: E-Science Tools For The Genomic Scale Characterisation Of Bacterial Secreted Proteins Tracy Craddock, Phillip Lord, Colin Harwood and Anil Wipat Newcastle.

Outline

Computational challenges of bioinformatics

Secretion in Bacillus Classification and analysis workflows Results and discussion

Page 18: E-Science Tools For The Genomic Scale Characterisation Of Bacterial Secreted Proteins Tracy Craddock, Phillip Lord, Colin Harwood and Anil Wipat Newcastle.

Classification Results

Page 19: E-Science Tools For The Genomic Scale Characterisation Of Bacterial Secreted Proteins Tracy Craddock, Phillip Lord, Colin Harwood and Anil Wipat Newcastle.

Similar to unknown proteinsTransport/binding proteins and lipoproteinsCell WallMembrane bioenergeticsGerminationProtein secretionSporulationMetabolism of carbohydrates and related moleculesSpecific pathwaysTransformation/competenceMetabolism of lipidsMetabolism of phosphateTranscription regulationMetabolism of amino acids and related molecules

02

46

81

01

2

Functions of the Clusters

Num

ber

of

fam

ilies

Page 20: E-Science Tools For The Genomic Scale Characterisation Of Bacterial Secreted Proteins Tracy Craddock, Phillip Lord, Colin Harwood and Anil Wipat Newcastle.

Biologist’s Outlook

Results available for subsequent analysis

Data and results are of great interest

Page 21: E-Science Tools For The Genomic Scale Characterisation Of Bacterial Secreted Proteins Tracy Craddock, Phillip Lord, Colin Harwood and Anil Wipat Newcastle.

eScientist’s Outlook

Microbase simplified data analysis

But … Autonomy - most services

provided originally by external parties

Licensing – limits exposure of services

Distribution - difficulty came from the relatively large datasets

Page 22: E-Science Tools For The Genomic Scale Characterisation Of Bacterial Secreted Proteins Tracy Craddock, Phillip Lord, Colin Harwood and Anil Wipat Newcastle.

Future Enhancements

Use notification to automatically analyse recently annotated genomes

Migrate workflows to a remote enclosed environment?

Page 23: E-Science Tools For The Genomic Scale Characterisation Of Bacterial Secreted Proteins Tracy Craddock, Phillip Lord, Colin Harwood and Anil Wipat Newcastle.

Acknowledgments

Phillip Lord Colin Harwood Anil Wipat

myGrid Carole Goble Tom Oinn

… and the rest of the myGrid team

Microbase Yudong Sun Anil Wipat Matthew Pocock Pete A. Lee Paul Watson Keith Flanagan James T. Worthington