High Throughput Computational Sequence Analysis Rob Edwards redwards@salmonella.org Argonne National...

Post on 19-Dec-2015

216 views 0 download

Tags:

Transcript of High Throughput Computational Sequence Analysis Rob Edwards redwards@salmonella.org Argonne National...

High Throughput ComputationalSequence Analysis

Rob Edwardsredwards@salmonella.org

Argonne National LaboratorySan Diego State University

Firstbacterial genome

100bacterial genomes

1,000bacterial genomesN

um

ber

of

know

n s

equence

s

Year

How much has been sequenced

Environmentalsequencing

Everybody inSan Diego

Everybody inUSA

AllculturedBacteria

100people

How much will be sequenced

One genome fromevery species

Most majormicrobial environments

High Performance Computing

TeraGrid

The Teragrid National Resource

Life Sciences Gateway to TeraGrid

Subsystems

Subsystems make up metabolism

Wik

ipedia

Meta

bolis

mhtt

p:/

/en.w

ikip

edia

.org

/wik

i/Port

al:M

eta

bolis

m

Subsystems are not just metabolism

http://aig.cs.man.ac.uk/gallery/Utopia/

Enzyme complex

http://webdeptos.uma.es/

Cell Machinery

http://www.brown.edu/

Cell Processes

http://www.theseed.org

http://www.theseed.org

Growth in generation of subsystems

Microbial Genomics Annotation Platform

• Goal 1: Automate the generation of high quality annotations by leveraging the information contained in SubSystems and FIGfams.

• Goal 2: Minimize turnaround time. Initial target 48 hours

• Automated process consisting of:– Gene calling– Initial annotation of function– Initial metabolic

reconstruction• Process takes 1-7 hours

depending on size and complexity of the genome

• ~20 genomes per day

• Password protected, secure, private

• Release to public databases if required

Freely available annotation service

http://www.nmpdr.org/anno-server/index48.cgi

Some estimate of annotation quality

05

101520253035404550

Bacillus

anthracis str.

Sterne

Mycobacterium

tuberculosisCDC1551

Listeria

monocytogenes

EGD-e

Streptococcuspyogenes M1

GAS

Staphylococcusaureus subsp.

aureus MW2

260799 83331 169963 160490 196620

% in SS SEED

% in SS SP1Ke

% hypothecial SP1Ke

% hypothetical SEED

Evaluation / Viewing

Download results

• We provide a number of export formats:– Genbank, Fasta, GFF3, Excel– can easily be extended to all formats supported by

BioPerl

• Genomes can be deleted by the user at any time (we keep them for max. 120 days)

• Genomes can be directly imported into the SEED if the user wishes

• all genomes are password protected

Metagenomics SEED

http://metagenomics.theseed.org

Metagenome Metabolic Reconstruction

Starch utilization in cow rumens

Metabolic potential in environments

Everybody inSan Diego

Everybody inUSA

AllculturedBacteria

100people

Too much will be sequenced

One genome fromevery species

Most majormicrobial environments

Acknowledgements

Argonne National LaboratoryRick StevensBob OlsonFolker Meyer

San Diego State UniversityForest Rohwer

Fellowship for Interpretation of Genomes

Ross OverbeekVeronika VonsteinThe Annotators