Databases for Microbes and Man - University of Minnesotakumar001/cbcb/powerpoint/ellis.pdf ·...

18
1 Databases for Microbes and Man Lynda B.M. Ellis Laboratory Medicine and Pathology University of Minnesota Encoding Metabolic Logic: Predicting Biodegradation

Transcript of Databases for Microbes and Man - University of Minnesotakumar001/cbcb/powerpoint/ellis.pdf ·...

1

Databases forMicrobes and Man

Lynda B.M. Ellis

Laboratory Medicine and PathologyUniversity of Minnesota

Encoding Metabolic Logic:Predicting Biodegradation

2

The University of Minnesota Biocatalysis and Biodegradation

Database (UM-BBD)

Freely available on the World Wide Web http://umbbd.ahc.umn.edu/

Focus:Microbial specialized catabolismEmphasizing environmental pollutants

Represents metabolism in the form of metabolic pathways

From 1,2-Dichloroethane to 2-ChloroethanolGraphic of the reaction.

Medline referenceVerschueren KH, Seljee F, Rozeboom HJ, Kalk KH, Dijkstra BW Nature (1993)363(6431): 693-8.

Search Medline titles for haloalkane dehalogenase. 64 citations found on August 14, 2005.

1,2-Dichloroethane|| H2O

haloalkane | /dehalogenase |/

3.8.1.5 | Search GenBank, 89 hits on Aug. 14, 2005Kyoto |\

ExPASy | \| HClv

2-Chloroethanol

Generate a pathway starting from this reaction.

UM-BBD Biotransformation rules in accord with this reaction:Alkyl halide -----> Alcohol (bt0022)

______________________________________________________________________[1,2-Dichloroethane] [BBD Main Menu]

Page Author(s): Renhao Li August 14, 2005 Contact UsThis is the UM-BBD reaction, reacID# r0001.It was generated on August 24, 2005 2:42:28 PM CDT.

© 2005, University of Minnesota.

ReactionPage

3

MySQL Relational Database

MetapathwayMaphttp://umbbd.ahc.umn.edu/meta/meta_map.html

4

Hypothesis

~107 compounds are in the environment.The UM-BBD will never contain all such metabolism. UM-BBD data might be used to predict reasonable biodegradation schemes for compounds it does not contain.

Prediction Overview

http://umbbd.ahc.umn.edu/predict/

5

Biotransformation RuleRule bt0003

[Pathway Prediction Engine] [All Rules List] [BBD Main Menu] -------------------------------------------------------------------------------------------------Description:bt0003: Aldehyde -> Carboxylate

UM-BBD Reaction(s):Chloroacetaldehyde -----> Chloroacetate (reacID# r0003)3-Chloroallyl aldehyde -----> trans-3-Chloroacrylic acid (reacID# r0693)3-Chloroallyl aldehyde -----> cis-3-Chloroacrylic acid (reacID# r0692)Dodecanal -----> Lauric acid (reacID# r0605)3-Hydroxybenzaldehyde -----> 3-Hydroxybenzoate (reacID# r0401)3-Methylbenzaldehyde -----> m-Methylbenzoate (reacID# r0214)2-Methylbenzaldehyde -----> o-Methylbenzoate (reacID# r0222)1-Octanal -----> Octanoate (reacID# r0003)Perillyl aldehyde -----> Perillic acid (reacID# r0730)p-Tolualdehyde -----> p-Toluate (reacID #r0177)…

Contact Us if you have any comments on rule bt0003.

Biotransformation Rule-base

bt0001: primary alcohol → aldehydebt0002: secondary alcohol → ketonebt0003: aldehyde → carboxylatebt0005: vic-unsubstituted aromatic → cis-dihydroxydihydroaromaticbt0008: vic-dihydroxybenzenoid → extradiol ring cleavage…bt0255: vic-dihydrodihydroxyaromatic → vic-dihydroxyaromaticbt0259: vic-dihydroxyaromatic → intradiol ring cleavage

6

Pathway Prediction Example

After five steps

Start

Evaluation Can the PPS predict known UM-BBD pathways?

Used 117 UM-BBD compounds containing only C, H, O, N, S, P and/or X initiating pathways of 2 or more reactionswith 629 pathway branches

Present system can duplicate at least one known biodegradation pathway for 115 of these compounds Can predict 503 (80%) of the pathway branchesMost predicted pathways that did not completely duplicate known metabolism were plausible

7

Complexity - 100gm of soil

• 100,000,000,000 bacteria• 100,000 natural chemicals• 10,000 bacterial species• 1 new chemical

Guidance for PPS Users

PPS Aerobic LikelihoodStandard conditions

AerobicSoil (moderate moisture) or water25° CNo other toxic or competing chemicals

Two or more experts prioritize rules as: Very Likely, Likely, Neutral, Unlikely, Very Unlikely, Unknown

8

Aerobic Likelihood Added

Future Work

Rule development 250 rules, and growingNew pathways = new rules

Collaboration with LHASA/METEOR

REACTOR GUI for rule generation

9

People

Top: Philip Judson, Larry Wackett, Yogesh Kale, Dave Roe, Jack RichmanBottom: Lynda Ellis, Carla EssenbergNot shown: John Carlis, John Schrom

Acknowledgements/Reference

US Department of EnergyDOE DE-FG02-01ER63268

LHASA, Ltd. (METEOR)http://www.lhasalimited.org/

Ellis, L.B.M., Roe, D., Wackett, L.P. (2006) “UM-BBD: The First Decade” Nucleic Acids Research, in press.

10

Seeking theVertebrate Secretome

Secreted Proteins

Proteins synthesized within the cell and actively transported out of the cellHave an active role in cell-cell interactions Make up ≈10% of the proteome For two decades, attractive targets for computational identification

11

Signal-Mediated CoTranslational Translocation

Robert J. Huskey, University of Virginia

http://www.people.virginia.edu/~rjh9u/secrprotsyn.html

Signal Peptide Structure

M

12

????Just compute and solve?

Expressed Sequence Tags (ESTs)

Short coding nucleotide sequence fragments from cDNA clonesExpressed sequencesHigh throughput, sequencing coverage mostly 1xEstimated 2% error rateSequence are often incomplete

Often start at the 3’ end, are 5’ truncated 5’ end translates to N-terminus of protein

EST consensus sequences contain multiple, overlapping ESTs, increasing sequence quality

13

N-terminal truncation and SP prediction

Signal peptide prediction programs assume N-terminally-complete sequences

N-terminal transmembrane domains; false positiveProblems with N-terminal truncation

Signal peptides may be truncated; more false negativesInternal transmembrane domains may appear to be at the

N-terminus; more false positives

Signal peptide prediction programs are less accurate when used on EST databases

Objectives

Identify secreted proteins in vertebrate Expressed Sequence Tag databases

Supply targets supporting embryonic growth and development studies in

zebrafishcattleswine

14

Methods

Reference secretome constructionidentified by localization prediction software

Homology modelingcomparison of query EST to reference secretomes

Signal peptide predictionquery sequences and their reference homologs

Alignment analysis

Study-specific filtering

Materials

Reference proteomesObtained from NCBI, IPI, DOE JGI, and PEDANTH. sapiens, Fugu, D. melanogaster

TargetP – subcellular localization predictionPredictions based on N-terminal sequencePredicts secretory pathway signal peptides

15

Vertebrate Secretome DatabaseMySQL relational dBInterfaces with analysis softwareAllows complex queries

www.secretomes.umn.edu

Partial Porcine Secretome

ReferenceSecreted Sets

Homologs ATG aligned signalpeptide

ATG, aligned,signal peptide

H. sapiens IPI 2792 2442 1007 522 294 RefSeq 2379 2077 820 450 228 GenScan 2646 2291 888 462 247T. rubripes 2121 1824 570 333 148Non-redundant 3934 3422 1487 626 352

16

Validation Targets

352 putative secreted porcine sequences with complete N-termini190 had N-terminal MARC clones46 of these chosen for validation

Enriched with sequences of unknown function

Validation Summary

46 sequences selected for CTT validation40 could be translated34 of these (85%) were secreted

17

Future Directions

Improve translation start site identificationAdd reference secretomes - protochordates?Differentiate between secreted receptors and ligandsImprove homology selection criteriaExplore secretion prediction without using signal sequences

AcknowledgementsEric Klee (now Mayo Clinic)Liz Saftalov (2003 BSI intern)Kyong-Jin ShimStephen Ekker

Michael Pickart (now UW-Stout)Michelle Knowlton

Scott FahrenkrugDan Carlson

NIH R01-GM63904, NLM TG-0704l, NIH T32-DE07288-07, NSF/NIBIB EEC-0234112

18

References

Klee EW, Carlson DF, Fahrenkrug SC,Ekker SC, Ellis LB. (2004) “Identifying secretomes in people, pufferfish and pigs.” Nucleic Acids Res. 32: 1414-21.

Klee EW, Ellis LB (2005) “Evaluating Eukaryotic Secreted Protein Prediction.”BMC Bioinformatics, 6: 256.