Annotation of the Kytococcus sedentarius Genome from DNA...

1
www.buffalo.edu Abstract Introduction Results References Conclusion Annotation of the Kytococcus sedentarius Genome from DNA Coordinates 364717 to 370640 Nidhish Gokhale¹, Par Iang², Jessica Phillips³ and Sangeeta Gokhale ¹Williamsville East High School, ²McKinley High School, ³East Aurora High School (participating in the BEAM – Buffalo Engineering Awareness for Minorities Program) and the Western New York Genetics in Research Partnership Kyto coccus sedentarius is the only bacteria known to us that produces the antibioti cmonesin Aand Bwhich is used as TMR ( Total m ixed ration) for increased milk production efficiency in dairy cows. (Whitman et al , 2012) . But Kytococcus is an opportunistic pathogen. It has been isolated from varying environments, including human skin, groundwater, and even airline cabins. K yto coccu s sedentarius can cause dermatological infe ctions li ke pi tted keratolysi s (foot in fection). It is a Grampositive organism that growsas spherical/co ccoid in tetrads which can be arranged in cubicalpackets. Itis non-encapsulated and doesno t formendospores. Itis strictly aerobicand chemoorganotrophic,requires methionine and other amino acid s for growth, and growswell in NaCla t concentrations atoptimally growsat25-37°C. More studies needed to be done on the commercial uses of the antibiotics and the en zyme s made by K yto coccus. Ex . Antibioti c s Monesin Aand B scan be used a s fodder,serine protease enzymes ma y be of commercial use in the biodegradation of a range of keratin polymers, biological washing powdersand in the treatmento funwanted callus on human skin. (Longshaw etal, 2002) The goal of this studyi s to study the genome of Kytococcus sedentarius , to compare the proposed gene product given by the Genbank with the data given by the genome data base and to compare the resul ts from the GenBank to other amino acid sequences found in other common organismsand bacteriumas using differentprograms like Bla st, CCD,T- Coffee, WebLogo, phylogenic trees, KEGG, Meta cys pathway. The above gene productsand sequences were also studied to find po ssibil ity of duplications, horizontal gene transfers or being a psuedogene. This projectu sed crowdsourcing to determine protein structure and function . The crowdsourcing compares computer generated results to human based input. It allows for more collaboration and fast paced e vidence gathering. Modules of the GENI-ACT (http://www.geni-act.org/) were used to complete Kytococcus sedentariusgenome annotation as described below Kytococcus sedentarius 03790: The top BLASThits were Escherichia Coli O157 and Serinicoccus profundi and the COG Names were Maltose binding periplasmic protein MalE [Carbohydrate transportand metabolism]and PotD: Spermidine/putrescine- binding periplasmicprotein [Am ino acid transportand me tabolism ]. The only Pfam name found was SBP_bac_8. Two accepted names, provided by BLAS T, are malto se binding periplasmic protein and ABC transporter substrate-binding protein [Serinicoccus profundi]. All of this data, re search and information leads up to the conclusion: the sequence codes for a carbohydrate (specifically maltose) binding periplasmic protein. According to the phylogenic tree, it seems as though Kytococcus sedentarius has e volved independently from other bacteria. Thu s, this bacterium i s divergently evol ving from the o ther species. Its close st relative is Serinico ccu s marinus. It is safe to a ssume that Kyto coccu s sedentarius has paralogs of the malto se binding ATPa se periplasmic protein in other organisms. O ther specie s such as Ornithinimicrobium pe kingense and Serinicoccus profundi seem to be clo sely related as the yha ve evol ved from similar pathways. When aligning the raw translation of Ksed_03790 with the amino acid sequence for this gene in Genbank, it was observed that no frameshift occurred in the sequence as noted in the alignmentabove .These findings suggest that Ksed_03790 is most likely not a pseudogene, since the sequence i s an identical ,perfe ct match and doesn't have an yother start and stop codon with its Shine Dalgarno'ssequence. On the SignalP results, the probability for the appearance of a signal peptide is 0.616 > 0 .45. This indicates a clear presence of a signal peptidewith possibility o f being present in the extracellular/non-cytoplasmic region. Fig 4: Ksed_03790 gene neighborhood looks similar to the neighbohoodsof related and un- related bacteria Annotations are part of genome analysi s that can be done by computational means before a genome sequence is deposited in Gen bank (Koonin et al, 2003). The leading website , GEN I-ACT, ( www.geni- science .org/ ) is an annotation collaboration tool that provides a cce ss to genomes, bioinformatics and project de signing resources to facil itate genomics research wa s u sed to annotate 5 consecu tive genes o f Kyto coc cus sendentarius (Ksed_03790-Ksed_03830). (www.geni- science .org/).The Genbank proposed gene product name for each gene was asse ssed. Following steps were taken. The assessment o f amino acid sequence similarity data to by loo king the redundancy in the genetic code, the structure-based evidence from the amino acid sequence to find similarity in the structures o f fun ctional domains, the cellular locali zation data to determine where in cell the protein is encoded, the potential alternative open reading frames, and the possibility of horizontal gene transfer.The data obtained manually matched the compu ter’s data for the corresponding gene productnames. Figure 5 – Ksed_03790: The SignalP gram+ predictions graph results show that the cleavage site is between pos. 28 and 29. The GENI-ACT proposed gene product did notdiffer significantly from the proposed gene annotation for each of the genes in the group; therefore,the genes appear to be correctly annotated by the computer database. Figure 6 Ksed_03790: The phylogenic tree shows K yto coc cus sedentarius has e volved independentlydue to some change at point 0.85 , closestrelati ve i s Serini coccus marinus. Kytoco ccu s sedentariusma yhave paralogs of the protein in other organisms. su ch a s Ornithinimicrobium pekingense and Serinicoccus profundi and they ma y ha ve e volved from similar pathways. Kytococcus sedenarius 03800:Genbankproposed gene productname is permease component of ABC-type sugartransporter. Top BLAST hi ts suggest ABC type sugar transporter proteins.Top COG hits are sugar transporter permease components confirm that protein product is a sugar transporter protein Kytococcus sedantarius 03810: BLAST top hit s: Mal tose transpor t system permease protein MalG . Cog top hits: ABC-type maltose transpor t system, permease and ABC-type glycerol-3-phosphate transport system, permease component [Carbohydrate transport and metabolism]. TIGR fam name: BPD_transp_1: ABC transporter, permease. Pfam top hit: Binding- protein-dependenttransportsystem inner membrane component This concludes that the protein is a transporter membrane protein Kytococcus sedentarius 03820: Blast top hit:HTH-type transcriptiona l regulator MalR; AltName: Full= Maltose operon transcriptional repressor . Second hit: Transcriptional regulator TIGRfam name: sc cpA catabolite control protein A .Pfam top hit name: Ba cterial regulatory proteins, la cI family Pfam second hit: Periplasmic binding protein-like domain All of the above suggests that the protein is a transcriptional regulator Kytococcus sedentarius 03830: The initial proposed product of this gene by G ENI-ACT was a Trehalose- phosphatase.Thisgene product proposalwas supported by the top BL AST hits for the amino acid sequence, the presen ce of well-curated protein functional domains within the amino acidsequence, the cellular location o f the amino acid sequence, and the enzymatic function o f the a mino acid sequence. Therefore,the proposed annotation is Trehalose-phosphatase. . Bergey's Manual of Systematic Bacteriology: Volume 5: The Actinobacteria, Whitman et. al (2012) Kytococcus sedentarius, the organism associated with pitted keratolysis, produces two keratin-degrading enzymes, Longshaw et. al (2002) GENI- ACT Guiding Education through Novel Investigation, Developing Next-generation Academic Scientists http://www.geni-science.org/ Genome annotation: data flow and performance, 5.1.1 Sequence - Evolution - Function: Computational Approaches in Comparative Genomics. Koonin et al (2003) Geni-act, http://www.geni-act.org/ Acknowledgments Supported by NSF ITEST Strategies Award Number 1311902. Special Thanks to: Dr. Rama Dey-Rao and Dr. Stephen Koury (WNYGiRP). Figure 7 - data from Phobius prediction graph indicates that there is an extremely low probability of appearing in the cytoplasmic region and a high probability of the non-cytoplasmic region. F igure 3 – T he Kytococcus sedentarius 03830 neighborhood is delineating in both more closely and distantly related microorganisms. F igures 1 and 2 - T hermotoga maritima maltotriose binding protein bound with maltotriose protein ribbon diagram (right) and Kytococcus sedentarius tetrad Figure 8 - Eight transmembrane helices in Ksed_03800 Figure 9 - HMM logo of Ksed_03830 Figure 10 – Pairwise alignment of Ksed_03830 Gene Locus Geni-Act gene product Proposed Annotations 03790 Maltose/maltodextri n ABC transporter, substrate binding periplasmic proteinMalE Maltose-binding perip lasm ic prote in MalE 03800 Carbohydrate ABC transporter membrane protein 1 Permease component of ABC-ty pe sugar transporter 03810 Carbohydrate ABC Transporter membrane protein 2 Maltose transport system permease protein 03820 Transcriptional Regulator, LacI family HTH-type transcriptional regulator MalR 03830 Trehalose 6 phosphatase Trehalose-phosphatase

Transcript of Annotation of the Kytococcus sedentarius Genome from DNA...

Page 1: Annotation of the Kytococcus sedentarius Genome from DNA ...ubwp.buffalo.edu/wnygirp/wp-content/uploads/sites/5/2016/05/37_B… · Bergey's Manual of Systematic Bacteriology: Volume

www.buffalo.edu

Abstract

Introduction

Results

References

Conclusion

Annotation of the Kytococcus sedentarius Genome from DNA Coordinates 364717 to 370640Nidhish Gokhale¹, Par Iang², Jessica Phillips³ and Sangeeta Gokhale¹Williamsville East High School, ²McKinley High School, ³East Aurora High School (participating in the BEAM – Buffalo Engineering Awareness for Minorities Program) and the Western New York Genetics in Research Partnership

Kyto coccus sedentarius is the only bacteria known to us that producesthe antibioti cmonesin A and B which is used as TMR ( Total m ixed ration)for increased mil k production ef fic iency in dairy cows. (Whitman et al ,2012) . But K yto coccus is an opportunis tic pathogen. It has beenisolated from varying environments, including human skin, groundwater,and even airline cabins. K yto coccu s sedentarius can causedermatological infe ctions li ke pi tted keratolysi s (foot in fection). I t is aGram positi ve organism that growsas spherical/co ccoid in tetrads whichcan be arranged in cubi cal packets. I t i s non-encapsulated and doesno tform endospores. I t i s s trict ly aerobicand chemoorganotrophic, requiresmethionine and other amino acid s for growth, and growswell in NaCl a tconcentrations atoptimally growsat25-37°C.

More studies needed to be done on the commercial uses of theantibiotics and the en zyme s made by K yto coccus. Ex. Antibioti csMonesin A and B scan be used a s fodder, serine protease enzymes ma ybe of commerc ial use in the biodegradation of a range of keratinpolymers, biological washing powdersand in the treatmento funwantedcallus on human skin. (Longshaw etal, 2002)

The goal of this study i s to study the genome of Kytococcus sedentarius ,to compare the proposed gene product given by the Genbank with thedata given by the genome data base and to compare the resul ts from theGenBank to other amino acid sequences found in other commonorganismsand bacterium as using differentprograms like Bla st, CCD,T-Coffee, WebLogo, phylogenic trees, KEGG, Meta cys pathway. Theabove gene productsand sequences were also studied to find po ssibil it yof duplications, horizontal gene transfers or being a psuedogene. Thisprojectu sed crowdsourcing to determine protein structure and function .The crowdsourcing compares computer generated results to humanbased input. It allows for more collaboration and fast paced e videncegathering.

Modules of the GENI-ACT (http://www.geni-act.org/) were used to complete Kytococcus sedentariusgenome annotation as described below

Kytococcus sedentarius 03790:The top BLA ST hi ts were E scherichia Coli O157 and Serinicoccus profundiand the COG Names were Maltose binding periplasmic protein MalE[Carbohydrate transportand metabolism ]and PotD: Spermidine/putrescine-binding periplasmicprotein [Am ino acid transportand me tabolism ]. The onlyPfam name found was S BP_bac_8. Two a ccepted names, provided byBLAS T, are malto se binding periplasmi c protein and ABC transportersubstrate-binding protein [Serinicoccus profundi] . All of thi s data, re searchand information leads up to the conclusion: the sequence codes for acarbohydrate (specifically maltose) binding periplasmic protein.According to the phylogenic tree, i t seems as though Kytococcussedentarius has e volved independently from other bacteria. Thu s, thisbacterium i s divergently evol ving from the o ther species. Its close st relativeis Serinico ccu s marinus. It is safe to a ssume that K yto coccu s sedentariushas paralogs of the malto se binding ATPa se periplasmi c protein in otherorganisms. O ther specie s such as Ornithini microbium pe kingense andSerinicoccus profundi seem to be clo sely related as the yha ve evol ved fromsimilar pathways. When aligning the raw translation of Ksed_03790 with theamino acid sequence for thi s gene in Genbank, it was ob served that noframeshift occurred in the sequence as noted in the alignmentabove .Thesefindings suggest that K sed_03790 is most li kely not a pseudogene, sincethe sequence i s an identical ,perfe ct match and doesn't have an yother startand stop codon with it s Shine Dalgarno'ssequence. On the SignalP result s,the probability for the appearance of a signal peptide is 0.616 > 0 .45. Thisindicates a clear presence of a signal peptidewith possibili ty o f beingpresent in the extracellular/non-cytoplasmic region.

Fig 4: Ksed_03790gene neighborhoodlooks similar to theneighbohoodsof related and un-related bacteria

Annotations are part of genome analysi s that can be done bycomputational means before a genome sequence i s deposited in Genbank (Koonin et al , 2003). The leading website , GEN I-ACT, (www.geni-science .org/) is an annotation collaboration tool that provides a cce ss togenomes, bioinformati cs and project de signing resources to facil itategenomics research wa s u sed to annotate 5 consecu tive genes o fKyto coccus sendentarius (Ksed_03790-Ksed_03830). (www.geni-science .org/).The Genbank proposed gene product name for each genewas asse ssed. Fol lowing steps were taken. The assessment o f aminoacid sequence similari ty data to by loo king the redundancy in the geneticcode, the structure-based evidence from the amino acid sequence to findsimilarit y in the structures o f fun ctional domains, the cellular locali zationdata to determine where in cell the protein is encoded, the potentialalternative open reading frames, and the possibili ty of horizontal genetransfer.The data obtained manually matched the compu ter’s data for thecorresponding gene productnames.

Figure 5 –Ksed_03790: The SignalP gram+ predictions graph results show that the cleavage site is between pos. 28 and 29.

The GENI-ACT proposed gene product did notdi ffer signif icantl y fromthe proposed gene annotation for each of the genes in the group;therefore, the genes appear to be correctl y annotated by the computerdatabase.

Figure 6 – Ksed_03790: The phylogenic tree shows K yto coccussedentariushas e volved independentlydue to some change at point 0.85 ,closest relati ve i s Serini coccus marinus. Kytoco ccu s sedentariusma yhaveparalogs of the protein in other organism s. su ch a s Ornithinimicrobiumpekingense and Serinicoccus profundi and they ma y ha ve e volved fromsimilar pathways.

Kytococcus sedenarius 03800:Genbankproposed gene productname ispermease component o f ABC-type sugartransporter. Top BLAST hi tssuggest A BC type sugar transporter proteins.Top COG hits are sugartransporter permease components confirm that protein product is a sugartransporter protein

Kytococcus sedantarius 03810: BLAST top hit s: Mal tose transpor tsystem permease protein MalG . Cog top hit s: A BC-type maltose transpor tsystem , permease and ABC-type glycerol-3-phosphate transport system ,permease component [Carbohydrate transport and metaboli sm] . TIGR famname: BPD_transp_1: ABC transporter, permease. P fam top hit: Binding-protein-dependent transportsystem inner membrane componentThis concludes that the protein is a transporter membrane protein

Kytococcus sedentarius 03820: Blast top hit:HTH-type transcriptiona lregulator MalR; AltName: Full= Maltose operon transcriptional repressor .Second hit: Transcriptional regulator TIGRfam name: sccpA catabolitecontrol protein A .Pfam top hit name: Ba cterial regulatory proteins, la cIfamily Pfam second hit: Periplasmic binding protein-like domainAll of the above suggests that the protein is a transcriptional regulator

Kytococcus sedentarius 03830:The initial proposed product of this gene by G ENI-ACT was a Trehalose-phosphatase.Thi sgene product proposal was supported by the top BL ASThits for the am ino acid sequence, the presen ce of well-curated proteinfunctional domains within the amino acidsequence, the cellular location o fthe amino acid sequence, and the enzymati c function o f the a mino acidsequence. Therefore, the proposed annotation is Trehalose-phosphatase.

.

Bergey's Manual of Systematic Bacteriology: Volume 5: The Actinobacteria, Whitman et. al (2012)Kytococcus sedentarius, the organism associated with pitted keratolysis, produces two keratin-degrading enzymes, Longshaw et. al (2002)GENI- ACT Guiding Education through Novel Investigation, Developing Next-generation Academic Scientists http://www.geni-science.org/Genome annotation: data flow and performance, 5.1.1 Sequence -Evolution - Function: Computational Approaches in Comparative Genomics. Koonin et al (2003)Geni-act, http://www.geni-act.org/

AcknowledgmentsSupported by NSF IT EST Strategies Award Number 1311902.Special Thanks to: Dr. Rama Dey-Rao and Dr. Stephen Koury(WNYGiRP).

Figure 7 - data from Phobius prediction graph indicates that there is an extremely

low probability of appearing in the

cytoplasmic region and a high probability of the non-cytoplasmic region.

F igure 3 – The Kytococcus sedentarius 03830 neighborhood is delineating in both more closely and distantly related microorganisms.

F igures 1 and 2 - Thermotoga maritima maltotriose binding protein bound with maltotriose protein ribbon diagram (right) and Kytococcus sedentarius tetrad

Figure 8 - Eight transmembrane helices in Ksed_03800

Figure 9 -HMM logo of Ksed_03830

Figure 10 –Pairwise alignment of Ksed_03830

Gene Locus Geni-Act gene product Proposed Annotations

03790 Maltose/maltodextrin ABC transporter, substrate binding periplasmic protein MalE

Maltose-binding periplasmic protein MalE

03800 Carbohydrate ABC transporter membrane protein 1

Permease component of ABC-type sugar transporter

03810 Carbohydrate ABC Transporter membrane protein 2

Maltose transport system permease protein

03820 Transcriptional Regulator, Lac I family

HTH-type transcriptional regulator MalR

03830 Trehalose 6 phosphatase Trehalose-phosphatase