Genomic sequence analysis tools and a genotype-phenotype … · 2020. 4. 24. · Dengue Virus...
Transcript of Genomic sequence analysis tools and a genotype-phenotype … · 2020. 4. 24. · Dengue Virus...
-
Genomic sequence analysis tools and a genotype-phenotype association platform in the Virus Pathogen Resource
Yun Zhang1, Brett Pickett1, Eva Rab1, Jyothi Noronha1, R. Burke Squires1, Victoria Hunt1, Mengya Liu2, Liwei Zhou3, Chris Larson4, Jonathan Dietrich3, Edward B. Klem3, Richard H. Scheuermann1,5
1Department of Pathology, 5Division of Biomedical Informatics, Univ. of Texas Southwestern Medical Center, Dallas, TX; 2Southern Methodist Univ., Dallas, TX; 3Northrop Grumman Health Solutions, Rockville MD; 4Vecna Technologies, Greenbelt MD.
Introduction
Figure 2: A screenshot of the Sequence Feature Details page. The details page displays strain information, Sequence Feature information, available 3D protein structures, and a table containing all Variant Types for the selected Sequence Feature.
1(2011) Ongoing and future developments at the Universal Protein Resource. Nucleic acids research, 39, D214-219. 2 Sayers, E.W., Barrett, T., Benson, D.A., Bolton, E., Bryant, S.H., Canese, K., Chetvernin, V., Church, D.M., DiCuccio, M., Federhen, S. et al. (2011) Database resources of the National Center for Biotechnology Information. Nucleic acids research, 39, D38-51.
3Vita, R., Zarebski, L., Greenbaum, J.A., Emami, H., Hoof, I., Salimi, N., Damle, R., Sette, A. and Peters, B. (2010) The immune epitope database 2.0. Nucleic acids research, 38, D854-862. Edgar, R.C., 2004.
4MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC bioinformatics 5, 113. 5Waterhouse, A.M., Procter, J.B., Martin, D.M., Clamp, M. and Barton, G.J. (2009) Jalview Version 2--a multiple sequence alignment editor and analysis workbench. Bioinformatics, 25, 1189-1191.
6Zmasek, C.M. and Eddy, S.R. (2001) ATV: display and manipulation of annotated phylogenetic trees. Bioinformatics, 17, 383-384. 7Hanson, R. (2010) Jmol - a paradigm shift in crystallographic visualization. Journal of Applied Crystallography, 43, 1250-1260.
We would like to thank Elliot Lefkowitz, Carla Kuiken, Bernard Moss, and R. Pad Padmanabhan for reviewing and validating the SFVT definitions, as well as the primary data providers for the sequence data that was used throughout this study. We also recognize the scientific and technical personnel responsible for supporting and developing ViPR, which has been wholly supported with federal funds from the NIAID, NIH, Department of Health and Human Services (N01AI2008038 to R.H.S.).
Figure 4: 3D Protein Structure Viewer in the Virus Pathogen Database and Analysis Resource (ViPR). A display of an example sequence feature highlighted on the 3D structure of the Hepatitis C Virus NS5b protein (PDB ID: 1CSJ).
ViPR combines the strength of a relational database with a suite of bioinformatics integrated tools to support everything from basic sequence and structural analyses to more advanced genotype-phenotype studies. The uniqueness of ViPR lies in:
• integrating data from various sources • encouraging the analysis of the comprehensive data
contained within the system • combining the available tools to quickly perform complex
analytical workflows • facilitating rapid hypothesis generation using bio-
informatics methods for subsequent experimental testing • allowing data sharing and storage with collaborators in
personal workbenches
Figure 1: A screenshot of the ViPR homepage. The ViPR homepage is the portal used to access the various types of data and advanced functionality within the system.
The Virus Pathogen Database and Analysis Resource (ViPR, www.viprbrc.org), sponsored by the National Institute of Allergy and Infectious Diseases serves as a single publicly-accessible repository of integrated datasets and analysis tools for 14 different virus families to support wet-bench virology researchers focusing on the development of diagnostics, prophylactics, vaccines, and treatments for these pathogens.
ViPR Supports 14 Virus Families
Arenaviridae, Bunyaviridae, Caliciviridae, Coronaviridae, Filoviridae, Flaviviridae, Hepeviridae, Herpesviridae, Paramyxoviridae, Picornaviridae, Poxviridae, Reoviridae, Rhabdoviridae, and Togaviridae.
ViPR Intergrates Data from Many Sources
• GenBank sequence records, gene annotations, and strain metadata
• Gene Ontology (GO) classifications • UniProtKB protein annotations • Protein Databank (PDB) 3D protein structures • Immune epitopes from the Immune Epitope Database
(IEDB) • Clinical data • Additional data derived from computational algorithms • Host-Pathogen Interactions*coming soon
ViPR Provides Analysis and Visualization Tools
• Genome Annotator • BLAST Sequence Similarity Search • Multiple Sequence Alignment • Phylogenetic Tree Construction • 3D Protein Structures with Sequence Feature or Epitope
Highlights • Sequence Feature Variant Type (SFVT) Analysis • Metadata-driven Comparative Genomics Analysis • SNP Analysis
ViPR enables you to store and share data and results through the ViPR Workbench
Figure 3: Analytical tools available for SFVT data. (A) A multiple sequence alignment calculated with MUSCLE4 and visualized with JalView5 in ViPR. (B) A metadata-driven comparative genomics analysis tool to identify individual positions that correlate with a metadata attribute. (C) A phylogenetic tree that has been automatically colored according to country of isolation using the Archaeopteryx6 tool showing the relationship between HCV-1 genomes .
Loading Virus Pathogen Database and Analysis Resource (ViPR)...
SearchSearch our comprehensive database for:
AnalyzeAnalyze data online:
Save to WorkbenchUse your workbench to:
Browse All Search Types Browse All Tools
Single-Stranded Positive-Sense RNA Single-Stranded Negative-Sense RNA Double-Stranded RNA Double-Stranded DNA
For influenza virus data and tools, usethe Influenza Research Database, aseparate resource created by the ViPRteam.
Two DBPs Funded!!Through the Driving Biological Projects (DBP) program,the ViPR team will be collaborating with two groups toextend ViPR in support of more host-pathogeninteraction data.
A group headed by Dr. Moriah Szpara and Dr. Lynn
Clinical Data for HumanDengue Virus Isolates!!
In this release of ViPR we have added extensiveclinical data for ~2600 Dengue Virus isolates.Includes patient demographics and measures ofdisease severity, immune response, evolution, etc.Isolates from both Eastern and WesternHemispheres.All linked to complete genome sequences!Try our Metadata Genome Compare tool for customcomparative analysis.Search, or download complete dataset.
Like Brett Pickett, Burke Squires and 16
Genome Statistics for Virus Families
Families 14
Genera 70
Species 912
Strains 50,196
Segments 64,367
Click on family or species of interest in taxonomy below to view viral genomes, or click here to view in list format.
Genomes
Genes & proteins
Immune epitopes
3D protein structures
Identify similar sequences (BLAST)
Align sequences (MSA)
Identify short peptides in proteins
Visualize aligned sequences
Store data in working sets for future analysis
Integrate ViPR data with your laboratory data
Store analysis results
Share results and data with collaborators
Virus Families
Caliciviridae
Coronaviridae
Flaviviridae
Hepeviridae
Picornaviridae
Togaviridae
Arenaviridae
Bunyaviridae
Filoviridae
Paramyxoviridae
Rhabdoviridae
Reoviridae Herpesviridae
Poxviridae
Family: Arenaviridae (1 Genus - 1118 complete genomes) Family: Bunyaviridae (4 Genera - 2800 complete genomes) Family: Caliciviridae (6 Genera - 451 complete genomes) Family: Coronaviridae (2 Subfamilies - 536 complete genomes) Family: Filoviridae (2 Genera - 96 complete genomes) Family: Flaviviridae (3 Genera - 5967 complete genomes)
Start to type species to get suggestions Jump
Click on a featured virus of interest to go to virus-specific home page.
Click on family below or use Virus Taxonomy Browser at bottom of this page to select family or species of interest. Orclick here to view taxonomy in list format.
Jump to species in taxonomy:
Featured Viruses
Dengue Hepatitis C virus
Highlights
ViPR WorkbenchThe ViPR Workbench allows users to save 'working sets' ofsequences, searches and analysis results between websessions in their own private workspace. Users can shareworking sets or analysis results with collaborators.
Key Highlights:Save and organize working sets of sequences,analysis results and search criteriaVisualize saved analysis resultsShare working sets and analysis results with othersUpload personal sequences and combine with existing working sets
Go to Workbench
Virus Taxonomy Browser
Connect with Us
Influenza Research Database
Data Summary Updated April 26, 2011
Announcements
About Us Announcements Links Resources Support
You are logged in as [email protected]
!"#$%&'()*+% ,-./0% 1*2%34%'5%(6%*$4"%()*+% 7%0*$&$&89:9;&"4897;%4$..7+T$%GT-2$/>%8NLN9$0-,#T$6#%+C%&$#$,-6/%HCC-",/%>%L7"$67$%+C%#A$%G6U"/"5.$%-6D%&",-.Y+6$>%LR"//%G6/#"#*#$%+C%1"+"6C+,T-#"7/N
!"#$%&'()*+#+%,-$(#)+
!"#$%&'&()*+#'$(,(-./012"3*&42"5*3&0#"506*$#4&*3&072&+,+3*#+8*"+,9*+#.!:;
%,./').("%0'12$% %3-'-4($-%56*).7-'-$#8%94--% %:6./%5;6#;+?2(4-%5(#4/#+-%>.D1(4#+.'%B-1.4$
)A$,$%-,$%?%0+/"#"+6/%RA"7A%-,$%/"26"C"7-6#.E%D"CC$,$6#%5$#R$$6%#A$%2,+*0/N
5.+#$#.' F2)$#1)-%>.D1(4#+.'%5;C()2- :2G+-$+%>.'$4#G2$#'7%$.%:#7'#@#8('8-
]`?? @N@]PO<
!"#$%&"'()*&"+,(-).(/$(,0(#
*>?@>!A>'B!CDEF7=BD!
J+T$% %WE%4+,Z5$67A% %4+,Z"62NNN% %H."26%L$3*NNN% %($/*.#/% %&"/*-."[$%H."26$D%L$3*$67$/% %($/*.#/
!"#$%$%&%'#(L\H(!J%9H)H HFHX]Y\%^%&GL8HXGY\ 4_(`1\F!J &G(8L%aHWGXG\L J_W\
H5+*#%8/ H66+*67$T$6#/ X"6Z/ ($/+*,7$/ L*00+,# L"26%_*#
]+*%-,$%.+22$D%"6%-/%E*6N[A-62b*#/+*#AR$/#$,6N$D*
!"#$%&'()*+,-.&/()(0(%-&(.1&2.(34%"%&5-%+$#6-&7!"'58&9&:3(;";"#%$#7M%7-6%5$%/-U$D%-#%-6E%#"T$%-6D%#A$6,$#,"$U$D%.-#$,N%!A++/$%#A$%U"$R%E+*%R-6#%#+%/-U$>%#A$6%,$/#+,$%"#%.-#$,%RA$6%E+*%-,$,$-DEN
9"/0.-E%)E0$: "#$%&'()*!"+),$+,)#!-&!.()+%%&
9$#-"./
/!0!.#11!23-+%3#4 2C"*D
:'/EC'/E$ 2C"*D
!+.+,%L#,*7#*,$%1E: !
X-5$.: !
=-)>5,3%8>4-8;)A)A$/$%+0#"+6/%7+6#,+.%#A$%2$6$,-.%-00$-,-67$%+C%#A$%0,+#$"6%/#,*7#*,$%"6#A$%U"$R$,N
:-F:5-F:4%5-F,;=)%J"2A."2A#%X"2-6D/%"6%%%%%%%
:-F:5-F:4%7>-48>7)%J"2A."2A#%$0"#+0$/%+6%#A$%/#,*7#*,$%"6%%%%%%% N%\",/#>%/$.$7#%-6%$0"#+0$%#E0$
C,+T%#A$%."/#N%)A$6%7A$7Z%$0"#+0$/%#+%A"2A."2A#N
>"G$'H"%#"I0"1J" !*1/" -7=K%-=HH!(HH ?O?@]?O?^ O_HH`Xa9 ?O?Q]?O?P ?^?H!(HH` ?O?%?O@?]?O@b>%?Ob_
'91%L$3*$67$KL#,*7#*,$%9$#-"./
:-F:5-F:4B5,K75%97,46!7))A$/$%+0#"+6/%7+.+,>%A"2A."2A#%+,%.-5$.%7$,#-"6%C$-#*,$/%+C%#A$%/#,*7#*,$N
($/#+,$%&"$R: !
Y++T: 5667
L0"6:
%
%
%
J+T$% %b9%',+#$"6%L#,*7#*,$%L$-,7A% %($/*.#/% %',+#$"6%L#,*7#*,$%&"$R$,%I
!"#$%&%&'()(*&+,'LcH(!J%9H)H HFHXdYc%e%&GL8HXGYc 4f(`1cF!J &G(8L%\HWGXGcL JfWc
H5+*#%8/ H66+*67$T$6#/ X"6Z/ ($/+*,7$/ L*00+,# L"26%f*#
d+*%-,$%.+22$D%"6%-/%E*6N[A-62g*#/+*#AR$/#$,6N$D*
!"#$%&'()*+,-.&/()(0(%-&(.1&2.(34%"%&5-%+$#6-&7!"'58&9&:3(;";"#