Manual Annotation of Features in DNA sequences - Genome...
Transcript of Manual Annotation of Features in DNA sequences - Genome...
![Page 1: Manual Annotation of Features in DNA sequences - Genome …genomecuration.github.io/genometrain/c-structural... · 2015-05-23 · Genome Project (part of the FlyBase consortium) and](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f020f2e7e708231d4025f93/html5/thumbnails/1.jpg)
Manual Annotation of Features in DNA sequences
Eukaryotic Genome Annotation and Analysis Course
![Page 2: Manual Annotation of Features in DNA sequences - Genome …genomecuration.github.io/genometrain/c-structural... · 2015-05-23 · Genome Project (part of the FlyBase consortium) and](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f020f2e7e708231d4025f93/html5/thumbnails/2.jpg)
Overview
�What is manual annotation?�Why do manual annotation?�Tools for manual annotation
GBrowseApolloArtemis
�Critical evaluation of evidence
![Page 3: Manual Annotation of Features in DNA sequences - Genome …genomecuration.github.io/genometrain/c-structural... · 2015-05-23 · Genome Project (part of the FlyBase consortium) and](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f020f2e7e708231d4025f93/html5/thumbnails/3.jpg)
What is manual annotation?
�Automated gene structure annotations are often imperfect and can benefit from manual refinement.
�A manual annotator uses different tools to view and curate the gene structure.
![Page 4: Manual Annotation of Features in DNA sequences - Genome …genomecuration.github.io/genometrain/c-structural... · 2015-05-23 · Genome Project (part of the FlyBase consortium) and](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f020f2e7e708231d4025f93/html5/thumbnails/4.jpg)
Why do manual annotation?
�Finding the genes is a primary goal of genome sequencing, and the gene set is the foundation for further studies. Accuracy is essential!
�Scientists can make informed decisions to correct faulty automated annotations.
�Model organisms have to be annotated to GOLD standards.
![Page 5: Manual Annotation of Features in DNA sequences - Genome …genomecuration.github.io/genometrain/c-structural... · 2015-05-23 · Genome Project (part of the FlyBase consortium) and](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f020f2e7e708231d4025f93/html5/thumbnails/5.jpg)
Manual annotation tools
� Annotation tools allow you to visualize the location of features with respect to each other.
� They can also display evidence alignments and other information relevant to the process of annotation.
� Annotation editors allow the user to make changes in the annotation of a sequence and save these changes to a database.
� This talk will mainly focus on some open source annotation viewers and editors, however many commercial solutions are also available.
![Page 6: Manual Annotation of Features in DNA sequences - Genome …genomecuration.github.io/genometrain/c-structural... · 2015-05-23 · Genome Project (part of the FlyBase consortium) and](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f020f2e7e708231d4025f93/html5/thumbnails/6.jpg)
Open source annotation tools/viewers
GBROWSE
APOLLO
ARTEMIS
![Page 7: Manual Annotation of Features in DNA sequences - Genome …genomecuration.github.io/genometrain/c-structural... · 2015-05-23 · Genome Project (part of the FlyBase consortium) and](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f020f2e7e708231d4025f93/html5/thumbnails/7.jpg)
GBrowse
The Generic Genome Browser (also referred to as 'gbrowser') is a combination of database and interactive web page for manipulating and displaying annotations on genomes. It is similar to the genome browser used at UCSD and Ensemble
http://www.gmod.org/gbrowse
Hands on this afternoon
![Page 8: Manual Annotation of Features in DNA sequences - Genome …genomecuration.github.io/genometrain/c-structural... · 2015-05-23 · Genome Project (part of the FlyBase consortium) and](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f020f2e7e708231d4025f93/html5/thumbnails/8.jpg)
Important Features of GBrowse
� Web based annotation Viewer� Simultaneous bird's eye and detailed views of the genome. � Scroll, zoom and center functions� Attach arbitrary URLs to any annotation. � Order and appearance of tracks are customizable by
administrator and end-user. � Search by annotation ID, name, or comment. � Supports third party annotation using
GFF format� Settings persist across sessions. � Allows DNA and GFF dumps. � DNA sequence can be loaded into the database and displayed in a track along
with the annotation.� Annotation provided by the user can also be displayed in a track. The annotation
data can be in a file on a remote server.
It is mainly used to view and critically evaluate the various evidence
![Page 9: Manual Annotation of Features in DNA sequences - Genome …genomecuration.github.io/genometrain/c-structural... · 2015-05-23 · Genome Project (part of the FlyBase consortium) and](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f020f2e7e708231d4025f93/html5/thumbnails/9.jpg)
GBrowse needs:
�Data files DNA and features files to load into the local
database. e.g. gff files
�Conf files GBrowse configuration files for you to take and
modify.
![Page 10: Manual Annotation of Features in DNA sequences - Genome …genomecuration.github.io/genometrain/c-structural... · 2015-05-23 · Genome Project (part of the FlyBase consortium) and](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f020f2e7e708231d4025f93/html5/thumbnails/10.jpg)
GBrowse : An example :100kbp
![Page 11: Manual Annotation of Features in DNA sequences - Genome …genomecuration.github.io/genometrain/c-structural... · 2015-05-23 · Genome Project (part of the FlyBase consortium) and](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f020f2e7e708231d4025f93/html5/thumbnails/11.jpg)
Configure Tracks
![Page 12: Manual Annotation of Features in DNA sequences - Genome …genomecuration.github.io/genometrain/c-structural... · 2015-05-23 · Genome Project (part of the FlyBase consortium) and](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f020f2e7e708231d4025f93/html5/thumbnails/12.jpg)
Detailed information about a gene
![Page 13: Manual Annotation of Features in DNA sequences - Genome …genomecuration.github.io/genometrain/c-structural... · 2015-05-23 · Genome Project (part of the FlyBase consortium) and](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f020f2e7e708231d4025f93/html5/thumbnails/13.jpg)
Clickable links
![Page 14: Manual Annotation of Features in DNA sequences - Genome …genomecuration.github.io/genometrain/c-structural... · 2015-05-23 · Genome Project (part of the FlyBase consortium) and](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f020f2e7e708231d4025f93/html5/thumbnails/14.jpg)
Load and compare annotations
Visually compareAnnotationsBy Differentgroups
![Page 15: Manual Annotation of Features in DNA sequences - Genome …genomecuration.github.io/genometrain/c-structural... · 2015-05-23 · Genome Project (part of the FlyBase consortium) and](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f020f2e7e708231d4025f93/html5/thumbnails/15.jpg)
Repeat region information
![Page 16: Manual Annotation of Features in DNA sequences - Genome …genomecuration.github.io/genometrain/c-structural... · 2015-05-23 · Genome Project (part of the FlyBase consortium) and](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f020f2e7e708231d4025f93/html5/thumbnails/16.jpg)
Highlight Regions
![Page 17: Manual Annotation of Features in DNA sequences - Genome …genomecuration.github.io/genometrain/c-structural... · 2015-05-23 · Genome Project (part of the FlyBase consortium) and](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f020f2e7e708231d4025f93/html5/thumbnails/17.jpg)
Highlight Features
![Page 18: Manual Annotation of Features in DNA sequences - Genome …genomecuration.github.io/genometrain/c-structural... · 2015-05-23 · Genome Project (part of the FlyBase consortium) and](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f020f2e7e708231d4025f93/html5/thumbnails/18.jpg)
Display options
![Page 19: Manual Annotation of Features in DNA sequences - Genome …genomecuration.github.io/genometrain/c-structural... · 2015-05-23 · Genome Project (part of the FlyBase consortium) and](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f020f2e7e708231d4025f93/html5/thumbnails/19.jpg)
Links to ManateeRama and Linda will talk about Manatee tomorrowAnd Mathangi will demo manatee
![Page 20: Manual Annotation of Features in DNA sequences - Genome …genomecuration.github.io/genometrain/c-structural... · 2015-05-23 · Genome Project (part of the FlyBase consortium) and](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f020f2e7e708231d4025f93/html5/thumbnails/20.jpg)
Open source annotation tools/viewers
GBROWSE
APOLLO
ARTEMIS
![Page 21: Manual Annotation of Features in DNA sequences - Genome …genomecuration.github.io/genometrain/c-structural... · 2015-05-23 · Genome Project (part of the FlyBase consortium) and](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f020f2e7e708231d4025f93/html5/thumbnails/21.jpg)
APOLLOhttp://www.fruitfly.org/annot/apollo/
![Page 22: Manual Annotation of Features in DNA sequences - Genome …genomecuration.github.io/genometrain/c-structural... · 2015-05-23 · Genome Project (part of the FlyBase consortium) and](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f020f2e7e708231d4025f93/html5/thumbnails/22.jpg)
About Apollo
� Apollo is a genome annotation viewer and editor
� It was developed as a collaboration between the Berkeley Drosophila Genome Project (part of the FlyBase consortium) and The Sanger Institute in Cambridge, UK.
� Apollo allows researchers to explore genomic annotations at manylevels of detail, and to perform expert annotation curation, all in a graphical environment.
� The Generic Model Organism Database (GMOD) project, which aims to provide a complete ready-to-use toolkit for analyzing whole genomes, has adopted Apollo as its annotation workbench .
� Apollo is a Java application that can be downloaded and run on Windows, Mac OS X, or any Unix-type system (including Linux).
![Page 23: Manual Annotation of Features in DNA sequences - Genome …genomecuration.github.io/genometrain/c-structural... · 2015-05-23 · Genome Project (part of the FlyBase consortium) and](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f020f2e7e708231d4025f93/html5/thumbnails/23.jpg)
Important features of Apollo
� Apollo provides 'semantic zooming' � You can drag and drop features from evidence panel. � Simple interface for editing gene models (merging, splitting,
deleting, adding exons/transcripts, set 5’, 3’ or calculate longest open reading frame)
� By default it can display both strands at the same time.� Sequence-level adjustments are possible with Apollo's exon
editor. � Apollo allows you to tag results with a comment� Apollo automatically creates alternatively spliced transcripts for a
gene whenever the Open Reading Frames (ORFs) of transcripts overlap.
![Page 24: Manual Annotation of Features in DNA sequences - Genome …genomecuration.github.io/genometrain/c-structural... · 2015-05-23 · Genome Project (part of the FlyBase consortium) and](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f020f2e7e708231d4025f93/html5/thumbnails/24.jpg)
Reading Data
�Apollo can read and load data directly from a chado database
�Apollo can also read Chado XML or GAME XML files
� It can read Ensembl GFF files� It can read Ensembl schema databases � It can read GenBank or EMBL format
Apollo can transparently access data across the net work from remote machines, as well as reading files that reside locally.
![Page 25: Manual Annotation of Features in DNA sequences - Genome …genomecuration.github.io/genometrain/c-structural... · 2015-05-23 · Genome Project (part of the FlyBase consortium) and](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f020f2e7e708231d4025f93/html5/thumbnails/25.jpg)
Apollo – Main window
Evidence
Forward Strand
Reverse Strand
Evidence
Gene Finders Database Searches
![Page 26: Manual Annotation of Features in DNA sequences - Genome …genomecuration.github.io/genometrain/c-structural... · 2015-05-23 · Genome Project (part of the FlyBase consortium) and](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f020f2e7e708231d4025f93/html5/thumbnails/26.jpg)
Types Panel
![Page 27: Manual Annotation of Features in DNA sequences - Genome …genomecuration.github.io/genometrain/c-structural... · 2015-05-23 · Genome Project (part of the FlyBase consortium) and](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f020f2e7e708231d4025f93/html5/thumbnails/27.jpg)
Finding more about annotation and features
Right click on a gene
![Page 28: Manual Annotation of Features in DNA sequences - Genome …genomecuration.github.io/genometrain/c-structural... · 2015-05-23 · Genome Project (part of the FlyBase consortium) and](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f020f2e7e708231d4025f93/html5/thumbnails/28.jpg)
Annotation info editor
![Page 29: Manual Annotation of Features in DNA sequences - Genome …genomecuration.github.io/genometrain/c-structural... · 2015-05-23 · Genome Project (part of the FlyBase consortium) and](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f020f2e7e708231d4025f93/html5/thumbnails/29.jpg)
FIND Function
![Page 30: Manual Annotation of Features in DNA sequences - Genome …genomecuration.github.io/genometrain/c-structural... · 2015-05-23 · Genome Project (part of the FlyBase consortium) and](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f020f2e7e708231d4025f93/html5/thumbnails/30.jpg)
Exon detail editor
Sliding window
![Page 31: Manual Annotation of Features in DNA sequences - Genome …genomecuration.github.io/genometrain/c-structural... · 2015-05-23 · Genome Project (part of the FlyBase consortium) and](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f020f2e7e708231d4025f93/html5/thumbnails/31.jpg)
An example
![Page 32: Manual Annotation of Features in DNA sequences - Genome …genomecuration.github.io/genometrain/c-structural... · 2015-05-23 · Genome Project (part of the FlyBase consortium) and](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f020f2e7e708231d4025f93/html5/thumbnails/32.jpg)
Open source annotation tools/viewers
GBROWSE
APOLLO
ARTEMIS
![Page 33: Manual Annotation of Features in DNA sequences - Genome …genomecuration.github.io/genometrain/c-structural... · 2015-05-23 · Genome Project (part of the FlyBase consortium) and](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f020f2e7e708231d4025f93/html5/thumbnails/33.jpg)
Artemis
� Developed by Sanger Institute, UK� Reads in Genbank, EMBL and GFF3 files and sequence in
FASTA format.� Displays annotation derived from a feature table.� Has a selection of tools such as a G/C graph.� Allows the direct display of blast results.� Artemis is Java based (cross platform), easy to run and install.� Allows editing the annotation and export it in GenBank, EMBL,
and GFF3 format
http://www.sanger.ac.uk/Software/Artemis/ Hands on this afternoon
![Page 34: Manual Annotation of Features in DNA sequences - Genome …genomecuration.github.io/genometrain/c-structural... · 2015-05-23 · Genome Project (part of the FlyBase consortium) and](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f020f2e7e708231d4025f93/html5/thumbnails/34.jpg)
Sequence Viewing
GC Plot
SequenceAnd
FeatureOverview
Feature Table
DNA ViewAnd
6 FrameTranslation
![Page 35: Manual Annotation of Features in DNA sequences - Genome …genomecuration.github.io/genometrain/c-structural... · 2015-05-23 · Genome Project (part of the FlyBase consortium) and](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f020f2e7e708231d4025f93/html5/thumbnails/35.jpg)
Annotating with Artemis
Evidence shows thatthis region shouldbe annotated as a 5’-UTR (hypothetically)
![Page 36: Manual Annotation of Features in DNA sequences - Genome …genomecuration.github.io/genometrain/c-structural... · 2015-05-23 · Genome Project (part of the FlyBase consortium) and](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f020f2e7e708231d4025f93/html5/thumbnails/36.jpg)
Annotating with Artemis
5’ UTRinsertedin Genemodel
![Page 37: Manual Annotation of Features in DNA sequences - Genome …genomecuration.github.io/genometrain/c-structural... · 2015-05-23 · Genome Project (part of the FlyBase consortium) and](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f020f2e7e708231d4025f93/html5/thumbnails/37.jpg)
Comparing the viewers
Users may add “private” annotation
Designed for annotating sequence
Designed for annotating sequence
Annotating
Convenient via the web, possible to share databases
Users can share databases
Not designed for sharing
Sharing
Flat file or databaseFlat file or databaseFlat fileData Source
Genome Browser
ApolloArtemis
![Page 38: Manual Annotation of Features in DNA sequences - Genome …genomecuration.github.io/genometrain/c-structural... · 2015-05-23 · Genome Project (part of the FlyBase consortium) and](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f020f2e7e708231d4025f93/html5/thumbnails/38.jpg)
Weighing the Evidence
full-length cDNA
EST
protein
gene prediction
Full-length cDNAs are the most reliable pieces of evidence, followed by EST, protein, and finally gene prediction evidence. Evidence of the highest reliability is used for structural annotation.
Critical evaluation of evidence
![Page 39: Manual Annotation of Features in DNA sequences - Genome …genomecuration.github.io/genometrain/c-structural... · 2015-05-23 · Genome Project (part of the FlyBase consortium) and](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f020f2e7e708231d4025f93/html5/thumbnails/39.jpg)
full-length cDNA Evidence
![Page 40: Manual Annotation of Features in DNA sequences - Genome …genomecuration.github.io/genometrain/c-structural... · 2015-05-23 · Genome Project (part of the FlyBase consortium) and](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f020f2e7e708231d4025f93/html5/thumbnails/40.jpg)
EST Evidence
When encountering EST evidence, one can examine information about the EST or the ESTs used to make a Tentative Consensus (TC) sequence at Gene Indices WebPages:
http://compbio.dfci.harvard.edu/tgi/
![Page 41: Manual Annotation of Features in DNA sequences - Genome …genomecuration.github.io/genometrain/c-structural... · 2015-05-23 · Genome Project (part of the FlyBase consortium) and](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f020f2e7e708231d4025f93/html5/thumbnails/41.jpg)
EST Evidence
EST evidence
![Page 42: Manual Annotation of Features in DNA sequences - Genome …genomecuration.github.io/genometrain/c-structural... · 2015-05-23 · Genome Project (part of the FlyBase consortium) and](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f020f2e7e708231d4025f93/html5/thumbnails/42.jpg)
Protein Evidence
![Page 43: Manual Annotation of Features in DNA sequences - Genome …genomecuration.github.io/genometrain/c-structural... · 2015-05-23 · Genome Project (part of the FlyBase consortium) and](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f020f2e7e708231d4025f93/html5/thumbnails/43.jpg)
Questions?