An introduction to Web Apollo for the Biomphalaria glabatra research community.

29
An introduction to Web Apollo. A webinar for the Biomphalaria glabrata research community. Monica Munoz-Torres, PhD | @monimunozto Berkeley Bioinformatics Open-Source Projects (BBOP) Genomics Division, Lawrence Berkeley National Laboratory 18 June, 2014 UNIVERSITY OF CALIFORNIA

description

Web Apollo is a web-based, collaborative genomic annotation editing platform. We need annotation editing tools to modify and refine precise location and structure of the genome elements that predictive algorithms cannot yet resolve automatically. This presentation is an introduction to how the manual annotation process takes place using Web Apollo. It is addressed to the members of the Biomphalaria glabatra research community.

Transcript of An introduction to Web Apollo for the Biomphalaria glabatra research community.

Page 1: An introduction to Web Apollo for the Biomphalaria glabatra research community.

An introduction to Web Apollo. A webinar for the Biomphalaria glabrata research community.

Monica Munoz-Torres, PhD | @monimunozto Berkeley Bioinformatics Open-Source Projects (BBOP)

Genomics Division, Lawrence Berkeley National Laboratory 18 June, 2014

UNIVERSITY OF CALIFORNIA

Page 2: An introduction to Web Apollo for the Biomphalaria glabatra research community.

Outline 1.  What is Web Apollo?:

• Definition & working concept.

2.  Our Experience With Community Based Curation.

3.  The Manual Annotation Process.

4.  Becoming acquainted with Web Apollo.

An introduction to Web Apollo. A webinar for the Biomphalaria glabrata research community.

Outline 2

Page 3: An introduction to Web Apollo for the Biomphalaria glabatra research community.

What is Web Apollo? •  Web Apollo is a web-based, collaborative genomic

annotation editing platform. We  need  annota)on  edi)ng  tools  to  modify  and  refine  the  precise  loca)on  and  structure  of  the  genome  elements  that  predic)ve  algorithms  cannot  yet  resolve  automa)cally.

3 1. What is Web Apollo?

Find more about Web Apollo at http://GenomeArchitect.org

and Genome Biol 14:R93. (2013).

Page 4: An introduction to Web Apollo for the Biomphalaria glabatra research community.

Brief history of Apollo*:

a. Desktop: one person at a time editing a specific region, annotations saved in local files; slowed down collaboration. b. Java Web Start: users saved annotations directly to a centralized database; potential issues with stale annotation data remained.

1. What is Web Apollo? 4

Biologists could finally visualize computational analyses and experimental evidence from genomic features and build manually-curated consensus gene structures. Apollo became a very popular, open source tool (insects, fish, mammals, birds, etc.).

*

Page 5: An introduction to Web Apollo for the Biomphalaria glabatra research community.

Web Apollo •  Browser-based tool integrated with JBrowse.

•  Two new tracks: “Annotation” and “DNA Sequence”

•  Allows for intuitive annotation creation and editing, with gestures and pull-down menus to create and modify transcripts and exons structures, insert comments (CV, freeform text), etc.

•  Customizable look & feel.

•  Edits in one client are instantly pushed to all other clients: Collaborative!

1. What is Web Apollo? 5

Page 6: An introduction to Web Apollo for the Biomphalaria glabatra research community.

Working Concept

In the context of gene manual annotation, curation tries to find the best examples and/or eliminate most errors.

To conduct manual annotation efforts: Gather and evaluate all available evidence

using quality-control metrics to corroborate or modify automated annotation predictions.

Perform sequence similarity searches (phylogenetic framework) and use literature and public databases to: • Predict functional assignments from experimental data.

• Distinguish orthologs from paralogs, and classify gene membership in families and networks.

2. In our experience. 6

Automated gene models

Evidence: cDNAs, HMM domain searches, alignments with assemblies or

genes from other species.

Manual annotation & curation

Page 7: An introduction to Web Apollo for the Biomphalaria glabatra research community.

Dispersed, community-based gene manual annotation efforts. We continuously train and support

hundreds of geographically dispersed scientists from many research communities, to perform biologically supported manual annotations using Web Apollo.

– Gate keepers and monitoring. – Written tutorials. – Training workshops and geneborees. – Personalized user support.

2. In our experience. 7

Page 8: An introduction to Web Apollo for the Biomphalaria glabatra research community.

What we have learned.

Harvesting expertise from dispersed researchers who assigned functions to predicted and curated peptides we have developed more interactive and responsive tools, as well as better visualization, editing, and analysis capabilities.

8 2. In our experience.

http://people.csail.mit.edu/fredo/PUBLI/Drawing/

Page 9: An introduction to Web Apollo for the Biomphalaria glabatra research community.

Collaborative Efforts Improved Automated Annotations*

In many cases, automated annotations have been improved (e.g: Apis mellifera. Elsik et al. BMC Genomics 2014, 15:86).

Also, learned of the challenges of newer sequencing technologies, e.g.: – Frameshifts and indel errors – Split genes across scaffolds – Highly repetitive sequences

To face these challenges, we train annotators in recovering coding sequences in agreement with all available biological evidence.

9 2. In our experience.

Page 10: An introduction to Web Apollo for the Biomphalaria glabatra research community.

It is helpful to work together. Scientific community efforts bring together domain-specific and natural history expertise that would otherwise remain disconnected.

Breaking down large amounts of data into manageable portions and mobilizing groups of researchers to extract the most accurate representation of the biology from all available data distills invaluable knowledge from genome analysis.

10 2. In our experience.

Page 11: An introduction to Web Apollo for the Biomphalaria glabatra research community.

Understanding the evolution of sociality Comparing the genomes of 7 species of ants

contributed to a better understanding of the evolution and organization of insect societies at the molecular level.

Insights drawn mainly from six core aspects of ant biology:

1.  Alternative morphological castes 2.  Division of labor 3.  Chemical Communication 4.  Alternative social organization 5.  Social immunity 6.  Mutualism

11

Libbrecht et al. 2012. Genome Biology 2013, 14:212

2. In our experience.

Atta cephalotes (above) and Harpegnathos saltator. ©alexanderwild.com

Groups of communities continue to guide our efforts.

Page 12: An introduction to Web Apollo for the Biomphalaria glabatra research community.

A little training goes a long way!

With the right tools, wet lab scientists make exceptional curators who can easily learn to maximize the generation of accurate, biologically supported gene models.

12 2. In our experience.

Page 13: An introduction to Web Apollo for the Biomphalaria glabatra research community.

Manual Annotation

How do we get there?

13

Assembly Manual

annotation Experimental

validation Automated Annotation

In a genome sequencing project…

3. How do we get there?

Page 14: An introduction to Web Apollo for the Biomphalaria glabatra research community.

Gene Prediction

Identification of protein-coding genes, tRNAs, rRNAs, regulatory motifs, repetitive elements (masked), etc.

- Ab initio (DNA composition): Augustus, GENSCAN, geneid, fgenesh

- Homology-based: E.g: SGP2, fgenesh++

14

Nucleic Acids 2003 vol. 31 no. 13 3738-3741

3. How do we get there?

Page 15: An introduction to Web Apollo for the Biomphalaria glabatra research community.

Gene Annotation Integration of data from prediction tools to generate a

consensus set of predictions or gene models. •  Models may be organized using:

-  automatic integration of predicted sets; e.g: GLEAN -  packaging necessary tools into pipeline; e.g: MAKER

•  All available biological evidence (e.g. transcriptomes) further informs the annotation process.

15 3. How do we get there?

In some cases algorithms and metrics used to generate consensus sets may actually reduce the accuracy of the gene’s representation; in such cases it is usually better to use an ab initio model to create a new annotation.

Page 16: An introduction to Web Apollo for the Biomphalaria glabatra research community.

Manual Genome Annotation

•  Identifies elements that best represent the underlying biology.

•  Eliminates elements that reflect the systemic errors of automated genome analyses.

•  Determines functional roles through comparative analysis of well-studied, phylogenetically similar genome elements using literature, databases, and the researcher’s experience.

16 3. How do we get there?

Page 17: An introduction to Web Apollo for the Biomphalaria glabatra research community.

Curation Process is Necessary

1.  A computationally predicted consensus gene set is generated using multiple lines of evidence.

2.  Manual annotation takes place.

3.  Ideally consensus computational predictions will be integrated with manual annotations to produce an updated Official Gene Set (OGS).

Otherwise, “incorrect and incomplete genome annotations will poison every experiment that uses them”.

- M. Yandell.

17 3. How do we get there?

Page 18: An introduction to Web Apollo for the Biomphalaria glabatra research community.

Web Apollo

Page 19: An introduction to Web Apollo for the Biomphalaria glabatra research community.

Sort

Web Apollo

19

The Sequence Selection Window

4. Becoming Acquainted with Web Apollo.

19

Page 20: An introduction to Web Apollo for the Biomphalaria glabatra research community.

Navigation tools: pan and zoom Search box: go

to a scaffold or a gene model.

Grey bar of coordinates indicates location. You can also select here in order to zoom to a sub-region.

‘View’: change color by CDS, toggle strands, set highlight.

‘File’: Upload your own evidence: GFF3, BAM, BigWig, VCF*. Add combination and sequence search tracks.

‘Tools’: Use BLAT to query the genome with a protein or DNA sequence.

Available Tracks

Evidence Tracks Area

‘User-created Annotations’ Track

Login

Web Apollo

20

Graphical User Interface (GUI) for editing annotations

4. Becoming Acquainted with Web Apollo.

Page 21: An introduction to Web Apollo for the Biomphalaria glabatra research community.

Flags non-canonical splice sites.

Selection of features and sub-features

Edge-matching

Evidence Tracks Area

‘User-created Annotations’ Track

The editing logic in the server: §  selects longest ORF as CDS §  flags non-canonical splice sites

21

Web Apollo

4. Becoming Acquainted with Web Apollo.

21

Page 22: An introduction to Web Apollo for the Biomphalaria glabatra research community.

DNA Track

‘User-created Annotations’ Track

Web Apollo

22 4. Becoming Acquainted with Web Apollo.

§  There are two new kinds of tracks for: §  annotation editing §  sequence alteration editing

Page 23: An introduction to Web Apollo for the Biomphalaria glabatra research community.

Web Apollo

23

Annotations, annotation edits, and History: stored in a centralized database.

4. Becoming Acquainted with Web Apollo.

23

Page 24: An introduction to Web Apollo for the Biomphalaria glabatra research community.

Web Apollo

24 4. Becoming Acquainted with Web Apollo.

24

•  DBXRefs •  PubMed IDs •  GO terms •  Comments

The Information Editor

Page 25: An introduction to Web Apollo for the Biomphalaria glabatra research community.

Additional Functionality In addition to protein-coding gene annotation that you know and love.

•  Non-coding genes: ncRNAs, miRNAs, repeat regions, and TEs

•  Sequence alterations (less coverage = more fragmentation)

•  Visualization of stage and cell-type specific transcription data as coverage plots, heat maps, and alignments

25 4. Becoming Acquainted with Web Apollo.

25

Page 26: An introduction to Web Apollo for the Biomphalaria glabatra research community.

1.  Select a chromosomal region of interest, e.g. scaffold. 2.  Select appropriate evidence tracks. 3.  Determine whether a feature in an existing evidence track will

provide a reasonable gene model to start working. -  If yes: select and drag the feature to the ‘User-created

Annotations’ area, creating an initial gene model. If necessary use editing functions to adjust the gene model.

-  If not: let’s talk. 4.  Check your edited gene model for integrity and accuracy by

comparing it with available homologs.

4. Becoming Acquainted with Web Apollo

General Process of Curation

26 |

Always remember: when annotating gene models using Web Apollo, you are looking at a ‘frozen’ version of the genome assembly and you will not be able to modify the assembly itself.

26

Page 27: An introduction to Web Apollo for the Biomphalaria glabatra research community.

Example: NADH dehydrogenase subunit 5 Live Demonstration using the Apis mellifera and Biomphalaria

glabrata genomes.

Example 27

A public Honey Bee Web Apollo Demo is available at http://genomearchitect.org/WebApolloDemo

Page 28: An introduction to Web Apollo for the Biomphalaria glabatra research community.

Arthropod-centric Thanks! AgriPest Base FlyBase Hymenoptera Genome Database VectorBase Acromyrmex echinatior Acyrthosiphon pisum Apis mellifera Atta cephalotes Bombus terrestris Camponotus floridanus Helicoverpa armigera Linepithema humile Manduca sexta Mayetiola destructor Nasonia vitripennis Pogonomyrmex barbatus Solenopsis invicta Tribolium castaneum…and many more!

28

28

Thank you.

Page 29: An introduction to Web Apollo for the Biomphalaria glabatra research community.

Thanks! •  Berkeley Bioinformatics Open-source Projects

(BBOP), Berkeley Lab: Web Apollo and Gene Ontology teams. Suzanna E. Lewis (PI).

•  Elsik Lab. § University of Missouri. Christine G. Elsik (PI).

•  Ian Holmes (PI). * University of California Berkeley.

•  Arthropod genomics community, i5K http://www.arthropodgenomes.org/wiki/i5K Steering Committee, Teams at USDA/NAL, HGSC-BCM, BGI, and 1KITE http://www.1kite.org/.

•  Web Apollo is supported by NIH grants 5R01GM080203 from NIGMS, and 5R01HG004483 from NHGRI, and by the Director, Office of Science, Office of Basic Energy Sciences, of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231.

•  Insect images used with permission: http://AlexanderWild.com

•  For your attention, thank you!

Thank you. 29

Web Apollo

Ed Lee

Gregg Helt

Colin Diesh §

Deepak Unni §

Rob Buels *

Gene Ontology

Chris Mungall

Seth Carbon

Heiko Dietze

BBOP

Web Apollo: http://GenomeArchitect.org

GO: http://GeneOntology.org

i5K: http://arthropodgenomes.org/wiki/i5K