Web Apollo at Genome Informatics 2014

20
APOLLO Collaborative Curation and Interactive Analysis of Genomes Monica Munoz-Torres, PhD | @monimunozto Suzanna Lewis, Ian Holmes, Colin Diesh, Deepak Unni, Christine Elsik. Berkeley Bioinformatics Open-Source Projects (BBOP) Genomics Division, Lawrence Berkeley National Laboratory Genome Informatics. Cambridge, UK. September, 2014

description

Precise elucidation of the many different biological features encoded in any genome requires careful examination and review by researchers, who gather and evaluate the available evidence to corroborate and modify gene predictions and other biological elements. This curation process allows them to resolve discrepancies and validate automated gene model hypotheses and alignments. This approach is the well-established practice for well-known genomes such as human, mouse, zebrafish, Drosophila, et cetera. Desktop Apollo was originally developed to meet these needs. The cost of sequencing a genome has been dramatically reduced by several orders of magnitude in the last decade, and the natural consequence is that more and more researchers are sequencing more and more new genomes, both within populations and across species. Because individual researchers can now readily sequence many genomes of interest, the need for a universally accessible genomic curation tool logically follows. Each new exome or genome sequenced requires visualization and curation to obtain biologically accurate genomic features sets, even for limited set of genes, because computational genome analysis remains an imperfect art. Additionally, unlike earlier genome projects, which had the advantage of more highly polished genomes, recent projects usually have lower coverage. Therefore researchers now face additional work correcting for more frequent assembly errors and annotating genes split across multiple contigs. Genome annotation is an inherently collaborative task; researchers only very rarely work in isolation, turning to colleagues for second opinions and insights from those with with expertise in particular domains and gene families. The new JavaScript based Apollo, allows researchers real-time interactivity, breaking down large amounts of data into manageable portions to mobilize groups of researchers with shared interests. We are also focused on training the next generation of researchers by reaching out to educators to make these tools available as part of curricula via workshops and webinars, and through widely applied systems such as iPlant and DNA Subway. Here we offer details of our progress. Presentation at Genome Informatics, Session (3) on Databases, Data Mining, Visualization, Ontologies and Curation. Authors: Monica C Munoz-Torres, Suzanna E. Lewis, Ian Holmes, Colin Diesh, Deepak Unni, Christine Elsik.

Transcript of Web Apollo at Genome Informatics 2014

Page 1: Web Apollo at Genome Informatics 2014

APOLLOCollaborat ive Curation and Interact ive Analysis of Genomes Monica Munoz-Torres, PhD | @monimunoztoSuzanna Lewis, Ian Holmes, Colin Diesh, Deepak Unni, Christine Elsik. Berkeley Bioinformatics Open-Source Projects (BBOP)Genomics Division, Lawrence Berkeley National LaboratoryGenome Informatics. Cambridge, UK. September, 2014

Page 2: Web Apollo at Genome Informatics 2014

OUTLINE

•  MANUAL  CURATION  is  necessary,  but  does  not  always  scale  

•  EMPOWER  CURATORS  collabora@ve  genome  annota@on  

•  WEB  APOLLO  architecture,  implementa@on,  plans  

•  BBOP  PROJECTS  future  plans  

Web  Apollo  Collabora@ve  Cura@on  and    Interac@ve  Analysis  of  Genomes  

2

Page 3: Web Apollo at Genome Informatics 2014

MANUAL ANNOTATIONis necessary

v  Automated  genome  analyses  remain  an  imperfect  art.  

v  Precise  elucida@on  of  biological  features  encoded  in  the  genome  requires  careful  examina@on  and  review.    

•  Evaluate  all  available  evidence  and  corroborate  or  modify  genome  element  predic@ons.    

•  Resolve  discrepancies  and  validate  automated  gene  model  hypotheses.    

v  Desktop  version  of  Apollo            was  designed  to  fit  the  manual  annota@on  needs  of  genome  projects  such  as  Human,  Mouse,  Fruit  fly,  Zebrafish,  etc.  

Schiex  et  al.  Nucleic  Acids  2003  (31)  13:  3738-­‐3741  

Automated Predictions

Experimental Evidence

Manual Curation 3

Page 4: Web Apollo at Genome Informatics 2014

4

CURATIONin this context

Iden@fies  elements  that  best  represent  the  underlying  biology  (including  missing  genes)  and  eliminates  elements  that  reflect  systemic  errors  of  automated  analyses.  

Assigns  func@on  through  compara@ve  analysis  of  similar  genome  elements  from  closely  related  species  using  literature,  databases,  and  researchers’  lab  data.  

1  

2  

Examples  

Comparing  7  ant  genomes  contributed  to  be_er  understanding  evolu@on  and  organiza@on  of  insect  socie@es  at  the  molecular  level;  e.g.  division  of  labor,  mutualism,  chemical  communica@on,  etc.  

Libbrecht  et  al.  2012.  Genome  Biology  2013,  14:212  

Queen  Bee  

Worker  Bee  Castes  

Larva  

Dnmt  RNAi  Royal  jelly  

Kucharski  et  al.  2008.  Science  (319)  5871:  1827-­‐1830      

Insect  Methylome  

Anchoring  molecular  markers  to  reference  genome  pointed  to  chromosomal  rearrangements  &  detec@ng  signals  of  adap@ve  radia@on  in  Heliconius  bu_erflies.    

Joron  et  al.  2011.  Nature,  477:203-­‐206  Manual Curation

Page 5: Web Apollo at Genome Informatics 2014

BUT, MANUAL CURATIONdoes not always scale

A  small  group  of  highly  trained  experts;  e.g.  GO  

1   Museum  

A  few  very  good  biologists  and  a  few  very  good  bioinforma@cians  camp  together,  during  intense  but  short  periods  of  @me.  

Jamboree  2  

Researchers  work  by  themselves,  then  may  or  may  not  publicize  results;  …  may  be  a  dead-­‐end  with  very  few  people  ever  aware  of  these  results.  

Co>age  3  

Elsik  et  al.  2006.  Genome  Res.  16(11):1329-­‐33.  

Manual Curation 5

Too  many  sequences  and  not  enough  hands  to  approach  cura@on.  

Page 6: Web Apollo at Genome Informatics 2014

POWER TO THE CURATORSaugment existing tools

Fill   in   the   gap   for   all   the   things   that  won’t   be   easy   to   cover   with   these  approaches;  this  will  allow  researchers  to  be_er  contribute  their  efforts.  

Give  more  people  the  power  to  curate!  Big  data  are  not  a  subs@tute  for,  but  a   supplement   to   tradi@onal   data  collec@on  and  analysis.  

The  Parable  of  Google  Flu.  Lazer  et  al.  2014.  Science  343  (6176):  1203-­‐1205.  

v Enable  more  curators  to  work  

v Enable  be_er  scien@fic  publishing  

v Credit  curators  for  their  work    

WEB APOLLO 6

Page 7: Web Apollo at Genome Informatics 2014

IMPROVING TOOLS FOR MANUAL ANNOTATIONour plan

“More  and  more  sequences”:  more  genomes,  within  popula@ons  and  across  species,  are  now  being  sequenced.    

 This  begs  the  need  for  a  universally  accessible  genome  cura@on  tool:  

WEB APOLLO 7

To  produce  accurate  sets  of  genomic  features.  

To  address  the  need  to  correct  for  more  frequent  assembly  and  automated  predic@on  errors  due  to  new  sequencing  technologies.  

Page 8: Web Apollo at Genome Informatics 2014

GENOME ANNOTATIONan inherently collaborative task

Researchers  onen  turn  to  colleagues  for  second  opinions  and  insight  from  those  with  exper@se  in  par@cular  areas  (e.g.,  domains,  families).  To  facilitate  and  encourage  this,  we  con@nue  to  improve  Apollo.    The  new  Javascript-­‐based  Apollo                :    

WEB APOLLO 8

v  Web  based  for  easy  access.    v  Concurrent  access  supports  real  @me  collabora@on.    v  Built-­‐in  support  for  standards  (transparently  compliant).    v  Automa@c  genera@on  of  ready-­‐made  computable  data.    v  Client-­‐side  applica@on  relieves  server  bo_leneck  and  supports  privacy.  v  Supports  annota@on  of  genes,    pseudogenes,  tRNAs,  snRNAs,  

snoRNAs,  ncRNAs,  miRNAs,  TEs,  and  repeats.  

Page 9: Web Apollo at Genome Informatics 2014

WEB APOLLOarchitecture

WEB APOLLO 9

1  

2  

3  

Page 10: Web Apollo at Genome Informatics 2014

WEB-BASED CLIENTuser interaction

v  Plugin  to  JBrowse  

v  Graphic  interface  for  edi@ng  opera@ons  and  to  handle  user  management  

v  Two  new  kinds  of  tracks:  DNA  and  User-­‐created  Annota<ons.    

1)  Pulls  from  data  service  2)  Sends  “edit”  opera@ons  to  server,  and    3)  “Listens”  to  edits  pushed  back  from  server    

WEB APOLLO 10

1  

Page 11: Web Apollo at Genome Informatics 2014

ANNOTATION EDITING ENGINEthe logic

v  Server:  Java  servlet  

v  Data  Model  (and  I/O):  GBOL.  Chado-­‐based.  Simple  Hibernate  layer  &  wrapper  bio-­‐layer  that  considers  SO.  

v  EdiKng  Logic:  Selects  longest  ORFs,  flags  non-­‐canonical  splice  sites.  This  is  where  biology  “reasons”.  

v  Plug-­‐in  Architecture:  for  sequence  alignment  searches  (BLAT).  

v  JE  (Java  version  of  Berkeley  DB):  stores  annota@ons,  edits,  and  History.  

v  Real-­‐Kme  support.    

WEB APOLLO 11

2  

Page 12: Web Apollo at Genome Informatics 2014

SERVER SIDE DATA SERVICEaccess and broker data

v  Data  are  processed  with  (perl)  pipelines  that  generate  sta@c  JSON.    

   •  Cultural  shiO:  Reliance  on  big  genome  

centers  is  not  so  prominent  any  more,  and  not  all  data  come  from  large  repositories.  

•  Mostly  from  GFF3s  (both  from  sequencing  centers  and  individual  laboratories).  

v  Data  repositories  (i.e.  Chado,  UCSC-­‐MySQL,  DAS)  accessed  by  Java  data  broker  (Trellis):  passes  them  as  JSON  to  JBrowse  for  display.    

WEB APOLLO 12

3  

Page 13: Web Apollo at Genome Informatics 2014

CURRENT COLLABORATIONScrowdsourcing development too

v  New  avenues  for  landing  on  Apollo  and  customiza@on  of  addi@onal  applica@ons.  

v  Web  services  for  alignment  and  func@onal  annota@on  tools.    v  RNAseq  datasets  being  used  to  re-­‐annotate  the  bovine  genome,  finding  

genes  that  neither  RefSeq  nor  Ensembl  predicted.  Also  crea@ng  track  of  disagreement  between  sets.    

 v  Bovine  genome  consor@um  making  previous  itera@ons  of  manual  annota@on  

efforts  (from  3  assemblies  ago)  available  for  integra@on  of  curated  models.  

WEB APOLLO 13

UNIVERSITY of MISSOURI

National Agricultural Library

Page 14: Web Apollo at Genome Informatics 2014

CURRENT COLLABORATIONStraining and contributions

Partnerships  

WEB APOLLO 14

UNIVERSITY of MISSOURI

National Agricultural Library

Nature  Reviews  Gene<cs  2009  (10),  346-­‐347  

Page 15: Web Apollo at Genome Informatics 2014

CURRENT COLLABORATIONStraining and contributions

Partnerships  

WEB APOLLO 15

UNIVERSITY of MISSOURI

National Agricultural Library

Nature  Reviews  Gene<cs  2009  (10),  346-­‐347  

Norwegian  Spruce  h_p://congenie.org/  

Phlebotomus  papatasi  

Tallapoosa  darter  h_p://darter2.westga.edu/  

Wasmania  auropunctata  

Homo  sapiens  hg19  

Pinus  taeda  hEp://dendrome.ucdavis.edu/treegenes/browsers/  

Page 16: Web Apollo at Genome Informatics 2014

FUTURE PLANSinteractive analysis and curation of variants

v  Interac@ve  explora@on  of  VCF  files  (e.g.  from  GATK,  VAAST)  in  addi@on  to  BAM  and  GVF.    Mul@ple  tracks  in  one:  visualiza@on  of  gene@c  altera@ons  and  popula@on  frequency  of  variants.  

WEB APOLLO 16

1  

1  

2  

v  Clinical  applica@ons:  analysis  of  Copy  Number  Varia@ons  for  regulatory  effects;  overlaying  display  of  the  regulatory  domains.  

Philips-­‐Creminis  and  Corces.  2013.  Cell  50  (4):461-­‐474  

2  TADs:  topologically  associa@ng  domains  

Page 17: Web Apollo at Genome Informatics 2014

FUTURE PLANSeducational tools

We  are  working  with  educators  to  make  Web  Apollo  part  of  their  curricula.  

WEB APOLLO 17

Lecture  Series.  

In  the  classroom.  At  the  lab.  

Classroom  exercises:  from  genome  sequence  to  

hypothesis.  

Cura@on  group  dedicated  to  producing  educa@on  materials  for  non-­‐model  organism  communi@es.  

Our  team  provides  online  documenta@on,  hands-­‐on  

training,  and  rapid  response  to  users.  

Page 18: Web Apollo at Genome Informatics 2014

FEDERATED ENVIRONMENTother BBOP tools

BBOP Projects 18

Page 19: Web Apollo at Genome Informatics 2014

ALL ARE WELCOMEmonthly, or permanently!

Open  Call  for  Developers  on  the  First  Thursday  of  each  month  at  9:00AM  (Pacific  Time).    

Message  @monimunozto  for  details.  

BBOP Projects 19

We  Are  Hiring:  join  us  in  the  San  Francisco  Bay  Area!    h_p://@nyurl.com/jobs-­‐at-­‐bbop    

Page 20: Web Apollo at Genome Informatics 2014

•  Berkeley  BioinformaKcs  Open-­‐source  Projects  (BBOP),  Berkeley  Lab:  Web  Apollo  and  Gene  Ontology  teams.  Suzanna  E.  Lewis  (PI).  

•  §  Chris@ne  G.  Elsik  (PI).  University  of  Missouri.    

•  *  Ian  Holmes  (PI).  University  of  California  Berkeley.  

•  Arthropod  genomics  community:  i5K  Steering  Commi_ee,  Alexie  Papanicolaou  (CSIRO),  Monica  Poelchau  (USDA/NAL),  fringy  Richards  (HGSC-­‐BCM),  BGI,  Oliver  Niehuis  at  1KITE  h_p://www.1kite.org/,  and  the  Honey  Bee  Genome  Sequencing  Consor@um.  

•  Web  Apollo  is  supported  by  NIH  grants  5R01GM080203  from  NIGMS,  and  5R01HG004483  from  NHGRI,  and  by  the  Director,  Office  of  Science,  Office  of  Basic  Energy  Sciences,  of  the  U.S.  Department  of  Energy  under  Contract  No.  DE-­‐AC02-­‐05CH11231.  

•  Insect  images  used  with  permission:  h_p://AlexanderWild.com  and  O.  Niehuis.  

•  For  your  a>enKon,  thank  you!  

Thank you. 20

Web  Apollo  

Nathan  Dunn  

Colin  Diesh  §  

Deepak  Unni  §    

 

Gene  Ontology  

Chris  Mungall  

Seth  Carbon  

Heiko  Dietze  

 

BBOP  

Web  Apollo:  h_p://GenomeArchitect.org  

GO:  h_p://GeneOntology.org  

i5K:  h_p://arthropodgenomes.org/wiki/i5K  

Alumni  

Gregg  Helt    

Ed  Lee  

Rob  Buels*  

 

Thanks!