Research resources: curating the new eagle-i discovery system

1
Research resources: cura,ng the new eaglei discovery system Nicole Vasilevsky 1 , Tenille Johnson 2 , Karen Corday 2 , Carlo Torniai 1 , Ma:hew Brush 1 , Sco: Hoffmann 1 , Erik Segerdell 1 , Melanie L. Wilson 1 , Christopher J. Shaffer 1 , David Robinson 1 , and Melissa A. Haendel 1** 1 Oregon Health & Science University, Library, Portland, Oregon 2 Harvard Medical School, Center for Biomedical InformaTcs, Cambridge, Massachuse:s www.eaglei.net Open source so;ware available at: h=ps://open.med.harvard.edu/display/eaglei/So;ware eaglei Ontology GoogleCode: h=p://code.google.com/p/eaglei/ Acknowledgements **We, the authors, represent the members and leaders of the eaglei CuraTon team, and describe some of the efforts and products of all teams involved in the development of the eaglei discovery system. We would like to thank the Resource NavigaTon team, led by Richard Pearse; SoWware Build team, led by Daniela Bourges; and Project Management team, led by Julie McMurry. We would also like to thank Jackie Wirz. We gratefully acknowlege NIH award #U24RR029825. S eman,c W eb E ntry and E di,ng T ool Components of the eaglei annotaTon tool, known by the acronym SWEET, are generated directly from the eaglei ontology. The SWEET contains both annotaTon fields that are autopopulated using the ontology (purple box) and free text (orange box). Entrez Gene ID links out to the NCBI database (red box). Fields in the SWEET can also link records to other records in the repository, such as related publicaTons or documentaTon (blue box). Users can request new terms be added to the ontology using the Term Request field. Ontological modeling of research resources Data Cura,on at eaglei Development of data curaTon pracTces at eaglei depended on the Resource NavigaTon team for data collecTon, the CuraTon team for ontology development and data QA, and the SoWware team for user interface design in an iteraTve process. Tools and documentaTon were developed to assist users and team members with each of these processes. Lessons Learned Balance the data you need with the data you can get Documenta,on and quality assurance are itera,ve Tools and technology choices depend on the above Denotes required annotaTons. Denotes quesTons eliciTng informaTon for annotaTon. Denotes redirecTon to a different decision tree. Denotes higher value/priority annotaTons. Denotes medium value/ priority annotaTons. Denotes lower value/priority annotaTons. Denotes drop down or annotaTon field examples. Decision trees assist with data entry and annota,on of resources The Ideal Scholarly Research Cycle During the course of collecTng informaTon about research resources, which many laboratories were willing to share, we discovered that while larger core faciliTes rouTnely have resource and workflow organizaTon strategies, primary research labs very rarely do. This creates barriers to reproducing experiments as well as to publishing and sharing resources. Giving labs organizaTonal tools can help address these issues. Provide scien,sts with the tools they need to record their resources during the course of research How can we make this cycle more efficient? o Researchers produce data and resources that lead to publicaTons. o Published data informs researchers of new experimental designs. o InformaTon about researchers, resources, data, and published papers is stored in various public repositories. The goal of eaglei is to make scienTfic research resources more visible via a federated network of insTtuTonal repositories. Using an ontologydriven approach for biomedical resource annotaTon and discovery, the Network currently includes resources from 23 insTtuTons. New ini,a,ves with eaglei NCATS has funded two new projects that leverage eaglei to further translaTonal science. The first project aims to expand the breadth, quality, and discoverability of data about people and resources by harmonizing the ontologies of VIVO, eaglei, and ShareCenter (www.ctsaconnect.org). The second project aims to expand the eaglei plakorm to new CTSA insTtuTons, and to publish resources as Linked Open Data. BiocuraTon Data collecTon User interface design Ontology development CuraTon guidelines SPARQL query tool for QA Ontology Browser SWEET Search applicaTon Decision trees Google code The eaglei workflow Search applicaTon AnnotaTon tool InsTtuTonal repositories Biocurator Ontology Request new terms Request resources eaglei parTcipaTng lab Researcher Resources and data Researcher Publica,ons Public repositories eaglei MODs NIF Entrez Gene... Public repositories PubMed Google Scholar Mendeley… Professional networking: VIVO Harvard Profiles LinkedIn… 1 3 2 Major eaglei resource types are shown as dark boxes. Persons and laboratories play a central role in eaglei. Classes and properTes are reused from preexisTng ontologies or created de novo. Examples of some of the relaTons between the classes are indicated.

Transcript of Research resources: curating the new eagle-i discovery system

Page 1: Research resources: curating the new eagle-i discovery system

Research  resources:  cura,ng  the  new  eagle-­‐i  discovery  system  Nicole  Vasilevsky1,  Tenille  Johnson2,  Karen  Corday2,  Carlo  Torniai1,  Ma:hew  Brush1,  Sco:  Hoffmann1,  Erik  Segerdell1,    Melanie  L.  Wilson1,  Christopher  J.  Shaffer1,  David  Robinson1,  and  Melissa  A.  Haendel1**  1  Oregon  Health  &  Science  University,  Library,  Portland,  Oregon  2  Harvard  Medical  School,  Center  for  Biomedical  InformaTcs,  Cambridge,  Massachuse:s  

www.eagle-­‐i.net  Open  source  so;ware  available  at:    h=ps://open.med.harvard.edu/display/eaglei/So;ware  eagle-­‐i  Ontology  GoogleCode:    h=p://code.google.com/p/eagle-­‐i/    

Acknowledgements  **We,  the  authors,  represent  the  members  and  leaders  of  the  eagle-­‐i  CuraTon  team,  and  describe  some  of  the  efforts  and  products  of  all  teams  involved  in  the  development  of  the  eagle-­‐i  discovery  system.  We  would  like  to  thank  the  Resource  NavigaTon  team,  led  by  Richard  Pearse;  SoWware  Build  team,  led  by  Daniela  Bourges;  and  Project  Management  team,  led  by  Julie  McMurry.  We  would  also  like  to  thank  Jackie  Wirz.  We  gratefully  acknowlege  NIH  award  #U24RR029825.  

Seman,c  Web  Entry  and  Edi,ng  Tool  Components  of  the  eagle-­‐i  annotaTon  tool,  known  by  the  acronym  SWEET,  are  generated  directly  from  the  eagle-­‐i  ontology.  The  SWEET  contains  both  annotaTon  fields  that  are  auto-­‐populated  using  the  ontology  (purple  box)  and  free  text  (orange  box).  Entrez  Gene  ID  links  out  to  the  NCBI  database  (red  box).  Fields  in  the  SWEET  can  also  link  records  to  other  records  in  the  repository,  such  as  related  publicaTons  or  documentaTon  (blue  box).  Users  can  request  new  terms  be  added  to  the  ontology  using  the  Term  Request  field.      

Ontological  modeling  of  research  resources  

Data  Cura,on  at  eagle-­‐i  

Development  of  data  curaTon  pracTces  at  eagle-­‐i  depended  on  the  Resource  NavigaTon  team  for  data  collecTon,  the  CuraTon  team  for  ontology  development  and  data  QA,  and  the  SoWware  team  for  user  interface  design  in  an  iteraTve  process.  Tools  and  documentaTon  were  developed  to  assist  users  and  team  members  with  each  of  these  processes.  

Lessons  Learned  • Balance  the  data  you  need  with  the  data  you  can  get  • Documenta,on  and  quality  assurance  are  itera,ve  • Tools  and  technology  choices  depend  on  the  above  

Denotes  required  annotaTons.      

Denotes  quesTons  eliciTng  informaTon  for  annotaTon.  

Denotes  redirecTon  to    a  different  decision  tree.      

Denotes  higher  value/priority  annotaTons.      Denotes  medium  value/priority  annotaTons.      Denotes  lower  value/priority  annotaTons.      

Denotes  drop  down  or  annotaTon  field  examples.  

Decision  trees  assist  with  data  entry    and  annota,on  of  resources  

The  Ideal  Scholarly  Research  Cycle    

During  the  course  of  collecTng  informaTon  about  research  resources,  which  many  laboratories  were  willing  to  share,  we  discovered  that  while  larger  core  faciliTes  rouTnely  have  resource  and  workflow  organizaTon  strategies,  primary  research  labs  very  rarely  do.  This  creates  barriers  to  reproducing  experiments  as  well  as  to  publishing  and  sharing  resources.  Giving  labs  organizaTonal  tools  can  help  address  these  issues.  

Provide  scien,sts  with  the  tools  they  need    to  record  their  resources  during  the  course  of  research  

 

How  can  we  make  this  cycle  more  efficient?    

o  Researchers  produce  data  and  resources  that  lead  to  publicaTons.    

o  Published  data  informs  researchers  of  new  experimental  designs.    

o  InformaTon  about  researchers,  resources,    data,  and  published  papers  is  stored  in  various  public  repositories.  

The  goal  of  eagle-­‐i  is  to  make  scienTfic  research  resources  more  visible  via  a  federated  network  of  insTtuTonal  repositories.  Using  an  ontology-­‐driven  approach  for  biomedical  resource  annotaTon  and  discovery,  the  Network  currently  includes  resources  from  23  insTtuTons.  

New  ini,a,ves  with  eagle-­‐i  NCATS  has  funded  two  new  projects  that  leverage  eagle-­‐i  to  further  translaTonal  science.  The  first  project  aims  to  expand  the  breadth,  quality,  and  discoverability  of  data  about  people  and  resources  by  harmonizing  the  ontologies  of  VIVO,  eagle-­‐i,  and  ShareCenter  (www.ctsaconnect.org).  The  second  project  aims  to  expand  the  eagle-­‐i  plakorm  to  new  CTSA  insTtuTons,  and  to  publish  resources  as  Linked  Open  Data.  

BiocuraTon  

Data  collecTon  

User  interface  design  

Ontology  development  

CuraTon  guidelines  

SPARQL  query  tool  for  QA  

Ontology  Browser  

SWEET   Search  applicaTon  

Decision  trees  

Google  code  

The  eagle-­‐i  workflow  

Search  applicaTon  

AnnotaTon  tool  

InsTtuTonal  repositories  

Biocurator   Ontology  Reques

t  new  terms  

Request  resources  

eagle-­‐i  parTcipaTng  lab  

Researcher  

Resources    and  data  

Researcher Publica,ons  

Public  repositories  •  eagle-­‐i  •  MODs  •  NIF  •  Entrez  Gene...  

Public  repositories  •  PubMed  •  Google  Scholar  •  Mendeley…  

Professional    networking:  •  VIVO  •  Harvard  Profiles  •  LinkedIn…  

1  

3  

2  

Major  eagle-­‐i  resource  types  are  shown  as  dark  boxes.  Persons  and  laboratories  play  a  central  role  in  eagle-­‐i.    Classes  and  properTes  are  reused  from  pre-­‐exisTng  ontologies  or  created  de  novo.  Examples  of  some  of  the  relaTons  between  the  classes  are  indicated.