VIVO: Sharing Data for Research Discovery Mike Conlon University of Florida [email protected].

19
VIVO: Sharing Data for Research Discovery Mike Conlon University of Florida [email protected]

Transcript of VIVO: Sharing Data for Research Discovery Mike Conlon University of Florida [email protected].

Page 1: VIVO: Sharing Data for Research Discovery Mike Conlon University of Florida mconlon@ufl.edu.

VIVO: Sharing Data for Research

Discovery

Mike ConlonUniversity of Florida

[email protected]

Page 2: VIVO: Sharing Data for Research Discovery Mike Conlon University of Florida mconlon@ufl.edu.

Public, structured linked data about investigators interests, activities and accomplishments, and toolsto use thatdata to advancescience

Page 3: VIVO: Sharing Data for Research Discovery Mike Conlon University of Florida mconlon@ufl.edu.
Page 4: VIVO: Sharing Data for Research Discovery Mike Conlon University of Florida mconlon@ufl.edu.

VIVO Searchlight

Page 5: VIVO: Sharing Data for Research Discovery Mike Conlon University of Florida mconlon@ufl.edu.
Page 6: VIVO: Sharing Data for Research Discovery Mike Conlon University of Florida mconlon@ufl.edu.
Page 7: VIVO: Sharing Data for Research Discovery Mike Conlon University of Florida mconlon@ufl.edu.

Data Production

Page 8: VIVO: Sharing Data for Research Discovery Mike Conlon University of Florida mconlon@ufl.edu.

Producing Data

Page 9: VIVO: Sharing Data for Research Discovery Mike Conlon University of Florida mconlon@ufl.edu.

processOrg<-function(uri){ x<-xmlParse(uri) u<-NULL name<-xmlValue(getNodeSet(x,"//rdfs:label")[[1]]) subs<-getNodeSet(x,"//j.1:hasSubOrganization") if(length(subs)==0) list(name=name,subs=NULL) else { for(i in 1:length(subs)){ sub.uri<-getURI(xmlAttrs(subs[[i]])["resource"]) u<-c(u,processOrg(sub.uri)) } list(name=name,subs=u) }}

VIVO produces human and machine

readable formats

Software reads RDF from VIVO

and displays

Page 10: VIVO: Sharing Data for Research Discovery Mike Conlon University of Florida mconlon@ufl.edu.

Data Sharing

Photograph by J. G. Park. Flicker.com Photograph by Ell Brown Flicker.com

Page 11: VIVO: Sharing Data for Research Discovery Mike Conlon University of Florida mconlon@ufl.edu.

Information is stored using the Resource Description

Framework (RDF) as subject-predicate-object “triples”

Jane Smith

professor in

author ofhas affiliation with

Dept. of Genetic

s College of

Medicine

Journal article

Book chapte

r

Book

Genetics Institute

Subject Predicate Object

A Web of Data – The Semantic Web

Page 12: VIVO: Sharing Data for Research Discovery Mike Conlon University of Florida mconlon@ufl.edu.
Page 13: VIVO: Sharing Data for Research Discovery Mike Conlon University of Florida mconlon@ufl.edu.

The Role of the Archive• Collate data, final semantics, ready for

consumption

Data Archive

Page 14: VIVO: Sharing Data for Research Discovery Mike Conlon University of Florida mconlon@ufl.edu.

Institutions record activities, interests, accomplishments

Page 15: VIVO: Sharing Data for Research Discovery Mike Conlon University of Florida mconlon@ufl.edu.

Data, Tools and Scientists

Page 16: VIVO: Sharing Data for Research Discovery Mike Conlon University of Florida mconlon@ufl.edu.

Data Consumption

Photograph by ScoopMedia. Flicker.com

Photograph by Janet Tarbox. Flicker.com

Page 17: VIVO: Sharing Data for Research Discovery Mike Conlon University of Florida mconlon@ufl.edu.

A Consumption ScenarioFind all faculty members whose genetic work

is implicated in breast cancer

VIVO will store information about faculty and associate to genes. Diseaseome associates genes to diseases.

Query resolves across VIVO and data sources it links to.

Page 18: VIVO: Sharing Data for Research Discovery Mike Conlon University of Florida mconlon@ufl.edu.

Data ReasoningData integration continues to be a serious bottleneck for the expectations of increased productivity in the pharmaceutical and biotechnology domain.

“Linked Life Data” integrates common public datasets that describe the relationships between gene, protein, interaction, pathway, target, drug, disease and patient and currently consist of more than 5 billion RDF statements.

The dataset interconnects more than 20 complete data sources and helps to understand the “bigger picture” of a research problem by linking previously unrelated data from heterogeneous knowledge.

From the LarKC (Large Knowledge Collider) http://www.larkc.eu/overview/

Page 19: VIVO: Sharing Data for Research Discovery Mike Conlon University of Florida mconlon@ufl.edu.

http://vivo.ufl.edu/individual/mconlon