Cshals Tech Talk

21
SciVerse Platform Embraces Semantic Applications Vishal Gupta [email protected] @Visha1Gupta Ari Tuchman [email protected]

description

Talk given at the CSHALS 2011 conference at Boston

Transcript of Cshals Tech Talk

Page 1: Cshals Tech Talk

SciVerse Platform Embraces Semantic Applications 

Vishal [email protected]@Visha1Gupta

Ari [email protected]

Page 2: Cshals Tech Talk

Researchers are looking for tools to help them find, integrate, and re‐use content

“Scientific innovation depends onusing the 

Scientific innovation depends on finding, integrating, and re‐using the products of previous research... Our semantic enhancements led to the creation of a whole “ecosystem” of articles, documents, spreadsheets, data fusions related to that original work...frictionless interoperability between papers and datasets is highly desirable.”

D. et al, Adventures in Semantic Publishing: Shotton D. et al, Adventures in Semantic Publishing: Exemplar Semantic Enhancements of a Research Article. PLoS Comput Biol 5(4)

Page 3: Cshals Tech Talk

Elsevier’s SciVerse platform delivers the right information and tools, at the right time, g

ScienceDirectScienceDirect ScopusScopusAPPAPP

APP pp

APPAPP

APP

10 MILLION FULL TEXT ARTICLES

15 THOUSANDE‐BOOKS

41 MILLIONABSTRACTS

Web/Third Party ContentWeb/Third Party Content SciTopicsSciTopics

300 MILLION 13 SOURCES

APPAPP

APP

APP PAGESWRITTEN BY SCIENTIFIC 

EXPERTS ONLY18 INSTITUTIONSINTL REPOSITORY

23 MILLIONPATENT FILES

WEB PAGES SOCIETY

SHARE YOUR KNOWLEDGE AND LEARN FROM 

OTHERSAPPAPP APP

EXPERTS ONLYINTL. REPOSITORY PATENT FILES OTHERS

Page 4: Cshals Tech Talk

The Developer Network beta provides developers access to resources and community

Page 5: Cshals Tech Talk

Developers can access content APIs and download SDKs to get started…

Page 6: Cshals Tech Talk

…with a variety of options, depending on the use case

Page 7: Cshals Tech Talk

Some of our APIs beyond the framework

Page 8: Cshals Tech Talk

SciVerse Applications beta to provide access to applications developed by the scientific community

Page 9: Cshals Tech Talk

SciVerse Applications gallery will make available a constantly expanding universe of applications

Page 10: Cshals Tech Talk

NCBO, Stanford semantic search app

Page 11: Cshals Tech Talk

Trusted academic and corporate partners are already beginning to drive innovation

Page 12: Cshals Tech Talk

Understanding by visualizing big picture context –

trends, clusters, correlations

.... and providing tools to rapidly drill down on high‐value details.

Page 13: Cshals Tech Talk

Search for Warfarin (Coumadin)

• Identify salient data types for analyzing query topic {Phrase, gram/sec, Molar…}• Aggregate data values of learned type.U t d i l t d h

• Identify salient data types for analyzing query topic {Phrase, gram/sec, Molar…}• Aggregate data values of learned type.U t d i l t d h• Uncover trends in related phrases.• Uncover trends in related phrases.

Page 14: Cshals Tech Talk

Examine Landscape of Dosages

• Visualize landscape of numeric data.• Intuitive displays of clusters, standard values, and ranges.  •Recognize whitespace and zoom in to desired parameter space

• Visualize landscape of numeric data.• Intuitive displays of clusters, standard values, and ranges.  •Recognize whitespace and zoom in to desired parameter space•Recognize whitespace and zoom in to desired parameter space.•Recognize whitespace and zoom in to desired parameter space.

Page 15: Cshals Tech Talk

Focus on High‐Impact Patient Studies

• Filter clinical studies by patient number.• Identify clusters and correlations between axes• Filter clinical studies by patient number.• Identify clusters and correlations between axes• Identify clusters and correlations between axes.• Identify clusters and correlations between axes.

Page 16: Cshals Tech Talk

Correlate  Warfarin Dosages with Individual Genes

• Extract linkages between specific genes and optimum dosages.• Extract linkages between specific genes and optimum dosages.

Page 17: Cshals Tech Talk

Timeline of Genes Associated with Warfarin

Genes listed in orderof most recent appearancein corpus.

• Chart all genes in the literature that are investigated in the context of Warfarin  S t b fi t/l t d t i th

• Chart all genes in the literature that are investigated in the context of Warfarin  S t b fi t/l t d t i th

DATE

• Sort by fist/last appearance date in the corpus.• Analyze genes by degree of correlation (“friend‐of‐friend” genes).• Sort by fist/last appearance date in the corpus.• Analyze genes by degree of correlation (“friend‐of‐friend” genes).

Page 18: Cshals Tech Talk

In Summaryy

• Discover: 

Quantifind App on SciVerse Applications Marketplace

D l• Develop: 

Start Building an App for your institution on the Developer Network | Elsevier|

• Share:

Vishal [email protected]

Ari [email protected]

@Visha1Gupta

Page 19: Cshals Tech Talk

Forces that shape research 

Government Policies

l b lLEANTechnology Global 

Competition

LEANRESEARCH

Trend exacerbated by economic downturnWorkflow Inefficiencies

Page 20: Cshals Tech Talk

Web of linked data and knowledge outside the formal literature is growing exponentiallyg g p y

BBC PaycountData

BBC PaycountData

FlickrFlickr

LIBRISLIBRISRDF ohlohRDF ohloh

BBC MusicBBC Music

MySpace WrapperMySpace Wrapper

Audio‐ScrobblerAudio‐

Scrobbler

Music‐brainzMusic‐brainz

FOAF profilesFOAF profiles

cexporter

cexporter

ACMACM

DBLP RKB DBLP RKB 

RAE 2001RAE 2001

National Science 

Foundation

National Science 

Foundation

RDF Book MashupRDF Book Mashup

rieseriese

BBCProgrammes

BBCProgrammes

Geo‐namesGeo‐names

ProjectGuten‐berg

ProjectGuten‐berg

Open CalaisOpen Calais

ExplorerExplorereprintseprints

Virtuoso SpongerVirtuoso Sponger CORDISCORDIS

DBpediaDBpedia

Linked GeoDataLinked GeoData

US Census Data

US Census Data

W3CWorldNet

W3CWorldNet

LinkedCTLinkedCT

DBLP BerlinDBLP Berlin UniParcUniParc

FreebaseFreebase

ReactomeReactome

DBLP HanoverDBLP 

Hanover

TaxonomyTaxonomyPROSITEPROSITE

CiteSEERCiteSEER

UniRefUniRef

Homolo Gene

Homolo Gene

GeneIDGeneID

ChEBIChEBIGene

OntologyGene

Ontology

KEGGKEGG

OMIMOMIM

UniProtUniProt

PROSITEPROSITE

PfamPfam ProDomProDomCASCAS

Source: Richard Cyganiak and Anja Jentzsch – linkeddata.org (as of July 2009)

UniSTSUniSTSHGNCHGNC MGIMGI

PubMed

PubMed

PDBPDB

Page 21: Cshals Tech Talk

Linked data is also clouded by spam

BBC PaycountData

BBC PaycountData

FlickrFlickr

LIBRISLIBRISRDF ohlohRDF ohlohSPAMSPAM

SPAMSPAM

BBC MusicBBC Music

MySpace WrapperMySpace Wrapper

Audio‐ScrobblerAudio‐

Scrobbler

Music‐brainzMusic‐brainz

FOAF profilesFOAF profiles

cexporter

cexporter

ACMACM

DBLP RKB DBLP RKB 

RAE 2001RAE 2001

National Science 

Foundation

National Science 

Foundation

SPAMSPAMSPAMSPAM

SPAMSPAMSPAMSPAM

SPAMSPAM

“Approximately 90% of the 10 billion pages that will be added to the web over the next year to be spam The massive flooding of the web with

RDF Book MashupRDF Book Mashup

rieseriese

BBCProgrammes

BBCProgrammes

Geo‐namesGeo‐names

ProjectGuten‐berg

ProjectGuten‐berg

Open CalaisOpen Calais

ExplorerExplorereprintseprints

Virtuoso SpongerVirtuoso Sponger CORDISCORDIS

SPAMSPAM

SPAMSPAMSPAMSPAM

SPAMSPAM

SPAMSPAM

spam… The massive flooding of the web with endless copies and permutations and shadows of existing things is what is pulling the rug out from under link‐based search rankings... Links don't 

DBpediaDBpedia

Linked GeoDataLinked GeoData

US Census Data

US Census Data

W3CWorldNet

W3CWorldNet

LinkedCTLinkedCT

DBLP BerlinDBLP Berlin UniParcUniParc

FreebaseFreebase

ReactomeReactome

DBLP HanoverDBLP 

Hanover

TaxonomyTaxonomyPROSITEPROSITE

CiteSEERCiteSEER

UniRefUniRef

SPAMSPAM

SPAMSPAMSPAMSPAM

SPAMSPAMSPAMSPAM

grepresent a human voting on the quality of a site anymore.”

Ri h Sk t CEO f Bl kk

Homolo Gene

Homolo Gene

GeneIDGeneID

ChEBIChEBIGene

OntologyGene

Ontology

KEGGKEGG

OMIMOMIM

UniProtUniProt

PROSITEPROSITE

PfamPfam ProDomProDomCASCASSPAMSPAM

SPAMSPAM

SPAMSPAMSPAMSPAM

Rich Skrenta, CEO of BlekkoThe "Useless Garbage" Of The Web, Jan 10, 2011

UniSTSUniSTSHGNCHGNC MGIMGI

PubMed

PubMed

PDBPDBSPAMSPAM