ChemSpider Presentation At University Of Toronto
-
Upload
orcid-0000-0002-2668-4821 -
Category
Technology
-
view
1.747 -
download
2
description
Transcript of ChemSpider Presentation At University Of Toronto
ChemSpider: Building the Premier Online Resource for Chemists University of Toronto June 8th 2010
Overview
The status of chemistry online today The pragmatic vision of ChemSpider The Quality of online chemistry Linking together the internet using InChIs Citizen scientists for deposition and curation ChemSpider as a multimedia container Comparing ChemSpider, Reaxys and SciFinder
Where is chemistry online? Encyclopedic articles (Wikipedia) Chemical vendor databases Metabolic pathway databases Property databases Patents with chemical structures Drug Discovery data Scientific publications Compound aggregators Blogs/Wikis and Open Notebook Science
Chemistry on the Internet TODAY
Chemistry searches are generally limited to text-based searches across the internet
Data are dirty: sorting the wheat from the chaff. Who can you trust?
Too many searches required to resource data
media.obsessable.com
As few interfaces as possible
What do humans want?
A Pragmatic Vision
“Build a Structure Centric Community”
December 2006 – A hobby project initiated to connect chemistry on the web
Integrate chemical structure data on the web Create a “structure-based hub” to information and
data Provide access to structure-based “algorithms” Let chemists contribute their own data Allow the community to curate/correct data
ChemSpider Searches
Search Cholesterol
Search Cholesterol
Search Cholesterol
Search Cholesterol
Search Cholesterol
Linked across the internet
Kyoto Encyclopedia of Genes and Genomes
Links to Patents based on structure
Articles Linked
ChemSpider Complex Searches
Link off a structure in ChemSpider
Chemical suppliers Other publications Analytical Data Related Reactions Wikipedia Patents “Everything”
Answering Questions for Chemists Questions a chemist might ask…
What is the melting point of n-butanol? What is the chemical structure of Xanax? Chemically, what is phenolphthalein? What are the stereocenters of cholesterol? Where can I find publications about xylene? What are the different trade names for Ketoconazole? What is the NMR spectrum of Aspirin? What are the safety handling issues for Thymol Blue?
What is a compound?
ChemSpider is a structure-centric hub
ChemSpider aggregates and links out across the internet
Data aggregate based on “structures and links”
What defines a chemical compound?
Linked Data on the Web
Taken from: Rafael Sidis’ Blog
Where Would You look? What Do You Trust?
Chemistry on The Internet Is Messy
It’s Methane…
What’s Methane?
What’s Methane?
What ELSE is Methane???
PubChem
Chemistry is REALLY Messy
Vancomycin
Who will curate?
How would you clean such a large dataset?
Assertions!!!
Vancomycin on ChemSpider 1 compound – 3 days
The EXPERTS must get it right?!
Wikipedia, C&E News, PubChem C&E News (from ACS)
The InChI Identifier
Multiple Layers
InChIStrings Hash to InChIKeys
Vancomycin – Search the Internet
Full Molecule Search: 4 Hits
Full Skeleton Search: 104 Hits
Citizen Scientists
Crowd-sourcing Chemistry Curation
Crowd-sourced curation: identify/tag errors, edit names, synonyms, identify records to deprecate
Citizens as Data Sources
Semantic Markup: Project Prospect
Entity-Extraction, Mark-up, Annotate
Success Depends on Dictionaries
Semantic Linking of Structures
What would you want to link off a structure? Chemical suppliers Other publications Analytical Data Related Reactions Wikipedia Patents “Everything”
Unpublished Chemistry
Only a fraction of chemistry is published
Only a tiny fraction of chemistry is patented
What of the “Lost Chemistry”- never published and cannot be abstracted Reactions performed Structures made and studied Spectra acquired and then disposed of Available chemicals never found
Org Prep Daily (Blog)
ChemSpider SyntheticPages
Submission Process
Submissions reviewed by editorial board
Published as is or comments sent to author
Online Peer Review process
Data supported include web movies, images, live spectra etc.
Micro- and Nano-publications Blogs, wiki entries and even Amazon book reviews
are micro/nano-publications
ChemSpider SyntheticPages will be DOI’ed – students can add these “micro-publications” to their resume
Structures and spectra are nano-publications – these can be tracked and referenced also. (depositions, curations etc). Students participate in building one of the premier sources of chemistry data.
ChemSpider Everywhere:What do computers want?
Web services
flickr.com/photos/microcosmos
ChemSpider Everywhere: ChemMobi
Mobile ChemSpider
Multimedia Content Holder
Periodic Table Images
CAS SciFinder
reaxys
Differences between ChemSpider, Reaxys and SciFinder Everything on Reaxys and Scifinder is curated The data resources can be over a 100 years old The platforms are commercial and “read-only”
ChemSpider is free, to everyone Data are in a state of ongoing curation & annotation Data resources are from the “electronic era” Data are expanded daily and enhanced on an
ongoing basis The platform delivers integrated algorithm access
Community Contribution
We make a bigger contribution to the community if the community shares via ChemSpider
ChemSpider wins “Communitycontribution” best practice award”
How Can You Help ChemSpider?
Encourage students to deposit their data and share with the community Structures – one or many Spectra Links Syntheses into ChemSpider SyntheticPages
Spread the word – ChemSpider is an untapped resource
Chemistry on the Internet FUTURE The semantic web for chemistry is in place Crowdsourced contributions are commonplace Chemists will search by structure/substructure Chemistry articles indexed and searchable Reduced number of searches to find data Data are integrated – compounds, vendors,
syntheses, data, publications and patents A world of Open Access and Open Data
Thank you
[email protected]: ChemSpidermanwww.chemspider.com/blogSLIDES: www.slideshare.net/AntonyWilliams