The rsc e science - reflecting the change in the world we live in
-
Upload
valery-tkachenko -
Category
Technology
-
view
382 -
download
3
Transcript of The rsc e science - reflecting the change in the world we live in
The RSC & e-Science:Reflecting the Change in the World we Live In
Valery Tkachenko
RSC-OSDD Consultative Workshop on Cheminformatics
Delhi, September 28th 2013
Royal Society of Chemistry and Global Chemistry Network
The World we live in
Internet World20+ years into the Internet RevolutionWeb 2.0 -> Web 3.0
Connected WorldSocial NetworksReal-time Communications
Big Data WorldSemantic contentNew Interfaces
Pillars of the World
DataData (knowledge) is a KingDataflow
NavigationDomain-specific search and navigationNavigate inside and link out - federation
InterfacesHCI (human computer interface)M2M (machine to machine)
Science map
Chemical sciences map
Chemistry on the Internet
What’s wrong?!?!
Complexity
Royal Society of Chemistry and Global Chemistry Network
Knowledgebases and delivery systems
Big Data challenge
Crowdsourcing and altmetrics
New interfaces
Knowledgebases and delivery systems
Big Data challenge
Crowdsourcing and altmetrics
New interfaces
50000ft view at STM publisher
Knowledge
Our User Interfaces(Desktop, Web, Mobile, etc)
Customers
Delivery Magic
3rd party integrations(our web services)
ChemSpider Suite
Data Layer
ChemSpider Assays
ChemSpider Compounds
ChemSpider Reactions
ChemSpider Spectra
ChemSpider Materials
ChemSpider Algorithms
Business Objects Layer
CSAs BOCSC BO CSR BO CSS BO CSM BO CSA BO
APIs Layer
DS APIExport APISearch API Processing API
CSAs APICSC API CSR API CSS API CSM API CSA API
Components Layer
JS Components Google AppsComponents
Python widgets
SharePointComponents
PHP snippets
ASP.NET Components
UIs
ChemSpider website
ChemSpider Reactions
mobile web app
ChemSpider desktop app
Depositions client
Java Beans
• 29 million chemicals and growing
• Data sourced from >500 different sources
• Crowdsourced curation and annotation
• Ongoing deposition of data from our journals and our collaborators
• A structure centric hub for web-searching
ChemSpider and Atovaquone
ChemSpider and Atovaquone
ChemSpider and Atovaquone
ChemSpider and Atovaquone
ChemSpider and Atovaquone
ChemSpider and Atovaquone
ChemSpider and Atovaquone
ChemSpider and Atovaquone
ChemSpider and Atovaquone
ChemSpider and Atovaquone
ChemSpider and Atovaquone
ChemSpider and Atovaquone
Micropublishing
Micropublishing
Micropublishing
ChemSpider Reactions
ChemSpider Reactions
Knowledge in our own archives
DERA and Text Mining
The N-(β-hydroxyethyl)-N-methyl-N'-(2-trifluoromethyl-1,3,4-thiadiazol-5-yl)urea prepared in Example 6, thionyl chloride ( 5 ml ) and benzene ( 50 ml ) were charged into a glass reaction vessel equipped with a mechanical stirrer, thermometer and reflux condenser .
The reaction mixture was heated at reflux with stirring, for a period of about one-half hour .
After this time the benzene and unreacted thionyl chloride were stripped from the reaction mixture under reduced pressure to yield the desired product N-(β-chloroethyl)-N-methyl-N'-(2-trifluoromethyl-1,3,4-thiaidazol-5-yl)urea as a solid residue
Text Mining
The N-(β-hydroxyethyl)-N-methyl-N'-(2-trifluoromethyl-1,3,4-thiadiazol-5-yl)urea prepared in Example 6 , thionyl chloride ( 5 ml ) and benzene ( 50 ml ) were charged into a glass reaction vessel equipped with a mechanical stirrer , thermometer and reflux condenser .
The reaction mixture was heated at reflux with stirring , for a period of about one-half hour .
After this time the benzene and unreacted thionyl chloride were stripped from the reaction mixture under reduced pressure to yield the desired product N-(β-chloroethyl)-N-methyl-N'-(2-trifluoromethyl-1,3,4-thiaidazol-5-yl)urea as a solid residue
It is so difficult to navigate…
What’s the structure?What’s the structure?
Are they in our file?
Are they in our file?
What’s similar?What’s
similar?
What’s the target?
What’s the target?Pharmacology
data?Pharmacology
data?
Known Pathways?
Known Pathways?
Working On Now?
Working On Now?Connections
to disease?Connections to disease?
Expressed in right cell type?Expressed in
right cell type?
Competitors?Competitors?
IP?IP?
Digitally Enabling RSC Archive
Text, PDF, XML
Structures
Reactions
Spectra
Materials
Chemistry Validation andStandardization Platform
(CVSP)
DERA(Text Mining)
Biological Activities
Data quality issue and CVSP
Robochemistry
Proliferation of errors in public and private databases
Automated quality control system
ChemSpider issues
DrugBank dataset (6516 records)
~60 records that can’t be dearomatized unambiguously
DB04283 DB04462
~30 records with bonds that do not make sense
DB04283
DDB04009
2 records where Smiles, InChI, and name did not match the structure
DB00611 DB01547
~40 records where InChIs did not match the structure
DrugBank ID: DB00755InChI=1S/C20H28O2/c1-15(8-6-9-16(2)14-19(21)22)11-12-18-17(3)10-7-13-20(18,4)5/h6,8-9,11-12,14H,7,10,13H2,1-5H3,(H,21,22)/b9-6+,12-11+,15-8+,16-14+
DruGBank ID: DB00614
DB08128
J. Brechner, IUPACGraphical Representation of stereochem. configurationsSection: ST-1.1.10
DB06287
7 records with 2 stereo bonds at chiral atoms
CVSP validation of ChEMBL 16 (~1.3 mln. records)
• Overall 0.7% of records had validation issues
• Stereo problems (~82%)• Directions of bonds do not make sense (~63%)• Ambiguous stereo : 2 stereo bonds at chiral center (~19%)
“Direction of bond makes no sense” – 63%
“Stereo types of the opposite bonds mismatch” -15%
http://www.iupac.org/publications/pac/2006/pdf/7810x1897.pdf
“Stereo types of non-opposite bonds match” – 2%
“atom not recognized” – 3% isotopes
Should be atom from periodic table
No mass difference in atom line
No “M ISO” in connection table
In molfile:
ChemSpider Suite
Data Layer
ChemSpider Assays
ChemSpider Compounds
ChemSpider Reactions
ChemSpider Spectra
ChemSpider Materials
ChemSpider Algorithms
Business Objects Layer
CSAs BOCSC BO CSR BO CSS BO CSM BO CSA BO
APIs Layer
DS APIExport APISearch API Processing API
CSAs APICSC API CSR API CSS API CSM API CSA API
Components Layer
JS Components Google AppsComponents
Python widgets
SharePointComponents
PHP snippets
ASP.NET Components
UIs
ChemSpider website
ChemSpider Reactions
mobile web app
ChemSpider desktop app
Depositions client
Java Beans
Knowledgebases and delivery systems
Big Data challenge
Crowdsourcing and altmetrics
New interfaces
Started with 2 servers in a basement
Presently – two farms ~40 servers each
Future – in the Clouds
Compute intensive calculations
Delivery systems
Knowledgebases and delivery systems
Big Data challenge
Crowdsourcing and altmetrics
New interfaces
AltMetrics
Curation in ChemSpider
Knowledgebases and delivery systems
Big Data challenge
Crowdsourcing and altmetrics
New interfaces
Visualization
Navigation
ChemSpider APIs
We are a part of a larger world
National Chemistry Database
National Data Repository
University 1
Data Hub
Workstations
University 2
Data Hub
Workstations
Company 3
Data Hub
Workstations
Data Repositoryindexed storage
Data Repository provideddata storage
Chemically intelligent services
Indexes
Data
External clients Publishers
Scientists Funding bodies
http://www.openphacts.org
Open PHACTS is an Innovative Medicines Initiative (IMI) project, aiming to reduce the barriers to
drug discovery in industry, academia and for small
businesses.
Semantic web is one of the corner stones
What does e-Science do in
?ChemSpider provides many of the physicochemical properties within the Open PHACTS Discovery Platform
e-Science develop tools to check and standardise chemical structures
•
•
e-Science is creating the Open PHACTS chemical registration system
•
RDF Export
Data:ChEMBLHMDB
DrugBankChemistry Validation and Standardization Platform (CVSP)
at cvsp.chemspider.com•Validation•Standardization•Parent generation•Run on Hadoop-based farm
We know about Natural Products
Marinlit
OSDD
The Global Chemistry Network