The rsc e science - reflecting the change in the world we live in

70
The RSC & e-Science: Reflecting the Change in the World we Live In Valery Tkachenko RSC-OSDD Consultative Workshop on Cheminformatics Delhi, September 28 th 2013

Transcript of The rsc e science - reflecting the change in the world we live in

Page 1: The rsc e science - reflecting the change in the world we live in

The RSC & e-Science:Reflecting the Change in the World we Live In

Valery Tkachenko

RSC-OSDD Consultative Workshop on Cheminformatics

Delhi, September 28th 2013

Page 2: The rsc e science - reflecting the change in the world we live in

Royal Society of Chemistry and Global Chemistry Network

Page 3: The rsc e science - reflecting the change in the world we live in

The World we live in

Internet World20+ years into the Internet RevolutionWeb 2.0 -> Web 3.0

Connected WorldSocial NetworksReal-time Communications

Big Data WorldSemantic contentNew Interfaces

Page 4: The rsc e science - reflecting the change in the world we live in

Pillars of the World

DataData (knowledge) is a KingDataflow

NavigationDomain-specific search and navigationNavigate inside and link out - federation

InterfacesHCI (human computer interface)M2M (machine to machine)

Page 5: The rsc e science - reflecting the change in the world we live in

Science map

Page 6: The rsc e science - reflecting the change in the world we live in

Chemical sciences map

Page 7: The rsc e science - reflecting the change in the world we live in

Chemistry on the Internet

Page 8: The rsc e science - reflecting the change in the world we live in

What’s wrong?!?!

Complexity

Page 9: The rsc e science - reflecting the change in the world we live in

Royal Society of Chemistry and Global Chemistry Network

Page 10: The rsc e science - reflecting the change in the world we live in

Knowledgebases and delivery systems

Big Data challenge

Crowdsourcing and altmetrics

New interfaces

Page 11: The rsc e science - reflecting the change in the world we live in

Knowledgebases and delivery systems

Big Data challenge

Crowdsourcing and altmetrics

New interfaces

Page 12: The rsc e science - reflecting the change in the world we live in

50000ft view at STM publisher

Knowledge

Our User Interfaces(Desktop, Web, Mobile, etc)

Customers

Delivery Magic

3rd party integrations(our web services)

Page 13: The rsc e science - reflecting the change in the world we live in

ChemSpider Suite

Data Layer

ChemSpider Assays

ChemSpider Compounds

ChemSpider Reactions

ChemSpider Spectra

ChemSpider Materials

ChemSpider Algorithms

Business Objects Layer

CSAs BOCSC BO CSR BO CSS BO CSM BO CSA BO

APIs Layer

DS APIExport APISearch API Processing API

CSAs APICSC API CSR API CSS API CSM API CSA API

Components Layer

JS Components Google AppsComponents

Python widgets

SharePointComponents

PHP snippets

ASP.NET Components

UIs

ChemSpider website

ChemSpider Reactions

mobile web app

ChemSpider desktop app

Depositions client

Java Beans

Page 14: The rsc e science - reflecting the change in the world we live in

• 29 million chemicals and growing

• Data sourced from >500 different sources

• Crowdsourced curation and annotation

• Ongoing deposition of data from our journals and our collaborators

• A structure centric hub for web-searching

Page 15: The rsc e science - reflecting the change in the world we live in

ChemSpider and Atovaquone

Page 16: The rsc e science - reflecting the change in the world we live in

ChemSpider and Atovaquone

Page 17: The rsc e science - reflecting the change in the world we live in

ChemSpider and Atovaquone

Page 18: The rsc e science - reflecting the change in the world we live in

ChemSpider and Atovaquone

Page 19: The rsc e science - reflecting the change in the world we live in

ChemSpider and Atovaquone

Page 20: The rsc e science - reflecting the change in the world we live in

ChemSpider and Atovaquone

Page 21: The rsc e science - reflecting the change in the world we live in

ChemSpider and Atovaquone

Page 22: The rsc e science - reflecting the change in the world we live in

ChemSpider and Atovaquone

Page 23: The rsc e science - reflecting the change in the world we live in

ChemSpider and Atovaquone

Page 24: The rsc e science - reflecting the change in the world we live in

ChemSpider and Atovaquone

Page 25: The rsc e science - reflecting the change in the world we live in

ChemSpider and Atovaquone

Page 26: The rsc e science - reflecting the change in the world we live in

ChemSpider and Atovaquone

Page 27: The rsc e science - reflecting the change in the world we live in

Micropublishing

Page 28: The rsc e science - reflecting the change in the world we live in

Micropublishing

Page 29: The rsc e science - reflecting the change in the world we live in

Micropublishing

Page 30: The rsc e science - reflecting the change in the world we live in

ChemSpider Reactions

Page 31: The rsc e science - reflecting the change in the world we live in

ChemSpider Reactions

Page 32: The rsc e science - reflecting the change in the world we live in

Knowledge in our own archives

Page 33: The rsc e science - reflecting the change in the world we live in

DERA and Text Mining

The N-(β-hydroxyethyl)-N-methyl-N'-(2-trifluoromethyl-1,3,4-thiadiazol-5-yl)urea prepared in Example 6, thionyl chloride ( 5 ml ) and benzene ( 50 ml ) were charged into a glass reaction vessel equipped with a mechanical stirrer, thermometer and reflux condenser .

The reaction mixture was heated at reflux with stirring, for a period of about one-half hour .

After this time the benzene and unreacted thionyl chloride were stripped from the reaction mixture under reduced pressure to yield the desired product N-(β-chloroethyl)-N-methyl-N'-(2-trifluoromethyl-1,3,4-thiaidazol-5-yl)urea as a solid residue

Page 34: The rsc e science - reflecting the change in the world we live in

Text Mining

The N-(β-hydroxyethyl)-N-methyl-N'-(2-trifluoromethyl-1,3,4-thiadiazol-5-yl)urea prepared in Example 6 , thionyl chloride ( 5 ml ) and benzene ( 50 ml ) were charged into a glass reaction vessel equipped with a mechanical stirrer , thermometer and reflux condenser .

The reaction mixture was heated at reflux with stirring , for a period of about one-half hour .

After this time the benzene and unreacted thionyl chloride were stripped from the reaction mixture under reduced pressure to yield the desired product N-(β-chloroethyl)-N-methyl-N'-(2-trifluoromethyl-1,3,4-thiaidazol-5-yl)urea as a solid residue

Page 35: The rsc e science - reflecting the change in the world we live in

It is so difficult to navigate…

What’s the structure?What’s the structure?

Are they in our file?

Are they in our file?

What’s similar?What’s

similar?

What’s the target?

What’s the target?Pharmacology

data?Pharmacology

data?

Known Pathways?

Known Pathways?

Working On Now?

Working On Now?Connections

to disease?Connections to disease?

Expressed in right cell type?Expressed in

right cell type?

Competitors?Competitors?

IP?IP?

Page 36: The rsc e science - reflecting the change in the world we live in

Digitally Enabling RSC Archive

Text, PDF, XML

Structures

Reactions

Spectra

Materials

Chemistry Validation andStandardization Platform

(CVSP)

DERA(Text Mining)

Biological Activities

Page 37: The rsc e science - reflecting the change in the world we live in

Data quality issue and CVSP

Robochemistry

Proliferation of errors in public and private databases

Automated quality control system

Page 38: The rsc e science - reflecting the change in the world we live in

ChemSpider issues

Page 39: The rsc e science - reflecting the change in the world we live in

DrugBank dataset (6516 records)

~60 records that can’t be dearomatized unambiguously

DB04283 DB04462

Page 40: The rsc e science - reflecting the change in the world we live in

~30 records with bonds that do not make sense

DB04283

DDB04009

Page 41: The rsc e science - reflecting the change in the world we live in

2 records where Smiles, InChI, and name did not match the structure

DB00611 DB01547

Page 42: The rsc e science - reflecting the change in the world we live in

~40 records where InChIs did not match the structure

DrugBank ID: DB00755InChI=1S/C20H28O2/c1-15(8-6-9-16(2)14-19(21)22)11-12-18-17(3)10-7-13-20(18,4)5/h6,8-9,11-12,14H,7,10,13H2,1-5H3,(H,21,22)/b9-6+,12-11+,15-8+,16-14+

DruGBank ID: DB00614

Page 43: The rsc e science - reflecting the change in the world we live in

DB08128

J. Brechner, IUPACGraphical Representation of stereochem. configurationsSection: ST-1.1.10

DB06287

7 records with 2 stereo bonds at chiral atoms

Page 44: The rsc e science - reflecting the change in the world we live in

CVSP validation of ChEMBL 16 (~1.3 mln. records)

• Overall 0.7% of records had validation issues

• Stereo problems (~82%)• Directions of bonds do not make sense (~63%)• Ambiguous stereo : 2 stereo bonds at chiral center (~19%)

Page 45: The rsc e science - reflecting the change in the world we live in

“Direction of bond makes no sense” – 63%

Page 46: The rsc e science - reflecting the change in the world we live in

“Stereo types of the opposite bonds mismatch” -15%

http://www.iupac.org/publications/pac/2006/pdf/7810x1897.pdf

Page 47: The rsc e science - reflecting the change in the world we live in

“Stereo types of non-opposite bonds match” – 2%

Page 48: The rsc e science - reflecting the change in the world we live in

“atom not recognized” – 3% isotopes

Should be atom from periodic table

No mass difference in atom line

No “M ISO” in connection table

In molfile:

Page 49: The rsc e science - reflecting the change in the world we live in

ChemSpider Suite

Data Layer

ChemSpider Assays

ChemSpider Compounds

ChemSpider Reactions

ChemSpider Spectra

ChemSpider Materials

ChemSpider Algorithms

Business Objects Layer

CSAs BOCSC BO CSR BO CSS BO CSM BO CSA BO

APIs Layer

DS APIExport APISearch API Processing API

CSAs APICSC API CSR API CSS API CSM API CSA API

Components Layer

JS Components Google AppsComponents

Python widgets

SharePointComponents

PHP snippets

ASP.NET Components

UIs

ChemSpider website

ChemSpider Reactions

mobile web app

ChemSpider desktop app

Depositions client

Java Beans

Page 50: The rsc e science - reflecting the change in the world we live in

Knowledgebases and delivery systems

Big Data challenge

Crowdsourcing and altmetrics

New interfaces

Page 51: The rsc e science - reflecting the change in the world we live in

Started with 2 servers in a basement

Presently – two farms ~40 servers each

Future – in the Clouds

Page 52: The rsc e science - reflecting the change in the world we live in

Compute intensive calculations

Delivery systems

Page 53: The rsc e science - reflecting the change in the world we live in

Knowledgebases and delivery systems

Big Data challenge

Crowdsourcing and altmetrics

New interfaces

Page 54: The rsc e science - reflecting the change in the world we live in

AltMetrics

Page 55: The rsc e science - reflecting the change in the world we live in

Curation in ChemSpider

Page 56: The rsc e science - reflecting the change in the world we live in

Knowledgebases and delivery systems

Big Data challenge

Crowdsourcing and altmetrics

New interfaces

Page 57: The rsc e science - reflecting the change in the world we live in

Visualization

Page 58: The rsc e science - reflecting the change in the world we live in

Navigation

Page 59: The rsc e science - reflecting the change in the world we live in

ChemSpider APIs

Page 60: The rsc e science - reflecting the change in the world we live in

We are a part of a larger world

Page 61: The rsc e science - reflecting the change in the world we live in

National Chemistry Database

Page 62: The rsc e science - reflecting the change in the world we live in

National Data Repository

University 1

Data Hub

Workstations

University 2

Data Hub

Workstations

Company 3

Data Hub

Workstations

Data Repositoryindexed storage

Data Repository provideddata storage

Chemically intelligent services

Indexes

Data

External clients Publishers

Scientists Funding bodies

Page 63: The rsc e science - reflecting the change in the world we live in

http://www.openphacts.org

Open PHACTS is an Innovative Medicines Initiative (IMI) project, aiming to reduce the barriers to

drug discovery in industry, academia and for small

businesses.

Semantic web is one of the corner stones

Page 64: The rsc e science - reflecting the change in the world we live in

What does e-Science do in

?ChemSpider provides many of the physicochemical properties within the Open PHACTS Discovery Platform

e-Science develop tools to check and standardise chemical structures

e-Science is creating the Open PHACTS chemical registration system

Page 65: The rsc e science - reflecting the change in the world we live in

RDF Export

Data:ChEMBLHMDB

DrugBankChemistry Validation and Standardization Platform (CVSP)

at cvsp.chemspider.com•Validation•Standardization•Parent generation•Run on Hadoop-based farm

Page 66: The rsc e science - reflecting the change in the world we live in
Page 67: The rsc e science - reflecting the change in the world we live in

We know about Natural Products

Page 68: The rsc e science - reflecting the change in the world we live in

Marinlit

Page 69: The rsc e science - reflecting the change in the world we live in

OSDD

Page 70: The rsc e science - reflecting the change in the world we live in

The Global Chemistry Network