Linking RDF to cheminformatics and proteochemometrics

29
Linking Resource Description Framework to Cheminformatics and Proteochemometrics Egon Willighagen <http://chem-bla-ics.blogspot.com/> Bioclipse & Proteochemometric Group (Prof. Wikberg) Until 2010-09-30 Department of Pharmaceutical Biosciences Uppsala University 2010-08-22

description

My presentation at #acs_boston on 23 August 2010.

Transcript of Linking RDF to cheminformatics and proteochemometrics

Page 1: Linking RDF to cheminformatics and proteochemometrics

Linking Resource Description Frameworkto Cheminformatics and

Proteochemometrics

Egon Willighagen <http://chem-bla-ics.blogspot.com/>

Bioclipse & Proteochemometric Group (Prof. Wikberg)Until 2010-09-30

Department of Pharmaceutical Biosciences

Uppsala University

2010-08-22

Page 2: Linking RDF to cheminformatics and proteochemometrics

Problem

BuildingBlocks

Open Data

Application

Conclusion

Proteochemometrics

2010-08-22 Bioclipse & Proteochemometric Group - 2 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 3: Linking RDF to cheminformatics and proteochemometrics

Problem

BuildingBlocks

Open Data

Application

Conclusion

Data Analysis

2010-08-22 Bioclipse & Proteochemometric Group - 3 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 4: Linking RDF to cheminformatics and proteochemometrics

Problem

BuildingBlocks

Open Data

Application

Conclusion

Knowledge...

Solanum lycopersicum...

We model our world, but ...Life is not uni- or bivariateKnowledge is not eitherDifferent representations:compatible?Information Loss!

2010-08-22 Bioclipse & Proteochemometric Group - 4 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 5: Linking RDF to cheminformatics and proteochemometrics

Problem

BuildingBlocks

Open Data

Application

Conclusion

Names...

benzene3-[4-[3-(1-methyl-7-oxo-3-propyl-4H-pyrazolo[4,3-d]pyrimidin-5-yl)-4-propoxyphenyl]sulfonylpiperazin-1-yl]propanoicacid

InChI=1S/C25H34N6O6S/c1-4-6-19-22-23(29(3)28-19)25(34)27-24(26-22)18-16-17(7-8-20(18)37-15-5-2)38(35,36)31-13-11-30(12-14-31)10-9-21(32)33/h7-8,16H,4-6,9-15H2,1-3H3,(H,32,33)(H,26,27,34)

p450 (which one?? all residues known?)Solanum lycopersicum (well....)

2010-08-22 Bioclipse & Proteochemometric Group - 5 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 6: Linking RDF to cheminformatics and proteochemometrics

Problem

BuildingBlocks

Open Data

Application

Conclusion

... Molecular reality...

1 000 000 000 000 000 000 000 000000 000 000 000 000 000 000 000000 000 000 000

2010-08-22 Bioclipse & Proteochemometric Group - 6 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 7: Linking RDF to cheminformatics and proteochemometrics

Problem

BuildingBlocks

Open Data

Application

Conclusion

... and Numbers

2010-08-22 Bioclipse & Proteochemometric Group - 7 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 8: Linking RDF to cheminformatics and proteochemometrics

Problem

BuildingBlocks

Open Data

Application

Conclusion

Main Theme

How do we navigate this dimensions space?How to include prior knowledge?Minimize information loss?With optimal knowledge extraction?Maximizing interpretability?Without ending up in random correlation?

2010-08-22 Bioclipse & Proteochemometric Group - 8 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 9: Linking RDF to cheminformatics and proteochemometrics

Problem

BuildingBlocks

Open Data

Application

Conclusion

OpenMolecules RDF: dereferenceable URI

http://rdf.openmolecules.net/?InChI=1/CH4/h1H4

2010-08-22 Bioclipse & Proteochemometric Group - 9 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 10: Linking RDF to cheminformatics and proteochemometrics

Problem

BuildingBlocks

Open Data

Application

Conclusion

OpenMolecules RDF: linked data

2010-08-22 Bioclipse & Proteochemometric Group - 10 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 11: Linking RDF to cheminformatics and proteochemometrics

Problem

BuildingBlocks

Open Data

Application

Conclusion

The Chemistry Development Kit

A Family of ProjectsCDK-Taverna (chemoinformatics workflows)JChemPaint (semantic 2D editor)ChemoJava (GPL-ed extension)

Goalslibrary of cheminformatics algorithmseducational

UsageCDK: 100+ times cited in scientific literatureBioclipse, KNIME, Jumbo (CML), AMBIT, ...

C. Steinbeck et al., J.Chem.Inf.Comput.Sci, 2003C. Steinbeck et al., Curr.Pharm.Design, 2006

2010-08-22 Bioclipse & Proteochemometric Group - 11 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 12: Linking RDF to cheminformatics and proteochemometrics

Problem

BuildingBlocks

Open Data

Application

Conclusion

Bioclipse

O. Spjuth et al., BMC Bioinformatics 2007O. Spjuth et al., BMC Bioinformatics 2010

2010-08-22 Bioclipse & Proteochemometric Group - 12 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 13: Linking RDF to cheminformatics and proteochemometrics

Problem

BuildingBlocks

Open Data

Application

Conclusion

Bioclipse-RDF

local RDF storage (memory, on disk)read/write RDF/XML, N3run SPARQL queries (local and remote)extract RDF from XHTML/RDFa

Thanx to Open Source projects including Jena, SWI-Prolog,and Pellet.

2010-08-22 Bioclipse & Proteochemometric Group - 13 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 14: Linking RDF to cheminformatics and proteochemometrics

Problem

BuildingBlocks

Open Data

Application

Conclusion

SPARQL end points (Open Data)

NMRShiftDB data (C. Steinbeck, EBI/UK)ChEMBL (J. Overingthon, EBI/UK)

2010-08-22 Bioclipse & Proteochemometric Group - 14 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 15: Linking RDF to cheminformatics and proteochemometrics

Problem

BuildingBlocks

Open Data

Application

Conclusion

Proteochemometrics: simple QSAR

E.L.Willighagen et al., J. Biomed. Sem., 2010, in print

2010-08-22 Bioclipse & Proteochemometric Group - 15 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 16: Linking RDF to cheminformatics and proteochemometrics

Problem

BuildingBlocks

Open Data

Application

Conclusion

Proteochemometrics: RDF input

2010-08-22 Bioclipse & Proteochemometric Group - 16 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 17: Linking RDF to cheminformatics and proteochemometrics

Problem

BuildingBlocks

Open Data

Application

Conclusion

Proteochemometrics: Bayesian + extraPriors

●●

●●

●●

●●

●●

● ●

●●

●● ●

●●●

●●

●●●

●●

●●

●●

● ●●

● ● ●

●●

● ●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

● ●

●●

● ●

● ●

2 4 6 8 10 12

−5

05

1015

20(a)

Actual

Pre

dict

ed

●●

● ● ●

● ●

●●

●●

●●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●●

●●

●● ●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

2 4 6 8 10 12−

50

510

1520

(b)

Actual

Pre

dict

ed

2010-08-22 Bioclipse & Proteochemometric Group - 17 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 18: Linking RDF to cheminformatics and proteochemometrics

Problem

BuildingBlocks

Open Data

Application

Conclusion

MyExperiment: Bioclipse ScriptingLanguage

myexperiment.search("RDF")myexperiment.downloadWorkflow(937)

2010-08-22 Bioclipse & Proteochemometric Group - 18 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 19: Linking RDF to cheminformatics and proteochemometrics

Problem

BuildingBlocks

Open Data

Application

Conclusion

Reasoning: Prolog and Pellet

Samuel Lampa, M.Sc. project2010-08-22 Bioclipse & Proteochemometric Group - 19 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 20: Linking RDF to cheminformatics and proteochemometrics

Problem

BuildingBlocks

Open Data

Application

Conclusion

Semantic Wikis

Samuel Lampa, Google Summer of Code 2010

2010-08-22 Bioclipse & Proteochemometric Group - 20 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 21: Linking RDF to cheminformatics and proteochemometrics

Problem

BuildingBlocks

Open Data

Application

Conclusion

XHTML+RDFa

2010-08-22 Bioclipse & Proteochemometric Group - 21 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 22: Linking RDF to cheminformatics and proteochemometrics

Problem

BuildingBlocks

Open Data

Application

Conclusion

OpenTox: downloading

2010-08-22 Bioclipse & Proteochemometric Group - 22 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 23: Linking RDF to cheminformatics and proteochemometrics

Problem

BuildingBlocks

Open Data

Application

Conclusion

OpenTox: uploading

// requires an unspecified Bioclipse // development versionds = opentox.createDataset( "http://apps.ideaconsult.net:8080/ambit2/");opentox.addMolecule(ds, cdk.fromSMILES("CCCCC[N+](C)(C)C") )opentox.addMolecule(ds, cdk.fromSMILES("ClC(I)Br") )opentox.deleteDataset(ds);

2010-08-22 Bioclipse & Proteochemometric Group - 23 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 24: Linking RDF to cheminformatics and proteochemometrics

Problem

BuildingBlocks

Open Data

Application

Conclusion

Linked Data: Visualization

2010-08-22 Bioclipse & Proteochemometric Group - 24 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 25: Linking RDF to cheminformatics and proteochemometrics

Problem

BuildingBlocks

Open Data

Application

Conclusion

Substructure mining: ChEMBL

Annsofie Andersson, M.Sc. project2010-08-22 Bioclipse & Proteochemometric Group - 25 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 26: Linking RDF to cheminformatics and proteochemometrics

Problem

BuildingBlocks

Open Data

Application

Conclusion

Substructure mining: .. and MoSS

2010-08-22 Bioclipse & Proteochemometric Group - 26 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 27: Linking RDF to cheminformatics and proteochemometrics

Problem

BuildingBlocks

Open Data

Application

Conclusion

What does this bring us?

Platform to integrate the RDF with the computation worldBioclipse as single point of accessScripting, sharing of scripts with MyExperiment.orgBridging Names to Numbers

2010-08-22 Bioclipse & Proteochemometric Group - 27 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 28: Linking RDF to cheminformatics and proteochemometrics

Problem

BuildingBlocks

Open Data

Application

Conclusion

Acknowledgements

Maris Lapins, Martin Eklund: statisticsAnnsofie Andersson: ChEMBL + MoSS integrationSamuel Lampa: reasoning (Pellet/Prolog) and RDFIONina Jeliazkova: OpenTox integration

2010-08-22 Bioclipse & Proteochemometric Group - 28 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 29: Linking RDF to cheminformatics and proteochemometrics

Problem

BuildingBlocks

Open Data

Application

Conclusion

The Details

http://www.citeulike.org/user/

egonw/tag/papers

http:

//chem-bla-ics.blogspot.com

http://egonw.github.com

waveto:

[email protected]

2010-08-22 Bioclipse & Proteochemometric Group - 29 - Egon Willighagen | chem-bla-ics.blogspot.com