ChemInfo 2011 class1

49
Chemical Information Retrieval 2011 Jean-Claude Bradley September 23, 2011 First Class Associate Professor of Chemistry Drexel University CHEM367/767 Drexel University

description

Jean-Claude Bradley presents the introductory lecture for Chemical Information Retrieval at Drexel University for Fall 2011 on September 23, 2011. Examples are given to demonstrate how difficult it can be to find and assess chemical information such as melting points. An overview of the class wiki is then given

Transcript of ChemInfo 2011 class1

Page 1: ChemInfo 2011 class1

Chemical Information Retrieval 2011

Jean-Claude Bradley

September 23, 2011

First Class

Associate Professor of ChemistryDrexel University

CHEM367/767 Drexel University

Page 2: ChemInfo 2011 class1

Finding reliable chemical information

can be really hard

Page 3: ChemInfo 2011 class1

After this class,you should feel that

you can never blindly trust

chemical data sources again

Page 4: ChemInfo 2011 class1

But…You will learn how to do the best you can

with imperfect information

Page 5: ChemInfo 2011 class1

The Chemical Information Validation Sheet

567 curated and referenced measurements from Fall 2010 Chemical Information Retrieval course

Page 6: ChemInfo 2011 class1

Discovering outliers for melting points (stdev/average)

Page 7: ChemInfo 2011 class1

Investigating the m.p. inconsistencies of EGCG

Page 8: ChemInfo 2011 class1

Investigating the m.p. inconsistencies of cyclohexanone

Page 9: ChemInfo 2011 class1

Most popular data sources

Page 10: ChemInfo 2011 class1

Alfa Aesar donates melting points to the public

Page 11: ChemInfo 2011 class1

Open Melting Point Explorer

(Andrew Lang)

Page 12: ChemInfo 2011 class1

OutliersMDPI

datasetEPI (donated all data to public

also)

Page 13: ChemInfo 2011 class1

Outliers for ethanol: Alfa Aesar and Oxford MSDS

Page 14: ChemInfo 2011 class1

Inconsistencies and SMILES problems within MDPI dataset

Page 15: ChemInfo 2011 class1

MDPI Dataset labeled with High Trust Level

Page 16: ChemInfo 2011 class1

Open Melting Point DatasetsCurrently 20,000 compounds with Open MPs

Page 17: ChemInfo 2011 class1

American Petroleum Institute 5 CPHYSPROP -30 CPHYSPROP 125 Cpeer reviewed journal (2008) 97.5 Cgovernment database -30 Cgovernment database 4.58 C

What is the melting point of 4-benzyltoluene?

Page 18: ChemInfo 2011 class1

The quest to resolve the melting point of 4-benzyltoluene: liquid at room temp

and can be frozen <-30C

Page 19: ChemInfo 2011 class1

Open Lab Notebook page measuring the melting point of 4-benzyltoluene

Page 20: ChemInfo 2011 class1

Motivation: Faster Science, Better Science

Page 21: ChemInfo 2011 class1

Ruling out all melting points above -15C?

Page 22: ChemInfo 2011 class1

Oops – 4-benzyltoluene freezes after 16 days at -15C!

Page 23: ChemInfo 2011 class1

Measuring the melting point by slowly heating from -15 C gives 5 C

Page 24: ChemInfo 2011 class1

There are NO FACTS, only measurements embedded

within assumptions

Open Notebook Science maintains the integrity of data

provenance by making assumptions explicit

Page 25: ChemInfo 2011 class1

Open Random Forest modeling of Open Melting Point data using CDK descriptors

(Andrew Lang)

R2 = 0.78, TPSA and nHdon most important

Page 26: ChemInfo 2011 class1

Melting point prediction service

Page 27: ChemInfo 2011 class1

Melting point predictions and measurements on iPhone/iPad (Andrew Lang and Alex Clark)

Page 28: ChemInfo 2011 class1

Using melting point for temperature dependent solubility prediction

Page 29: ChemInfo 2011 class1

Web services for summary data

(Andrew Lang)

Page 30: ChemInfo 2011 class1

Web service calls from within a Google Spreadsheet for solubility measurement

and prediction

(Andrew Lang)

Page 31: ChemInfo 2011 class1

Integration of Multiple Web Services to Recommend Solvents

for Reactions

(Andrew Lang)

Page 32: ChemInfo 2011 class1

Publication of double+ validated melting point dataset to Nature

Precedings and LuLu

Page 33: ChemInfo 2011 class1
Page 34: ChemInfo 2011 class1
Page 35: ChemInfo 2011 class1

Reaction Attempts Book

Page 36: ChemInfo 2011 class1

Reaction Attempts Book: Reactants listed Alphabetically

Page 37: ChemInfo 2011 class1
Page 38: ChemInfo 2011 class1

All ONS web services

Page 39: ChemInfo 2011 class1

Google Apps Scripts web services

Page 40: ChemInfo 2011 class1

Google Apps Scripts for conveniently exploring melting

point data

Page 41: ChemInfo 2011 class1

Straight chain carboxylic acids from 1 to 10 carbons

Straight chain alcohols from 1 to 10 carbons

Comparison of model with triple validated measurements

Page 42: ChemInfo 2011 class1

Cyclic primary amines from 3 to 6 carbons (cyclobutylamine flagged for validation – only single source available)

Page 43: ChemInfo 2011 class1

Google Apps Scripts for planning reactions and creating schemes

Page 44: ChemInfo 2011 class1

Open Melting Points in Supplementary Data Pages of Wikipedia (Martin Walker)

Page 45: ChemInfo 2011 class1

Web services from data collected in this class will be added here

Page 46: ChemInfo 2011 class1

In this class you will learn

How to search Science1.0 resources

•Peer-Reviewed journals•Commercial databases•Patents•Conference Proceedings

Page 47: ChemInfo 2011 class1

In this class you will learn

How to participate in Science2.0

•wikis (Wikipedia, class wiki)•blogs•interactive databases (ChemSpider)•social software (Twitter, FriendFeed)

Page 48: ChemInfo 2011 class1

In this class you will learnHow to leverage Science3.0

(via collaboration with Andrew Lang)

•machine readable web-services

Page 49: ChemInfo 2011 class1

Now lets take a look at the class wiki