237th ACS Meeting - acscinf.orgacscinf.org/docs/meetings/237nm/presentations/237nm55.pdfGoals...

23
Learning Scoring Functions for Chemical Expert Systems C.-A. Azencott, M. A. Kayala, and P. Baldi Institute for Genomics and Bioinformatics Donald Bren School of Information and Computer Sciences 237th ACS Meeting March 24, 2009 Azencott (ACS Spring 09) Learning Chem Scoring Functions March 24, 2009 1 / 20

Transcript of 237th ACS Meeting - acscinf.orgacscinf.org/docs/meetings/237nm/presentations/237nm55.pdfGoals...

Learning Scoring Functions for Chemical Expert Systems

C.-A. Azencott, M. A. Kayala, and P. Baldi

Institute for Genomics and BioinformaticsDonald Bren School of Information and Computer Sciences

237th ACS Meeting

March 24, 2009

Azencott (ACS Spring 09) Learning Chem Scoring Functions March 24, 2009 1 / 20

Goals

Chemical space exploration

Reproduce Augmentproblem-solving ability oforganic chemists

Azencott (ACS Spring 09) Learning Chem Scoring Functions March 24, 2009 2 / 20

Reaction Simulator

Input:

Starting materialsConditions (pH, temperature)

Output:

ProductsEnergy diagram

→ Reaction favorability scoringfunction?

Azencott (ACS Spring 09) Learning Chem Scoring Functions March 24, 2009 3 / 20

Energy of Activation ∆G ‡

Measure of reaction favorability

Directly related to reaction rates

Azencott (ACS Spring 09) Learning Chem Scoring Functions March 24, 2009 4 / 20

Energy of Activation ∆G ‡

Measure of reaction favorability

Directly related to reaction rates

How can we compute ∆G ‡?

Direct computation not possibleLaboratory experiment or QM/MM simulation

Infer with Machine Learning → Data

Azencott (ACS Spring 09) Learning Chem Scoring Functions March 24, 2009 4 / 20

∆G ‡ numerical data

Experimental data

Quantum Mechanical /Molecular Modeling

Problems:

Time Consuming

Expensive

Very little available

→ What can we do?

Azencott (ACS Spring 09) Learning Chem Scoring Functions March 24, 2009 5 / 20

How do chemists solve problems?

Without:

performing experiments

producing numerical ∆G ‡

chemists can (almost) always:

tell which of several possiblereactions is more favorable

produce relative rates

Azencott (ACS Spring 09) Learning Chem Scoring Functions March 24, 2009 6 / 20

How do chemists solve problems?

Without:

performing experiments

producing numerical ∆G ‡

chemists can (almost) always:

tell which of several possiblereactions is more favorable

produce relative rates

→ How do we get this qualitative knowledge?

Azencott (ACS Spring 09) Learning Chem Scoring Functions March 24, 2009 6 / 20

Reaction Explorer

Jonathan H. Chen: http://www.reactionexplorer.org

Rule-based expert system

Basic undergraduate organicchemistry

Azencott (ACS Spring 09) Learning Chem Scoring Functions March 24, 2009 7 / 20

Reaction Explorer

Jonathan H. Chen: http://www.reactionexplorer.org

Rule-based expert system

Basic undergraduate organicchemistry

→ Create pairs of

elementary steps

ordered by favorability

Azencott (ACS Spring 09) Learning Chem Scoring Functions March 24, 2009 7 / 20

Approach 1: Priority Ordering

Rules priority → order elementary movements

Most favorable electron movement

����������������������

����������������

�����������������

����������������

Less favorable electron movement

�������������������������

����������������

�����������������

����������������

Azencott (ACS Spring 09) Learning Chem Scoring Functions March 24, 2009 8 / 20

Approach 2: Implicitly Unfavorable Reactions

From what simple elementary movement should happen

����������������������

����������������

�����������������

����������������

Deduce what elementary movements should not happen

������������������������

����������������

�����������������

����������������

��������������

����������������

��������

�����������������

���������������

��������

����������������

����������������

��������

�����������������

���������������

��������

Azencott (ACS Spring 09) Learning Chem Scoring Functions March 24, 2009 9 / 20

Data Sets

CHNO + Halides, Ionic Reactions.

Data Set Number of Pairs

Priority ordered 3, 457

Implicitly unfavored475, 684(from 16, 806 most favorable reactions)

Azencott (ACS Spring 09) Learning Chem Scoring Functions March 24, 2009 10 / 20

Feature Representation, f : Elementary Step → Rn

Electron movement between pair of source / sink orbitalsPrincipled choice of 150 features

Azencott (ACS Spring 09) Learning Chem Scoring Functions March 24, 2009 11 / 20

Neural Network Architecture I

Azencott (ACS Spring 09) Learning Chem Scoring Functions March 24, 2009 12 / 20

Neural Network Architecture II

Azencott (ACS Spring 09) Learning Chem Scoring Functions March 24, 2009 13 / 20

Classification Performance

Cross-Validated accuracy:

Data Set Architecture Accuracy

Priority orderedPerceptron (no hidden nodes) 92.8%15 hidden nodes 93.7%

Implicitly unfavoredPerceptron (no hidden nodes) 98.8%15 hidden nodes 99.0%

AltogetherPerceptron (no hidden nodes) 98.6%15 hidden nodes 99.0%

Azencott (ACS Spring 09) Learning Chem Scoring Functions March 24, 2009 14 / 20

Performance in the Simulator I

Azencott (ACS Spring 09) Learning Chem Scoring Functions March 24, 2009 15 / 20

Performance in the Simulator II

Azencott (ACS Spring 09) Learning Chem Scoring Functions March 24, 2009 16 / 20

Performance in the Simulator III

Azencott (ACS Spring 09) Learning Chem Scoring Functions March 24, 2009 17 / 20

Performance in the Simulator IV

Azencott (ACS Spring 09) Learning Chem Scoring Functions March 24, 2009 18 / 20

Conclusion

Lack of quantitative data

Compensate with qualitative knowledge

Applied to the prediction of ∆G ‡

Ionic ReactionsC, H, N, O + halidesQualitative trends / ranking order of reactivities necessary to solverelevant chemistry problems

Future Work

Expand coverageValidation on external reaction databases (e.g. SPRESI)

Azencott (ACS Spring 09) Learning Chem Scoring Functions March 24, 2009 19 / 20

Thanks

Matthew A. Kayala

Jonathan H. ChenReaction Simulation Expert System for Synthetic Organic

Chemistry, talk Thursday, March 26th at 10:30 am (CINF GeneralPapers)

Prof. Pierre Baldi

The Baldi Lab

OpenEye Academic Software License

ChemAxon Acadedmic Software License

NIH/NLM Biomedical Informatics Training Program

NSF Grants 0321390 and 0513376

The Dreyfus Foundation Special Grant Program in Chemical Sciences

Azencott (ACS Spring 09) Learning Chem Scoring Functions March 24, 2009 20 / 20