237th ACS Meeting - acscinf.orgacscinf.org/docs/meetings/237nm/presentations/237nm55.pdfGoals...
Transcript of 237th ACS Meeting - acscinf.orgacscinf.org/docs/meetings/237nm/presentations/237nm55.pdfGoals...
Learning Scoring Functions for Chemical Expert Systems
C.-A. Azencott, M. A. Kayala, and P. Baldi
Institute for Genomics and BioinformaticsDonald Bren School of Information and Computer Sciences
237th ACS Meeting
March 24, 2009
Azencott (ACS Spring 09) Learning Chem Scoring Functions March 24, 2009 1 / 20
Goals
Chemical space exploration
Reproduce Augmentproblem-solving ability oforganic chemists
Azencott (ACS Spring 09) Learning Chem Scoring Functions March 24, 2009 2 / 20
Reaction Simulator
Input:
Starting materialsConditions (pH, temperature)
Output:
ProductsEnergy diagram
→ Reaction favorability scoringfunction?
Azencott (ACS Spring 09) Learning Chem Scoring Functions March 24, 2009 3 / 20
Energy of Activation ∆G ‡
Measure of reaction favorability
Directly related to reaction rates
Azencott (ACS Spring 09) Learning Chem Scoring Functions March 24, 2009 4 / 20
Energy of Activation ∆G ‡
Measure of reaction favorability
Directly related to reaction rates
How can we compute ∆G ‡?
Direct computation not possibleLaboratory experiment or QM/MM simulation
Infer with Machine Learning → Data
Azencott (ACS Spring 09) Learning Chem Scoring Functions March 24, 2009 4 / 20
∆G ‡ numerical data
Experimental data
Quantum Mechanical /Molecular Modeling
Problems:
Time Consuming
Expensive
Very little available
→ What can we do?
Azencott (ACS Spring 09) Learning Chem Scoring Functions March 24, 2009 5 / 20
How do chemists solve problems?
Without:
performing experiments
producing numerical ∆G ‡
chemists can (almost) always:
tell which of several possiblereactions is more favorable
produce relative rates
Azencott (ACS Spring 09) Learning Chem Scoring Functions March 24, 2009 6 / 20
How do chemists solve problems?
Without:
performing experiments
producing numerical ∆G ‡
chemists can (almost) always:
tell which of several possiblereactions is more favorable
produce relative rates
→ How do we get this qualitative knowledge?
Azencott (ACS Spring 09) Learning Chem Scoring Functions March 24, 2009 6 / 20
Reaction Explorer
Jonathan H. Chen: http://www.reactionexplorer.org
Rule-based expert system
Basic undergraduate organicchemistry
Azencott (ACS Spring 09) Learning Chem Scoring Functions March 24, 2009 7 / 20
Reaction Explorer
Jonathan H. Chen: http://www.reactionexplorer.org
Rule-based expert system
Basic undergraduate organicchemistry
→ Create pairs of
elementary steps
ordered by favorability
Azencott (ACS Spring 09) Learning Chem Scoring Functions March 24, 2009 7 / 20
Approach 1: Priority Ordering
Rules priority → order elementary movements
Most favorable electron movement
����������������������
����������������
�����������������
����������������
Less favorable electron movement
�������������������������
����������������
�����������������
����������������
Azencott (ACS Spring 09) Learning Chem Scoring Functions March 24, 2009 8 / 20
Approach 2: Implicitly Unfavorable Reactions
From what simple elementary movement should happen
����������������������
����������������
�����������������
����������������
Deduce what elementary movements should not happen
������������������������
����������������
�����������������
����������������
��������������
����������������
��������
�����������������
���������������
��������
����������������
����������������
��������
�����������������
���������������
��������
Azencott (ACS Spring 09) Learning Chem Scoring Functions March 24, 2009 9 / 20
Data Sets
CHNO + Halides, Ionic Reactions.
Data Set Number of Pairs
Priority ordered 3, 457
Implicitly unfavored475, 684(from 16, 806 most favorable reactions)
Azencott (ACS Spring 09) Learning Chem Scoring Functions March 24, 2009 10 / 20
Feature Representation, f : Elementary Step → Rn
Electron movement between pair of source / sink orbitalsPrincipled choice of 150 features
Azencott (ACS Spring 09) Learning Chem Scoring Functions March 24, 2009 11 / 20
Neural Network Architecture I
Azencott (ACS Spring 09) Learning Chem Scoring Functions March 24, 2009 12 / 20
Neural Network Architecture II
Azencott (ACS Spring 09) Learning Chem Scoring Functions March 24, 2009 13 / 20
Classification Performance
Cross-Validated accuracy:
Data Set Architecture Accuracy
Priority orderedPerceptron (no hidden nodes) 92.8%15 hidden nodes 93.7%
Implicitly unfavoredPerceptron (no hidden nodes) 98.8%15 hidden nodes 99.0%
AltogetherPerceptron (no hidden nodes) 98.6%15 hidden nodes 99.0%
Azencott (ACS Spring 09) Learning Chem Scoring Functions March 24, 2009 14 / 20
Performance in the Simulator I
Azencott (ACS Spring 09) Learning Chem Scoring Functions March 24, 2009 15 / 20
Performance in the Simulator II
Azencott (ACS Spring 09) Learning Chem Scoring Functions March 24, 2009 16 / 20
Performance in the Simulator III
Azencott (ACS Spring 09) Learning Chem Scoring Functions March 24, 2009 17 / 20
Performance in the Simulator IV
Azencott (ACS Spring 09) Learning Chem Scoring Functions March 24, 2009 18 / 20
Conclusion
Lack of quantitative data
Compensate with qualitative knowledge
Applied to the prediction of ∆G ‡
Ionic ReactionsC, H, N, O + halidesQualitative trends / ranking order of reactivities necessary to solverelevant chemistry problems
Future Work
Expand coverageValidation on external reaction databases (e.g. SPRESI)
Azencott (ACS Spring 09) Learning Chem Scoring Functions March 24, 2009 19 / 20
Thanks
Matthew A. Kayala
Jonathan H. ChenReaction Simulation Expert System for Synthetic Organic
Chemistry, talk Thursday, March 26th at 10:30 am (CINF GeneralPapers)
Prof. Pierre Baldi
The Baldi Lab
OpenEye Academic Software License
ChemAxon Acadedmic Software License
NIH/NLM Biomedical Informatics Training Program
NSF Grants 0321390 and 0513376
The Dreyfus Foundation Special Grant Program in Chemical Sciences
Azencott (ACS Spring 09) Learning Chem Scoring Functions March 24, 2009 20 / 20