IUPAC pKa Compilations Converted toSubstructure Searchable Databases
Tony SlaterTony Slater§
and Joe and Joe CorkeryCorkery
ǂ
9 Bisbee CtSuite DSanta Fe, NM 87508
www.eyesopen.com
AbstractOpenEye’s partner, pKaData Limited has converted all four aqueous pKa compilations of organic acids and bases sponsored by the International Union of Pure and Applied Chemistry (IUPAC) from book form into fully curated, computer-readable data, searchable by substructure.
The 13697 molecules with 30415 pKa experimental pKa values can be searched very flexibly due to the careful assignment of information to defined data fields. Simple examples of such searches are provided below.
Searches
Figure 3: Search for para-substituted phenols in the pKa range 6.5 < pKa < 7.5
Products
1) Bandura, A. V., and S. N. Lvov, "The Ionization Constant of Water over a Wide Range of Temperatures and Densities." J. Phys. Chem. Ref. Data, Vol. 35, 2006, pp. 15 – 30.
References
In association with OpenEye, pKaData Limited has converted all four IUPAC compilations of aqueous pKa data in book form into computer-readable and substructure-searchable form. The databases are fully curated, and the complete database provides access to researchers to 30415 experimental pKa determinations. The IUPAC critical data quality assessment provides confidence limits to the measurements. Ionisation assignments have been provided for logD calculations
Conclusions
A simple search first, say we are interested in the effect of a para halogen substituent on the pKa of aniline as in Figure 2. Furthermore, we only want pKas measured at 25°C and we are interested only in the most reliable data as defined by the IUPAC data quality assessment. The results are shown in Table 1.
Search 2In this search, we have an enzyme active site that we think will accept an ionised phenol, but can we bring the phenolic pKa down enough to be in the range 6.5 < pKa < 7.5 using a para substituent as in Figure 3? Will nitro suffice and are there any “off the wall” substituents that will do the job? The results are shown in Table 2.
Data Source
Introduction
pKa Databases created by conversion of the following IUPAC books:
Base 1 (3775 molecules, 8766 pKas)Dissociation Constants of Organic Bases in Aqueous Solution, by D. D. Perrin
Acid 1 (1063 molecules, 2893 pKas)Dissociation Constants of Organic Acids in Aqueous Solution, by G. Kortum, W. Vogel and K. Andrussow
Base 2 (4275 molecules, 7844 pKas)Dissociation Constants of Organic Bases in Aqueous Solution, Supplement 1972, by D. D. Perrin
Acid 2 (4584 molecules, 10912 pKas)Ionisation Constants of Organic Acids in Aqueous Solution, by E. P. Serjeant and Boyd Dempsey
Complete Database (13697 molecules, 30415 pKas)
The four books of pKa compilations sponsored by IUPAC are:
1.Dissociation Constants of Organic Bases in Aqueous Solution, by D. D. Perrin2.Dissociation Constants of Organic Bases in Aqueous Solution, Supplement 1972, by D. D. Perrin3.Dissociation Constants of Organic Acids in Aqueous Solution, by G. Kortum, W. Vogel, and K. Andrussow4.Ionisation Constants of Organic Acids in Aqueous Solution, by E. P. Serjeant and Boyd Dempsey
116 Wood RoadRD9 MaungatapereWhangarei 0179New Zealand
www.pKaData.com
pKaData Limited§
ǂ
pKaData Limited has kindly been granted exclusive permission by IUPAC to convert their extensive compilations of experimental pKa values of organic acids and bases from book form into fully curated, computer-readable data, searchable by sub-structure.
The project was completed in mid 2011, providing researchers with access to 30415 experimental pKa values in aqueous solution.
convert names to smiles
supply full reference
translate quality assessmentinto confidence limits
supply details of method
assign data and text to relevant fields
assign temperature to correct field and put other text (eg. ~ or >) in separate field
convert pKb into pKa
assign non-numeric text (eg. <) to separate field,also assign ion group
ConversionFigure 1 below illustrates how the data were extracted from the books and assigned to defined fields. Note the production of a SMILES description was derived from the molecule name and/or molecular diagram for each molecule. pKbs were converted into pKas using the method of Bandura and
Lvov1.
Figure 1: Example of data extraction from book into database.
HalN
Figure 2: Search for pKa of aniline with para-substituted halogen measured at 25°C.
Features• Molecule names and structures converted
to SMILES• IUPAC critical data quality assessment• Data assigned to separate fields (e.g.
ionic strength, concentration and temperature)
• Associated alphabetic data placed in separate field to numerical data (e.g. <5.3 assigned to two data fields) for enhanced search capabilities
• Full reference and method description for each record
• Ionisation assignment for logD calculations
• Very flexible searching due to careful field assignment:• Substructure• Search for basic pKa with 6.5 < pKa <
7.5• Use only the highest quality data• Search for where 35°C ≤ temp ≤ 40°C• Any combination of the above and
much more• Database can be merged with existing in-
house data, with the IUPAC-sourced data clearly identified
• Tautomers were enumerated using OpenEye’s QuacPac program, with the ability to display just a single representatidve tautomer
• Total number of 75232 records and 79 columns
• Good range of organic chemistry with applicability to pharmaceutical, agrochemical and specialty chemicals research, as well as pKa prediction research
HalpKa range(# of obs)
most reliable value
F 4.53 - 4.65 (5) 4.64
Cl 3.82 - 4.15 (9) 3.982
Br 3.8 - 3.95 (6) 3.888
I 3.75 - 3.84 (6) 3.812
Table 1: Results for search for pKa of aniline with para-substituted halogen measured at 25 degC.
Search 1
XO
Table 2: Results for search for para-substituted phenols in the pKa range 6.5 < pKa < 7.5
Substituent X pKa
7.152
7.42
7.3
Top Related