QSPR Prediction of pK for Aliphatic Carboxylic Acids and Anilines in Different Solvents

12
QSPR Prediction of pK for Aliphatic Carboxylic Acids and Anilines in Different Solvents Jesu ´s Jover, Ramo ´ n Bosque and Joaquim Sales* Departament de Quȷmica Inorga `nica, Universitat de Barcelona, Martȷ i Franque `s, 1, 08028 Barcelona, Spain, E-mail: [email protected] Keywords: Anilines, Aprotic solvents, Carboxylic acids, Neural networks, Protic solvents, pK, QSPR Revised: July 10, 2008; Received: May 06, 2008; Accepted: July 23, 2008 DOI: 10.1002/qsar.200810049 Abstract Computational Neural Network (CNN)-based QSPR methodology is applied to predict the pK a of aliphatic carboxylic acids and pK b of anilines in protic and aprotic solvents. The proposed non-linear models contain seven descriptors, five of them are of the usual molecular type corresponding to the solutes and the other two describe the solvent as a bulk. The descriptors of both models are similar, and comparable to those obtained in previous studies for phenols and benzoic acids. The statistical results of the correlations, for the training, prediction, and validation sets of the two families are very good, with R 2 0.98 and rms error 0.3 pK units. The information encoded in the descriptors allows an interpretation of the dissociation process studied based on the specific and non-specific solute – solvent interactions. 1 Introduction The tendency of a compound to transfer a proton is a fun- damental property to understand many chemical and bio- chemical processes, consequently a good knowledge of the corresponding dissociation constants in various solvents is essential for the study of the general reactivity of a chemi- cal compound. Although most experimental pK a values have been determined in water, the increasing importance of the chemistry in non-aqueous solvents has promoted the determination of the acidity constants in them. The acidity of small molecules in the gas phase can currently be calculated with similar accuracy as the experimental de- terminations. Conversely, the difficulty to describe the sol- vation effects and the solute – solvent interactions with theoretical methods makes the situation much more com- plicated in solution. However, using dielectric continuum methods, it has been possible to calculate the acidity of or- ganic molecules in water and also in some organic solvents such as Acetonitrile (AN) and Dimethylsulfoxide (DMSO) [1 – 3]. The well-known Quantitative Structure – Property Rela- tionship (QSPR) methodology has been used for long time in the pK a estimation of several families of organic com- pounds in aqueous solutions. More recently, we have ap- plied this approach to predict the acidity in water and also in several organic solvents of phenols and benzoic acids [4, 5]. In the study of these multicomponent systems, we have developed non-linear models, by means of Computational Neural Networks (CNN), that contain the habitual molec- ular descriptors of the solutes as well as others that charac- terize the solvents as a bulk. In order to extend the appli- cation of QSPR methods to other multicomponent sys- tems, we report here the results obtained in the pK predic- tion of other two families of organic compounds, aliphatic carboxylic acids, and anilines, in different protic and aprot- ic solvents. Some papers on the acidity of these two families of or- ganic compounds, in aqueous solutions, have appeared re- cently. Gasteiger and coworkers [6] have proposed models for the pK a prediction of a wide set composed of 1122 ali- phatic carboxylic acids; the models were developed by Multiple Linear Regression (MLR) analysis and contain empirical atomic descriptors, the squared correlation coef- ficient is 0.813 and the standard error of prediction is 0.423 pK a units. Multivariate models [7], using Quantum Topological Molecular Similarity (QTMS) tools, have been proposed for sets of carboxylic acids and anilines. QSPR models to estimate the pK a of non-aromatic carbox- ylic acids [8] (R 2 ¼ 0.84), anilines (R 2 ¼ 0.77, s ¼ 0.958), and aliphatic carboxylic acids (R 2 ¼ 0.83, s ¼ 0.564) have also been reported [9]. On the other hand, the estimation of acidities in non-aqueous solvents is very scarce; a compa- rative study of Hammet –Taft and Drago [10] models for the pK a prediction of a set of 38 carboxylic acids and other of 28 anilines in methanol has been published. R 2 of 0.957 1204 # 2008 WILEY-VCH Verlag GmbH&Co. KGaA, Weinheim QSAR Comb. Sci. 27, 2008, No. 10, 1204 – 1215 Full Papers

Transcript of QSPR Prediction of pK for Aliphatic Carboxylic Acids and Anilines in Different Solvents

Page 1: QSPR Prediction of pK for Aliphatic Carboxylic Acids and Anilines in Different Solvents

QSPR Prediction of pK for Aliphatic Carboxylic Acids andAnilines in Different Solvents

Jesus Jover, Ramon Bosque and Joaquim Sales*

Departament de Qu�mica Inorganica, Universitat de Barcelona, Mart� i Franques, 1, 08028 Barcelona, Spain,E-mail: [email protected]

Keywords: Anilines, Aprotic solvents, Carboxylic acids, Neural networks, Protic solvents, pK, QSPRRevised: July 10, 2008;

Received: May 06, 2008; Accepted: July 23, 2008

DOI: 10.1002/qsar.200810049

AbstractComputational Neural Network (CNN)-based QSPR methodology is applied to predictthe pKa of aliphatic carboxylic acids and pKb of anilines in protic and aprotic solvents.The proposed non-linear models contain seven descriptors, five of them are of the usualmolecular type corresponding to the solutes and the other two describe the solvent as abulk. The descriptors of both models are similar, and comparable to those obtained inprevious studies for phenols and benzoic acids. The statistical results of the correlations,for the training, prediction, and validation sets of the two families are very good, withR2�0.98 and rms error �0.3 pK units. The information encoded in the descriptors allowsan interpretation of the dissociation process studied based on the specific and non-specificsolute – solvent interactions.

1 Introduction

The tendency of a compound to transfer a proton is a fun-damental property to understand many chemical and bio-chemical processes, consequently a good knowledge of thecorresponding dissociation constants in various solvents isessential for the study of the general reactivity of a chemi-cal compound. Although most experimental pKa valueshave been determined in water, the increasing importanceof the chemistry in non-aqueous solvents has promotedthe determination of the acidity constants in them. Theacidity of small molecules in the gas phase can currentlybe calculated with similar accuracy as the experimental de-terminations. Conversely, the difficulty to describe the sol-vation effects and the solute – solvent interactions withtheoretical methods makes the situation much more com-plicated in solution. However, using dielectric continuummethods, it has been possible to calculate the acidity of or-ganic molecules in water and also in some organic solventssuch as Acetonitrile (AN) and Dimethylsulfoxide(DMSO) [1 – 3].

The well-known Quantitative Structure – Property Rela-tionship (QSPR) methodology has been used for long timein the pKa estimation of several families of organic com-pounds in aqueous solutions. More recently, we have ap-plied this approach to predict the acidity in water and alsoin several organic solvents of phenols and benzoic acids [4,5]. In the study of these multicomponent systems, we have

developed non-linear models, by means of ComputationalNeural Networks (CNN), that contain the habitual molec-ular descriptors of the solutes as well as others that charac-terize the solvents as a bulk. In order to extend the appli-cation of QSPR methods to other multicomponent sys-tems, we report here the results obtained in the pK predic-tion of other two families of organic compounds, aliphaticcarboxylic acids, and anilines, in different protic and aprot-ic solvents.

Some papers on the acidity of these two families of or-ganic compounds, in aqueous solutions, have appeared re-cently. Gasteiger and coworkers [6] have proposed modelsfor the pKa prediction of a wide set composed of 1122 ali-phatic carboxylic acids; the models were developed byMultiple Linear Regression (MLR) analysis and containempirical atomic descriptors, the squared correlation coef-ficient is 0.813 and the standard error of prediction is0.423 pKa units. Multivariate models [7], using QuantumTopological Molecular Similarity (QTMS) tools, havebeen proposed for sets of carboxylic acids and anilines.QSPR models to estimate the pKa of non-aromatic carbox-ylic acids [8] (R2¼0.84), anilines (R2¼0.77, s¼0.958), andaliphatic carboxylic acids (R2¼0.83, s¼0.564) have alsobeen reported [9]. On the other hand, the estimation ofacidities in non-aqueous solvents is very scarce; a compa-rative study of Hammet – Taft and Drago [10] models forthe pKa prediction of a set of 38 carboxylic acids and otherof 28 anilines in methanol has been published. R2 of 0.957

1204 � 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim QSAR Comb. Sci. 27, 2008, No. 10, 1204 – 1215

Full Papers

Page 2: QSPR Prediction of pK for Aliphatic Carboxylic Acids and Anilines in Different Solvents

and 0.975 and standard deviations of 0.26 and 0.30 pKa

units were obtained for the carboxylic acids and anilinessets, respectively. More recently, an ab initio protocol [11]that can predict the pKa value of protonated amines orphosphines in AN has appeared, the precision of the meth-od is of 1.1 pKa units. Other methods such as fragmentalapproach have been applied to the pKa estimation of phe-nols, carboxylic acids, and nitrogen-containing compoundsin water [12].

2 Data and Computational Methods

2.1 Datasets

The aliphatic carboxylic acids set contains 158 pKa valuescorresponding to 43 carboxylic acids in eleven solvents,five protic: water, methanol, ethanol, isopropanol, andtert-butanol; and six aprotic: DMSO, AN, Nitromethane(NM), N,N-Dimethylformamide (DMF), N,N-Dimethyla-cetamide (DMA), and Acetone (Ac). The experimentalvalues in the protic solvents have been obtained fromBosch [13 – 17], Kolthoff [18, 19], Headley [20], and others[21 – 25]; and in the aprotic solvents from the Izutsu [26]compilation and Bartnicka [27]. In order to maintain somehomogeneity of the studied dataset, only monoprotic ali-phatic carboxylic acids with two to six carbon atoms havebeen considered. They contain different substituents suchas halogens, OH, CN groups, and in some cases phenyland vinyl groups. The number of acids in each solvent isvariable: water (43), methanol (37), ethanol (11), isopropa-nol (6), tert-butanol (6), DMSO (12), AN (12), NM (9),DMF (7), Ac (5), and DMA (5). Obviously, there aremany more acids with the pKa values determined in water,but in order not to decompensate the number of experi-mental values in both types of solvents, no more acidshave been considered in the analysis; if the number of val-ues in one or two solvents is much larger than in the othersolvents the derived models will predict, probably, incor-rectly the pKa of these last ones. Summing up, two thirdsof the values correspond to protic solvents and the otherthird to aprotic ones. The pKa values ranged from 0.7 forthe trichloroacetic acid in water to 22.73 for the butanoicacid in AN, with a mean value of 9.22. Table 1 contains allthe experimental and predicted pKa values.

pKb values were used, instead of pKa ones, in the ani-lines set treatment due to the fact that CODESSA pro-gram does not work properly when applied to the calcula-tion of structural descriptors of ionic species. The pKb val-ues have been calculated from the pKa of the correspond-ing anilinium cations determined by several authors [16,17, 26, 28 – 34] and the corresponding autoionization con-stant, Ksolv, of each solvent (pKaþpKb¼pKsolv). The valuesof the pKsolv used for these calculations are: water (14.0),methanol (17.2), NM (24.0), AN (33.3), Ac (32.5), DMSO(33.3), and DMF (29.4). The full set of anilines is com-

posed of 108 pKb values corresponding to 41 compoundsin seven solvents. Analogously to the carboxylic acids,there are more pKb values of anilines determined in waterbut, in order to maintain some similarity between thenumber of values in protic and aprotic solvents, they havenot been added. The number of values in each solvent is:water (28), methanol (26), NM (24), AN (11), Ac (8),DMSO (6), and DMF (5); that is, nearly the half of the ex-perimental values are in protic and the other half in aprot-ic solvents. Besides the aniline itself, the set contains 26mono-, 3 di-, and 3 tri-substituted compounds; there arealso values for eight different N-alkyl substituted anilines.Among all these values, 25 correspond to anilines with atleast one substituent in the ortho-positions of the ring and83 to compounds without substituents in these positions.The group substituents are the most usual such as halo-gens, OH, CN, and NO2. The range of pKb values is large,from 7.43 for the N,N-diethylaniline in water to 30.8 forthe 2,6-dichloro-4-nitroaniline in AN. Table 2 contains allthe experimental and predicted pKb values.

The full dataset of the experimental pK values was di-vided in the three subsets: training, prediction, and valida-tion sets. This selection guarantees that the three subsetscontain values in all the solvents studied and with all thetypes of substituents. The training set, formed by the 70%of data, is used exclusively to derive the models; the pre-diction set which contains the 20% of the values, that werenot included in the model development, is used to probethe predictive ability of the model; and the validation setwith the remaining 10% of the experimental values, isused in the training of the neural networks (see below).

2.2 Solute Descriptors

The structural descriptors of the carboxylic acids and theanilines were derived by the CODESSA program [35].Previously, the molecular geometries were fully optimized,without symmetry restrictions, using the semi-empiricalmethod AM1 [36] implemented in MOPAC program [37].For each molecule, frequency calculations were performedto ensure that all the calculated geometries correspond toglobal minima. The MOPAC output files were sent to CO-DESSA to calculate several hundreds of molecular de-scriptors which can be classified in six general classes: con-stitutional, topological, geometrical, electrostatic, quan-tum-chemical, and Charge Partial Surface Area (CPSA).

2.3 Solvent Descriptors

The solvents were characterized as a bulk by several physi-cal properties and by the parameters of some solvent po-larity scales. The following 11 physical properties havebeen considered: molecular weight (M), density at 25 8C(d), molar volume (V), refractive index (n), dielectric con-stant or relative permittivity (eT), dipole moment (m), po-larizability (ALFA), refractivity (c), standard molar vapor-

QSAR Comb. Sci. 27, 2008, No. 10, 1204 – 1215 www.qcs.wiley-vch.de � 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim 1205

QSPR Prediction of pK for Aliphatic Carboxylic Acids and Anilines in Different Solvents

Page 3: QSPR Prediction of pK for Aliphatic Carboxylic Acids and Anilines in Different Solvents

1206 � 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.qcs.wiley-vch.de QSAR Comb. Sci. 27, 2008, No. 10, 1204 – 1215

Table 1. Experimental and calculated pKa values for the carboxylic acids set.

No. Compound Solvent Experimental Calculated

1 2-(3-Chlorophenyl)acrylic acida) Water 4.29 4.062 2-(3-Chlorophenyl)acrylic acida) MeOH 9.06 9.143 2-(3-Nitrophenyl)acrylic acida) Water 4.12 3.904 2-(3-Nitrophenyl)acrylic acida) MeOH 9.89 9.585 2-(4-Nitrophenyl)acrylic acida) Water 4.05 3.776 2-(4-Nitrophenyl)acrylic acida) MeOH 8.83 9.227 2,2-Dichloroacetic acida) iPrOH 7.80 7.688 2,2-Dichloroacetic acida) AN 15.81 15.579 2,2-Dichloroacetic acida) DMA 6.10 6.16

10 2,2-Dichloroacetic acida) DMF 7.90 7.8111 2,2-Dichloroacetic acida) DMSO 6.36 6.1612 2,2-Dichloroacetic acida) EtOH 7.30 7.0613 2,2-Dichloroacetic acida) Water 1.34 1.6714 2,2-Dichloroacetic acida) MeOH 6.38 6.4715 2,2-Dichloroacetic acida) NM 8.88 9.2116 2,2-Dichloroacetic acida) tBuOH 10.20 10.4417 2,3-Dibromopropanoic acida) iPrOH 8.88 8.9318 2,3-Dibromopropanoic acida) AN 17.10 17.2219 2,3-Dibromopropanoic acida) DMSO 7.13 7.5120 2,3-Dibromopropanoic acida) MeOH 7.38 7.5721 2,3-Dibromopropanoic acida) tBuOH 11.70 11.7022 2,3-Dibromopropanoic acidc) Water 2.36 2.7023 2,3-Dichloropropanoic acida) MeOH 7.50 7.5424 2,3-Dichloropropanoic acidb) Water 2.85 2.6925 2-Bromopropanoic acida) Water 3.00 3.0126 2-Bromopropanoic acidb) MeOH 8.22 8.0227 2-Chlorobutanoic acida) MeOH 8.11 8.0328 2-Chlorobutanoic acidc) Water 2.92 3.0329 2-Chloropropanoic acida) Water 2.90 3.0830 2-Chloropropanoic acidc) MeOH 8.06 8.1031 2-Phenylpropanoic acida) MeOH 9.31 9.4332 2-Phenylpropanoic acidb) Water 4.44 4.2633 2-Fluoroacrylic acida) Water 2.57 2.2934 2-Propenoic acida) Water 4.25 4.4035 2-Propenoic acidc) MeOH 9.27 9.5736 3-(2-Furyl)acrylic acida) Water 4.29 4.4637 3-Bromobutanoic acida) MeOH 9.12 9.1638 3-bromobutanoic acidb) Water 4.01 3.9239 3-Bromopropanoic acida) Water 4.04 3.8840 3-Bromopropanoic acida) MeOH 9.00 8.9641 3-Chlorobutanoic acida) Water 4.17 3.9442 3-Chlorobutanoic acidc) MeOH 9.18 9.1643 3-Chloropropanoic acida) MeOH 9.18 9.0044 3-Chloropropanoic acidc) Water 4.09 3.9545 3-Phenylpropanoic acida) Water 4.45 4.4646 3-Phenylpropanoic acidb) NM 14.37 14.6047 3-Hydroxypropanoic acida) Water 4.51 4.4148 3-Hydroxypropanoic acidb) MeOH 9.42 9.5349 3-Iodopropanoic acida) Water 4.05 3.6850 3-Iodopropanoic acida) MeOH 8.89 8.8351 Acetic acida) iPrOH 11.30 11.4952 Acetic acida) Ac 18.33 18.2353 Acetic acida) DMF 13.25 13.4554 Acetic acida) DMSO 12.30 12.5155 Acetic acida) EtOH 10.30 10.6456 Acetic acida) Water 4.75 4.7457 Acetic acida) MeOH 9.70 9.8058 Acetic acida) NM 14.41 14.7659 Acetic acidb) DMA 12.60 12.5160 Acetic acidb) tBuOH 14.20 13.9561 Acetic acidc) AN 22.30 22.16

Full Papers Jesus Jover et al.

Page 4: QSPR Prediction of pK for Aliphatic Carboxylic Acids and Anilines in Different Solvents

QSAR Comb. Sci. 27, 2008, No. 10, 1204 – 1215 www.qcs.wiley-vch.de � 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim 1207

Table 1. (cont.)

No. Compound Solvent Experimental Calculated

62 Bromoacetic acida) Water 2.90 2.9863 Bromoacetic acidc) MeOH 8.06 7.9664 Bromoacetic acidc) NM 11.70 11.5465 Butanoic acida) AN 22.73 22.5366 Butanoic acida) EtOH 10.70 10.8367 Butanoic acida) MeOH 9.68 9.8968 Butanoic acidb) Water 4.82 4.7369 Butanoic acidc) iPrOH 12.20 11.8670 Butanoic acidc) DMSO 12.86 12.5571 Cyanoacetic acida) AN 18.04 17.9572 Cyanoacetic acida) Water 2.46 2.6973 Cyanoacetic acida) MeOH 7.50 7.5774 Cyanoacetic acida) NM 10.68 10.6475 Cyanoacetic acidb) EtOH 8.00 8.3276 Cyanoacetic acidc) DMSO 8.50 8.1177 Cyanoacetic acidc) tBuOH 12.60 12.4178 Chloroacetic acida) iPrOH 9.20 9.2079 Chloroacetic acida) AN 18.80 19.1780 Chloroacetic acida) DMSO 8.90 8.7081 Chloroacetic acida) EtOH 8.20 8.5482 Chloroacetic acida) Water 2.85 2.9583 Chloroacetic acida) MeOH 7.88 7.9184 Chloroacetic acida) NM 11.62 11.5485 Chloroacetic acida) tBuOH 12.20 11.8386 Chloroacetic acidc) DMA 8.75 8.5587 Chloroacetic acidc) DMF 10.13 9.7588 Phenylacetic acida) Ac 17.17 17.2289 Phenylacetic acida) DMF 13.50 13.2290 Phenylacetic acida) EtOH 10.20 10.3491 Phenylacetic acida) Water 4.31 4.2092 Phenylacetic acida) MeOH 9.43 9.4193 Phenylacetic acidb) AN 20.73 21.1094 Phenylacetic acidb) DMSO 11.60 11.6795 Phenylacetic acidc) iPrOH 10.80 11.1796 Phenylacetic acidc) NM 13.80 14.1397 Phenoxyacetic acida) iPrOH 9.50 9.8598 Phenoxyacetic acida) Ac 15.28 15.4299 Phenoxyacetic acida) AN 20.12 20.22

100 Phenoxyacetic acida) DMSO 9.71 9.52101 Phenoxyacetic acida) Water 3.17 3.24102 Phenoxyacetic acida) NM 12.60 12.35103 Phenoxyacetic acidb) DMF 11.17 11.21104 Fluoroacetic acida) Water 2.82 2.81105 Fluoroacetic acida) MeOH 7.99 7.69106 Hexanoic acida) MeOH 9.90 9.52107 Hexanoic acidc) Water 4.83 4.60108 Hydroxyacetic acida) AN 19.30 19.22109 Hydroxyacetic acida) DMA 9.85 9.86110 Hydroxyacetic acida) DMSO 10.20 9.86111 Hydroxyacetic acida) EtOH 9.27 9.17112 Hydroxyacetic acida) MeOH 8.68 8.55113 Hydroxyacetic acidc) Water 3.85 3.52114 Iodoacetic acida) Water 3.13 3.15115 Iodoacetic acida) MeOH 8.38 8.24116 Isobutanoic acida) DMF 14.05 14.11117 Isobutanoic acida) DMSO 12.79 12.68118 Isobutanoic acida) Water 4.88 4.80119 Isobutanoic acida) MeOH 9.90 9.94120 Isobutanoic acidb) Ac 18.90 18.69121 Isobutanoic acidc) AN 22.20 22.32122 Isohexanoic acida) Water 4.80 5.01

QSPR Prediction of pK for Aliphatic Carboxylic Acids and Anilines in Different Solvents

Page 5: QSPR Prediction of pK for Aliphatic Carboxylic Acids and Anilines in Different Solvents

ization enthalpy at 258 (DvapH0), standard internal energy

of vaporization (DvapU0), and Hildebrand�s solubility pa-

rameter (d). The polarity solvent scales used are the mostpopular, including: a, b, and p* of Kamlet and Taft [38 –40]; ET (30) of Dimroth and Reichardt [41]; donor and ac-ceptor numbers of Gutmann [42, 43]; polarity and acidity/basicity parameters of Koppel and Palm [44], and others[45 – 55]. In total 25 descriptors of this kind were found foreach solvent.

The heuristic procedures incorporated into the CODES-SA program were used to make the first reduction in thepool of descriptors with the compounds forming the train-ing set. This process eliminates all the incomplete and in-variant descriptors, as well as the ones with a correlationcoefficient below 0.01. Descriptors correlated (above 0.95)with another descriptor which has higher regression coeffi-cient were also deleted. The initial pool of descriptors

(�700) formed by the molecular descriptors of the solutesand the specific descriptors of the solvents was reduced to165 and 180 for the carboxylic acids and the anilines sets,respectively.

2.4 CNN Methods (ADAPT)

The computations were performed with the ADAPT (Au-tomated Data Analysis and Pattern recognition Toolkit)program [56, 57] including feature selection routines (ge-netic algorithm [58] and simulated annealing [59]) andCNN procedures [60]. The CNNs used are three-layer,fully connected, feed-forward networks, that were em-ployed in our previous papers on the pKa estimation ofphenols [4] and benzoic acids [5], and have been describedin detail by Jurs and coworkers [60, 61]. The number ofneurons of the input layer corresponds to the number of

1208 � 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.qcs.wiley-vch.de QSAR Comb. Sci. 27, 2008, No. 10, 1204 – 1215

Table 1. (cont.)

No. Compound Solvent Experimental Calculated

123 Isohexanoic acidc) MeOH 9.88 10.13124 Isopentanoic acida) iPrOH 12.30 12.08125 Isopentanoic acida) EtOH 10.80 10.96126 Isopentanoic acida) Water 4.74 4.90127 Isopentanoic acidc) MeOH 9.81 10.03128 Lactic acida) Water 3.73 3.55129 Lactic acida) MeOH 8.81 8.62130 Mandelic acida) DMSO 8.79 9.17131 Mandelic acida) Water 3.37 3.21132 Mandelic acidc) MeOH 8.40 8.27133 Mercaptoacetic acida) MeOH 8.52 8.60134 Mercaptoacetic acidc) Water 3.73 3.51135 Methacrylic acida) Water 4.16 4.02136 Pentanoic acida) Water 4.25 4.42137 Pentanoic acida) Water 4.70 4.79138 Pentanoic acida) MeOH 9.86 9.96139 Pentanoic acidb) iPrOH 12.30 12.17140 Pentanoicc) EtOH 10.70 10.92141 Propanoic acida) iPrOH 12.20 12.19142 Propanoic acida) AN 22.04 22.16143 Propanoic acida) DMA 12.05 12.24144 Propanoic acida) DMF 13.95 13.63145 Propanoic acida) DMSO 12.52 12.24146 Propanoic acida) EtOH 10.60 10.67147 Propanoic acidb) MeOH 9.71 9.75148 Propanoic acidc) Ac 18.74 18.44149 Propanoic acidc) Water 4.88 4.59150 Trichloroacetic acida) AN 10.57 10.58151 Trichloroacetic acida) Water 0.70 0.96152 Trichloroacetic acida) MeOH 4.90 4.85153 Trichloroacetic acida) NM 7.27 7.18154 Trichloroacetic acida) tBuOH 8.90 8.87155 Vinylacetic acida) iPrOH 11.20 11.30156 Vinylacetic acida) Water 4.34 4.31157 Vinylacetic acida) MeOH 9.50 9.50158 Vinylacetic acidc) EtOH 10.20 10.40

a Training set.b Validation set.c Prediction set.

Full Papers Jesus Jover et al.

Page 6: QSPR Prediction of pK for Aliphatic Carboxylic Acids and Anilines in Different Solvents

QSAR Comb. Sci. 27, 2008, No. 10, 1204 – 1215 www.qcs.wiley-vch.de � 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim 1209

Table 2. Experimental and calculated pKb values for the anilines set.

No. Compound Solvent Experimental Calculated

1 2,6-Dichloro-4-nitroanilinea) AN 30.80 30.802 2,6-Dimethylanilinea) Water 10.11 9.443 2,6-Dimethylanilinea) MeOH 11.70 11.504 2-Aminoanilinea) DMF 24.21 24.195 2-Aminoanilinea) DMSO 28.50 28.666 2-Bromoanilineb) Water 11.47 11.577 2-Bromoanilinec) MeOH 13.74 14.128 2-Chloroanilinea) Water 11.33 11.389 2-Chloroanilinea) MeOH 13.49 13.69

10 2-Chloroanilinec) NM 17.54 17.1811 2-Hydroxyanilinea) NM 14.15 14.3912 2-Methylanilinea) MeOH 11.25 11.4313 2-Methylanilinea) NM 14.95 14.9214 2-Methylanilinec) Water 9.57 9.4015 2-Methoxyanilinea) NM 14.68 14.3216 2-Nitroanilinea) Ac 28.73 28.7117 2-Nitroanilinea) Water 14.25 14.2318 2-Nitroanilinea) MeOH 17.00 17.3419 2-Nitroanilinec) AN 28.45 28.7220 3,5-Dimethoxyanilinea) Water 10.14 9.9721 3,5-Dimethoxyanilinea) MeOH 11.54 11.5522 3-Aminoanilinea) NM 13.87 13.8223 3-Aminoanilinec) DMF 24.13 24.4324 3-Bromoanilinea) Water 10.44 10.3725 3-Bromoanilinea) MeOH 12.78 12.5526 3-Chloroanilinea) Water 10.49 10.3827 3-Chloroanilinea) NM 16.26 16.0128 3-Chloroanilinec) MeOH 12.68 12.5029 3-Ethoxyanilinea) MeOH 11.83 11.5530 3-Ethoxyanilineb) Water 9.82 9.7531 3-Fluoroanilinea) MeOH 12.60 12.4232 3-Fluoroanilinea) NM 16.18 15.9533 3-Fluoroanilinec) Water 10.50 10.4734 3-Hydroxyanilinea) Water 9.75 9.7635 3-Hydroxyanilineb) MeOH 11.21 11.4936 3-Methylanilinea) MeOH 11.11 11.3137 3-Methylanilinea) NM 14.30 14.8838 3-Methylanilineb) DMSO 29.44 29.4239 3-Methylanilinec) Ac 26.35 26.2040 3-Methylanilinec) Water 9.31 9.3341 3-Methoxyanilinea) DMSO 30.01 29.8742 3-Methoxyanilinea) Water 9.80 9.6743 3-Methoxyanilinec) MeOH 11.16 11.3944 3-Nitroanilinea) Ac 27.53 27.6445 3-Nitroanilinea) AN 25.70 25.7146 3-Nitroanilinea) DMF 26.66 26.6847 3-Nitroanilinea) Water 11.49 11.6648 3-Nitroanilinea) MeOH 13.74 14.0149 3-Nitroanilinec) NM 17.38 17.7450 4,5-Dichloro-2-nitroanilinea) AN 30.20 30.2051 4,5-Dimethyl-2-nitroanilinea) AN 27.65 27.6552 4-Aminoanilinea) DMF 22.77 22.7853 4-Aminoanilineb) NM 12.15 12.5054 4-Benzylanilinea) Water 9.22 9.2055 4-Benzylanilinea) MeOH 11.22 11.0256 4-Bromoanilinea) Water 10.12 10.4457 4-Bromoanilinea) MeOH 12.36 12.5258 4-Bromoanilinec) NM 16.03 16.2659 4-Bromo-N,N-dimethylanilinea) NM 14.28 14.1160 4-Chloro-2-nitroanilinea) Water 15.07 15.0761 4-Chloro-2-nitroanilinea) MeOH 17.87 17.53

QSPR Prediction of pK for Aliphatic Carboxylic Acids and Anilines in Different Solvents

Page 7: QSPR Prediction of pK for Aliphatic Carboxylic Acids and Anilines in Different Solvents

descriptors in the model. The number of hidden neuronscontrols the flexibility of the network and has to be adjust-ed until the optimal network architecture is achieved. Theoptimization of the number of neurons in these two firstlayers is done by means of a building up procedure thatconsists in starting with a low number of neurons and in-creasing it one unit until the results achieved with the newarchitecture are not better than those obtained with the

previous one. The output layer contains one neuron repre-senting the pK value.

The descriptors retained after the heuristic proceduresof CODESSA were imported to ADAPT, where they weresubjected to the objective feature selection routines of thisprogram for the compounds in the training set. In thiscase, only identical test and intercorrelation of descriptorsare taken into account, both with a cut-off value of 0.9.

1210 � 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.qcs.wiley-vch.de QSAR Comb. Sci. 27, 2008, No. 10, 1204 – 1215

Table 2. (cont.)

No. Compound Solvent Experimental Calculated

62 4-Chloro-2-nitroanilineb) AN 29.10 28.5163 4-Chloro-2-nitro-N-methylanilinea) AN 29.53 29.5464 4-Chloroanilinea) Water 10.02 10.0365 4-Chloroanilinea) NM 16.01 15.6566 4-Chloroanilineb) MeOH 12.25 11.9667 4-Chloroanilinec) DMSO 30.51 30.7568 4-Cyanoanilinea) NM 18.28 18.2869 4-Ethoxyanilinea) MeOH 10.28 10.1170 4-Ethoxyanilinec) Water 8.68 8.7371 4-Hydroxyanilinea) Water 8.35 8.7572 4-Hydroxyanilinea) MeOH 9.79 10.1273 4-Hydroxyanilinea) NM 13.09 13.4874 4-Methylanilinea) Ac 25.93 25.9775 4-Methylanilinea) AN 22.05 22.0676 4-Methylanilinea) DMSO 28.96 28.9477 4-Methylanilinea) Water 8.92 9.2178 4-Methylanilinea) MeOH 10.63 11.0679 4-Methylanilinea) NM 14.20 14.6180 4-Methoxyanilinea) Water 8.64 8.7381 4-Methoxyanilinea) NM 13.72 13.3682 4-Methoxyanilinec) MeOH 10.31 10.0983 4-Nitroanilinea) MeOH 15.65 15.6684 4-Nitroanilinea) NM 18.82 18.9485 4-Nitroanilineb) Water 13.00 12.9286 4-Nitroanilinec) Ac 28.48 28.2987 Anilinea) Ac 26.58 26.4288 Anilinea) AN 22.74 22.8489 Anilinea) DMF 25.04 25.0490 Anilinea) DMSO 29.70 29.6391 Anilinea) Water 9.40 9.3892 Anilinea) NM 14.93 15.0193 Anilinec) MeOH 11.15 11.4494 N,N-Diethylanilinea) Ac 23.33 23.3695 N,N-Diethylanilinea) Water 7.43 7.3696 N,N-Diethylanilinea) MeOH 10.28 10.0997 N,N-Diethylanilinea) NM 11.89 12.4098 N,N-Dimethylanilinea) AN 21.87 21.7999 N,N-Dimethylanilinea) Water 8.93 8.24100 N,N-Dimethylanilineb) Ac 25.26 24.89101 N,N-Dimethylanilinec) NM 12.96 12.59102 N-Ethylanilineb) NM 13.82 14.08103 N-Isopropylanilinea) MeOH 10.70 10.46104 N-Isopropylanilinec) Water 8.23 8.73105 N-Methyl-3-nitroanilinea) NM 17.41 17.22106 N-Methylanilinea) Water 9.15 8.69107 N-Methylanilinea) NM 14.20 13.75108 N-Methylanilinec) AN 22.55 22.25

a Training set.b Validation set.c Prediction set.

Full Papers Jesus Jover et al.

Page 8: QSPR Prediction of pK for Aliphatic Carboxylic Acids and Anilines in Different Solvents

The new reduced pools contain 99 (11 of the solvents) and90 (10 of the solvents) descriptors for the carboxylic acidsand the anilines, respectively. These descriptors were usedas the starting point in the non-linear selection of the mod-els.

Fully CNNs were developed using an automatic geneticalgorithm descriptor selection routine with a CNN forevaluating the fitness of each subset of descriptors select-ed. The fitness of descriptor subsets was calculated asCOST¼TSETþ0.4 jTSET – VSET j, where TSET andVSET denote rms errors for the training and validationsets, respectively. Models chosen with this quality factorperformed better than models chosen with just training setrms error as the quality factor. That is, CNNs that producetraining and validation set errors that are low and similarin magnitude tend to perform well in predicting propertiesof interest for compounds not used in the training process.A quasi-Newton method Broyden – Flectcher – Golfarb –Shanno (BFGS) [61] is used to train the network.

The training, prediction, and validation sets for the car-boxylic acids data accounts for 114, 28, and 16 pKa values,respectively. The best results were obtained with a 7-5-1architecture. For the anilines, the training, prediction, andvalidation sets contain 78, 20, and 10 pKb values, respec-tively; the best architecture found was 7-4-1. The ten mod-els derived by ADAPT were analyzed and evaluated bythe external prediction set and the best one was selected.

3 Results and Discussion

Table 3 shows the descriptors contained in the proposedmodel for the carboxylic acids set; five descriptors corre-spond to the solutes and the other two to the solvents. Ofthe five solute descriptors two belong to the quantumtype, and the other three are topological, electrostatic, andCPSA, respectively. The topological descriptor, averageinformation content (order 0), is derived from the infor-mation theory of Shannon [62]; it is a measure of the atom-ic diversity of the compound and can be related to the mo-lecular symmetry. The electrostatic descriptor is the maxi-mum partial charge of a hydrogen atom, it corresponds tothe hydrogen atom of the carboxylic group and the chargeis calculated by the Zefirov method [63], based on theelectronegativity scale of Sanderson. The quantum de-scriptors are the total dipole moment of the molecules andthe minimum electron – electron repulsion energy of anO�H bond; both are calculated by the semi-empiricalAM1 method. The fifth solute descriptor belongs to theCharged Partial Surface Area (CPSA) [64] type. These de-scriptors combine shape and electronic data and encodeinformation related to the polar intermolecular interac-tions. The FNSA-3 descriptor, fractional partial negativesurface area, is calculated as

FNSA� 3 ¼

P

AqASA

TMSA

where qA are the negative partial atomic charges, SA thecorresponding negatively charged solvent-accessible atom-ic surface area, and TMSA is the total molecular surfacearea. The other two descriptors of the model belong to thesolvents. The Kamlet – Taft hydrogen-bond donation abili-ty a [38] is one of the most used scales to describe the hy-drogen bond capacity of the solvents; it is based on solva-tochromic parameters and measures the tendency to do-nate hydrogen bonds of the solvent to the solute mole-cules. The last descriptor is the standard internal energy ofvaporization, DvapU

0, it measures the energy necessary tobring the molecules of the liquid from their equilibriumdistances to an infinite distance. The statistical results ob-tained with this model are very good, the coefficient of de-termination, R2, is 0.998 for the training and the predictionsets, and 0.999 for the validation set; the rms error is 0.19,0.25, and 0.19 for the training, prediction and validationsets, respectively. Figure 1 shows the plot of the calculatedversus experimental pKa values for all the carboxylic acids.Table 1 shows the experimental and calculated pKa values;the rms errors for all the 158 entries is 0.20, there is no en-try with a residual larger than twice this error. To ensurethat this model is not due to chance effects, Y-randomiza-tion experiments were conducted. The average R2 and therms error for the ten experiments performed are 0.17 and4.3, respectively, showing that chance correlation did notplay any significant role in this study. We have repeatedthe neural network analysis five more times varying ineach case the splitting of the original dataset into training,validation, and prediction subsets. The statistical parame-ters obtained in each case are very similar; the average val-ues for R2 and rms error are 0.999 and 0.23, respectivelyshowing the goodness of the proposed model. Table 4shows the statistical parameters for the subsets of pKa val-ues for each solvent; these results, with R2 and rms errorsvery similar between them and also similar to the ones ofthe complete set, prove the robustness of the model.

The descriptors forming the model for the anilines setare shown in Table 5. Analogously to the carboxylic acidsset, there are five solute descriptors and two solvent de-scriptors. Among the solute descriptors there is one of theCPSA type and one electrostatic; the other three belong tothe quantum type. The CPSA descriptor, FPSA-3, is verysimilar to that found in the carboxylic acids model, butnow it is computed from the positive partial charges. Theelectrostatic descriptor is the minimum partial charge of anitrogen atom, the charge has been calculated by the Ze-firov method [63], and the nitrogen atom is always the oneof the amino group. The quantum descriptors, obtained di-rectly from the AM1 method, are the energy of theHOMO orbital, which has in all cases a contribution fromthe nitrogen atomic orbitals, the maximum index of nucle-

QSAR Comb. Sci. 27, 2008, No. 10, 1204 – 1215 www.qcs.wiley-vch.de � 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim 1211

QSPR Prediction of pK for Aliphatic Carboxylic Acids and Anilines in Different Solvents

Page 9: QSPR Prediction of pK for Aliphatic Carboxylic Acids and Anilines in Different Solvents

ophilic reactivity of a nitrogen atom and the maximum ex-change energy of a C�N bond. Thus, these three quantumdescriptors are also clearly related to the amino group,which participates directly on the acceptation process ofprotons. The two solvent descriptors, the hydrogen-bonddonation ability of Kamlet and Taft, a, and the standardinternal energy of vaporization, DvapU

0, are the same asfound in the analysis of the carboxylic acids set. For theanilines set, the statistical parameters of the correlationare also very good, R2 is 0.999 for the training, prediction,and validation sets, and the rms errors are: 0.24, 0.28, and0.26, respectively for these three sets. Figure 2 shows theplot of the calculated versus experimental pKb values forthe three sets. The Y-randomization experiments done al-low to reject possible chance effects in the derivation ofthe model, since the average R2 and rms errors obtainedfor ten scrambles are 0.24 and 6.5, respectively. We havealso done the neural network analysis five more timesvarying the composition of the training, prediction, andvalidation sets; the average values for R2 and rms errorsare 0.998 and 0.29, confirming again the goodness of themodel. Table 2 shows the experimental and calculated pKb

values; the rms error for all the 108 entries is 0.25; thereare five cases with residuals larger than twice this error,shown in Table 6. These five values correspond to five dif-ferent anilines, two of them have substituents in ortho-po-sition, and other two are N-alkyl substituted; the solventsare NM, AN, and water, two residuals are negative andthe other three are positive. All these facts prove thatthere is no structural reason associated with these largerprediction errors. Analogously to the carboxylic acids set,the statistics for the subsets shown in Table 7, proves therobustness of the proposed model.

1212 � 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.qcs.wiley-vch.de QSAR Comb. Sci. 27, 2008, No. 10, 1204 – 1215

Table 3. Descriptors forming the model for the carboxylic acidsset.

Descriptor Type

Average information content (order 0) (solute) TopologicMaximum partial charge of an hydrogen atom(solute)

Electrostatic

FNSA-3 (solute) CPSADipole moment (solute) QuantumMinimum electron-electron repulsion of a H-Obond (solute)

Quantum

Hydrogen-bond donation ability (Kamlet andTaft), a (solvent)

DvapU0 (solvent) –

Table 4. Statistics of pKa estimations for different subsets of car-boxylic acids.

Subset n R2 rms error

Protic 108 0.996 0.19Aprotic 50 0.998 0.23Water 43 0.965 0.17Methanol 37 0.976 0.17Ethanol 11 0.982 0.19Isopropanol 11 0.981 0.23Tert-butanol 6 0.991 0.19DMSO 12 0.985 0.28DMF 7 0.992 0.24AN 12 0.997 0.21Ac 5 0.997 0.08NM 9 0.992 0.24DMA 5 0.997 0.17

Figure 1. Plot of calculated versus experimental pKa values forthe training, prediction, and validation sets of the carboxylicacids.

Figure 2. Plot of calculated versus experimental pKb values forthe training, prediction, and validation sets of the anilines.

Full Papers Jesus Jover et al.

Page 10: QSPR Prediction of pK for Aliphatic Carboxylic Acids and Anilines in Different Solvents

There is a simple method which measures the relative im-portance of the descriptors in neural network-derived mod-els [65]. The first input descriptor is randomly scrambledand then the property is predicted again by the model; obvi-ously, a worse correlation between descriptor values andproperty values is obtained, and the rms error for the newprediction is larger than the rms error of the model, the so-called base rms error. The difference between the scram-bled rms error and the base rms error gives the importanceof the descriptor to the predictive capacity of the model;bigger differences are associated to more important descrip-tors. For both sets of compounds studied, the two most sig-nificant descriptors are the solvent descriptors: a hydrogen-bond donation ability and DvapU

0; for the carboxylic acids

set, the third most significant descriptor is the maximumpartial charge of a hydrogen atom, and for the anilines set isthe CPSA descriptor, FPSA-3.

The reaction analyzed here is very complicated and canbe considered as the result of several processes such as thedissolution of the chemicals in the corresponding solventand their posterior dissociation to establish an equilibriumbetween the remaining non-dissociated solvated acid mol-ecules and the solvated conjugated base and protonformed. On the other hand, the solute/solvent interactionspresent in all these processes are also very complex, mak-ing rather difficult to give an explanation on the role ofthe descriptors on the value of the pK. These solute/sol-vent interactions are usually divided [66] in two generaltypes, the non-specific interactions, which include solventpolarization in the field of the solute molecule, isotropicdispersion forces, and solute cavity formation; and the spe-cific interactions, which mainly involve, the chemicalbonds formed between the solute and the solvent mole-cules in the solution. In spite of this complexity, and takinginto account that, in general, the models developed withneural networks do not give a clear interpretation of thedescriptors on the property studied, it is possible to findsome physicochemical meaning into the descriptors of theproposed models. Thus, the internal energy of vaporizationor cohesive energy, DvapU

0, indicates the energy involvedin the dissolution process and it is clearly related to thenon-specific interactions, while the solvent acidity a de-scriptor, explains the specific interaction due to the hydro-gen bond interactions formed between the solvent and thesolute molecules. On the other hand, the solute descriptorsare more clearly involved in non-specific interactions; inthe carboxylic acids set, the FNSA-3 and the dipole mo-ment explain polar solute/solvent interactions, while themaximum partial charge of a hydrogen atom and the mini-mum electron – electron repulsion energy of an O�H bondreflect the facility of dissociation of the O�H bond. In theanilines set, the solute descriptors, energy of the HOMOorbital, and the maximum index of nucleophilic reactivityof a nitrogen atom are related to the specific Lewis acid –base interactions; while the other three solute descriptors:the minimum partial charge of a nitrogen atom, the CPSAdescriptor FPSA-3, and the maximum exchange energy ofa C�N bond, correspond to polar non-specific interactions.The statistical parameters obtained with our models, R2�0.99 and rms error �0.25 pK units; are better than thosereported for aqueous solutions [6 – 9] indicated before;only the results from the application of Hammet – Taft andDrago models [10] in methanol give comparable predic-tions with the results reported here.

4 Conclusions

The results described here on the pK estimation of the car-boxylic acids and anilines in different solvents, jointly with

QSAR Comb. Sci. 27, 2008, No. 10, 1204 – 1215 www.qcs.wiley-vch.de � 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim 1213

Table 5. Descriptors forming the model for the anilines set.

Descriptor Type

Minimum partial charge for a N atom (solute) ElectrostaticFPSA-3 (solute) CPSAHOMO energy (solute) QuantumMaximum nucleophilic reaction index for a Natom (solute)

Quantum

Maximum exchange energy for a C�N bond (so-lute)

Quantum

Hydrogen-bond donation ability (Kamlet andTaft), a (solvent)

DvapU0 (solvent) –

Table 6. Predictions with residuals larger than twice the rms er-ror, for the anilines.

Aniline Entry Solvent pKb

exp.pKb

calc.Residual

N,N-Diethylaniline 97 NM 11.89 12.40 �0.513-Methylaniline 37 NM 14.30 14.88 �0.584-Chloro-2-nitroaniline 62 AN 29.10 28.51 0.592,6-Dimethylaniline 2 Water 10.11 9.44 0.67N,N-Dimethylaniline 99 Water 8.93 8.24 0.69

Table 7. Statistics of pKb estimations for different subsets of ani-lines.

Subset n R2 rms error

Ortho 25 0.999 0.27Non-ortho 83 0.999 0.25Protic 54 0.986 0.26Aprotic 54 0.998 0.25Non-N-substituted 91 0.999 0.23N-substituted 17 0.998 0.33Water 28 0.977 0.27Methanol 26 0.984 0.26DMSO 6 0.961 0.16DMF 5 0.991 0.16AN 11 0.996 0.24Ac 8 0.992 0.17NM 24 0.971 0.32

QSPR Prediction of pK for Aliphatic Carboxylic Acids and Anilines in Different Solvents

Page 11: QSPR Prediction of pK for Aliphatic Carboxylic Acids and Anilines in Different Solvents

those reported previously [4, 5] for phenols and benzoicacids, prove the high capacity of the QSPR methodologyin the analysis of multicomponent systems properties. Forthese four families of organic compounds, the proposedmodels contain seven descriptors, five of the solutes andtwo of the solvents. The two descriptors of the solvents:hydrogen-bond donation abilitya, of the Kamlet – Taftscale, and the standard internal energy of vaporization,DvapU

0, are the same in all the cases; these two descriptorsare also the most significant in all the models. The five de-scriptors of the solutes are similar in all the models and,mainly, they are of electrostatic, quantum, and CPSA type.The statistical results of the correlations are very good inall cases, with R2�0.98 and rms error �0.3 pK units, theyare better than those reported for the pK estimation in wa-ter for these families of organic compounds. The modelsare also very robust. These models contain descriptors thatallow the analysis of the pK values in the different sol-vents, according to the general explanation of the solute –solvent interactions, based on specific and non-specific ef-fects. Thus, the descriptors the hydrogen-bond donationability a, of the solvent, the HOMO energy (present in theanilines set), and the LUMOþ1 energy (present in thephenols set) of the solutes explain specific solute – solventinteractions. Other descriptors encode information relatedto the non-specific interactions. Thus, the descriptors: ofthe CPSA type (present in the four families) and others asthe dipole moment (present in the carboxylic acids set)and the polarizability of the solutes (present in the phenolsset), reflect polar intermolecular interactions. It is also re-markable to note that the solute descriptors are localizedmainly on the atoms of the functional groups that sufferfrom the proton dissociation (carboxylic�COOH) or asso-ciation (amino �NH2). Thus, the maximum partial chargefor a hydrogen atom (present in the phenols, benzoics, andcarboxylic acids sets), the minimum resonance energy foran O�H bond (present in the benzoics set), and the mini-mum electron – electron repulsion energy of an O�H bond(present in the carboxylic acids set), are clearly related tothe energy of the O�H bond broken in the dissociationprocess. Other descriptors such as, maximum e – n attrac-tion energy for a C�O bond (present in the phenols set),maximum e – e repulsion for an oxygen atom (present inthe benzoics set), and minimum valency of a carbon atom(present in the benzoics set), affecting the atoms of thecarboxylic group can also be related to the dissociationprocess of the O�H bond. The situation is analogous withthe anilines set, where the descriptors: minimum partialcharge of a nitrogen atom, maximum index of nucleophilicreactivity of a nitrogen atom, and the maximum exchangeenergy of a C�N bond, are localized on the amino group,which participates directly on the process of acceptation ofprotons. The standard internal energy of vaporization ofthe solvent, DvapU

0, mainly reflects the non-specific effectsconsisting in the formation of the cavities in the solvent to

locate the molecules of the solute and the species formedin the dissociation process.

Acknowledgements

The authors thank Professor Peter C. Jurs for giving us ac-cess to the ADAPT program. We are grateful to CatalanGovernment for financial support (grant 2005 SGR 00814)and J. J of the Universitat de Barcelona for a BRD grant.

References

[1] Y. Fu, L. Liu, R.-Q. Li, Q.-X. Guo, J. Am. Chem. Soc. 2004,126, 814 – 822.

[2] D. M. Chipman, J. Phys. Chem. A 2002, 106, 7413 – 7422.[3] G. I. Almerindo, D. W. Tondo, J. R. Pliego, J. Phys. Chem. A

2004, 108, 166 – 171.[4] J. Jover, R. Bosque, J. Sales, QSAR Comb. Sci. 2007, 26,

385 – 397.[5] J. Jover, R. Bosque, J. Sales, QSAR Comb. Sci. 2008, 27,

563 – 581.[6] J. Zhang, T. Kleinçder, J. Gasteiger, J Chem. Inf. Model.

2006, 46, 2256 – 2266.[7] U. A. Chaudry, P. L. A. Popelier, J. Org. Chem. 2004, 69,

233 – 241.[8] M. J. Citra, Chemosphere 1999, 38, 191 – 206.[9] B. G. Tehan, E. J. Lloyd, M. G. Wong, W. R. Pitt, E. Gancia,

D. T. Manallack, QSAR Comb. Sci. 2002, 21, 473 – 485.[10] E. Bosch, F. Rived, M. Roses, J. Sales, J. Chem. Soc., Perkin

Trans. 2 1999, 1953 – 1958.[11] J.-N. Li, Y. Fao, L. Liu, Q.-X. Guo, Tetrahedron 2006, 62,

11801 – 11813.[12] A. A. Ivanova, I. I. Baskin, V. A. Palyulin, N. S. Zefirov,

Dokl. Chem. 2007, 413, 90 – 94.[13] E. Bosch, C. Rafols, M. Roses, Talanta 1989, 36, 1227 – 1231.[14] E. Bosch, C. Rafols, M. Roses, Anal. Chim. Acta 1995, 302,

109 – 119.[15] C. Rafols, M. Roses, E. Bosch, Anal. Chim. Acta 1997, 338,

127 – 134.[16] F. Rived, M. Roses, E. Bosch, Anal. Chim. Acta 1998, 374,

309 – 324.[17] F. Rived, I. Canals, E. Bosch, M. Roses, Anal. Chim. Acta

2001, 439, 315 – 333.[18] M. K. Chantooni, I. M. Kolthoff, Anal. Chem. 1979, 51,

133 – 140.[19] I. M. Kolthoff, M. K. Chantooni, J. Am. Chem. Soc. 1971,

93, 3843 – 1446.[20] A. D. Headley, S. D. Starnes, L. Y. Wilson, G. R. Famini, J.

Org. Chem. 1994, 59, 8040 – 8046.[21] I. J. Minnick, M. Kilpatrick, J. Phys. Chem. 1939, 43, 259 –

274.[22] A. L. Henne, C. J. Fox, J. Am. Chem. Soc. 1954, 76, 479 –

481.[23] H. Sigel, R. Malini-Balakrishnan, U. K. Haering, J. Am.

Chem. Soc. 1985, 107, 5137 – 5148.[24] V. N. Viryulina, R. A. Chupakhina, V. V. Serebrennikov, J.

Gen. Chem. USSR (Engl. Transl.) 1981, 51, 1244 – 1246.[25] B. Chawla, S. K. Mehta, J. Phys. Chem. 1984, 88, 2650 –

2655.

1214 � 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.qcs.wiley-vch.de QSAR Comb. Sci. 27, 2008, No. 10, 1204 – 1215

Full Papers Jesus Jover et al.

Page 12: QSPR Prediction of pK for Aliphatic Carboxylic Acids and Anilines in Different Solvents

[26] K. Izutsu, Acid – Base Dissociation Constants in DipolarAprotic Solvents, Blackwell Scientific Publications, Oxford1990.

[27] H. Bartnicka, I. Bojanowska, M. K. Kalinowski, Aust. J.Chem. 1991, 44, 1077 – 1084.

[28] S. P. Porras, M.-L. Riekkola, E. Kenndler, J. Chromato-graph. A 2001, 905, 259 – 268.

[29] N. G. Korzhenevskaya, M. M. Mestechkin, K. Y. Chotii,A. B. Abdusalamov, K. B. Ulanenko, V. I. Rybachenko, J.Gen. Chem. USSR (Engl. Transl.) 1992, 62, 939 – 941.

[30] I. Kaljurand, A. Kuett, L. Soovaeli, T. Rodima, V. Maee-mets, I. Leito, I. A. Koppel, J. Org. Chem. 2005, 70, 1019 –1028.

[31] M. R. Crampton, L. C. Rabbitt, J. Chem. Soc., Perkin Trans.2 1999, 1669 – 1674.

[32] Y. F. Milyaev, S. A. Khorishko, L. N. Balyatinskaya, Gen.Chem. USSR (Engl. Transl.) 1981, 51, 168 – 171.

[33] T. V. Kashik, V. A. Usov, S. M. Ponomareva, L. V. Timokhi-na, M. G. Voronkov, J. Org. Chem. USSR (Engl. Transl.)1985, 21, 549 – 553.

[34] Z. Pawlak, S. Kuna, M. Richert, E. Giersz, M. Wisniewska,J. Chem. Thermodynamics 1991, 23, 135 – 140.

[35] A. R. Katritzky, V. S. Lovanov, M. Karelson, CODESSA,Reference Manual V 2.13, Semichem and the University ofFlorida, Gainesville, FL 1997.

[36] M. J. S. Dewar, E. G. Zoebisch, E. F. Healey, J. P. P. Stewart,J. Am. Chem. Soc. 1985, 107, 3902 – 3909.

[37] J. P. P. Stewart, MOPAC 6.0 Quantum Chemistry ProgramExchange, QCPE, no. 455, Indiana University, Bloomington,IN 1989.

[38] R. W. Taft, M. J. Kamlet, J. Am. Chem. Soc. 1976, 98,2886 – 2894.

[39] M. J. Kamlet, R. W. Taft, J. Am. Chem. Soc. 1976, 98, 377 –389.

[40] M. J. Kamlet, J.-L. M. Abboud, R. W. Taft, J. Am. Chem.Soc. 1977, 99, 6027 – 6038.

[41] K. Dimroth, C. Reichardt, T. Siepmann, F. Bohlmann, Lie-bigs Ann. Chem. 1963, 661, 1 – 37.

[42] V. Gutmann, E. Wychera, Inorg. Nucl. Chem Lett. 1996, 2,257 – 260.

[43] U. Mayer, V. Gutmann, W. Gerger, Monatsh. Chem. 1975,1235 – 1257.

[44] I. A. Koppel, V. A. Palm, in: N. B. Chapman, J. Shorter(Eds.), The Influence of Solvent on Organic Reactivity in

Advances in Linear Free, Energy Relationships, PlenumPress, London 1972, Chapter 5.

[45] I. A. Koppel, A. I. Paju, Org. React. 1974, 1, 121 – 136.[46] E. M. Kosower, J. Am. Chem. Soc. 1958, 80, 3253 – 3260.[47] J. Catalan, C. A. D�az, Liebigs Ann./Recl. 1997, 1941 – 1949.[48] J. Catalan, C. A. D�az, V. Lopez, P. Perez, J. L. G. Paz, J. G.

Rodr�guez, Liebigs Ann. 1996, 1785 – 1794.[49] J. Catalan, V. Lopez, P. Perez, R. Mart�n-Villamil, J. G. Ro-

dr�guez, Liebigs Ann. 1995, 241 – 252.[50] C. G. Swain, M. S. Swain, A. L. Powell, S. Alumni, J. Am.

Chem. Soc. 1983, 105, 502 – 513.[51] R. S. Drago, J. Chem. Soc., Perkin Trans. 2 1992, 1827 –

1838.[52] R. S. Drago, Applications of Electrostatic-Covalent Models

in Chemistry, Surfside, Gainsville 1994.[53] S. Browstein, Can. J. Chem. 1960, 38, 1590 – 1596.[54] A. Janowski, I. Turowska-Tyrk, P. K. Wrona, J. Chem. Soc.,

Perkin Trans. 2 1985, 821 – 825.[55] F. W. Fowler, A. R. Katritzky, R. J. D. Rutherford, J. Chem.

Soc. B 1971, 460 – 489.[56] P. C. Jurs, J. T. Chow, M. Yuan, in: E. C. Olson, R. E. Chris-

torffersen (Eds.), Computer-Assisted Drug Design, TheAmerican Chemical Society, Washington DC 1979, pp.103 – 129.

[57] A. J. Stuper, W. E. Brugger, P. C. Jurs, Computer-AssistedStudies of Chemical Structure and Biological Function, Wi-ley, New York 1979.

[58] B. T. Luke, J. Chem. Inf. Comput. Sci. 1994, 34, 179 – 1287.[59] J. M. Sutter, S. L. Dixon, P. C. Jurs, J. Chem. Inf. Comput.

Sci. 1995, 35, 77 – 84.[60] L. Xu. J. W. Ball, S. L. Dixon, P. C. Jurs, Environ. Toxicol.

Chem. 1994, 13, 841 – 851.[61] M. D. Wessel, P. C. Jurs, Anal. Chem. 1994, 66, 2480 – 2487.[62] C. Shannon, W. Weaver, The Mathematical Theory of Com-

munication, University of Illinois Press, Urbana, ILL 1949.[63] N. S. Zefirov, M. A. Kirpichenok, F. F. Izmailov, M. I. Trofi-

mov, Dokl. Akad. Nauk. SSSR 1987, 296, 883 – 887.[64] D. T. Stanton, P. C. Jurs, Anal. Chem. 1990, 62, 2323 – 2329.[65] R. Guha, P. C. Jurs, J. Chem. Inf. Comput. Model. 2005, 45,

800 – 806.[66] C. Reichardt, Solvents and Solvent Effects in Organic

Chemistry, 3rd Edn., VCH, Weinheim 2003.

QSAR Comb. Sci. 27, 2008, No. 10, 1204 – 1215 www.qcs.wiley-vch.de � 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim 1215

QSPR Prediction of pK for Aliphatic Carboxylic Acids and Anilines in Different Solvents