Is clay-polycation adsorbent future of the greener society...

10
Is clay-polycation adsorbent future of the greener society? In silico modeling approach with comprehensive virtual screening Supratik Kar a , Shinjita Ghosh b , Jerzy Leszczynski a, * a Interdisciplinary Nanotoxicity Center, Department of Chemistry, Physics and Atmospheric Sciences, Jackson State University, Jackson, MS, USA b School of Public Health, Jackson State University, Jackson, MS, USA highlights graphical abstract Clayepolyelectrolyte nano- composites are one of the future adsorbents. Determination of adsorption coef- cient (k d ) is decisive for adsorbents' adsorption property. Developed QSPR model of log (k d ) for 30 organic pollutants. Predicted the log (k d ) of ~0.9 million chemicals from ve diverse databases. Responsible structural scaffolds for higher log (k d ) are identied employing MCS algorithm. article info Article history: Received 3 November 2018 Received in revised form 27 December 2018 Accepted 31 December 2018 Available online 2 January 2019 Handling Editor: Y. Yeomin Yoon Keywords: Adsorption Clay-polymer nanocomposites (CPNs) Organic pollutants QSPR Virtual screening abstract Presence of organic pollutants in the wastewater and aquatic environment is one of the serious concerns worldwide. Superior adsorption of organic pollutants on modied clays with organocations is well approved nowadays. Among hybrid materials, clayepolyelectrolyte nanocomposites (CPN) are one of the specically designed materials for the efcient adsorption of diverse organic pollutants. Due to higher surface area of the clay mineral coupled with a polymer coating, they have an explicit afnity for the organic pollutants. In this background, we have developed statistically signicant and mechanistically interpretable quantitative structure-property relationship (QSPR) model for adsorption coefcient of diverse organic pollutants to the protonated montmorilloniteepoly-4-vinylpyridine-co-styrene (Mt eHPVPcoS), a hybrid CPN. Further, the model was employed to predict the logk d value of ~0.9 million chemicals from ve diverse databases spanning from existing and experimental pharmaceuticals, natural and synthetic chemicals and dyes with unknown logk d value for the mentioned CPN. The reliability of predicted data is checked with two layers condence screening i.e. the applicability domain study fol- lowed by prediction quality check by Prediction Reliability Indicator. Thus, prediction of each compound can be used for data gap lling by environmental regulatory authorities as well as industries. Followed by, maximum common substructure-based (MCS) algorithm is employed for individual database to extract the important structural scaffold for higher logk d to the mentioned CPN. © 2018 Published by Elsevier Ltd. 1. Introduction Environmental pollution leads to around 9 million premature * Corresponding author. E-mail address: [email protected] (J. Leszczynski). Contents lists available at ScienceDirect Chemosphere journal homepage: www.elsevier.com/locate/chemosphere https://doi.org/10.1016/j.chemosphere.2018.12.215 0045-6535/© 2018 Published by Elsevier Ltd. Chemosphere 220 (2019) 1108e1117

Transcript of Is clay-polycation adsorbent future of the greener society...

Page 1: Is clay-polycation adsorbent future of the greener society ...pubs.ccmsi.us/pubs/Chemosphere19-220-1108.pdf · Is clay-polycation adsorbent future of the greener society? In silico

lable at ScienceDirect

Chemosphere 220 (2019) 1108e1117

Contents lists avai

Chemosphere

journal homepage: www.elsevier .com/locate/chemosphere

Is clay-polycation adsorbent future of the greener society? In silicomodeling approach with comprehensive virtual screening

Supratik Kar a, Shinjita Ghosh b, Jerzy Leszczynski a, *

a Interdisciplinary Nanotoxicity Center, Department of Chemistry, Physics and Atmospheric Sciences, Jackson State University, Jackson, MS, USAb School of Public Health, Jackson State University, Jackson, MS, USA

h i g h l i g h t s

* Corresponding author.E-mail address: [email protected] (J. Leszczynsk

https://doi.org/10.1016/j.chemosphere.2018.12.2150045-6535/© 2018 Published by Elsevier Ltd.

g r a p h i c a l a b s t r a c t

� Clayepolyelectrolyte nano-composites are one of the futureadsorbents.

� Determination of adsorption coeffi-cient (kd) is decisive for adsorbents'adsorption property.

� Developed QSPR model of log (kd) for30 organic pollutants.

� Predicted the log (kd) of ~0.9 millionchemicals from five diversedatabases.

� Responsible structural scaffolds forhigher log (kd) are identifiedemploying MCS algorithm.

a r t i c l e i n f o

Article history:Received 3 November 2018Received in revised form27 December 2018Accepted 31 December 2018Available online 2 January 2019

Handling Editor: Y. Yeomin Yoon

Keywords:AdsorptionClay-polymer nanocomposites (CPNs)Organic pollutantsQSPRVirtual screening

a b s t r a c t

Presence of organic pollutants in the wastewater and aquatic environment is one of the serious concernsworldwide. Superior adsorption of organic pollutants on modified clays with organocations is wellapproved nowadays. Among hybrid materials, clayepolyelectrolyte nanocomposites (CPN) are one of thespecifically designed materials for the efficient adsorption of diverse organic pollutants. Due to highersurface area of the clay mineral coupled with a polymer coating, they have an explicit affinity for theorganic pollutants. In this background, we have developed statistically significant and mechanisticallyinterpretable quantitative structure-property relationship (QSPR) model for adsorption coefficient ofdiverse organic pollutants to the protonated montmorilloniteepoly-4-vinylpyridine-co-styrene (MteHPVPcoS), a hybrid CPN. Further, the model was employed to predict the logkd value of ~0.9 millionchemicals from five diverse databases spanning from existing and experimental pharmaceuticals, naturaland synthetic chemicals and dyes with unknown logkd value for the mentioned CPN. The reliability ofpredicted data is checked with two layers confidence screening i.e. the applicability domain study fol-lowed by prediction quality check by ‘Prediction Reliability Indicator’. Thus, prediction of each compoundcan be used for data gap filling by environmental regulatory authorities as well as industries. Followedby, maximum common substructure-based (MCS) algorithm is employed for individual database toextract the important structural scaffold for higher logkd to the mentioned CPN.

© 2018 Published by Elsevier Ltd.

i).

1. Introduction

Environmental pollution leads to around 9 million premature

Page 2: Is clay-polycation adsorbent future of the greener society ...pubs.ccmsi.us/pubs/Chemosphere19-220-1108.pdf · Is clay-polycation adsorbent future of the greener society? In silico

S. Kar et al. / Chemosphere 220 (2019) 1108e1117 1109

deaths in 2015 worldwide. Interestingly, only water pollution leadsto 1.8 million deaths according to Lancet Commission report(Landrigan et al., 2018). Due towater pollution, less than 1% of earthwater is accessible for drinking purpose and the crisis will be morepronounced by 2050 when the demand of clean water will be one-third higher than in the present time. Thus, there is a serious needfor proper monitoring and solution driven plan to avoid majorsources of water pollution followed by economical and fast watercleaning options (Grandclement et al., 2017). According to a report,more than 700 listed environmental pollutants, their respectivemetabolites and transformed wastes are exist in the Europeanaquatic environment (NORMAN Network). This list includes phar-maceuticals, agrochemicals, heavy metals, dyes, and oil products.According to a watch list decision from European Union in 2015/495/EU of 20 March 2015, following materials are found in thewater systems: natural hormone (17-b-estradiol (E2)), synthetichormone 17-a-ethinylestradiol (EE2)), pain killer diclofenac, majormacrolide antibiotics (clarithromycin, azithromycin, and erythro-mycin), pesticides (oxadiazon, imidacloprid, methiocarb, thiame-thoxam, and triallate) as well as UV filter, dye additives (Barbosaet al., 2016).

To eliminate hazardous organic micropollutants from water-bodies as well as different wastewater, multiple techniques areavailable like: traditional coagulation, chemical precipitation, ionexchange, reverse osmosis, electrodialysis and adsorption. Amongthe mentioned ones, one of the economic technique is adsorptionof organic pollutants from water (Ruiz-Hitzky et al., 2012). How-ever, it has one of the major drawback which is the regenerationcosts, particularly if thermal process is implemented. Thus, toeradicate this disadvantage, the use of clays and/or clay minerals,activated carbon and polymers have played an imperative role inthe removal of these organic compounds. Particularly, claymineralshave acknowledged a great attention due to their cation exchangecapability, large surface area, flexibility, low cost, easy productionand ecofriendly nature (Unuabonah and Taubert, 2014). Further,hybrid clayepolyelectrolyte nanocomposites (CPN) attained lots ofattention in the field of organic pollutants removal from watersystems. Commonly employed CPN are Chitosan/montmorillonite(MMT) composite, Nanocomposites of Poly (4-vinylpyridine-co-styrene) (PVPcoS), polydiallyl dimethylammonium chloride(PDADMAC) with MMT, Composite of Magnetite/Bentonite Claycomposite and PVPcoS/MMT clay composite (Gardi et al., 2015).Shabtai and Mishael (2017) reported protonated PVPcoS-MMT(HPVPcoS-MMT) showed enhanced adsorption property over thenormal PVPcoS/MMT. The efficient removal of pyrene (Radian andMishael, 2012) and atrazine (Zadaka et al., 2009) by HPVPcoS-MMT system also gained lots of attention to establish it as afuture efficient adsorbent CPN due to their large functionalizedsurface followed by enabled hydrogen bond, van der Waals, andpep bond formation for rapid adsorption with high efficiency(Radian and Mishael, 2012).

In this scenario, a great number of studies need to be performedto explore the adsorbent material experimentally as well ascomputationally. In recent time, adsorption of organic pollutants bycarbon nanotubes (CNTs) had been successfully modeled byquantitative structure-property relationship (QSPR) (Roy et al.,2019; Wang et al., 2019). Considering importance of the CPN ma-terial in respect to combat with water pollution, a hand countableresearch had been performed till today (Radian and Mishael, 2012;Radian et al., 2015; Zadaka et al., 2009). Radian et al. (2015)experimentally checked 30 organic pollutants adsorption co-efficients (logkd) to HPVPcoS-MMT which probably represent thehighest number of available data. Thus, more efficient approachesare needed, and computational model can play significant role topredict large number of chemicals' logkd. This would provide

sufficient information related to their removal from the wastewater system employing the studied CPN. Among the computa-tional approaches, the QSPR model has the capability to encode theproperty response employing physico-chemical properties oforganic pollutants which is well approved by the regulatoryagencies worldwide (Roy et al., 2015a; Petrosyan et al., 2017). Fol-lowed by QSPR modeling, the developed model can be employedfor virtual screening and prediction as well as identification ofmajor structural scaffolds responsible for higher adsorption coef-ficient within no time with minimal cost confidently.

The present study applied thirty organic pollutants' experi-mental logkd value (Radian et al., 2015) for the development ofstatistically robust and mechanistical interpretable QSPR modelemploying simple structural descriptors. The developed model wasfurther implemented for prediction of logkd value for ~0.9 millioncompounds comprises five diverse datasets including approved aswell as experimental pharmaceuticals, synthetic and naturalpharmaceuticals, chemicals and reagents; and dyes. All five data-bases considered are important and common source of virtualscreening for drug lead finding, toxicity testing and dyes selectionfor research groups. Their removal from the environment is crucialas their chances of presence in the environment are more likely tobe happened now or later. Along with the mechanistic interpreta-tion of modeled compounds, major responsible structural scaffoldsfor higher logkd are identified employing the maximum commonsubstructure (MCS) approach for individual datasets. The predic-tion of such a huge number of chemicals not only providing vastdata gap filling regarding ecotoxicity for regulatory agencies as wellas for industries but is also good source of repository for designingof future chemical regarding their removal from environmentemploying the studied CPN. The entire computational process isrepresented in Fig. 1.

2. Materials and methods

2.1. Dataset

2.1.1. Dataset for modelingThe adsorption coefficient (kd) of 30 organic pollutants to the

MteHPVPcoS were collected from Radian et al. (2015) The detailsabout the experimental condition and calculation of kd can befound in the literature (Radian et al., 2015). The logarithmic kd value(Table 1) were modeled through QSPR tools employing physico-chemical parameters of the studied organic pollutants. It isimportant to mention that although the number of data points aresmall, but this is only available dataset regarding adsorption coef-ficient of organic pollutants to the MteHPVPcoS.

2.1.2. Datasets for screening and predictionFor screening and prediction, around 0.9 million chemicals

(spreading over existing pharmaceuticals, new synthetic and nat-ural chemicals as well as pharmaceuticals, agrochemicals, indus-trial chemicals and dyes) covering five different databases areconsidered (Supplementary). The databases are as following: SuperNatural II (Banerjee et al., 2015) Interbioscreen Natural and Syn-thetic database (InterBioScreen ltd. database), DrugBank (Wishartet al., 2018) Paola Gramatica database (Sangion and Gramatica,2016) and Hair Dye database (Williams et al., 2018). The reasonsto select these explicit groups of compounds are followings:

a) Pharmaceuticals are one of the major sources of ecotoxicityand thus their risk assessment and management is veryimportant.

Page 3: Is clay-polycation adsorbent future of the greener society ...pubs.ccmsi.us/pubs/Chemosphere19-220-1108.pdf · Is clay-polycation adsorbent future of the greener society? In silico

Fig. 1. The flow diagram of the employed computational steps in the present study.

S. Kar et al. / Chemosphere 220 (2019) 1108e11171110

b) Majority of the considered compounds are persistence in theenvironment due to long half-lives and hydrophobiccharacteristics.

c) They have tendency to remain in soil and aqueouscompartment and sediment; and followed by bioaccumulatein different organisms,

d) As majority of pharmaceuticals are intended to act in thehuman system towards specific targets, substrates or en-zymes, the chances of toxicity is high and details areunknown.

2.2. Descriptor calculation

Chemical structures were drawn using GaussView 6 software(GaussView, Version 6, 2016) followed by optimization using den-sity functional theory's (Parr and Yang, 1989) B3LYP (Becke, 1993)functional with 6-31G (d, p) basis sets employing Gaussian 16software (Gaussian 16, Revision B.01, 2016). The final outputstructures are then saved in .mol2 format and employed in Dragonsoftware version 6 (DRAGON Version 6.0, 2011) to computefollowing properties: constitutional indices, topological indices,ring descriptors, connectivity indices, atom-centred fragments,functional group counts, atom-type-E-state indices, molecularproperties, and charge descriptors. To compute first and secondgeneration extended topochemical atom (ETA) indices, we have

used PaDEL-Descriptor 2.21 (Yap, 2011). Thereafter, all computeddescriptors were pretreated with a 0.0001 variance cut off andchecked through a 0.99 correlation coefficient to eradicate corre-lations between them. Finally, a total pool of 275 descriptors werecalculated.

2.3. Dataset splitting

The studied dataset consists of structural and chemical classvariance. Thus, we grouped them by clustering employing Koho-nen's Self Organizing Map (SOM) approach in Matlab (MATLAB andStatistics Toolbox Release, 2012) as reported in Fig. 2. The leftportion of Fig. 2 describes the neurons represent clusters of com-pounds through blue hexagons and the neurons are connected byred lines. The greater distances represented through darker colors(near to black and read) and the lesser distances between clustercharacterized through the lighter colors (close to yellow). The rightpart of Fig. 1 recommends the SOM grouping categorized by thesizes stated in the hexagonal boxes. Thus, typically, all compoundsare pretty diverse in nature. The dataset has been divided into atraining set (23 compounds) and a test set (7 compounds) with 3:1ratio employing activity sorted response taking 4th seed moleculein the test set using ‘Dataset division GUI 1.2’ tool (DTC LabSoftware Tools).

Page 4: Is clay-polycation adsorbent future of the greener society ...pubs.ccmsi.us/pubs/Chemosphere19-220-1108.pdf · Is clay-polycation adsorbent future of the greener society? In silico

Table 1Experimental and predicted adsorption coefficients values in logarithmic scale for the modeled training and test set organic pollutants employing developed QSPR equation.

ID Chemical RBN MAXDP D 3B Experimental log (kd) Calculated/Predicted log (kd)

Training setOP1 Acenaphthylene 0 0.178 0.09 4.191 3.708OP4 Benzoic acid 1 3.201 0.09778 2.14 3.168OP5 Bromacil 2 4.707 0.02043 2.217 1.567OP6 Carbamazepine 0 4.753 0.09298 1.89 2.071OP8 Diazinon 7 2.185 0.03511 4.763 4.934OP9 Diclofenac 4 3.827 0.10385 5.884 4.345OP10 Diuron 1 4.247 0.07115 2.743 2.273OP11 Eosin B 2 5.431 0.13887 3.445 3.556OP12 Gemfibrozil 6 3.943 0.03098 4.144 3.774OP13 Ibuprofen 4 3.762 0.03512 3.331 3.059OP15 Imazaquin 3 5.265 0.0635 3.296 2.612OP17 Metaldehyde 0 1.808 0 1.317 1.389OP18 Metazaclor 3 5.096 0.06509 2.216 2.705OP19 Metolachlor 5 5.188 0.03182 2.584 2.898OP21 Paracetamol 1 3.524 0.07308 1.955 2.578OP22 Phenanthrene 0 0.178 0.08596 4.544 3.631OP23 Picric acid 3 3.372 0.16379 5.405 5.224OP24 Prometryn 5 1.304 0.03624 4.644 4.420OP25 Pyrene 0 0.212 0.09377 2.946 3.767OP26 Pyrithiobac 5 4.291 0.11266 4.186 4.771OP28 Sulfentrazone 3 5.018 0.0791 2.525 3.001OP29 Terbuthylazine 4 1.661 0.04544 3.1 4.033OP30 Toluene 0 0.083 0.05333 3.055 3.044Test setOP2 Ametryn 5 1.264 0.04079 3.941 4.523OP3 Atrazine 4 1.616 0.05222 4.523 4.179OP7 Clofibric acid 3 3.741 0.06647 2.803 3.234OP14 Imazapyr 3 5.001 0.04897 1.958 2.434OP16 Ketoprofen 4 5.244 0.08 3.199 3.367OP20 Naphthalene 0 0.12 0.07937 2.372 3.526OP27 Simazine 4 1.568 0.06105 5.168 4.366

Fig. 2. Clustering of the studied dataset employing Kohonen's self-organizing map.

S. Kar et al. / Chemosphere 220 (2019) 1108e1117 1111

2.4. Model development and validation

Model is generated employing genetic function approximation(GFA) for best descriptors selection followed by partial least squares(PLS) using ‘Genetic Algorithm v4.1’ and ‘Partial Least Squares’software, respectively under DTC lab open access tools (DTC LabSoftware Tools). To evaluate model's robustness, quality and pre-diction capability, internal (R2 and Q2

LOO) and external (Q2F1orR2

pred and Q2F2) validation metrics are computed.

2.5. Applicability domain (AD) and randomization

The AD was important study to check the prediction reliabilityof each modeled as well as screened compound. Thus, two ap-proaches a) the standardization technique (STD-AD) (Roy et al.,

2015b) and b) the Euclidean distance approach (ED-AD) (DTC LabSoftware Tools). were studied employing open access tools ‘Appli-cability Domain 1.0’ and ‘Euclidean Applicability Domain 1.0’,respectively (DTC Lab Software Tools). To check whether the modelis obtained by chance or not, thus we have employed tworandomization approaches: X-randomization and Y-randomization(DTC Lab Software Tools). In case of X-randomization, the randommodels were generated shuffling the entire descriptor matrix (X)while keeping the values of response property (Y) same. On thecontrary, for Y-randomization, random models were developedshuffling the response property values (Y), while keeping thedescriptor matrix same (X). In both cases, 100 randommodels weregenerated and the average R2 and Q2 of 100 random models werecomputed.

Page 5: Is clay-polycation adsorbent future of the greener society ...pubs.ccmsi.us/pubs/Chemosphere19-220-1108.pdf · Is clay-polycation adsorbent future of the greener society? In silico

S. Kar et al. / Chemosphere 220 (2019) 1108e11171112

2.6. Prediction reliability check

As we will predict logKd values of huge number of compoundsemploying the QSPR model for data gap filling, so reliability ofprediction is an important issue. Thus, we have implemented anopen aces tool i.e. “Prediction Reliability Indicator' developed byRoy et al. (2018) which is proficient to offer prediction reliabilitycomposite score and categorize each prediction in form of good(score 3), moderate (score 2) and bad (score 1) prediction. Here, it isimportant to mention that although the AD studies can predict thereliability of a prediction but are unable to classify the reliability orprediction quality in term of score or quantity. So, employing thistool, prediction qualities can be checked and classified for indi-vidual molecules which will give reliability to data gap filling.

3. Results and discussion

3.1. GA-PLS model

The final model is developed employing GA-PLS tool. The modelcan be demonstrated with the following equation (1):

logkd ¼ 2:058þ 0:431� RBN þ 19:061� DεB � 0:370�MAXDPnTraining ¼ 23; Latent variable ¼ 2;R2 ¼ 0:75;Q2

LOO ¼ 0:63;nTest ¼ 7;Q2

F1 ¼ R2pred ¼ 0:66;Q2

F2 ¼ 0:65

(1)

The obtained statistical parameters suggest that the model isrobust and predictive in nature although we have extremelydiverse chemicals dataset to model. The developed model has onlytwo latent variables. Modeled training and test set compositionswith the computed descriptors values along with predicted logkdare summarized in Table 1. Experimental and calculated/predictedlogkd values are employed in a scatter plot where all the moleculesare scattered within ±0.5 of the fitted line signifying a robust model(Fig. 3a). To see the importance of modeled descriptors, they arestandardized and plotted (Fig. 3b). The variable significance plotsummarized that RBN has highest and positive contribution to-wards logkd. On the contrary, D 3B and MAXDP have positive andnegative contributions, respectively, towards logkd and contribu-tion significance is second and third, respectively.

The PLS model is studied with two AD approaches and both theED-AD and STD-AD methods suggested that there are no com-pounds residing outside of the AD zone. Thus, considering bothapproaches, prediction of all test set compounds are reliable. Theresult of ED based AD study is plotted in Fig. 3c. We have alsoperformed X and Y-randomization tests to verify whether themodel was obtained by any chance or not. The 100 random modelswere generated for both techniques and we found that the averageR2 and Q2 of those 100 random models are 0.159 and �0.244; and0.131 and �0.27, respectively for X and Y-randomization study,respectively which are much lower than the acceptable limit of 0.5for both parameters (Fig. 3d). The results suggested that the PLSmodel was not obtained by any chance.

3.2. Mechanistic interpretation of the QSPR model

As discussed earlier, RBN which is a constitutional index(Todeschini and Consonni, 2008) signify number of rotatable bondspresent in a specific molecule, has highest and positive contribu-tion towards the logkd. In other words, the conformational re-striction or flexibility of a compound seems to affect the logkd.Higher number of rotatable bonds or flexibility increases theadsorption coefficient proportionally and relatively rigid or

composite compounds showed lower adsorption coefficient to thestudied CPN.

The D 3B is an ETA index which measures contribution of unsa-turation present in a molecule (Roy and Das, 2012). The positivecontribution of this specific feature responsible for higher adsorp-tion confident of organic pollutants towards the CPN. The D 3B canbe explained with the following equations:

DεB ¼ ε1 � ε4 (2)

ε1 ¼P

ε

N(3)

ε4 ¼ ½P ε�SSNSS

(4)

where, 31 gives a measure of electronegative atom count. It repre-sents a summed ε value in a compound comparative to the totalnumber of atoms including hydrogen. On the other hand, 34 is asummation of ε value relative to the total number of atomscounting hydrogen for a saturated carbon skeleton moiety of thestandardmolecule i.e., carbon-carbonmultiple bonds considered assingle bond.

The MAXDP is an electrotopological index which defines themaximum positive intrinsic state difference and related to theelectrophilicity of the molecule (Todeschini and Consonni, 2008).Due to its negative contribution, with the increase of elco-trophilicity the logkd of organic pollutants will be decreased. Thus,presence of nucleophiles increases the logkd value for the studiedorganic pollutants. Compounds with carboxylic group, amine de-rivatives, hydroxyl group, pyrimidine derivatives showed higheradsorption coefficient towards the CPN.

Although PLS can eliminate intercorrelated descriptors in thefinal model, still we have checked intercorrelation effects betweendescriptors and correlation between log (kd) and individual de-scriptors. Interestingly, there is no intercorrelation between anymodeled descriptors according to the correlation plot (Fig. 3e). Thecorrelation plot also signifies the importance of RBN descriptors toencode the response property followed by DεB and MAXDP de-scriptors. The heatmap (Fig. 3f) illustrated using color coded in-formation how log (kd) of individual compound is shifting alongwith each modeled descriptor (The darkest blue, white and darkestyellow signify value of �1, 0 and 1, respectively).

3.3. Screening of datasets

The knowledge and quantitative information about adsorptionaffinity of diverse organic chemicals towards the MteHPVPcoS isvital for the environmental risk assessment and risk managementpurpose (Radian et al., 2010). Thus, we have implemented fivedifferent databases covering wide range of chemical class andstructural diversity which are extensively used in pharmaceuticaland chemical industries. Prediction of ~0.9 million (exactly 885880)compounds employing the developed QSPR model will provide abroader perspective of data gap filling along with quantitative riskassessment and risk management criteria for environmental reg-ulatory authorities and industries. As individual compounds pre-diction is checked through two layers reliability criteria employingthe AD study and ‘Prediction Reliability Indicator’, one can considereach response value with utmost confidence.

For prediction purpose, chemical structures of each database arethoroughly checked and implemented in Dragon 6 and PaDEL-Descriptor 2.21 to compute the modeled descriptors for predic-tion purpose. Followed by prediction performed employing the

Page 6: Is clay-polycation adsorbent future of the greener society ...pubs.ccmsi.us/pubs/Chemosphere19-220-1108.pdf · Is clay-polycation adsorbent future of the greener society? In silico

Fig. 3. a) Scatter plot, b) descriptors importance plot, c) Euclidean-distance based AD study plot, d) X- and Y-randomization plot for the QSPR model (Equation (1)), e) Correlationplot; and f) Heat map for modeled descriptors and response.

S. Kar et al. / Chemosphere 220 (2019) 1108e1117 1113

QSPR model the AD is checked for each compound. After that,prediction reliability indicator tool is implemented only for thosecompounds which passed the AD study to classify the prediction interm of good, moderate and bad. Predicted log (kd) range for all fivedatabases after employing the AD study and prediction qualityemploying ‘Prediction Reliability Indicator’ is reported in Table 2 inform of numbers. The optimumweighting combination of 0.2-0-0.8has been selected from the best QSPR model which is lateremployed for all five screening datasets to check the reliability ofprediction. The details about studied compounds for prediction,their computed descriptors value, predicted logkd and compositescore with prediction quality can be found for each database in theSupplementary.

3.3.1. Super Natural II databaseSuper Natural II is natural products database (Banerjee et al.,

2015) which contains 325,508 natural compounds (NCs)including elaborate data about their 2D structures, physicochemicalproperties, predicted toxicity class. For Super Natural II database,out of 325,508 natural products, 264891 compounds passed the ADstudy. Among 264891 compounds, good, moderate and bad pre-diction showed by 264890, 0 and 1 compound, respectively. Thus,one can consider the predicted log (kd) values of 264890 com-pounds with confidence (Supplementary). In Supplementary file,Super Natural II database has compounds' individual ID (SNID) andSMILES, plus if anyone wants to check individual compounds'structure and additional information, follow this link http://bioinf-

Page 7: Is clay-polycation adsorbent future of the greener society ...pubs.ccmsi.us/pubs/Chemosphere19-220-1108.pdf · Is clay-polycation adsorbent future of the greener society? In silico

Table 2Final screening result in respect to number of compounds residing in different predicted log (kd) range for each database with prediction quality employing ‘PredictionReliability Indicator’.

Parametera Range Super Natural II InterbioscreenNaturalDataset

InterbioscreenSyntheticDataset

DrugBank Gramatica Dataset Dye dataset

Initial total compound - 325508 66804 482688 9300 1267 313AD failed - 60617 4571 14296 1867 120 6Predicted log kda 0.001e3 140921 40657 223948 2831 561 152

3.001e6 123617 21574 244067 4575 583 1546.001 and up 352 2 377 27 3 1

Prediction qualitya Good 264890 62233 468392 7433 1096 304Moderate 0 0 0 0 51 3Bad 1 0 0 0 0 0

a Predicted log kd range and prediction quality are implemented only for those compounds which passed the AD study.

S. Kar et al. / Chemosphere 220 (2019) 1108e11171114

applied.charite.de/supernatural_new/index.php?site¼compound_input, provide the require SNID or SMILES. The highest log (kd)value of 7.374 is obtained for SN00132349. Top ten compoundsalong with the predicted log (kd) of this database is demonstratedin Fig. 4.

Fig. 4. Top 10 chemicals in respect to adsorption coe

3.3.2. Interbioscreen databaseInterbioscreen is a high quality chemical library for screening

purpose with around 550000 compounds (66804 natural com-pounds (NCs) and 482688 synthetic compounds (SCs), in precisetotal 549492) available in-stock (InterBioScreen ltd. database). For

fficient to MteHPVPcoS for individual database.

Page 8: Is clay-polycation adsorbent future of the greener society ...pubs.ccmsi.us/pubs/Chemosphere19-220-1108.pdf · Is clay-polycation adsorbent future of the greener society? In silico

Fig. 5. Common structural scaffolds among the molecules with highest adsorption coefficient to for individual database.

S. Kar et al. / Chemosphere 220 (2019) 1108e1117 1115

calculation purpose, we have employed NCs and SCs separately. Outof 66804 NCs, 62233 compounds passed AD and followed by 62233,0 and 0 compounds showed good, moderate and bad prediction,respectively. In case of SCs, among 482688 compounds, 468392compounds passed AD followed by 468392, 0 and 0 compoundsreported good, moderate and bad prediction, respectively. There-fore, predicted log (kd) values of 62233 NCs and 482688 SCs fromInterbioscreen database can be considered with confidence

(Supplementary). To check details about each compound, put theprovided compounds' ID from Supplementary file in the followinglink http://mastersearch.chemexper.com/misc/hosted/ibscreen/.The highest log (kd) values of 6.218 and 7.359 are obtained forSTOCK1N-25773 and STOCK1S-62119, respectively for NCs and SCsdatabase. Top ten compounds for NCs and SCs along with thepredicted log (kd) under this database is demonstrated in Fig. 4separately.

Page 9: Is clay-polycation adsorbent future of the greener society ...pubs.ccmsi.us/pubs/Chemosphere19-220-1108.pdf · Is clay-polycation adsorbent future of the greener society? In silico

S. Kar et al. / Chemosphere 220 (2019) 1108e11171116

3.3.3. DrugBank databaseDrugBank database combines detailed drug data with inclusive

drug target information consists of approved small molecule drugs,biotech drugs, nutraceuticals and experimental drugs (Wishartet al., 2018). During the time of access, it contains 9300 mole-cules which were employed for prediction followed AD screening.Among 9300 molecules, 7433 molecules passed the AD study andall compounds showed good prediction according to predictionreliability indicator (Supplementary). Check each DrugBank mole-cule putting molecules' ID in the following link https://www.drugbank.ca/drugs for any additional information. The highest log(kd) value of 6.658 is obtained for DB00941. Top ten DrugBankcompounds along with the predicted log (kd) is reported in Fig. 4.

3.3.4. Paola Gramatica databaseDatabase of 1267 human and veterinary pharmaceuticals pre-

pared by Sangion and Gramatica (2016) were implemented for log(kd) prediction (Supplementary). Majority of the pharmaceuticalsunder this study are proved hazardous chemicals to the environ-ment due to their persistence and bioaccumulation character. Thus,information of log (kd) for the studied CPN is significant consideringrisk assessment of these pharmaceuticals. Among 1267 pharma-ceuticals, 1147 passed the AD study and out of them 1096 showedgood prediction and 51 fallen under moderate prediction level (CASnumber of each pharmaceuticals is provided in Supplementaryfile). Alverine (CAS: 150-59-4) showed highest log (kd) value of6.422 among the studied ones and its prediction quality is good.Top ten pharmaceuticals along with the predicted log (kd) of thisdatabase is reported in Fig. 4.

3.3.5. Hair dye databaseDyes are also one of the major sources of environmental pol-

lutants, thus 313 promising hair dyes (Williams et al., 2018) haveimplemented for prediction purpose (Supplementary). Out of 313hair dyes, 307 dyes passed AD study (IUPAC name of each com-pounds had been provided). Out of 307 dyes, 304 dyes predictionquality is good, and 3 dyes showedmoderate prediction quality. C.I.Direct Red 81 showed highest log (kd) value of 6.022 among thestudied ones its prediction quality is moderate. Top ten pharma-ceuticals along with the predicted log (kd) of this database is re-ported in Fig. 4.

3.4. Maximum common substructure (MCS) identification

TheMCS is ametric for similarity searching among two differentchemicals by identifying the maximum structural scaffolds thatappears in both chemicals (Cao et al., 2008). The MCS algorithmoffers more flexible similarity measures than existing orthodoxsimilarity search approaches. Generally, the MCS approach isemployed largely in drug design aspect but herewe have employedthis approach to evaluate the major structural scaffolds which areresponsible for higher adsorption coefficient to the MteHPVPcoS.The identification of common structural scaffold will be helpful toidentify the chemicals adsorption coefficient nature during thetime of design or much before their practical synthesis. Not onlythat, this informationwill be helpful to plan their removal from theenvironment efficiently employing MteHPVPcoS. Thus, taking ac-count of top twenty chemicals based on predicted log (kd) value forindividual datasets, we have determined important structuralscaffolds employing web-based ChemMine tools (Backman et al.,2011) reported in Fig. 5.

4. Conclusion

Statistically robust and predictive QSPRmodel is developedwith

only three descriptors and two latent variables for 30 organic pol-lutants. The modeled response log (kd) of pollutants to theMteHPVPcoS is a significant parameter in present day when eco-toxicity due to pharmaceuticals and industrial chemicals are majorconcerns for environmental regulatory authorities. To removeorganic pollutants from the ecosystem, especially from the watercompartments, this hybrid clay polymer nanocomposite is thefuture of upcoming days for greener society. Thus, modeling thisspecific response for the mentioned clay-polycation adsorbent isthe need for present hour.

The present QSPR model is statistically significant, mechanisti-cally interpretable and developed with only two latent variables.Further, the present model employed for successful prediction anddata gap filling for large varieties of chemical classes and structuresspanning over ~0.9 million chemicals from five diverse datasets.

I. The present model showed much higher statistical result fol-lowed bywent throughmost stringent validation process beforeemploying for virtual screening and prediction purpose.

II. The PLS model predicted log (kd) value of ~0.9 million chemicalsand pharmaceuticals to the MteHPVPcoS which can be verysignificant data for risk assessment and management purpose.Not only that, by checking the AD, the developed model can beemployed as a prediction tool to quantify the log (kd) value ofnew and/or untested chemicals even before their synthesis.� The MCS algorithm provided important structural scaffolds/fragments for individual datasets responsible for higher log(kd) to the MteHPVPcoS. Analyzing all the obtained structuralscaffolds in Fig. 5, we have prepared a final list of commonscaffolds among all studied datasets. The identified scaffoldsare derivatives of Methoxy benzene, nitrobenzene, sulfonate/sulfonic acid, diphenylmethane, and primary, secondary andtertiary amines.

Acknowledgements

Authors thankful to the National Science Foundation (NSF/CREST HRD-1547754 and NSF/RISE HRD-1547836) for financialsupport.

Appendix A. Supplementary data

Supplementary data to this article can be found online athttps://doi.org/10.1016/j.chemosphere.2018.12.215.

References

Backman, T.W., Cao, Y., Girke, T., 2011. ChemMine tools: an online service foranalyzing and clustering small molecules. Nucleic Acids Res. 39, W486eW491.

Banerjee, P., Erehman, J., Gohlke, B.-O., Wilhelm, T., Preissner, R., Dunkel, M., 2015.Super Natural IIda database of natural products. Nucleic Acids Res. 43,D935eD939.

Barbosa, M.O., Moreira, N.F.F., Ribeiro, A.R., Pereira, M.F.R., Silva, A.M.T., 2016.Occurrence and removal of organic micropollutants: an overview of the watchlist of EU decision 2015/495. Water Res. 94, 257e279.

Becke, A.D., 1993. Density-functional thermochemistry. III. The role of exact ex-change. J. Chem. Phys. 98, 5648e5652.

Cao, Y., Jiang, T., Girke, T., 2008. A maximum common substructure-based algorithmfor searching and predicting drug-like compounds. Bioinformatics 24 (13),i366ei374.

DRAGON Version 6.0, 2011. http://www.talete.mi.it/.DTC Lab Software Tools: http://teqip.jdvu.ac.in/QSAR_Tools/.Gardi, I., Nir, S., Mishael, Y.G., 2015. Filtration of triazine herbicides by polymer-clay

sorbents: coupling an experimental mechanistic approach with empiricalmodeling. Water Res. 70, 64e73.

Grandclement, C., Seyssiecq, I., Piram, A., Chung, P.W.W., Vanot, G., Tiliacos, N.,Roche, N., Doumenq, P., 2017. From the conventional biological wastewatertreatment to hybrid processes, the evaluation of organic micropollutantremoval: a review. Water Res. 111, 297e317.

Frisch, M.J., Trucks, G.W., Schlegel, H.B., Scuseria, G.E., Robb, M.A., Cheeseman, J.R.,

Page 10: Is clay-polycation adsorbent future of the greener society ...pubs.ccmsi.us/pubs/Chemosphere19-220-1108.pdf · Is clay-polycation adsorbent future of the greener society? In silico

S. Kar et al. / Chemosphere 220 (2019) 1108e1117 1117

Scalmani, G., Barone, V., Petersson, G.A., Nakatsuji, H., Li, X., Caricato, M.,Marenich, A.V., Bloino, J., Janesko, B.G., Gomperts, R., Mennucci, B.,Hratchian, H.P., Ortiz, J.V., Izmaylov, A.F., Sonnenberg, J.L., Williams-Young, D.,Ding, F., Lipparini, F., Egidi, F., Goings, J., Peng, B., Petrone, A., Henderson, T.,Ranasinghe, D., Zakrzewski, V.G., Gao, J., Rega, N., Zheng, G., Liang, W., Hada, M.,Ehara, M., Toyota, K., Fukuda, R., Hasegawa, J., Ishida, M., Nakajima, T., Honda, Y.,Kitao, O., Nakai, H., Vreven, T., Throssell, K., Montgomery Jr., J.A., Peralta, J.E.,Ogliaro, F., Bearpark, M.J., Heyd, J.J., Brothers, E.N., Kudin, K.N., Staroverov, V.N.,Keith, T.A., Kobayashi, R., Normand, J., Raghavachari, K., Rendell, A.P., Burant, J.C.,Iyengar, S.S., Tomasi, J., Cossi, M., Millam, J.M., Klene, M., Adamo, C., Cammi, R.,Ochterski, J.W., Martin, R.L., Morokuma, K., Farkas, O., Foresman, J.B., Fox, D.J.,2016. Gaussian 16, Revision B.01. Gaussian, Inc., Wallingford CT.

Dennington, R., Keith, T.A., Millam, J.M., 2016. GaussView, Version 6. Semichem Inc.,Shawnee Mission, KS.

InterBioScreen ltd. database Access at https://www.ibscreen.com/bases on October3, 2018.

Landrigan, P.J., Fuller, R., Acosta, N.J.R., Adeyi, O., Arnold, R., Basu, N., et al., 2018. TheLancet Commission on pollution and health. Lancet 391, 462e512.

MATLAB and Statistics Toolbox Release, 2012. The Math Works, Inc., Natick, Mas-sachusetts, United States.

NORMAN Network: Access at www.norman-network.net On October 3, 2018.Parr, R.G., Yang, W., 1989. Density-functional Theory of Atoms and Molecules.Petrosyan, L.S., Kar, S., Leszczynski, J., Rasulev, B., 2017. Exploring simple, inter-

pretable and predictive QSPR model of fullerene C60 solubility in organic sol-vents. J. Nanotox. Nanomed. 2, 28e43.

Radian, A., Fichman, M., Mishael, Y., 2015. Modeling binding of organic pollutants toa clay-polycation adsorbent using quantitative structuraleactivity relationships(QSARs). Appl. Clay Sci. 116e117, 241e247.

Radian, A., Michaeli, D., Serban, C., Nechushtai, R., Mishael, Y.G., 2010. Bioactiveapoferredoxinepolycationeclay composites for iron binding. J. Mater. Chem. 20,4361e4365.

Radian, A., Mishael, Y., 2012. Effect of humic acid on pyrene removal from water bypolycation-clay mineral composites and activated carbon. Environ. Sci. Technol.46, 6228e6235.

Roy, J., Ghosh, S., Ojha, P.K., Roy, K., 2019. Predictive quantitative structure-propertyrelationship (QSPR) modeling for adsorption of organic pollutants by carbonnanotubes (CNTs). Environ. Sci.: Nano. https://doi.org/10.1039/C8EN01059E (inpress).

Roy, K., Ambure, P., Kar, S., 2018. How precise are our quantitative structure-activity

relationship derived predictions for new query chemicals? ACS Omega 3,11392e11406.

Roy, K., Das, R.N., 2012. QSTR with extended topochemical atom (ETA) indices. 15.Development of predictive models for toxicity of organic chemicals againstfathead minnow using second generation ETA indices. SAR QSAR Environ. Res.23, 125e140.

Roy, K., Kar, S., Das, R.N., 2015a. A Primer on QSAR/QSPR Modeling: FundamentalConcepts. Springer (SpringerBriefs in Molecular Science).

Roy, K., Kar, S., Ambure, P., 2015b. On a simple approach for determining applica-bility domain of QSAR models. Chemometr. Intell. Lab. Syst. 145, 22e29.

Ruiz-Hitzky, E., Aranda, P., Darder, M., Rytwo, G., 2012. Hybrid materials basedonclays for environmental and biomedical applications. J. Mater. Chem. 20,9306.

Sangion, A., Gramatica, P., 2016. PBT assessment and prioritization of contaminantsof emerging concern: Pharmaceuticals. Environ. Res. 147, 297e306.

Shabtai, I.A., Mishael, Y.G., 2017. Catalytic polymer-clay composite for enhancedremoval and degradation of diazinon. J. Hazard Mater. 335, 135e142.

Todeschini, R., Consonni, V., 2008. Handbook of Molecular Descriptors. Wiley-VCH,Verlag GmbH, Weinheim, Germany.

Unuabonah, E.I., Taubert, A., 2014. Clay-polymer nanocomposites (CPNs): adsor-bents of the future for water treatment. Appl. Clay Sci. 99, 83e92.

Wang, Y., Chen, J., Tang, W., Xia, D., Liang, Y., Li, X., 2019. Modeling adsorption oforganic pollutants onto single-walled carbon nanotubes with theoretical mo-lecular descriptors using MLR and SVM algorithms. Chemosphere 214, 79e84.

Williams, T.V., Kuenemann, M.A., Driessche, G.A.V., Williams, A.J., Fourches, D.,Freeman, H.S., 2018. Toward the rational design of sustainable hair dyes usingcheminformatics approaches: step 1. Database development and analysis. ACSSustain. Chem. Eng. 6, 2344e2352.

Wishart, D.S., Feunang, Y.D., Guo, A.C., Lo, E.J., Marcu, A., Grant, J.R., Sajed, T.,Johnson, D., Li, C., Sayeeda, Z., Assempour, N., Iynkkaran, I., Liu, Y.,Maciejewski, A., Gale, N., Wilson, A., Chin, L., Cummings, R., Le, D., Pon, A.,Knox, C., Wilson, M., 2018. DrugBank 5.0: a major update to the DrugBankdatabase for 2018. Nucleic Acids Res. 46, D1074eD1082.

Yap, C.W., 2011. PaDEL-Descriptor: an open source software to calculate moleculardescriptors and fingerprints. J. Comput. Chem. 32, 1466e1474.

Zadaka, D., Nir, S., Radian, A., Mishael, Y.G., 2009. Atrazine removal from water bypolycation-clay composites: effect of dissolved organic matter and comparisonto activated carbon. Water Res. 43, 677e683.