Evaluation of two models for solubility of active ... · Evaluation of two models for solubility of...

10
Evaluation of two models for solubility of active pharmaceutical ingredients: Application on solvent selection in process design Sara Antunes Febra [email protected] Instituto Superior T´ ecnico, University of Lisbon, Portugal November 2013 Abstract Having an optimal solvent selection, and ultimately designing the optimal process, can only be achieved with a suitable solubility model. The empirical NRTL-SAC model [1, 2] has proven to be superior to other solubility models and Pharma mod. UNIFAC(PMUNIFAC) [3] , specially designed for application in API organic solutions, has arisen as a powerful candidate for the application in solvent selection. Unlike NRTL-SAC, PMUNIFAC has the advantage of being a predictive model, not requiring experimental data as user input, but only molecular structure, melting temperature and melting enthalpy. The implementations of NRTL-SAC and PMUNIFAC were accomplished with success in the softwares gPROMS ModelBuilder and Excel/Visual Basic for Applications (VBA). The two models were evaluated against experimental data from 8 APIs from GlaxoSmithKline (GSK). Only NRTL-SAC proved to be t for purpose (average order of magnitude of discrepancy (OMD) = 0.27 and maximum OMD = 1.63). However, when only systems with all parameters available were considered, PMUNIFAC showed good predicting potential with an average OMD = 0.40 and a maximum OMD = 2.05. It was conclusive that the bottleneck for PMUNIFAC to be applied in solubility modelling was having too many missing parameters, but its results were encouraging to support its further development. In the last stage of the study, NRTL-SAC was successfully applied in the solvent selection of two GSK crystallisation cases. Alternative solvent systems were identified and validated, adding value to the solvent selection procedures of the company. Keywords: Solubility; Molecular modelling; NRTL-SAC; Pharma. Mod. UNIFAC; Solvent selection; Solid-liquid equilibria. 1. Introduction There are many new medicines that are being de- veloped today. Many of these potential new medi- cines will fail in clinical trials, but some may rep- resent tomorrow’s new treatments. In total, it takes approximately 10 to 15 years to go through the drug discovery and clinical development process and bring a medicine to the market. Ultimately, the manufacturing process should give a product matching all quality standards while assuring its economical efficiency and sustainability that starts from the optimization of yield, process volumes, mass efficiency, vessel occupancy, carbon footprint, among others. The whole procedure is also costly, which are all reasons that support the big effort by the pharmaceutical industry on the improvement of the process design strategies. In the pharmaceutical industry, the presence of solvents is essential in all steps of manufacturing process (Figure 1). Therefore, the solubility predic- tion methodologies have drawn the attention of the scientific community [4] , since the key parameters in the manufacture of active pharmaceutical ingredi- ents (APIs), being related with solvent selection, are the solute solubility and the solvation promo- tion (form change) [5] . Input API Reaction Extraction Isolation Washing Drying Distillation Crystallisation Figure 1: Typical process sequence in Pharma in- dustry. Crystallisation is the preferred method of purific- ation in the pharmaceutical industry for both the final drug substance and the isolated intermediates in the synthesis [2] , hence, the selection of a suitable solvent system is especially relevant for this step, where the quality of the isolated product is determ- ined and the yield of the process can be the most compromised. 1

Transcript of Evaluation of two models for solubility of active ... · Evaluation of two models for solubility of...

Evaluation of two models for solubility of active pharmaceutical

ingredients: Application on solvent selection in process design

Sara Antunes [email protected]

Instituto Superior Tecnico, University of Lisbon, Portugal

November 2013

Abstract

Having an optimal solvent selection, and ultimately designing the optimal process, can only beachieved with a suitable solubility model. The empirical NRTL-SAC model[1, 2] has proven to besuperior to other solubility models and Pharma mod. UNIFAC(PMUNIFAC)[3], specially designed forapplication in API organic solutions, has arisen as a powerful candidate for the application in solventselection. Unlike NRTL-SAC, PMUNIFAC has the advantage of being a predictive model, not requiringexperimental data as user input, but only molecular structure, melting temperature and meltingenthalpy. The implementations of NRTL-SAC and PMUNIFAC were accomplished with success in thesoftwares gPROMS ModelBuilder and Excel/Visual Basic for Applications (VBA). The two modelswere evaluated against experimental data from 8 APIs from GlaxoSmithKline (GSK). Only NRTL-SACproved to be t for purpose (average order of magnitude of discrepancy (OMD) = 0.27 and maximumOMD = 1.63). However, when only systems with all parameters available were considered, PMUNIFACshowed good predicting potential with an average OMD = 0.40 and a maximum OMD = 2.05. Itwas conclusive that the bottleneck for PMUNIFAC to be applied in solubility modelling was havingtoo many missing parameters, but its results were encouraging to support its further development. Inthe last stage of the study, NRTL-SAC was successfully applied in the solvent selection of two GSKcrystallisation cases. Alternative solvent systems were identified and validated, adding value to thesolvent selection procedures of the company.Keywords: Solubility; Molecular modelling; NRTL-SAC; Pharma. Mod. UNIFAC; Solvent selection;Solid-liquid equilibria.

1. Introduction

There are many new medicines that are being de-veloped today. Many of these potential new medi-cines will fail in clinical trials, but some may rep-resent tomorrow’s new treatments. In total, ittakes approximately 10 to 15 years to go throughthe drug discovery and clinical development processand bring a medicine to the market. Ultimately,the manufacturing process should give a productmatching all quality standards while assuring itseconomical efficiency and sustainability that startsfrom the optimization of yield, process volumes,mass efficiency, vessel occupancy, carbon footprint,among others. The whole procedure is also costly,which are all reasons that support the big effort bythe pharmaceutical industry on the improvement ofthe process design strategies.

In the pharmaceutical industry, the presence ofsolvents is essential in all steps of manufacturingprocess (Figure 1). Therefore, the solubility predic-tion methodologies have drawn the attention of thescientific community[4], since the key parameters in

the manufacture of active pharmaceutical ingredi-ents (APIs), being related with solvent selection,are the solute solubility and the solvation promo-tion (form change)[5].

Input API

Reaction ExtractionIsolation Washing DryingDistillation Crystallisation

Figure 1: Typical process sequence in Pharma in-dustry.

Crystallisation is the preferred method of purific-ation in the pharmaceutical industry for both thefinal drug substance and the isolated intermediatesin the synthesis[2], hence, the selection of a suitablesolvent system is especially relevant for this step,where the quality of the isolated product is determ-ined and the yield of the process can be the mostcompromised.

1

In practice, the selection of solvents and anti-solvents for crystallisation mostly relies on exper-ience, analogy and experimental testing[5]. It istime and resources consuming to test many puresolvent/mixtures systems, let alone all combina-tions. Hence, representatives are selected (solventclasses), leaving out many potentially better op-tions. Moreover, it is not always possible to gener-ate experimental data due to material constraints,as they tend to occur in the early developmentphases or in case of impurities. To surpass thesehindrances, one interesting perspective is the use ofthermodynamic models for the calculation.

Many distinct thermodynamic models have beendeveloped throughout the years and some have pre-dictive potential with high accuracy such as Wilson,UNIQUAC, NRTL(-RK) and modified UNIFAC(Dortmund)[6] for vapour-liquid equilibria (VLE).On the other hand, the choice of the best thermody-namic model for solid-liquid equilibria (SLE) is notan easy task, as most of the models are not designedfor SLE, but for other applications[7]. Neverthe-less, there are currently two recent models, PharmaMod. UNIFAC (PMUNIFAC) and Non-RandomTwo-Liquid Segment Activity (NRTL-SAC), thatwere built exactly for API solubility prediction andshow promising results for activity coefficients inSLE [3, 1], however PMUNIFAC had not been im-plemented and evaluated within a pharmaceuticalcompany yet.

2. Solubility TheoryAccording to literature[8], the general relation forSLE calculation of simple eutectic systems is de-scribed in equation 1.

lnxLI γLI = −∆hm,I

RT

(1− T

Tm,I

)− ∆cP,I

R

×(

1− Tm,I

T

)+

∆cP,I

Rln

T

Tm,I(1)

For temperatures that are not very far from themelting point, the contributions of the heat capa-city are negligible, since they have tendency to can-cel out due to the opposite signs[9]. After this sim-plification a simple relation is obtained, equation2.

lnxLI γLI = −∆hm,I

RT

(1− T

Tm,I

)=

∆sm,I

R

(1− Tm,I

T

)(2)

The solubility of a solid compound in the liquidphase can then be calculated for a given temper-ature if, apart from the enthalpy (or entropy) of fu-sion and the melting temperature information, thesolute activity coefficient is available. By the use

of an activity coefficient model, it is possible to de-termine the solute solubility in an iterative way.

2.1. NRTL-SAC model

NRTL-SAC is a segment contribution activity coef-ficient model, derived from the polymer non-random two-liquid model (NRTL). It describes theeffective surface interactions for any organic, non-electrolyte and non-solvate molecule in terms of fourtypes of conceptual segments. The hydrophobicsegment X represents the molecular surface areathat is adverse to hydrogen bonding. The hydro-philic segment Z) simulates polar molecular sur-faces with the tendency to form a hydrogen bond.The polar segments represent the molecular sur-face area with interactions characteristic of an elec-tron donor or acceptor. The polar-attractive seg-ment Y − exhibits attractive interactions with ahydrophilic molecular surface, whereas the polar-repulsive segment Y + exhibits repulsive interac-tions with a hydrophilic molecular surface. Thesesegments have unary parameters (rk, with k = seg-ment letter or just X, Y −, Y + and Z) and binaryparameters (αnm and τnm , with n and m as thesegment indices) associated.

The generation of the parameters X, Y −, Y + andZ for solvent parameters is done by the model de-velopers through regressions that involve availableexperimental VLE or liquid-liquid equilibria (LLE)data for solvent-solvent binary pairs. There is totalof five different possible natures for solvents that arerelated with the unary parameters weights: hydro-phobic, hydrophilic, polar, hydrophobic/polar andhydrophilic/hydrophobic.

The solute parameters X, Y −, Y + and Z can beretrieved by a model user in a capable tool withNRTL-SAC implemented, such as Aspen Solubil-ity Modeler, by feeding a representative solubilitydataset of solute-solvent(s) systems. It is also im-portant to cover all solvents natures (up to five) thatare present in the target systems to be predicted,in the initial experimental dataset.

The correlative character of the model enables adifferent approach for the general relation for SLE(equation 1 or its simplification, equation 2). Since,as suggested in literature[2], there is the possibilityof creating up to three adjustable parameters A, Band C, according to equations 3 and 4.

equation (2) = A+B

T(3)

A =∆hm,I

RTm,I, B = −∆hm,I

R

equation (1) = A+B

T+ Cln(T ) (4)

2

A =∆hm,I

RTm,I− ∆cP,I

R− ∆cP,I ln(Tm,I )

R,

B = −∆hm,I

R+

∆cP,ITm,I

R, C =

∆cP,I

R

Instead of using the melting temperature and en-thalpy, It is possible to regress these three addi-tional API-specific parameters or only A and B,as the heat capacity contribution is typically neg-ligible. This approach is specially useful for caseswhen there is no knowledge on the fusion temper-ature, the fusion enthalpy or the liquid and solidheat capacity (when significant). Plus, it enables abetter fit for the experimental data.

Regarding the binary interactions, the developersproposed “reference compounds” for the conceptualsegments - Hexane for X, Water for Z and Acet-onitrile for Y . The interaction between the polarmolecule and water can be negative or positive, ori-gin of the Y − and Y +. The reference compoundsare used to identify the segment-segment nonran-domness factor (αnm) and binary interaction energy(τnm) parameters for the conceptual segments fromregression of available experimental VLE and LLEdata associated with these reference compounds.This way, the binary interaction matrices τnm andαnm , have each sixteen predefined entries. Despitethe activity coefficients being highly dependent onthe temperature, NRTL-SAC attempts to provideorder-of-magnitude estimates based on minimal ex-perimental information. Thus no temperature de-pendency is provided in the model, apart from thecontribution in the equilibrium constant parametersfrom equations 3 and 4.

2.2. Pharma. mod. UNIFAC model

Pharma. mod. UNIFAC is a group contributionmethod, hence, it looks at a solution as a mix-ture of functional groups (FGs) with predeterminedparameters rather than molecules. The advantageof this concept is that the number of FGs is muchsmaller than the number of possible molecules. Themodel is, in fact, a derivative of modified UNI-FAC model, created with the intention of overcom-ing the functional groups diversity limitation thatthe mod. UNIFAC faced when applied to APIsolutions. Therefore, the system of equations ofPMUNIFAC is the same as the one of mod. UNI-FAC and the difference between the models residessolely in the set of available FGs and its unary (Rk

and Qk) and binary parameters values Ψnm, wherek, n and m are the functional group indices.

Incrementation is the process of decomposinga molecule in functional groups respective to thegroup contribution method in use. Each groupstructure is associated to a main group (MG), and,to illustrate, an example of an incrementation ofHeptane is present in Figure 2. The respective func-

tional groups count νik are in Table 1, where k andi are the FG and component indices, respectively.

Figure 2: Incrementation for Heptane.

Table 1: Incrementation for Heptane.

Main group 1Functional group (k) 1 CH3 2 CH2

νHeptanek 2 5

There are two functional group specific paramet-ers, the surface area R and the volume Q, that areobtained from the van der Waals surface area andvolumes. As for the binary parameters, Ψnm, asclear from equation 5, they have three contribu-tions, anm, bnm and cnm, defined between two maingroups, m and n, rather than functional groups.

Ψnm = exp

(−anm + bnmT + cnmT

2

T

)(5)

The author of the model started by identifying 5solvent classes (plus Water) that cover 75% of allsolvents used in pharmaceutical industry: alkane,alcohol, ketone, ester, ether and Water. Each oneof these classes is associated to a main group, be-ing main group 1 characteristic of alkane class, 5 ofalcohol, 13 of ketone, 19 of ester, 34 of ether and73 of Water. For this first approach of the model,the author mainly considered the binary interac-tions between each main group present in APIs andsolvents MGs 1, 5, 13, 19, 34 and 73. There wereonly a few binary interactions considered with boththe groups in large concentration. This shows thatthe model is not ready yet to applied in solvent mix-tures. For instance, a ternary system of an API witha solvent mixture of alcohol/ketone would not bepossible to work with because it would imply para-meters for the interaction 5-13 with both groupsin large quantity, and, currently, this interaction isonly described for API/alcohol or API/ketone in-teractions (parameters fitted over limited concen-tration range).

An analysis was done for GSK solvents, in or-der to assess at which extent they were describedby the model. The results show that consideringonly the groups 1, 5, 13, 19, 34 and 73, to repres-ent the solvents that are used in API manufacture,is a fair approach to start with, since only 34 outof 162 solvents in the GSK database and 3 out of18 in GSK manufacture are not being considered.However, the fact that the model can not describethe solvents in their totality is already represent-ing a limitation for the application of the model insolvent selection procedures.

3

3. Models Evaluation

This evaluation was based on the comparisonbetween experimental data and solubility predic-tions for GSK molecules. All the molecules thatmade part in the evaluation step are in the scope ofthe models:

• Organic• Non-electrolytes• Non-solvates• Fully incrementable by Pharma Mod. UNIFAC

The solubility values generated by the models inthis exercise were for the systems included in theexperimental database. While NRTL-SAC hadthe experimental data from the database fed intothe solute parameters, Pharma Mod. UNIFAC,as a semi-empirical model, had not. This trans-lates into NRTL-SAC predicting interpolations andPMUNIFAC predicting “extrapolations”. The ob-jective of this stage was to assess if either NRTL-SAC or Pharma Mod. UNIFAC had the potentialto be applied in solvent selection procedures. But itis not conclusive for NRTL-SAC, since solvent se-lection procedures require predicting potential forextrapolations (other systems and temperatures dif-ferent from the experimental database).

Two different selections of the same databasewere targeted for the global analysis. The first se-lection being the complete set of solubility data (8GSK API molecules that included 87 points) and,the second, the points correspondent to pairs API-solvent that have all PMUNIFAC functional groupsand binary parameters available (3 APIs that in-cluded 13 points). It is the second scenario that canactually show the potential of the UNIFAC modelin a fair way.

For each model and scenario, the differencebetween the logarithms to base 10 of the calculatedand the experimental data was plotted against theexperimental data. The plots enabled a visual per-ception of the accuracy of each model by the scatter-ing of the data points and the included boundariesfor one and half an order of magnitude up and downthe reference to easily pin down how much the cal-culated data points differed from the experimentaldata. It is desirable that the points are mainly in-side the half an order of magnitude boundaries andnot so much to have the points outside the orderof magnitude boundaries. The statistical analysiscalculations are presented next.

Mean Absolute Deviation of the logarithmof base 10 (MAD LOG10) - it gives the averagefor the order of magnitude of discrepancy betweencalculated and experimental values.

MAD LOG10 =1

N

∑i

|LOG(xcalc,i)

−LOG(xexp,i)| (6)

Maximum Logarithm Absolute Deviation(Max AD LOG10 ) - it gives the maximum for theorder of magnitude of discrepancy between calcu-lated and experimental values. This value is some-times an outlier due to experimental error.

Max AD LOG10 = Max {|LOG(xcalc,i)

−LOG(xexp,i)|} (7)

In Figures 3, 4 and 5, it is possible to visualisethe accuracy of each model for the two selectionsof data, as well as the extent of the scattering.

-7

-5

-3

-1

1

-1 0 1 2 3

LOG

(xsa

t,ca

lc)

-LO

G(x

sat,

exp)

LOG(xsat,exp)

PMUNIFAC1 order of magnitude1/2 order of magnitude

Figure 3: NRTL-SAC performance for the completedatabase.

-7

-5

-3

-1

1

-1 0 1 2 3

LOG

(xsa

t,ca

lc)

-LO

G(x

sat,

exp)

LOG(xsat,exp)

PMUNIFAC1 order of magnitude1/2 order of magnitude

Figure 4: PMUNIFAC performance for the com-plete database.

-7

-5

-3

-1

1

-1 0 1 2 3

LOG

(xsa

t,ca

lc)

-LO

G(x

sat,

exp)

LOG(xsat,exp)

PMUNIFAC1 order of magnitude1/2 order of magnitude

Figure 5: PMUNIFAC performance for the selecteddatabase of API-solvent pairs with all PMUNIFACparameters available.

Tables 2 and 3 display the statistical results forthe two scenarios.

4

Table 2: Fit statistics of the two models for thecommon set of the company molecules. Number ofAPIs: 8; number of points: 87; number of solvents:26; temperature range: 0.6 - 50 ◦C.

Fit statistics - deviationsModel MAD LOG10 Max AD LOG10

PMUNIFAC 0.85 6.12NRTL-SAC 0.27 1.63

Table 3: Fit statistics of the two models for the se-lected set of API-solvent pairs with all PMUNIFACparameters available. Number of APIs: 3; num-ber of points: 13; number of solvents: 6 solvents;temperature range: 6.2 - 47.5 ◦C.

Fit statistics - deviationsModel MAD LOG10 Max AD LOG10

PMUNIFAC 0.40 2.05NRTL-SAC 0.18 0.74

Looking at Table 2, NRTL-SAC shows a verygood fit. The average MAD of 0.85 fromPMUNIFAC would label the model as unsuitablefor the application in solvent selection. But hav-ing the knowledge that the majority of the requiredparameters were not available (not yet regressed),that could be a precipitated conclusion before hav-ing looked at the seconde scenario. Indeed, Table3 shows that the results from PMUNIFAC signific-antly improve when all interactions and functionalgroups are well described by parameters.

Figures 3 and 4 show that the deviations observedhave reduced scattering for NRTL-SAC and low ac-curacy with high scattering for PMUNIFAC. Butas can be seen from Figure 5, PMUNIFAC becomesmore accurate when only points with structuralgroups and group interaction parameters availableare considered. In Figure 5, the point with lar-ger deviation is correspondent to Ethylene glycolthat is considered a natural outlier by the author ofthe model[10], that may be due to its propensity toself-association, resulting in an equilibrium betweendifferent incrementations.

To finish the study, the probability of predictionfailure (PPF)for each model over the MAD LOG10

was calculated and plotted. The PPF of solubility(mg API/ mL solvent) in the MAD LOG10, equa-tion 8, is used for comparison between models. Be-ing Nn relative to molecule ID, Nmax to maximumnumber of molecules (8 for the complete dataset and3 for the selection) and value to MAD value.

PPF (%) = 100% (1−∑Nmax

n=1 Nn {(MAD LOG10 )n ≤ value}Nmax

)(8)

Both PPFs are shown in Figures 6 and 7. Theyshow which percentage of data points possesses lar-

ger deviations than a certain value, for instance, inFigure 6, the probability to have a PMUNIFAC fitwith a MAD LOG10 value larger than 1 is 30 %.Comparing to Figure 6, a great improvement forPMUNIFAC performance is clear in Figure 7.

0102030405060708090

100

0 0.5 1 1.5Part

of

dat

a w

ith

gre

ater

d

evia

tio

n (

PP

F)

MAD LOG10(solubility)

NRTL-SACPMUNIFAC

Figure 6: Probability of prediction failure of the twomodels for their common database.

0102030405060708090

100

0 0.2 0.4 0.6 0.8 1Part

of

dat

a w

ith

gre

ater

d

evia

tio

n (

PP

F)

MAD LOG10(solubility)

NRTL-SACPMUNIFAC

Figure 7: Probability of prediction failure of thetwo models for the selected database of API-solventpairs with all PMUNIFAC parameters available.

3.1. Evaluation conclusions

It is possible to relate A, B and C with the thermo-dynamic properties by equation 3, but it does notmean that the properties can be calculated fromthe regressed parameters. In fact, it was found thatmany times, even with an extensive experimentaldatabase, A andB regression results would not havephysical meaning, such as negative melting temper-ature.

PMUNIFAC proved to be a model with poten-tial for future applications based on the predictionsof great accuracy when all inputs were gathered.Nevertheless, it still is a work in progress, sincethe majority of binary parameters is missing. Itwas also conclusive that further binary interac-tions should be considered for future regressions,since having all considered ones available would stillnot be enough for application in solvent selectionprocedures. Even though NRTL-SAC results arehighly dependent on the initial experimental data-base, they endorse that NRTL-SAC is a very strongmodel, at least for interpolations. The only modelthat could be fit for purpose in the solvent selectionprocedures is NRTL-SAC.

5

4. Solubility Modelling Application

The last part of the present study describes the ap-plication of solubility modelling with NRTL-SAC inan actual GSK project as an attempt to contributefor the improvement of processes in API manufac-ture. The regression of the parameters was done inAspen Solubility Modeler V7.3. For the regressiononly the molar weight of the solid and experimentalsolubility data were required. The knowledge aboutthe melting point was not required since A and Bwere regressed together with X, Y −, Y + and Z.

Having the parameters, predictions were gener-ated for 130 solvents at several temperatures, pureand binary mixtures with a selected co-solvent. Theresults were compiled in a new spreadsheet togetherwith the relevant database columns for each case.Keeping in mind the target and solvent selectioncriteria, Excel filters were applied and a group ofsolvent alternatives stood out. New meetings werearranged where the discussion of the alternativesolutions took place. Closing the cycle, follow-upsto assess the quality of the results were achieved byboth author and project scientists.

It is worth mentioning that, unfortunately, thecase that that came up in the short time windowof this study was not organic non solvate and non-electrolyte (model scope), but hemi-hydrate organicelectrolyte. The solvent selection analysis was takenforth anyway and it was possible to assess the valueof NRTL-SAC outside its scope.

Some concepts used in the solvent selection pro-cedures are clarified in the next paragraphs.

Common solvents - The favourable solvents.They are the ones that the company commonly usesin manufacture. These are listed in internal docu-mentation of the company.

Form screen - The objective of a form screenis to discover and characterize crystal-forms of theAPI that are relevant to the project objectives. Thescreen includes a set of crystallisation experimentsin several solvents and solvent mixtures to identifyand characterize crystal-forms (polymorphs, hy-drates, and solvates).

Process volume (vol) - This value is one ofthe most relevant to get a feel about the equipmentdimensions requested by the process under consid-eration. It equals to the mL solvent per g inputmaterial. Normally it is desirable to have vol < 30.

Red solvents - The solvents to avoid. Theirdenomination comes from the consideration of sev-eral weights related with environmental, health andsafety issues and are listed in internal documenta-tion of the company.

Yield - Yield of the system undergoing the crys-tallisation. The yield can be calculated by equation9, where solubility (xsat) is in mg API/mL solvent,

and vol and V in L.

yield(%) =

(x sat × vol)final − (x sat × vol)initial

(x sat × vol)initial× 100% (9)

vol initial =1000 L/mL

x sat initial(10)

volfinal = vol initial + V added anti−solvent (11)

4.1. Solubility Modelling - CaseThe process to move from was a cooling crystallisa-tion in 10% water/IPA. The kinetics of the crys-tallisation were very slow (> 24h for cooling andageing) and the final yield was not ideal (< 70%).The API (Molecule D) is an hemi-hydrate HBrsalt formed from the reaction of Molecule C withaqueous HBr. The process is represented in Figure8 where the grey area is related to the solid stateof the API and the coloured pointers illustrate heatexchange. All weights, volumes and equivalents arerelative to Molecule C.

The data available on this HBr salt compoundincluded solubility data in 18 solvents at 2 tem-peratures (20 and 50 ◦C). A form screen was alsoavailable, it was known that higher hydrates couldbe generated with solvent mediums with high wa-ter activity and HBr was only available in watersource, max 48 %wt HBr in water. BuOH wasnot suitable because it promotes a different form.DMSO/THF system result in gelling. The object-ive was to identify solvent/anti-solvent pair or puresolvent for cooling crystallisation (by first intent)that would give improved crystallisation kinetics(currently > 24h), avoid gelling and improve yield.The solvent selection procedure comprised:

1. Regression of Molecule D NRTL-SAC paramet-ers from experimental data;

2. Calculation of solubility values for 130 puresolvents at 5, 20, 50 and 80 ◦C;

3. Calculation of solubility values for mixtures of130 solvents with 10 %wt water at 20 ◦C;

4. Calculation of solubility values for mixtures of130 solvents with 20, 30, 40, 60 %wt water at5, 20, 50, 80 ◦C;

5. Analysis of calculated data for high temperat-ure and low water content;

6. Evaluation of the results from the analysisthrough solubility experiments.

The results for the predictions fit with the NRTL-SAC parameters from the regression are condensedin Table 4 and illustrated in Figure 9.

Table 4: Fit statistics for Molecule D.

Fit statistics - deviationsMAD LOG10 Max AD LOG10

0.54 1.37

6

Molecule C (1 wt)IPA (9.2 vol)

Water (0.625 vol)

4 FILTRATION

IPA/Water 10% w/w (1 vol, 0.77wt)

Mother Liquors (10 vol)

8 FILTRATION

10 voladded over 30 min

age for 30 min

3 AGE

10 vol 75°C

5 COOL

40°C

Organic Waste (3.5 vol)

Water/IPA 10w/w% (3.5 vol)

10 WASH 2

Organic Waste (3.5 vol)

9 WASH 1

Water/IPA 10w/w% (3.5 vol)

Seed (Molecule D, 0.05wt)slurried in 10% water/IPA (0.2 vol)

Age for 31h

6 SEED, AGE

40°C

7 COOL, AGE

0°C Molecule DYield 65-70%

IPM

Water+IPA11 VACUUM DRY

40°C

1 HEAT

50°C10 vol

2 REACTION

48% aq HBr (0.39 wt, 0.263 vol, 1.025 eq)

50°C

Figure 8: Process Definition Diagram of Molecule D from Molecule C.

The fit is not perfect, maybe due to the lar-ger amount of points, possible experimental errorand the molecule being an electrolyte (NRT-SACparameters do not consider an electrolyte segment).Nevertheless, the fitted parameters still show somepredicting potential, as observable from Figure 9.

-1.5

0

1.5

-1.5 0 1.5

Log

Pre

dic

ted

mas

s fr

acti

on

(m

g A

PI/

mL

Solv

en

t)

Log Experimental mass fraction (mg API/ mL Solvent)

1 order of magnitude1/2 order of magnitude

Figure 9: Predicted and experimental solubilityLOG values.

An API is, commonly, a large organic moleculewhich hampers its solubility. Hence, the bottleneckis to identify a suitable solvent for the crystallisa-tion. Because one restriction for the solvent systemis not to have high water activity, the focus was re-duced to the 0 and 20 %wt water mixtures. Thedesired solvent system must provide high solubil-ity and the model was sub-estimating some values,so the focus moved to the higher temperatures (50and 80 ◦C). At 80 ◦C the solubility values are thehighest but on the other hand, some solvents arenot suitable anymore if they have a lower boilingpoint.

The criteria was applied to the calculated dataas summarized in Table 5. The solvent should havesolubility larger than 30 mg/mL since this trans-lates into process volumes smaller than 30 vol. Thesolvent should not belong to the red category in

first intent. Alcohols, bases and acids are to beavoided, since they may promote undesirable re-actions. There will always be water in the pro-cess (the HBr is added in aqueous solution), sothe solvent should be water miscible. Finally, thesolvent should be preferably listed in the commonsolvents list from the company.

After applying the restrictions, 9 alternativesolvents stood out, from which 3 were common -Acetone, 2-MeTHF and 1,4-dioxane. After meetingwith the project tem members, only Acetone waspointed as a potential solution of the procedure.2-MeTHF and 1,4-dioxane, as alcohols, could alsopromote the undesirable reaction with HBr. Acet-one was taken into closer investigation in the labor-atory.

4.1.1. Evaluation of the resultsA series of experiments with several water contentsat 20 and 50 ◦C were carried out showing that wetAcetone provided very high solubility as predicted.The model predicted correctly the existence of asolubility maximum behaviour over the water con-tent. The results also showed, as predicted, thatthere was no advantage in working at higher tem-peratures than ambient, since there is no great in-crease in the solubility of the API. These results aresummed up in Table 6.

Table 6: Experimental versus predicted solubilityvalues for the Acetone and wet Acetone systems.

Temp. %wt water (W) W activity vol(◦C) (%) (molar)

20 0.0 0 25.020 5.2 0.48 5.820 10.1 0.62 1.320 19.5 0.70 2.050 5.2 0.50 4.8

7

Table 5: Solvent selection for Molecule D.

Criteria 50 ◦C 80 ◦C

0 %wt water 20 %wt water 0 %wt water 20 %wt water

Solubility > 30 mg/mL 12 102 21 89

Red solvents 2 17 4 15

Alcohols/Bases/Acids 9 23 15 24

Water non-miscible 6 85 9 69

Common solvent 3 16 1 8

After restrictions 3 9 4 9(0 common) (3 common: (0 common) (1 common:

Acetone; 1,4-dioxane)2-MeTHF;1,4-dioxane)

It was a case of success since 1 alternative cameout of 1 proposed solvent mixture (wet acetone).

Despite the maximum value for solubility beingaround 10 %wt water content, 5 %wt is a better op-tion, because oiling phenomena is promoted whenthe solute has such an affinity with the solvent (sol-ubility too high), keeping the API from crystallisingout. Plus, at 5 %wt water content a good processvolume is already allowed (≈ 6 vol), and the wa-ter activity is at a reasonable value (not very highor low) to expect the right form - hemi-hydrate.In order to determine if the form of the API inthe selected solvent system was the correct one, theform screen was consulted and it was found that thehemi-hydrate was the form obtained in the Acet-one/5 %wt water system.

Having a strong candidate for solvent and coolingcrystallisation being excluded (low yields), an anti-solvent had to be selected. From the initial experi-mental data, three candidates for anti-solvent wereidentified: t-Butylmethyl ether (TBME), EthylAcetate and Butyl Acetate.

In order to check the miscibility in the tern-ary systems, ternary maps were generated with the“Conceptual design” tool from Aspen Plus V7.3.All three candidates for anti-solvent were misciblewith wet Acetone at process conditions. Although,the miscibility behaviour might change with the ad-dition of the API.

Several solubility experiments were done for thethree different ternary solvent systems so that themiscibility could be checked, the yield predicted andthe API form determined. The study included volu-metric ratios of anti-solvent:solvent of 1:1, 2:1, 3:1and pure anti-solvent, all at ambient temperature.

The yields and final process volumes were calcu-lated based on the initial values (in 5 % wet Acet-one): solubility of 159.9 mg/mL and process volume

of 6.3 vol. High yields and low process volumes wereobtained for the three anti-solvents in volumetricratio 2:1, especially for TBME.

The form was checked through IR spectroscopy.IR confirmed that TBME gives the same form asthe reference (i.e., the hemi-hydrate). Butyl Acet-ate and Ethyl Acetate give a slightly different spec-tra, however, because the reference is a dry powderwhilst the samples are slurries, there is a possibilitythey are actually giving the hemi-hydrate as well.The study was not carried further since an altern-ative process had been successfully identified - anti-solvent crystallisation at 20 ◦C of 5 %wet Acetoneas solvent and TBME as anti-solvent in the volu-metric ratio of 1:2, respectively.

Because there are always yield losses in steps pre-vious to the crystallisation, such as vessel transfer,process yield > 95% is a realistic value.

The new proposal for the process diagram ispresented in Figure 10. It has 2 steps less than ini-tial (heating is no longer required), the system doesnot contain reactive species and the yield has im-proved significantly. The process volume increasedfrom ≈ 11 vol to ≈ 20 vol, which is still a good valueand that can be reduced compromising the yield. Itis also expected a great decrease on the crystallisa-tion time, but no crystallisation experiments werecarried out during the study duration.

The project team members accepted the pro-posed alternative process and further experimentswould be done for optimisation purposes before thescale-up.

5. ConclusionsIt was found that NRTL-SAC, unlike PMUNIFACat this time, can be successfully used for solvent se-lection. NRTL-SAC can be applied on electrolytesand solvates, however higher quality of dataset isrequired and lower success rate is expected from re-

8

Table 7: Experimental solubility for the ternary systems Solvent (S)/Anti-solvent (AS) at 20 ◦C withSolvent = 5 %wt wet Acetone.

ASRatio AS/S Solubility Yield Final Process Vol. Form(volumetric) (mg/ mL) (%) (vol)

None Pure Solvent 159.9 Initial vol = 6.3 Hemi-hydrate

Ethyl Acetate 1:1 20.32 87.3 12.5Ethyl Acetate 2:1 5.54 96.5 18.8 OtherEthyl Acetate 3:1 2.83 98.2 25.0Ethyl Acetate Pure AS 0.5 99.7 ∞Butyl Acetate 1:1 20.63 87.1 12.5Butyl Acetate 2:1 5.92 96.3 18.8 OtherButyl Acetate 3:1 2.74 98.3 25.0Butyl Acetate Pure AS 0.1 99.9 ∞TBME 1:1 10.85 93.2 12.5TBME 2:1 1.07 99.3 18.8 Hemi-hydrateTBME 3:1 0.19 99.9 25.0TBME Pure AS 0.0 100.0 ∞

Molecule DYield > 95%

IPMTBME

9 VACUUM DRY

40°C

Input (1 wt)Acetone (4.75 wt)

Water (0.25 wt)

3 INLINE FILTRATION

Water/acetone 5%wt (1 vol, 0.77wt)

Mother Liquors (18 vol)

6 FILTRATION

1 HEAT

20°C5 vol 10 vol

added over 30 minage for 30 min

Organic Waste (3.5 vol)

TBME (3.5 vol)

8 WASH 2

Organic Waste (3.5 vol)

7 WASH 1

TBME/Acetone/Water, composition as in ML (3.5 vol =cake vol)

2 REACTION

48% aq HBr (1.025 eq) TBME (12 vol)

4 SEED, AGE

20°C

5 ANTI-SOLVENT

20°C5 vol

Seed (Molecule D, 0.05wt)slurried in TBME/Acetone/Water

20°C

Figure 10: New Proposal for the Process Definition Diagram of Molecule D from Molecule C.

commended solvents. Nevertheless, NRTL-SAC hasdisadvantages, such as the requirement of a repres-entative set of experimental data for each solute(parameters regression) and the high dependencyof the accuracy of prediction on the selection ofdataset going into regression. Therefore, the iden-tification of the ”ideal” dataset (which is expectedto give the highest accuracy on prediction) is ulti-mately, depending on the choice of the modeller.

PMUNIFAC is currently not fit for purpose inGSK solvent selection due to the early stage of itsdevelopment. Since it is highly limited in terms ofparameters, only for a very small number of solvent-solute combinations all parameters are available.However, the results obtained for PMUNIFAC areencouraging to support its further development,since it showed to give comparable or even better

solubility prediction for certain solute-solvent pairs,when compared to NRTL-SAC. The main advant-age of PMUNIFAC is that no experimental solubil-ity data for solute is required, which widens modelapplicability within GSK significantly.

Solvent selection through solubility modelling hasproven to add value in process design, after beenapplied with success on two GSK processes. Thetool is best applied on early phase projects wherenot much data is yet available and the amount ofmaterial is limited, hence targeted experiments areconducted. Solvent selection cannot be based onmodelling alone but requires experimental work.

6. Future Work

The next steps should focus on the improvementof the applicability and the accuracy of solubility

9

models in order to allow the design and optimisationof unit operations to rely on solubility models moreefficiently.

It would be desirable to improve the accuracyof predictive models such as Pharma Mod. UNI-FAC to allow solubility modelling to be applied onsolutes where no experimental data is available. Forthis, the existing Pharma Mod. UNIFAC parametermatrix is recommended to be revised, expandedand completed, which imply an increase of the datapoints that are fed into the parameter regressionin order to generate further binary group interac-tion parameters. Since both the group contributionbased models and the fully empirical models rely ondatasets with a high quantity and quality of datapoints, it is strongly recommended that more effortsare made within the industry to organise existingsolubility data. A central storage of solubility datawould allow solubility predictions at an early stageof the solute development, enabling a more efficientprocess design at a later stage in development for aspecific project. It would also enable model evalu-ation with a more relevant basis.

In order to widen the applicability of solubilitymodels further to electrolytes and solvates, whichaccount for a significant part of solutes that appearin the manufacturing processes of APIs, it is re-commended to carry out an evaluation of existingmodels for electrolytes such as LIQUAC, LIFAC[11]

and eNRTL-SAC[12] in a similar manner as it hasbeen done in this work. It is also recommended toexplore models which take a different approach thatmay offer a viable alternative, such as the SAFT-gamma[13] model, which is based on an equation-of-state approach.

Acknowledgements

The author would like to thank Dr.-Ing. Thor-alf Hartwig, Prof. Dr. Benilde Saramago for allthe support and company GlaxoSmithKline Phar-maceuticals for having provided with the workingfacilities and for the knowledge kindly shared.

References

[1] Chen Chau-Chyun and Song Yuhua. Solu-bility modeling with a nonrandom two-liquidsegment activity coefficient model. Ind. Eng.Chem. Res., 43(26):8354–8362, 2004.

[2] Chen Chau-Chyun and Crafts Peter A. Cor-relation and prediction of drug molecule solu-bility in mixed solvent system with the non-random two-liquid segment activity coefficient(NRTL - SAC) model. Ind. Eng. Chem. Res.,45(13):4816–4824, 2006.

[3] Diedrichs Anja and Gmehling Jurgen. Solu-bility calculation of active pharmaceutical in-gredients in alkanes, alcohols, water and their

mixtures using various activity coefficient mod-els. Ind. Eng. Chem. Res., 50(3):1757–1769,2011.

[4] S. Perez-Vega, E. Ortega-Rivas, I. Salmeron-Ochoa, and P.N. Sharratt. A system viewof solvent selection in the pharmaceutical in-dustry: towards a sustainable choice. En-vironment, Development and Sustainability,15(1):1–21, 2013.

[5] Petr Kolar, Jun-Wei Shen, Akio Tsuboi,and Takeshi Ishikawa. Solvent selection forpharmaceuticals. Fluid Phase Equilib., 194-197:771–782, 2002.

[6] Prof. Dr. J. Gmehling. 7th common Meetingof UNIFAC Consortium and DDBST GmbH,Oldenburg, Germany, September 2009.

[7] Baptiste Bouillot, Sebastien Teychene, andBeatrice Biscans. An evaluation of thermo-dynamic models for the prediction of drugand drug-like molecule solubility in organicsolvents. Fluid Phase Equilib., 309:36–52,2011.

[8] John M. Prausnitz, Rudiger N. Lichtenthaler,and Edmundo Gomes de Azevedo. Molecu-lar Thermodynamics of Fluid-Phase Equilibria.Prentice Hall PTR, Upper Saddle River, NJ,3rd edition, 1999. ISBN:978-0-1397-7745-5.

[9] Jurgen Gmehling, Barbel Kolbe, MichaelKleiber, and Jurgen Rarey. Chemical Thermo-dynamics for Process Simulation. Wiley-VCH,Darmstadt, Germany, 1st edition, March 2012.ISBN: 978-3-527-31277-1.

[10] Diedrichs Anja. Evaluation und Erweiterungthermodynamischer Modelle zur Vorhersagevon Wirkstoffloslichkeiten. PhD thesis, Uni-versitat Oldenburg, 2010.

[11] Andre Mohs and Jurgen Gmehling. A re-vised {LIQUAC} and {LIFAC} model (LI-QUAC*/LIFAC*) for the prediction of prop-erties of electrolyte containing solutions. FluidPhase Equilibria, 337(0):311 – 322, 2013.

[12] Chen Chaun-Chyun and Song Yuhua. Exten-sion of nonrandom two-liquid segment activ-ity coefficient model for electrolytes. Ind. Eng.Chem. Res., 44(23):8909–8921, 2005.

[13] Alexandros Lymperiadis, Claire S. Adjiman,Amparo Galindo, and George Jackson. Agroup contribution method for associatingchain molecules based on the statistical associ-ating fluid theory (SAFT-γ). J. Chem. Phys.,127(23):–, 2007.

10