Structure-Based Methods for the Prediction of the Dominant P450 Enzyme in Human Drug...

This article was downloaded by: [University of California Santa Cruz]On: 20 November 2014, At: 11:19Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House,37-41 Mortimer Street, London W1T 3JH, UK

SAR and QSAR in Environmental ResearchPublication details, including instructions for authors and subscription information:http://www.tandfonline.com/loi/gsar20

Structure-Based Methods for the Predictionof the Dominant P450 Enzyme in Human DrugBiotransformation: Consideration of CYP3A4, CYP2C9,CYP2D6N. Manga , J.C. Duffy , P.H. Rowe & M.T.D. Cronina School of Pharmacy and Chemistry , Liverpool John Moores University , Byrom Street,Liverpool, L3 3AF, EnglandPublished online: 01 Feb 2007.

To cite this article: N. Manga , J.C. Duffy , P.H. Rowe & M.T.D. Cronin (2005) Structure-Based Methods for the Prediction ofthe Dominant P450 Enzyme in Human Drug Biotransformation: Consideration of CYP3A4, CYP2C9, CYP2D6, SAR and QSAR inEnvironmental Research, 16:1-2, 43-61, DOI: 10.1080/10629360412331319871

To link to this article: http://dx.doi.org/10.1080/10629360412331319871

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) containedin the publications on our platform. However, Taylor & Francis, our agents, and our licensors make norepresentations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of theContent. Any opinions and views expressed in this publication are the opinions and views of the authors, andare not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon andshould be independently verified with primary sources of information. Taylor and Francis shall not be liable forany losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoeveror howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use ofthe Content.

This article may be used for research, teaching, and private study purposes. Any substantial or systematicreproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in anyform to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions

http://www.tandfonline.com/loi/gsar20

http://www.tandfonline.com/action/showCitFormats?doi=10.1080/10629360412331319871

http://dx.doi.org/10.1080/10629360412331319871

http://www.tandfonline.com/page/terms-and-conditions

http://www.tandfonline.com/page/terms-and-conditions

STRUCTURE-BASED METHODS FOR THE PREDICTIONOF THE DOMINANT P450 ENZYME IN HUMAN DRUG

BIOTRANSFORMATION: CONSIDERATION OF CYP3A4,CYP2C9, CYP2D6*

N. MANGA, J.C. DUFFY, P.H. ROWE and M.T.D. CRONIN†

School of Pharmacy and Chemistry, Liverpool John Moores University, Byrom Street, Liverpool,L3 3AF, England

(Received 13 May 2004; In final form 6 September 2004)

Metabolic drug–drug interactions are receiving more and more attention from the in silico community. Earlyprediction of such interactions would not only improve drug safety but also contribute to make drug design morepredictable and rational. The aim of this study was to build a simple and interpretable model for the determination ofthe P450 enzyme predominantly responsible for a drug’s metabolism. The P450 enzymes taken into considerationwere CYP3A4, CYP2D6 and CYP2C9. Physico-chemical descriptors and structural descriptors for 96 currentlymarketed drugs were submitted to statistical analysis using the formal inference-based recursive modelling (FIRM)method, a form of recursive partitioning. Generally accepted knowledge on metabolism by these enzymes was alsoused to construct a hierarchical decision tree. Robust methods of variable selection using recursive partitioning wereutilised. The descriptive ability of the resulting hierarchical model is very satisfactory, with 94% of the compoundscorrectly classified.

Keywords: Drug interactions; QSPKR; Metabolism; Recursive partitioning; WHIM descriptors; Hydrogen bondfactors

INTRODUCTION

The use of in silico modelling to predict the pharmacokinetic profile of drugs is steadily

gaining in momentum and the areas of investigation are getting wider spread. It is hoped that

progressively every single property of a drug will be addressed more and more successfully

as researchers gather experience [1]. The ability to predict the pharmacokinetic profile of

drugs should bring about considerable benefits to preclinical drug research both in financial

and strategic terms.

One of the new fields of interest from in silico modelling is the area of metabolic drug–

drug interactions. The emergence of interest in this area is not only limited to the in silico

community, however. As an example, such data have now been specifically added into

ISSN 1062-936X print/ISSN 1029-046X online q 2005 Taylor & Francis Ltd

DOI: 10.1080/10629360412331319871

*Presented at the 11th International Workshop on Quantitative Structure-Activity Relationships in the HumanHealth and Environmental Sciences (QSAR2004), 9–13 May 2004, Liverpool, England.

†Corresponding author. E-mail: [email protected]

SAR and QSAR in Environmental Research,Vol. 16 (1–2), February–April 2005, pp. 43–61

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a Sa

nta

Cru

z] a

t 11:

19 2

0 N

ovem

ber

2014

the latest edition of Goodman and Gilman’s [2] standard pharmacological textbook. This is

evidence of the growing awareness of drug safety throughout the whole scientific community

concerned with pharmaceutical development and compliance. The emphasis on drug safety

is attributable partly to the legal pressures placed on pharmaceutical companies when

submitting applications for marketing authorisations. Another contributing factor being that

it is now easier to find lead compounds for any given pharmacological target and hence any

method to select out otherwise flawed compounds is beneficial. In addition, mapping out of

drug–drug interactions ensures that drugs can be used more generally and, at the same time,

wards off the competition from endeavouring to elaborate safer versions of drugs.

The P450 enzymes represent an ideal subject for the investigation of metabolic drug–drug

interactions. This superfamily of enzymes is thought to metabolise approximately 90% of all

marketed drugs. In addition there are abundant literature data available on these enzymes.

Most pharmaceutical companies have acknowledged the need of screening for the most

significant P450 enzymes.

P450 is considered to be the most important single enzyme family in drug metabolism [3].

P450-mediated metabolism is an oxidation reaction that is part of phase 1 metabolism. It is

characterised by the attack of an activated oxygen species to facilitate the conjugation and

elimination of compounds that would otherwise lack suitable functional groups. P450s are

among the strongest oxidising agents known in living systems, consequently many drugs can

be oxidised by more than one P450 enzyme.

In humans, seven members of the P450 family account for most of the P450-mediated

metabolism of drugs, these are: CYP1A, CYP2A6, CYP2C9, CYP2C19, CYP2D6, CYP2E1

and CYP3A4 [4]. These are the major enzymes involved in drug biotransformation, known to

act upon the therapeutically most significant drugs. They differ in the proportion of drugs

they process, however. For example CYP1A2, the major metabolising enzyme for

theophylline and caffeine, is thought to metabolise only 4% of known drugs [5]. In contrast,

CYP3A4 accounts for 50–60% [5,6] and CYP2D6 for 30% [5] of drug metabolism. Together

CYP2C9 and CYP2C19 are thought to act upon 10–20% drugs [5,6]; other enzymes are

thought to be responsible for less than 10% of drug metabolism each. Note that another,

however minor, isoenzyme of the CYP3A subfamily exists: CYP3A5. This enzymes displays

very similar substrate specificity to CYP3A4.

Lewis and co-workers [7,8] have investigated the feasibility of a general model to predict

substrate specificity to the different CYP450 enzymes. Using six substrates per enzyme, they

derived a decision tree approach for eight major P450 enzymes. The virtues of this model are

its simplicity and interpretability as it uses only four descriptors: COMPACT ratio, molecular

volume, pKa and log P. Due to its ease of use and application, such a tool is extremely useful.

Its contribution to make drug design more rational and predictable is clear. With regard to

metabolic drug interactions, however, the fact that it does not address the issue of

overlapping substrate specificity may be considered as a shortcoming.

It is essentially in cases of interference with a drug’s major metabolising enzyme(s) that

metabolic drug–drug interactions become clinically significant [9,10]. Models considering

P450 preferences of substrates would therefore constitute a valuable aid in the elucidation of

such interactions in early drug discovery. Alongside models evaluating the extent of

metabolism, i.e. evaluation of binding affinity and rate of P450 mediated metabolism, models

that could exclusively predict and/or describe the predominance of one P450 enzyme over

others in the metabolism of drug compounds could be very useful. They would represent

N. MANGA et al.44

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a Sa

nta

Cru

z] a

t 11:

19 2

0 N

ovem

ber

2014

a tool to support rational drug design both in cases when metabolism by a particular enzyme

is desired and when it should be avoided.

The aim of the present study, therefore, was to predict the predominant P450 enzyme

involved in a drug’s metabolism. The separation between drugs with CYP3A, CYP2D6 and

CYP2C9 as their primary oxidising enzyme was attempted first using rules derived from the

literature. The data set was therefore “reduced” and a novel model was used to assess the

remaining drugs. Interpretation of the descriptors entering the model will be given in light of

the knowledge currently available in order to render the model more transparent.

METHODS

P450 Data

Data for 96 currently marketed drugs were taken from Bertz et al. [4]. The data were each

drug’s pathways of biotransformation ranked in order of prevalence. Only drug compounds

whose main route of metabolism was CYP3A, CYP2D6 or CYP2C9 were selected for the

study, as necessary information could not be collated in sufficient quantity for other P450

enzymes. The name and SMILES strings of the compounds used are noted in Table I

alongside with the name of their primary metabolising P450 enzyme. Data that had no

numerical % for the relative proportion taken by a route of metabolism, or that did not

specify explicitly that one route had a more important role than another, were not considered.

For use in this study, these data were converted into categorical values. “1” was employed to

indicate that a compound is primarily oxidised by an enzyme, a “0” to indicate that it is not.

In a subsequent, independent, process a set of 51 test data of similar characteristics to the

training set was collected from the professional medical literature [11]. These compounds are

also listed in Table I with their SMILES notation and P450 metabolising enzyme.

Physico-chemical Descriptors

A total of 70 descriptors was calculated for each molecule in this study. The descriptors are

summarised in Table II and covered the physico-chemical properties considered to be

important for governing P450 predominance. Note that descriptors with an inter-correlation

of greater than 90% were removed from the set. A full listing of descriptor values is available

upon request from the authors.

Measures of hydrophobicity were obtained from various sources: the calculated logarithm

of the octanol–water partition coefficient (C log P) was obtained from the C log P for

Windows (ver 1.0.0) programme (Biobyte Corporation), the logarithm of the distribution

coefficient (log D) for different pH values was computed using the ACD Labs 6.00 software

(Advanced Chemistry Development Incorporation). From the difference of log D values, an

index of Broensted acidity was also calculated.

Molecular orbital properties were obtained following geometry optimisation in MOPAC6

(for PC) utilising the AM1 Hamiltonian. Continuous H-bond donor and acceptor descriptors

were calculated using the HYBOT programme [12].

A variety of descriptors was calculated using the TSAR for Windows (ver 3.3) molecular

spreadsheet (Accelrys Incorporation). These descriptors included molecular connectivities,

molecular weight, the number of rotatable bonds and the total lipole.

MODELLING DOMINANT HUMAN P450 45

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a Sa

nta

Cru

z] a

t 11:

19 2

0 N

ovem

ber

2014

TA

BL

EI

Dru

gco

mp

ou

nd

su

sed

inth

isst

ud

y,th

eP

45

0en

zym

ep

rim

aril

yre

spo

nsi

ble

for

met

abo

lism

and

thei

rS

MIL

ES

stri

ng

s

Nam

eD

om

ina

nt

CY

PS

MIL

ES

an

nota

tio

n

Tra

inin

gse

tA

lfen

tan

ilC

YP

3A

C1

CN

(CC

N2C

(vO

)N(C

C)N

vN

2)C

CC

1(C

OC

)N(C

(vO

)CC

)c3cc

ccc3

Alp

razo

lam

CY

P3

Ac1

cc2

N3

C(C

)vN

Nv

C3

CNv

C(c

4cc

ccc4

)c2cc

1C

lA

mio

dar

one

CY

P3A

CC

CC

c1oc2

cccc

c2c1

C(v

O)c

1cc

(c(c

(c1

)[I]

)OC

CN

(CC

)CC

)[I]

Am

lodip

ine

CY

P3

AC

lc1

cccc

c1C

2C

(C(v

O)O

C)v

C(C

)NC

(CO

CC

N)v

C2C

(vO

)OC

CA

stem

izole

CY

P3A

Fc3

ccc(

cc3)C

N2c1

cccc

c1Nv

C2N

C4C

CN

(CC

4)C

Cc5

ccc(

OC

)cc5

Bep

ridil

CY

P3A

CC

(C)C

OC

C(N

1C

CC

C1)C

N(c

2cc

ccc2

)Cc3

cccc

c3C

arbam

azep

ine

CY

P3A

NC

(vO

)N1c2

cccc

c2Cv

Cc2

cccc

c12

Cis

apri

de

CY

P3

AC

lc1

c(N

)cc(

OC

)c(c

1)C

(vO

)NC

2C

(OC

)CN

(CC

2)C

CC

Oc3

ccc(

F)c

c3C

lari

thro

my

cin

CY

P3

AC

CC

3O

C(v

O)C

(C)C

(OC

1C

C(C

)(O

C)C

(O)C

(C)O

1)C

(C)C

(OC

2O

C(C

)CC

(C2O

)N(C

)C)C

(C)(

O)C

C(C

)C(v

O)C

(C)C

(OC

)C3(C

)OC

oca

ine

CY

P3

AC

OC

(vO

)C1

C2

CC

C(C

C1

OC

(vO

)c1cc

ccc1

)N2

Cycl

osp

ori

nC

YP

3A

CC

C1N

C(v

O)C

(C(O

)C(C

)C\Cv

C\C

)N(C

)C(v

O)C

(C(C

)C)N

(C)C

(vO

)C(C

C(C

)C)

N(C

)C(v

O)C

(CC

(C)C

)N(C

)C(v

O)C

(C)N

C(v

O)C

(C)N

C(v

O)C

(CC

(C)C

)N(C

)C(v

O)C

(NC

(vO

)C(C

C(C

)C)N

(C)C

(vO

)CN

(C)C

1v

O)C

(C)C

Dex

amet

has

on

eC

YP

3A

CC

1C

C2

C3

CC

C4v

CC

(vO

)Cv

CC

4(C

)C3

(F)C

(O)C

C2

(C)C

1(O

)C(v

O)C

OD

ilti

azem

CY

P3A

CO

c1cc

cc(c

1)C

1S

c2cc

ccc2

N(C

CN

(C)C

)C(v

O)C

1O

C(C

)vO

Dis

op

yra

mid

eC

YP

3A

CC

(C)N

(CC

C(C

(N)v

O)(

c1cc

ccc1

)c1cc

ccc1

)C(C

)CE

rgota

min

eC

YP

3A

c1cc

ccc1

CC

4C

(vO

)N(C

CC

2)C

2C

3(O

)N4

C(v

O)C

(O3

)(C

)NC

(vO

)C5

CN

C6

C(v

C5)C

7v

C8C

(C6

)vC

NC

8v

CCv

C7

Ery

thro

my

cin

CY

P3

AC

CC

1O

C(v

O)C

(C)C

(OC

2C

C(C

)(O

C)C

(O)C

(C)O

2)C

(C)C

(OC

2O

C(C

)CC

(C2O

)N(C

)C)C

(C)(

O)C

C(C

)C(v

O)C

(C)C

(O)C

1(C

)OE

thin

yle

stra

dio

lC

YP

3A

CC

12C

CC

3C

(CC

c4cc

(ccc

43)O

)C2C

CC

1(O

)C#C

Eth

osu

xim

ide

CY

P3

AC

CC

1(C

)CC

(vO

)NC

1v

OE

top

osi

de

CY

P3

AC

Oc1

cc(c

c(c1

O)O

C)C

1C

2C

(CO

C2v

O)C

(OC

2O

C3C

OC

(C)O

C3C

(O)C

2O

)c2cc

3c(

cc21)O

CO

3F

elo

dip

ine

CY

P3

AC

CO

C(v

O)C

1v

C(C

)NC

(vC

(C1

c1cc

cc(c

1C

l)C

l)C

(O)v

O)C

Fen

tany

lC

YP

3A

CC

C(v

O)N

(C1C

CN

(CC

1)C

Cc2

cccc

c2)c

3cc

ccc3

Fin

aste

rid

eC

YP

3A

Ov

C1N

C2

CC

C3

C4

CC

C(C

(vO

)NC

(C)(

C)C

)C4

(C)C

CC

3C

2(C

)Cv

C1

Flu

conaz

ole

CY

P3

AO

C(C

n1

cncn

1)(

Cn

1cn

cn1

)c1

ccc(

cc1

F)F

Flu

tam

ide

CY

P3

AC

C(C

)C(v

O)N

c1cc

c(N

(vO

)vO

)c(c

1)C

(F)(

F)F

Ifo

sfam

ide

CY

P3

AC

lCC

NP

1(v

O)O

CC

CN

1C

CC

lIn

din

avir

CY

P3A

n1cc

(ccc

1)C

N2C

CN

(C(C

(vO

)NC

(C)(

C)C

)C2)C

C(O

)CC

(Cc3

cccc

c3)C

(vO

)NC

4C

(O)C

c5c4

cccc

5Is

radip

ine

CY

P3

AC

OC

(vO

)C1v

C(C

)NC

(vC

(C1

c2cc

cc3

no

nc2

3)C

(vO

)OC

(C)C

)CIt

raco

naz

ole

CY

P3

AOv

C1N

(c4cc

c(cc

4)N

2C

CN

(c5cc

c(cc

5)O

CC

6(C

OC

(O6)(

Cn7ncn

c7)c

3c(

cc(c

c3)C

l)C

l))C

C2)Cv

NN

1C

(CC

)CK

eto

con

azo

leC

YP

3A

CC

(vO

)N(C

1)C

CN

(C1

)c(c

2)c

cc(c

2)O

CC

(CO

3)O

C3

(CN

4Cv

NCv

C4

)c5c(

Cl)

cc(C

l)cc

5L

idoca

ine

CY

P3A

CC

N(C

C)C

C(v

O)N

c1c(

cccc

1C

)CL

ora

tad

ine

CY

P3

Ac1

cc2

C(v

C3C

CN

(C(v

O)O

CC

)CC

3)c

4ncc

cc4C

Cc2

cc1C

lM

eth

ado

ne

CY

P3

AC

CC

(vO

)C(C

C(C

)N(C

)C)(

c1cc

ccc1

)c1cc

ccc1

Met

hy

lpre

dn

iso

lon

eC

YP

3A

CC

1C

C2

C3

CC

C(O

)(C

(vO

)CO

)C3

(C)C

C(O

)C2

C2

(C)Cv

CC

(vO

)Cv

C1

2M

iconaz

ole

CY

P3A

c1ncc

n1C

C(c

2cc

c(C

l)cc

2C

l)O

Cc3

c(C

l)cc

(Cl)

cc3

Mid

azola

mC

YP

3A

Cc1

ncc

2n1c1

ccc(

cc1C

(vN

C2)c

1cc

ccc1

F)C

l

N. MANGA et al.46

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a Sa

nta

Cru

z] a

t 11:

19 2

0 N

ovem

ber

2014

TA

BL

EI

–co

nti

nu

ed

Nam

eD

om

ina

nt

CY

PS

MIL

ES

an

no

tati

on

Nef

azodone

CY

P3A

Clc

1cc

(ccc

1)N

2C

CN

(CC

2)C

CC

N3C

(vO

)N(C

(vN

3)C

C)C

CO

c8cc

ccc8

Nic

ard

ipin

eC

YP

3A

CO

C(v

O)C

1v

C(C

)NC

(vC

(C1c1

cccc

(c1)[

Nþ

]([O

)vO

)C(v

O)O

CC

N(C

)Cc1

cccc

c1)C

Nif

edip

ine

CY

P3

AC

OC

(vO

)C1v

C(C

)NC

(vC

(C1c1

cccc

c1[N

þ](

[O2

])v

O)C

(vO

)OC

)CN

imo

dip

ine

CY

P3

Ac1

c(N

(vO

)vO

)ccc

c1C

2C

(C(v

O)O

C(C

)C)v

C(C

)NC

(C)v

C2

C(v

O)O

CC

OC

Nis

old

ipin

eC

YP

3A

CC

(C)C

OC

(vO

)C2v

C(C

)NC

(C)v

C(C

(vO

)OC

)C2c1

c(cc

(cc1

N(v

O)v

O))

Nit

ren

dip

ine

CY

P3

Ac1

ccc(

N(v

O)v

O)c

c1C

2C

(C(v

O)O

C)v

C(C

)NC

(C)v

C2C

(vO

)OC

CP

imozi

de

CY

P3A

Fc1

ccc(

cc1)C

(CC

CN

2C

CC

(CC

2)n

3c(v

O)[

nH

]c4cc

ccc3

4)c

5cc

c(F

)cc5

Pre

dnis

olo

ne

CY

P3

AC

C1

2C

C(O

)C3

C(C

CC

4v

CC

(vO

)Cv

CC

34C

)C2

CC

C1

(O)C

(vO

)CO

Quin

ine

CY

P3A

CO

c1cc

c2ncc

c(c2

c1)C

(O)C

1C

C2C

CN

1C

C2Cv

CR

apam

yci

nC

YP

3A

C3(C

1C

CC

CN

1C

(vO

)C(C

2(O

C(C

CC

2C

)CC

(C(C

)vC

Cv

CCv

CC

(CC

(C(C

(C(C

(vC

C(C

(CC

(O3)C

(CC

4C

CC

(C(C

4)O

C)O

)C)v

O)

C)C

)O)O

C)v

O)C

)C)O

C)O

)vO

)vO

Rif

abu

tin

CY

P3

Ac2

1c4

OC

(C1v

O)(

OCv

CC

(C(C

(C(C

(C(C

(O)C

(Cv

CCv

C(C

(Nc3

c(c(

c2c(

c3O

)c(O

)c4C

)O)Cv

NN

5C

CN

(CC

5)C

)vO

)C)C

)C)O

)C)

OC

(vO

)C)C

)OC

)CR

iton

avir

CY

P3

AN

1v

CS

C(v

C1

)CO

C(v

O)N

C(C

c2cc

ccc2

)C(O

)CC

(Cc3

cccc

c3)N

C(v

O)C

(C(C

)C)N

C(v

O)N

(C)C

C4v

CS

C(C

(C)C

)vN

4S

aquin

avir

CY

P3

Ac1

cccc

2c1

nc(

cc2

)C(v

O)N

C(C

(vO

)N)C

(vO

)NC

(Cc3

cccc

c3)C

(O)C

N4

C(C

(vO

)NC

(C)(

C)C

)CC

5C

(C4)C

CC

C5

Ser

tral

ine

CY

P3A

Clc

3c(

ccc(

c3)C

1(c

2c(

cccc

2)C

(CC

1)(

NC

)))C

lS

imvas

tati

nC

YP

3A

CC

C(C

)(C

)C(v

O)O

C1

CC

(C)Cv

C(C

12

)Cv

CC

(C)C

2C

CC

(O)C

C(O

)CC

(vO

)OT

acro

lim

us

CY

P3

AO

C1C

(OC

)CC

(CC

1)Cv

C(C

)CC

(C)C

9C

C(v

O)C

(CCv

C)Cv

C(C

)CC

(C)C

C(O

C)C

3O

C(O

(C(C

)CC

3(O

C))

C(v

O)C

(vO

)N4

C(C

CC

C4

)C(v

O)O

9T

amox

ifen

CY

P3

Ac1

cccc

c1C

(CC

)vC

(c2

cccc

c2)c

3cc

c(O

CC

N(C

)C)c

c3T

erfe

nad

ine

CY

P3A

c1cc

ccc1

C(O

)(c2

cccc

c2)C

(C3)C

CN

(C3)C

CC

C(O

)c4cc

c(C

(C)(

C)C

)cc4

Tes

tost

ero

ne

CY

P3

AC

C3

4C

CC

1C

(CC

C2v

CC

(vO

)CC

C12C

)C3C

CC

4O

Tri

azola

mC

YP

3A

Cc1

nnc2

n1c1

ccc(

cc1C

(vN

C2)c

1cc

ccc1

Cl)

Cl

Ver

apam

ilC

YP

3A

CO

c1cc

c(cc

1O

C)C

CN

(C)C

CC

C(C

#N

)(C

(C)C

)c1cc

c(c(

c1)O

C)O

CV

inb

last

ine

CY

P3

Ac1

cccc

2c1

NC

3v

C2C

CN

4C

C(O

)(C

C)C

C(C

4)C

C3(C

(vO

)OC

)c5c(

OC

)cc6

c(c5

)C7(C

C8)C

(N6(C

))C

(O)(

C(v

O)O

C)C

(OC

(vO

)C)

C9(C

C)C

7N

8C

Cv

C9

Vin

cris

tin

eC

YP

3A

c1cc

cc2c1

NC

3v

C2C

CN

4C

C(O

)(C

C)C

C(C

4)C

C3(C

(vO

)OC

)c5c(

OC

)cc6

c(c5

)C7(C

C8)C

(N6(Cv

O))

C(O

)(C

(vO

)OC

)C(O

C(v

O)

C)C

9(C

C)C

7N

8C

Cv

C9

Zid

ov

ud

ine

CY

P3

AC

C1v

CN

(C2

CC

(Nv

[Nþ

]v[N

2])

C(C

O)O

2)C

(vO

)NC

1v

OZ

olp

idem

CY

P3

AC

c1cc

c(cc

1)C

2v

C(C

C(v

O)N

(C)C

)N3Cv

C(C

)Cv

CC

3v

N2

Cod

ein

eC

YP

2D

6C

Oc1

ccc2

c3c1

OC

1C

(O)Cv

CC

4C

(C2)N

CC

C143

Des

ipra

min

eC

YP

2D

6C

NC

CC

N1c2

cccc

c2C

Cc2

cccc

c12

Dex

fenfl

ura

min

eC

YP

2D

6C

CN

C(C

)Cc1

cc(C

(F)(

F)F

)ccc

1E

nca

inid

eC

YP

2D

6C

Oc1

ccc(

cc1)C

(vO

)Nc1

c(cc

cc1)C

C2N

(C)C

CC

C2

Fle

cain

ide

CY

P2D

6F

C(F

)(F

)CO

c1cc

c(c(

c1)C

(vO

)NC

C1

CC

CC

N1

)OC

C(F

)(F

)FF

luoxet

ine

CY

P2D

6C

NC

CC

(Oc1

ccc(

cc1)C

(F)(

F)F

)c1cc

ccc1

Flu

vo

xam

ine

CY

P2

D6

CO

CC

CC

C(v

NO

CC

N)c

1cc

c(cc

1)C

(F)(

F)F


Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a Sa

nta

Cru

z] a

t 11:

19 2

0 N

ovem

ber

2014

TA

BL

EI

–co

nti

nu

ed

Nam

eD

om

ina

nt

CY

PS

MIL

ES

an

no

tati

on

Hal

op

erid

ol

CY

P2

D6

OC

1(C

CN

(CC

CC

(vO

)c2cc

c(cc

2)F

)CC

1)c

1cc

c(cc

1)C

lH

yd

roco

do

ne

CY

P2

D6

CO

c1cc

c2C

C5

C3

CC

C(v

O)C

4O

c1c2

C3

4C

CN

5C

Map

roti

line

CY

P2D

6c1

cccc

2c1

C4c3

c(C

2(C

CC

NC

)CC

4)c

ccc3

Met

ham

ph

etam

ine

CY

P2

D6

CN

C(C

)Cc1

cccc

c1M

etopro

lol

CY

P2D

6C

OC

Cc1

ccc(

cc1)O

CC

(O)C

NC

(C)C

Mex

ilet

ine

CY

P2D

6C

C(N

)Coc1

c(cc

cc1C

)CN

ort

rip

tyli

ne

CY

P2

D6

CN

CC

Cv

C1c2

cccc

c2C

Cc2

cccc

c12

Ox

yco

do

ne

CY

P2

D6

CO

c1cc

c2C

C5

C3

(O)C

CC

(vO

)C4

Oc1

c2C

34

CC

N5C

Par

oxet

ine

CY

P2D

6F

c4cc

c(cc

4)C

1C

(OC

c2cc

3c(

cc2)O

CO

3)C

NC

C1

Per

phen

azin

eC

YP

2D

6O

CC

N4C

CN

(CC

CN

2c1

cccc

c1S

c3cc

c(C

l)cc

23)C

C4

Pro

paf

eno

ne

CY

P2

D6

c1cc

ccc1

CC

C(v

O)c

2cc

ccc2

OC

C(O

)CN

CC

CR

isp

erid

on

eC

YP

2D

6Ov

C1

N2

CC

CC

C2v

NC

(C)v

C1C

CN

3C

CC

(C4v

NO

c5cc

(F)c

cc45)C

C3

Thio

ridaz

ine

CY

P2D

6C

Sc4

ccc3

Sc1

cccc

c1N

(CC

C2C

CC

CN

2C

)c3c4

Tim

olo

lC

YP

2D

6C

C(C

)(C

)NC

C(O

)CO

c1nsn

c1N

1C

CO

CC

1T

ram

adol

CY

P2D

6C

Oc1

cccc

(c1)C

2(O

)CC

CC

C2C

N(C

)CT

razo

done

CY

P2D

6C

lc1cc

cc(c

1)N

1C

CN

(CC

CN

2Nv

C3

Cv

CCv

CN

3C

2v

O)C

C1

Tri

mip

ram

ine

CY

P2

D6

CN

(C)C

C(C

)CN

1c2

cccc

c2C

Cc3

cccc

c13

Ven

lafa

xin

eC

YP

2D

6C

N(C

C(c

2cc

c(cc

2)O

C)C

1(C

CC

CC

1)O

)CD

iclo

fen

acC

YP

2C

9O

C(v

O)C

c1cc

ccc1

Nc2

c(C

l)cc

cc2C

lD

ronab

inol

CY

P2C

9c1

2C

3C

(C(O

c1cc

(cc2

O)C

CC

CC

)(C

)C)C

CC

(vC

3)C

Flu

rbip

rofe

nC

YP

2C

9c1

cccc

c1c2

c(F

)cc(

C(C

)C(v

O)O

)cc2

Gli

mep

irid

eC

YP

2C

9C

CC

(vC

(C)C

1)C

(vO

)N1

C(v

O)N

CC

c2cc

c(cc

2)S

(vO

)(v

O)N

C(v

O)N

C3

CC

C(C

C3

)CIb

up

rofe

nC

YP

2C

9C

C(C

(C)v

O)c

1cc

c(cc

1)C

(C)(

C)C

Indom

ethac

inC

YP

2C

9C

Oc1

ccc2

n(c

(c(c

2c1

)CC

(O)v

O)C

)C(v

O)c

1cc

c(cc

1)C

lN

apro

xen

CY

P2C

9C

Oc1

ccc2

cc(c

cc2c1

)C(C

)C(O

)vO

Ph

eny

toin

CY

P2

C9

Ov

C1

NC

(vO

)C(N

1)(

c2cc

ccc2

)c3cc

ccc3

Pir

ox

icam

CY

P2

C9

OC

1v

C(N

(C)S

(vO

)(v

O)c

2cc

ccc2

1)C

(vO

)Nc1

cccc

n1

To

lbu

tam

ide

CY

P2

C9

CC

CC

NC

(vO

)NS

(vO

)(v

O)c

1cc

c(cc

1)C

To

rsem

ide

CY

P2

C9

CC

(C)N

C(v

O)N

S(v

O)(v

O)c

1cn

ccc1

Nc2

cccc

(C)c

2

Tes

tS

etA

lpid

emC

YP

3A

c1(n

3c(

nc1

c2cc

c(cc

2)C

l)cc

c(c3

)Cl)

CC

(N(C

CC

)CC

C)v

OA

stem

izole

CY

P3A

n2(c

1c(

cccc

1)n

c2N

C3C

CN

(CC

3)C

Cc4

ccc(

cc4)O

C)C

c5cc

c(cc

5)F

Aza

tadin

eC

YP

3A

C2(c

1ncc

cc1C

Cc3

c2cc

cc3)v

C4C

CN

(CC

4)C

Bud

eso

nid

eC

YP

3A

C45

(C1

(C(C

2C

(C(C

1)O

)C3

(C(C

C2

)vC

C(Cv

C3

)vO

)C)C

C4

OC

(O5

)CC

C)C

)C(C

O)v

OC

isap

ride

CY

P3A

c3(C

(NC

1C

(CN

(CC

1)C

CC

Oc2

ccc(

cc2)F

)OC

)vO

)c(c

c(c(

c3)C

l)N

)OC

Colc

hic

ine

CY

P3

AC

Oc3

cc2

CC

C(N

C(C

)vO

)c1

cc(v

O)c

(OC

)ccc

1c2

c(O

C)c

3O

C

N. MANGA et al.48

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a Sa

nta

Cru

z] a

t 11:

19 2

0 N

ovem

ber

2014

TA

BL

EI

–co

nti

nu

ed

Na

me

Do

min

an

tC

YP

SM

ILE

Sa

nn

ota

tio

n

Co

rtis

ol

CY

P3

AC

13

(C(C

(vO

)CO

)(C

CC

1C

2C

CCv

4C

(C2

C(C

3)O

)(C

CC

(Cv

4)v

O)C

)O)C

Cy

clo

ben

zap

rine

CY

P3

AC

1(c

3c(

Cv

Cc2

c1cc

cc2)c

ccc3

)vC

CC

N(C

)CE

bas

tine

CY

P3A

C(c

1cc

ccc1

)(c2

cccc

c2)O

C3C

CN

(CC

3)C

CC

C(c

4cc

c(cc

4)C

(C)(

C)C

)vO

En

alap

ril

CY

P3

AN

2(C

(C(N

C(C

(vO

)OC

C)C

Cc1

cccc

c1)C

)vO

)C(C

(vO

)O)C

CC

2E

thin

yle

stra

dio

lC

YP

3A

CC

34C

CC

1C

(CC

c2cc

(O)c

cc12)C

3C

CC

4(O

)C#C

Ges

tod

ene

CY

P3

AC

42

(C(C

1C

CCv

3C

(C1

CC

2)C

CC

(Cv

3)v

O)Cv

CC

4(C

#C

)O)C

CG

lib

encl

amid

eC

YP

3A

S(N

C(N

C1

CC

CC

C1

)vO

)(c2

ccc(

cc2)C

CN

C(c

3c(

ccc(

c3)C

l)O

C)v

O)(v

O)v

OIr

ino

teca

nC

YP

3A

C1

(OC

C2v

C(C

1(C

C)O

)Cv

C3

c4c(

CN

3C

2v

O)c

(c5c(

n4)c

cc(c

5)O

C(v

O)N

6C

CC

(CC

6)N

7C

CC

CC

7)C

C)v

OL

evo

no

rges

trel

CY

P3

AC

42

(C(C

1C

CCv

3C

(C1

CC

2)C

CC

(Cv

3)v

O)C

CC

4(C

#C

)O)C

CL

isu

ride

CY

P3

AN

1Cv

C2C

(C1v

CCv

C3

)C3

C4v

CC

(CN

(C)C

4C

2)N

C(v

O)N

(CC

)CC

Mif

epri

stone

CY

P3A

C23v

C1C

(vC

C(C

C1

)vO

)CC

C2C

5C

(CC

3c4

ccc(

cc4)N

(C)C

)(C

(C#C

C)(

O)C

C5)C

Pri

mid

one

CY

P3

AC

1(C

(NC

NC

1v

O)v

O)(

c2cc

ccc2

)CC

Quer

ceti

nC

YP

3A

C1(v

C(O

c2c(

C1v

O)c

(cc(

c2)O

)O)c

3cc

(O)c

(cc3

)O)O

Qu

etia

pin

eC

YP

3A

Cv

1(c

3c(

Sc2

c(Nv

1)c

ccc2

)ccc

c3)N

4C

CN

(CC

4)C

CO

CC

OS

erti

ndole

CY

P3A

c2(c

1c(

ccc(

c1)C

l)n(c

2)c

3cc

c(cc

3)F

)C4C

CN

(CC

4)C

CN

5C

(NC

C5)v

OS

ulfi

dim

idin

eC

YP

3A

S(N

c1ncc

cc1)(

c2cc

c(cc

2)N

)(v

O)v

OT

ore

mif

ene

CY

P3

AC

(vC

(c1cc

ccc1

)CC

Cl)

(c2cc

c(cc

2)O

CC

N(C

)C)c

3cc

ccc3

Am

iflam

ine

CY

P2D

6N

(C)(

C)c

1cc

c(c(

c1)C

)CC

(C)N

Bu

fura

lol

CY

P2

D6

o1

c(cc

2c1

c(cc

c2)C

C)C

(CN

C(C

)(C

)C)O

Chlo

rphen

iram

ine

CY

P2D

6C

N(C

)CC

C(c

1cc

c(C

l)cc

1)c

2cc

ccn2

Cin

nar

izin

eC

YP

2D

6N

3(C

(c1cc

ccc1

)c2cc

ccc2

)CC

N(C

C3)C

Cv

Cc4

cccc

c4D

ebri

soq

uin

eC

YP

2D

6N

2(C

(vN

)N)C

c1cc

ccc1

CC

2D

epre

nyl

CY

P2D

6C

(#C

)CN

(C(C

c1cc

ccc1

)C)C

Flu

nar

izin

eC

YP

2D

6N

3(C

(c1cc

c(cc

1)F

)c2cc

c(cc

2)F

)CC

N(C

C3)C

Cv

Cc4

cccc

c4F

luphen

azin

eC

YP

2D

6O

CC

N4C

CN

(CC

CN

2c1

cccc

c1S

c3cc

c(cc

23)C

(F)(

F)F

)CC

4In

dora

min

CY

P2D

6c1

cccc

2c1

ncc

2C

CN

3C

CC

(CC

3)N

C(v

O)c

4cc

ccc4

Lo

bel

ine

CY

P2

D6

N2

(C(C

C(v

O)c

1cc

ccc1

)CC

CC

2C

C(c

3cc

ccc3

)O)C

Met

hoxyphen

amin

eC

YP

2D

6C

Oc1

c(cc

cc1)C

C(C

)NC

Mia

nse

rin

CY

P2D

6N

24C

(c1c(

cccc

1)C

c3c2

cccc

3)C

N(C

C4)C

Min

apri

ne

CY

P2D

6n2nc(

c1cc

ccc1

)cc(

c2N

CC

N3C

CO

CC

3)C

Morp

hin

eC

YP

2D

6C

N1C

CC

24C

3O

c5c(

O)c

cc(C

C1C

2Cv

CC

3O

)c4

5P

erh

exil

ine

CY

P2

D6

C1

CC

CC

(N1

)CC

(C3

CC

CC

C3

)C2

CC

CC

C2

Phen

form

inC

YP

2D

6c1

cccc

c1C

CN

C(v

N)N

C(v

N)N

Rem

oxip

rid

eC

YP

2D

6c1

(c(c

(ccc

1O

C)B

r)O

C)C

(NC

C2

N(C

CC

2)C

C)v

OS

par

tein

eC

YP

2D

6C

21

C3

CN

4C

CC

CC

4C

(CN

1C

CC

C2

)C3

Tro

pis

etro

nC

YP

2D

6C

1C

(N2

(C))

CC

C2C

C1

OC

(vO

)c3c4

c(nc3

)ccc

c4Z

ucl

open

thix

ol

CY

P2D

6S

1c2

c(cc

cc2)C

(c3c1

cccc

3)v

CC

CN

4C

CN

(CC

4)C

CO


Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a Sa

nta

Cru

z] a

t 11:

19 2

0 N

ovem

ber

2014

TA

BL

EI

–co

nti

nued

Nam

eD

om

ina

nt

CY

PS

MIL

ES

an

nota

tio

n

Ace

clo

fen

acC

YP

2C

9c1

(c(C

C(O

CC

(O)v

O)v

O)c

ccc1

)Nc2

c(cc

cc2C

l)C

lF

luvas

tati

nC

YP

2C

9c2

(c1c(

cccc

1)n

(c2Cv

CC

(CC

(CC

(O)v

O)O

)O)C

(C)C

)c3cc

c(cc

3)F

Lorn

oxic

amC

YP

2C

9c1

3c(

S(N

(C(v

C1

O)C

(Nc2

cccc

n2

)vO

)C)(v

O)v

O)c

c(s3

)Cl

Mef

enam

icac

idC

YP

2C

9c2

(c(N

c1c(

c(C

)ccc

1)C

)ccc

c2)C

(vO

)OP

hen

ylb

uta

zon

eC

YP

2C

9C

CC

CC

1C

(vO

)N(N

(C1v

O)c

2cc

ccc2

)c3cc

ccc3

Su

lfam

eth

izo

leC

YP

2C

9C

c2n

nc(

NS

(vO

)(v

O)c

1cc

c(N

)cc1

)s2

Su

pro

fen

CY

P2

C9

c2(C

(c1

ccc(

C(C

(vO

)O)C

)cc1

)vO

)scc

c2T

eno

xic

amC

YP

2C

9S

1(c

3c(

C(v

C(N

1C

)C(N

c2ncc

cc2)v

O)O

)scc

3)(v

O)v

OT

rim

ethopri

mC

YP

2C

9C

Oc2

cc(C

c1cn

c(N

)nc1

N)c

c(O

C)c

2O

CZ

afirl

ukas

tC

YP

2C

9c1

(ccc

cc1S

(NC

(c2cc

(c(c

c2)C

c4c3

cc(c

cc3n(c

4)C

)NC

(OC

5C

CC

C5)v

O)O

C)v

O)(v

O)v

O)C

N. MANGA et al.50

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a Sa

nta

Cru

z] a

t 11:

19 2

0 N

ovem

ber

2014

In addition WHIM descriptors and BCUT descriptors were calculated using the Dragon

software (ver 2.1) available on the internet. Electrotopological state indices were derived

from the QSAR-is (ver 1.1) modelling package (SciVision Incorporation).

Statistical Analysis

Statistical analysis was performed using the TSAR for Windows software (ver 3.3). To model

the data the Formal Inference-based Recursive Modelling (FIRM) method, as well as visual

inspection of the data, were applied.

Visual Inspection of the Data to Identify Qualitative Distributions

This consisted of observing the distribution of a descriptor in the data with respect to P450

metabolism. This was performed utilising simple 2D plots of the relevant descriptor.

Recursive Partitioning (FIRM Method) Analysis

FIRM is a form of recursive partitioning or decision tree analysis where a large set of data is

split into subgroups based on important predictor variables. The response data are split by

TABLE II List of physicochemical and structural descriptors used in the study

Total energyHeat of formationIonisation potentialEnergy of the lowest unoccupied molecular orbitalTotal dipole momentlog PLog of the distribution coefficient (log D) at pH 5.0 and 7.4Difference log D7.4–log D5

Total lipole4th order cluster molecular connectivity index4th order valence-corrected path/cluster molecular connectivity indexRotatable bondsMolecular weightCounts of H-bond acceptors (calculated from both TSAR and HYBOT)Highest H-bonding factor values for oxygen, nitrogen and hydrogen atomsCounts of H-bond donors with H-bonding factors higher than 1.5, 2 and 2.5

BCUT metrics2nd, 3rd and 5th highest eigenvalues of the Burden matrix weighted by atomic masses2nd and 3rd lowest eigenvalues of the Burden matrix weighted by atomic masses2nd highest and 1st lowest eigenvalues of the Burden matrix weighted by van der Waals volume1st highest and 4th lowest eigenvalues of the Burden matrix weighted by polarisabilities

WHIM descriptors2nd component symmetry directional WHIM index2nd component shape directional WHIM index weighted by atomic masses1st through 3rd component accessibility directional WHIM indices weighed by atomic masses1st and 2nd component shape directional WHIM indices weighted by van der Waals volumes2nd and 3rd component symmetry directional WHIM indices weighted by van der Waals volumes1st through 3rd component accessibility directional WHIM indices weighted by van der Waals volumes2nd component size directional WHIM index weighted by Sanderson electronegativities2nd component symmetry directional WHIM index weighted by Sanderson electronegativities1st through 3rd component accessibility directional WHIM indices weighted by Sanderson electronegativities1st component symmetry directional WHIM index weighted by atomic polarisabilities1st through 3rd component accessibility directional WHIM indices weighted by atomic electrotopological statesSum of electrotopological state indices for the following groups:RCH3, R2CH2, CR4, vCHR2, vCR2, Aromatic CH, CR, RNH2, R2NH, vNR, R3N, Aromatic N, ROH, vO,R2O, RF, RCl


Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a Sa

nta

Cru

z] a

t 11:

19 2

0 N

ovem

ber

2014

each predictor variable into up to ten groups. A p-value is computed for each possible split,

based on the predictor variable. The process is repeated for each subgroup of response data.

The analysis stops when a subgroup cannot be split any longer.

RESULTS

The metabolic routes of 96 drugs were analysed in this study. Of these, CYP3A metabolism

is known to prevail over other P450 isozymes’ for 60 drugs, CYP2D6 for 25 and CYP2C9 for

11. The data were first assessed by visual inspection to identify any obvious trends.

Visual Inspection of the Data (Removal of Compounds with a Particular Metabolic

Profile that can be Identified by Simple SARs)

An initial attempt was made to split the data using knowledge derived from the literature.

This was performed in an attempt to reduce down the data set to a smaller pool of compounds

whose further modelling would require the use of a more multivariate tool. Simple 2D plots

of the 96 data points against single molecular descriptors for molecular weight,

hydrophobicity and Broensted acidity were investigated. These properties are generally

known to be indicative of substrates of CYP3A and CYP2C9 [13]. Analysis and

interpretation of these plots revealed that in the present study, these descriptors are also

useful to indicate the prevalence of one P450 isozyme over another. The splits obtained are

shown in Figs. 1 and 2.

The log D7.4 was found to be a more useful hydrophobicity descriptor than log P or other

log D’s in that it was able to split a greater number of compounds successfully. All

compounds with molecular weight greater than 500 were primarily oxidised by CYP3A, as

were all compounds with log D7:4 . 4:1 (except for one, dronabinol). Overall, 26 “CYP3A4”

compounds could be separated using these two descriptors (Fig. 2).

The arithmetic difference between log D7.4 and log D5 was chosen as the index of the

Broensted acidity, basicity or neutrality of the molecules. This particular measure of acidity

was selected because the information that can be derived from it is qualitative as well as

FIGURE 1 Distribution of Broensted acidity, as measured by (log D7.4–log D5), for the whole data set of 96compounds. Acidic compounds are indicated by negative values on the acidity scale.

N. MANGA et al.52

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a Sa

nta

Cru

z] a

t 11:

19 2

0 N

ovem

ber

2014

quantitative in nature. The algebraic sign on the index is a reflection of whether a molecule is

acidic or basic (a value of 0 indicates a neutral molecule) while its magnitude is directly

proportional to the strength of a molecule’s acidity or basicity. Thus, this index provides all

the functionalities of a continuous descriptor. Acidity was a good indicator for compounds

predominantly undergoing metabolic oxidation by CYP2C9. It is clearly visible from Fig. 1

that every acidic compound in the dataset falls within the “CYP2C9” category.

As expected, the combined use of acidity, log D7.4 and molecular weight enables to

filter out compounds primarily oxidised by CYP3A and CYP2C9. The generally accepted

rationale for this is as follows [14,15]. The active site of the CYP2C9 enzyme contains a

positively charged group (possibly arginine) that is pivotal in substrate binding. Hence,

interaction with negatively charged compounds is favoured. Likewise, CYP3A is known

to be able to accommodate bulkier compounds than most other P450 enzymes. This

explains the preference for compounds with higher molecular weight. A marked

lipophilic character is also a known feature of prototypic CYP3A substrates [16]. As a

logical consequence compounds with higher log D ðlog D7:4 . 4:1Þ are, to a considerable

extent, primarily oxidised by CYP3A. This is probably due in part to the fact that the

likelihood of the presence of ionisable functional groups decreases along with log D.

These functional groups are critical moieties in CYP2C9 and CYP2D6 isozyme

substrates [6].

Thus, the cursory examination of the data using the three descriptors described above

enabled the identification of 37 compounds. Of these, 26 were correctly classified as having

CYP3A as primary oxidising enzyme and ten as having CYP2C9. Dronabinol was

incorrectly classified as being metabolised by CYP3A, owing to its high value of log D7.4 and

the fact that although it is primarily oxidised by CYP2C9 it is a neutral molecule. These 37

compounds, including dronabinol, were excluded from the dataset for further analysis in

order to avoid noise in the data. It should be noted that there are no “CYP2C9” drugs in

the remaining pool of compounds. Hence, the following is a description of the elaboration of

a model for the separation between CYP3A and CYP2D6 prevalence.

FIGURE 2 Distribution of molecular weight and log D7.4 as a function of the compounds’ predominant oxidisingP450 isozyme. Compounds with MW . 500 are all primarily oxidised by CYP3A. Compounds with log D7:4 , 4:1are all primarily oxidised by CYP3A apart from one (dronabinol).


Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a Sa

nta

Cru

z] a

t 11:

19 2

0 N

ovem

ber

2014

Development of a Model for the Separation between CYP3A and CYP2D6 Prevalence

In the second stage of the analysis, the building of a hierarchical model for the reduced data

set was attempted. The FIRM method was employed for feature selection. To achieve this,

the data were randomly divided into five subsets of approximately equal size. Model building

was performed on combinations of four subsets, leaving one out at a time to test each model,

until each subset had been left out once. This is, in effect, equivalent to a 20% leave-one-out

cross-validation. This procedure was employed in order to ensure that the feature selection

was robust [17]. The name of the descriptors entering the models and the frequency with

which they appeared are given in Table III. L3v (Dimensions along 3rd principal axis, van der

Waals weighted [directional WHIM descriptor]) and HiN (highest H-bond factor strength on

a nitrogen atom) enter the models repeatedly, both in combination and separately, suggesting

that they are useful to split “CYP3A4” from “CYP2D6” compounds. The number of times

these two descriptors appear in the models is clearly higher than for any other descriptors

shown in Table II. Despite the prevalence of these two parameters, the decision criteria for a

descriptor to be selected for the novel model were set as follows:

(1) The FIRM models in which they enter should yield a minimum of 85% accurate

classification in training.

(2) The descriptor in question should be found to be robust. The limit of 85% was set

arbitrarily and was not felt to be too stringent as classifying was between only two

categories.

Five FIRM-derived models were found to satisfy the first selection criterion. These five

models were of two types, as illustrated in Fig. 3. Corroborating the results from the

evaluation of the frequency of appearance in the model (Table III), four of the models used

the descriptors HiN and L3v (Fig. 3). They were variants from each other, with cut-off

values differing only slightly from one to the other. One model used HiN and log P (Fig. 3b).

Judging from the frequency results shown in Table III, the use of log P is somewhat

unexpected. However, it was felt that this descriptor could be used as its robustness was

warranted by the known mechanistic link between CYP3A and lipophilicity [16,18]. This is

illustrated in Fig. 4 where log P is plotted against L3v for the entire dataset of 96

compounds. “CYP2D6” and “CYP2C9” compounds are clearly segregated in the “lower”

log P ðlog P , 4:48Þ regions of the graph. Log P and log D may be correlated and in some

datasets the degree of correlation may be high, but in such cases log D7.4 will suffice to

classify those compounds that would have otherwise required log P. In the present study, the

correlation between log P and log D7.4 was 53% and so concerns regarding log P are not

relevant.

Figure 4 can also be taken as further evidence of the efficacy of the descriptor L3v to

separate compounds oxidised primarily by CYP3A. Here “CYP2D6” and “CYP2C9”

compounds are also concentrated in the lower L3v regions. This descriptor represents the

three-dimensional, (van der Waals) volume weighted, projection of a molecule along

invariant molecular axes. It encodes its spatial arrangement and ability to fit into the active

site of the P450’s considered here. It is de facto a surface obtained from a “slice’ of

a molecule. Accordingly, it is logical that high L3v values are indicative of compounds

predominantly oxidised by CYP3A. The collinearity between L3v and molecular weight and

all other relevant descriptors used in this study is shown in Table IV. The value of 69% for

N. MANGA et al.54

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a Sa

nta

Cru

z] a

t 11:

19 2

0 N

ovem

ber

2014

the correlation coefficient between L3v and molecular weight indicates the relative

independence of the two descriptors.

HiN is the third descriptor entering the models. It is a measure of the highest H-bonding

strength of a nitrogen atom in a molecule derived from the enthalpy of H-bond complex

formation. Its calculation is based on a predictive comparison with a reference H-bond

acceptor [12] on a scale from 0 to 2.5. Nitrogen atoms able to build stronger H-bonds are

more basic because of the higher electronic density around them. Hence, HiN can be

interpreted in terms of the strength of the basicity of a nitrogen atom: the most basic nitrogen

atom in a molecule. Such a measure of basicity is of particular interest. Indeed, CYP2D6

substrates have to possess a basic nitrogen atom [5,18] and HiN could help determine

a minimum basicity for interaction to still be possible. Within the scope of this study,

a minimum basicity level could be determined above which compounds will be primarily

oxidised by CYP2D6, provided that their other profiles (lipophilicity and bulk) do not favour

TABLE III List of the descriptors entering the models built using the FIRM method on the reduced data set, for theentirety of the leave-20% out cross-validation experiment

DescriptorNumber of “sets”*

entered (max possible=5)Total number of appearancesin models (max possible=40) Descriptor definition

BEHm2 1 1 Highest eigenvalue number 2 of Burdenmatrix, weighted by atomicmasses (BCUT)

BEHm3 1 1 Highest eigenvalue number 3 of Burdenmatrix, weighted by atomicmasses (BCUT)

BELp4 2 9 Lowest eigenvalue number 4 of Burdenmatrix, weighted by polarisabilities(BCUT)

E2m 1 2 Emptiness along 2nd principal axis,mass weighted (WHIM descriptor)

E2v 1 1 Emptiness along 2nd principal axis,van der Waals weighted(WHIM descriptor)

G1p 1 1 Symmetry along 1st principal axis,polarisability weighted(WHIM descriptor)

G2u 2 3 Symmetry along 2nd principal axis,unweighted (WHIM descriptor)

G2v 1 1 Symmetry along 2nd principal axis,van der Waals weighted(WHIM descriptor)

HiN 5 29 N atom with the highest H-bondfactor strength

HiO 2 2 O atom with the highest H-bondfactor strength

L3v 4 21 Dimensions along 3rd principal axis,van der Waals weighted(WHIM descriptor)

Log D5 1 1 Log water/octanol distribution coefficientat pH 5

Log D7.4 1 1 Log water/octanol distribution coefficientat pH 7.4

Log P 2 2 Log water/octanol partion coeficientSdO 1 1 Sum of electrotopological indices for

CvO moiety

*FIRM analysis was conducted on five series of data sets. “Set” refers here to a subset comprising a particular combination ofapproximately 80% of the compounds of the reduced dataset (20% being randomly left out for testing).Note: The percentages are approximate owing to the fact that the number of compounds to be split by 20% was not a multiple of 5.


Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a Sa

nta

Cru

z] a

t 11:

19 2

0 N

ovem

ber

2014

CYP3A oxidation (Fig. 3). Compounds with a HiN value below this threshold are acted upon

preferably by CYP3A. All three descriptors were selected for use in the novel model as they

satisfied the two criteria laid down above.

The FIRM-derived models were tested on their allocated test sets. The test results for the

model using log P (Fig. 3b) and the best result from the four models of the type illustrated in Fig. 3

are shown in Table V. The correct classification rates for the test sets were 73% and 77%,

respectively. These poor results obtained using either HiN and log P or HiN and L3v indicate that

all three descriptors need to be used together to improve the predictive ability of the novel model.

Model of the Whole Data Set (Combining the Results from the Previous Steps)

The rules identified above were combined into the single hierarchical model shown in Fig. 5.

The cut-off values used in the model for the complete data set were all chosen empirically.

They were derived from the observation of the distribution of the data for the following

descriptors: log D7.4, molecular mass and log P. For the descriptors L3v and HiN, the cut-off

points were chosen from those values that had been computed by the recursive partitioning

FIGURE 4 Plot of log P vs. L3v (van der Waals weighted directional WHIM dimension along 3rd principal axis)for the whole set of 96 compounds.

FIGURE 3 Schematic representation of the FIRM-derived models (built on the reduced data set) that satisfy thecriteria for feature selection.

N. MANGA et al.56

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a Sa

nta

Cru

z] a

t 11:

19 2

0 N

ovem

ber

2014

method (FIRM) for the five FIRM-derived models that satisfied the criteria set for feature

selection. These values ranged between 1.93 and 1.94 for HiN and 1 and 1.13 for L3v. The

cut-off points for these two descriptors were chosen to optimise the model for the complete

data set. It should also be noted that the positions taken by log P and L3v in the decision tree

depicted in Fig. 5 are interchangeable.

Step-by-step Y-randomisation was performed in order to ascertain that the

classification results were not due to chance [17]. This was performed as follows.

First the 37 compounds that had been classified using the descriptors log D7.4, molecular

mass and acidity were removed. For the remainder the Y response data were randomly

permuted, and the compounds were submitted to a complete FIRM analysis for

separation between “CYP3A4” and “CYP2D6” drugs. This was repeated ten times. The

best result that could be obtained when using randomly permuted Y variables was 59%

correct prediction. The average % of correct classification over the 10 runs was 49%,

compared to 94% (56/59) using real data.

Overall, the model for the complete data set classified 95% (92/96) of the compounds

correctly as either having CYP3A, CYP2D6 or CYP2C9 as primary enzyme responsible

for their oxidative biotransformation. Four compounds were misclassified; in the order in

which they are dealt with by the model, they are: dronabinol, oxycodone, fentanyl and

nefazodone. Oxycodone has a HiN value of exactly 1.93, which serves as the cut-off

value for this particular descriptor. However, no obvious explanation could be found for

the incorrect prediction for the other compounds, apart for the weakness of the model.

The model was tested using an external test set of 51 drugs. The preference

distribution for the different P450 isoenzymes in the test set was as follows: CYP3A was

known to be the primary metabolising P450 isoenzyme for 23 drugs, CYP2D6 for 18

drugs and CYP2C9 for ten drugs. 68% (35/51) of the test set drugs were classified

correctly. The node-by-node performance of the model on the test set is shown in Fig. 6.

TABLE IV Correlation matrix for the six descriptors entering the model for the whole data set shown in Fig. 4

Log D7.4 Molecular weight Acidity* HiN L3v

Molecular weight 0.34Acidity* 0.22 0.25HiN 20.03 0.07 0.66L3v 0.21 0.69 0.19 0.13log P 0.53 0.31 0.03 0.06 20.02

HiN ¼ Highest H-bonding strength on a nitrogen atom.L3v ¼ Dimensions along 3rd principal axis, van der Waals weighted (directional WHIM descriptor).*Acidity as measured by the difference ðlog D7:4 – log D5Þ.

TABLE V Test results of the novel models, built using the reduced set of compounds (FIRM-derived), on theirallocated test sets

Proportion of correctlyassigned “CYP3A” drugs

Proportion of correctlyassigned “CYP2D6” drugs

Total numberof test drugs

Overall % of correctclassification

Best of four modelsof the type shownin Figure CYP3A4

4 out of 6 3 out of 3 9 77%

Model as in Fig. 3b 8 out of 10 3 out of 5 15 73%

“CYP3A” and “CYP2D6” indicate that the drugs are primarily oxidised by CYP3A and CYP2D6, respectively.


Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a Sa

nta

Cru

z] a

t 11:

19 2

0 N

ovem

ber

2014

In order to delineate a domain of validity for the whole data model, the ranges for each of

the descriptor used in the training set were determined, they are reported in Table VI.

DISCUSSION

A model for the description of P450 isoform dominance in drug biotransformation between

CYP3A, CYP2D6 and CYP2C9 is presented (Fig. 5). It is in the form of a hierarchical

decision tree built up of two main parts. It was necessary to eliminate 37 compounds before

the recursive partitioning procedure (FIRM-method) could be successfully applied to the

remainder of drugs. Initially, the descriptors of molecular weight, log D7.4 and Bronsted

acidity are used to split a first group of compounds primarily acted upon by CYP3A and

CYP2C9. In a second stage, a recursive partitioning-derived tree was used to split the data

further into the remaining classes of CYP3A and CYP2D6 preferring drugs.

At the point of entry of the model for the complete data set, the three descriptors are

connected by an “or” logical clause which involves distinct descriptors. That is to say that

compounds are classified as primarily processed by CYP3A when log D7:4 . 4:1 OR MW .

500 OR by CYP2C9 when a drug is acidic. Consequently, no hierarchy is established

between these descriptors. This can lead to confusing or even contradictory results. For

example, the only drug misclassified at this level, dronabinol, is classified as preferring

CYP3A because of its high log D7.4 value. That this drug is a neutral molecule is still

in accord with the decision taken using log D, since CYP2C9 preferring drugs are predicted

to be acidic by the model. One would reach the limits of the model, however, were this

drug acidic. Therefore, it is felt that this lack of prioritisation constitutes a clear weakness

in the model.

FIGURE 5 Flow diagram representing the whole data model. Results are for the training set. “CYP3A” and“CYP2D6” refer to the prevalence of the corresponding P450 enzymes in a compound’s oxidative metabolism.

FIGURE 6 Flow diagram representing the whole data model. Results are for the test set. “CYP3A” and “CYP2D6”refer to the prevalence of the corresponding P450 enzymes in a compound’s oxidative metabolism.

N. MANGA et al.58

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a Sa

nta

Cru

z] a

t 11:

19 2

0 N

ovem

ber

2014

The introduction of the recursive partitioning in the second part of the model eliminates

the aforementioned limitations. A decision tree is built whereby the more significant

descriptors will always be positioned at earlier points of splitting into branches. In the

second, FIRM-derived, part of the model the descriptors L3v, log P and HiN are used with the

following hierarchical order: HiN . L3v ¼ log P: The equality between L3v and log P

being due to the fact that log P was introduced manually into the decision tree following the

fusion of different FIRM models. HiN can be viewed here as an alternative to commonly

used nitrogen basicity indices (pKa, nitrogen ionisability) in the parameterisation of

substrates of CYP2D6. Its main virtues are that it is a continuous descriptor and that its

calculation and determination are unambiguous as it is a whole molecule descriptor.

Recursive partitioning is a very powerful classification method [19]. However, it must be

used properly to ensure the reliability of the resulting decision tree. The main pitfalls of this

technique are highlighted in a recent publication by Susnow et al. [19]. Essentially, to

prevent chance results, the % distribution of classes in the training sets should be nearly

equal, while the size of the training sets should not be too small. In this study, the recursive

partitioning was performed on pools of approximately 45 compounds (58 compounds less

20%) belonging to two classes with a class-to-class a ratio of 34/25 ðratio ¼ 1:34Þ: It is

arguable that these values are satisfactory. However, the FIRM method yielded a consistent

series of similar results when performed several times on different subsets of compounds.

This was interpreted as a sign of robustness of the decision trees for two reasons. First, the

recursive partitioning techniques are very sensitive to the composition of the training set

[19]. Hence one would have expected more discrepancy in the resulting splits as there was a

20% variation in composition between the different training sets. Secondly, as outlined in the

“Results Section”, the descriptors repeatedly entering the decision trees (as well as log P)

were generally known to influence the property of interest. This also provided a proof of

mechanistic significance. Thus it is felt that a suitable protocol was followed in the

generation of the FIRM decision trees and that these results could be employed to construct

the whole data shown in Fig. 5.

The model uses bulk (molecular weight, L3v), H-bonding, Bronsted acidity (HiN, acidity)

and hydrophobicity (log D7.4, log P) indices to distinguish between CYP3A, CYP2D6 and

CYP2C9 metabolic predominance. CYP3A-preferring drugs are tagged as having high

molecular weight or high log D. HiN, log P and L3v enable the discrimination between

“CYP3A4” and “CYP2D6” drugs, while acidic drugs are all classed as being oxidised

primarily by CYP2C9.

It is striking to note that the model does not seem to address the well-known CYP2D6

substrate requirements that the basic nitrogen has to be at a distance of 5, 7 or 10 Angstrom of

the site of oxidation. Neither HiN, log P nor L3v are likely to account for this as they do not

provide the necessary electronic information. This points to another weakness of the model:

TABLE VI Determination of a domain of validity for the whole data model (data ranges)

log D7.4 Molecular weight Acidity* HiN L3v log P

Lowest value 21.4 141 22.24 0 0.21 20.14Highest value 7.64 1202 2.94 2.1 4.55 7.42

HiN ¼ Highest H-bonding strength on a nitrogen atom.L3v ¼ Dimensions along 3rd principal axis, van der Waals weighted (directional WHIM descriptor).*Acidity as measured by ðlog D7:4 – log D5Þ:


Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a Sa

nta

Cru

z] a

t 11:

19 2

0 N

ovem

ber

2014

that it does not account for every known mechanism of enzyme reaction. It would be, for

instance, surprising if the 5 Ansgtrom-rule did not play a role in determining CYP2D6

predominance. However, it is difficult to tell whether the satisfactory results obtained here

owe to the fact that the dataset was devoid of basic compounds not satisfying these specific

distance requirements, or if these requirements rank lower in importance than those included

in the current model for the determination of P450 prevalence.

Similarly, it is likely that the four compounds that appear as outliers to the model do so

because the factors that govern their P450 enzyme preference are different to those that the

model accounts for. Hence the ranges in the data (Table VI) are given as a broad indication of

the working domain of the model. Observation of the distribution of the values for the six

descriptors entering the model shows that compounds with extreme values fit the model very

well, whereas the outliers exhibit medium values.

The model was tested with an external set of drug compounds in order to evaluate its

predictive power (Fig. 6). Sixty eight percent of the 51 test drugs could be correctly

classified. This is a reasonably good result considering that classification was between three

categories. As can be seen from Fig. 6, owing to a certain lack of diversity in the test data

some decision nodes of the model could not be tested adequately: for example, none of the

test drugs showed a L3v value in excess of the 1.071 threshold. It is also obvious that

classification for test drugs predominantly metabolised by CYP3A is poor in general. This

means that it was more difficult to identify specific characteristics for “CYP3A-preferring”

drugs. This observation falls in line with the fact that CYP3A is a very versatile enzyme, able

to process a very wide range of substrates. However, on the whole, it can be said that the

trends identified in training are respected.

In conclusion, this study shows the feasibility of a simple and transparent model for the

determination of P450 predominance in CYP3A, CYP2D6 and CYP2C9. This is achieved

with the use of the three classes of descriptors: those for lipophilicity, bulk and Broensted

acidity. With a correct classification rate of 94% for the training set and 68% for the test set

the model can be considered as satisfactory. The use of novel and mechanistically relevant

descriptors such as HiN and L3v is also exemplified. These descriptors appear to convey very

useful information with regard to basicity and three-dimensional requirements for CYP2D6

and CYP3A predominance, respectively.

References

[1] Ekins, S. and Wrighton, S.A. (2001) “Application of in silico approaches to predict drug–drug interactions”,J. Pharmacol. Toxicol. Methods 45, 65–69.

[2] Hardman, J.G., Goodman, A.G. and Limbird, L.E. (2001) Goodman and Gilman’s the Pharmacological Basisof Therapeutics (McGraw-Hill, New York), pp 1924–2023.

[3] Guengerich, F.P. (1991) “Oxidation of toxic and carcinogenic chemicals by human cytochrome-P-450enzymes”, Chem. Res. Toxicol. 4, 391–407.

[4] Bertz, R.J. and Granneman, G.R. (1997) “Use of in vitro and in vivo data to estimate the likelihood of metabolicpharmacokinetic interactions”, J. Clin. Pharmacokinetics 32, 210–258.

[5] Anzenbacher, P. and Anzenbacherova, E. (2001) “Cytochromes P450 and metabolism of xenobiotics”, Cell.Mol. Life Sci. 58, 737–747.

[6] Danielson, P.B. (2002) “The Cytochrome P450 superfamily: biochemistry, evolution and drug metabolism inhumans”, Curr. Drug Metab. 3, 561–597.

[7] Lewis, D.F.V. (2000) “On the recognition of mammalian microsomal cytochrome P450 substrates and theircharacteristics”, Biochem. Pharmacol. 60, 293–306.

[8] Lewis, D.F.V. and Dickins, M. (2002) “Substrate SARs in human P450s”, Drug Disc. Today 7, 918–925.[9] Fuhr, U. (1996) “Systematic screening for pharmacokinetic interactions during drug development”, Int. J. Clin.

Pharmacol. Ther. 34, 139–151.

N. MANGA et al.60

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a Sa

nta

Cru

z] a

t 11:

19 2

0 N

ovem

ber

2014

[10] Michalets, E.L., et al. (1998) “Update: clinical significant cytochrome P-450 drug interactions”,Pharmacotherapy 18, 84–112.

[11] Tredger, J.M. and Stoll, S. (2002) “Cytochromes P450–Their impact on drug treatment”, Hosp. Pharmacist 9,167–173.

[12] Raevsky, O.A. (1997) “Hydrogen-bond strength estimation by means of the HYBOT program package”,In: van de Waterbeemd, H., Testa, B. and Folkers, G., eds, Computer-Assisted Lead Finding and Optimisation(Wiley-VCH, Basel), pp 367–378.

[13] Hardman, J.G., Goodman, A.G. and Limbird, L.E. (1996) Goodman and Gilman’s the Pharmacological Basisof Therapeutics (McGraw-Hill, New York), pp 1710–1792.

[14] Jones, B.C., Hawksworth, G., Horne, V.A., Newlands, A., Morsman, J., Tute, M.S. and Smith, D.A. (1996)“Putative active site template model for cytochrome P450 CYP2C9”, Drug Metab. Dispos. 24, 260–266.

[15] Mancy, A., Broto, P., Dijols, S., Dansette, P.M. and Mansuy, D. (1995) “The substrate binding site of humanliver CYP450 CYP2C9: an approach using designed tienilic acid derivatives and molecular modelling”,Biochemistry 34, 10365–10375.

[16] De Groot, M.J. and Ekins, S. (2002) “Pharmacophore modelling of cytochrome P450”, Adv. Drug DeliveryRev. 54, 367–383.

[17] Shen, M., Xiao, Y., Golbraikh, A., Gombar, V.K. and Tropsha, A. (2003) “Development and validation ofk-nearest neighbor QSPR models of metabolic stability of drug candidates”, J. Med. Chem. 46, 3013–3020.

[18] Smith, D.A., Ackland, B.J. and Jones, B.C. (1997) “Properties of cytochrome P450 isoenzymes and theirsubstrates. Part 2: properties of cytochrome P450 substrates”, Drug Disc. Today 2, 479–486.

[19] Susnow, R. and Dixon, S.L. (2003) “Use of robust classification techniques for the prediction of humancytochrome P4502D6 inhibition”, J. Chem. Inf. Comput. Sci. 43, 1308–1315.


Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a Sa

nta

Cru

z] a

t 11:

19 2

0 N

ovem

ber

2014

Structure-Based Methods for the Prediction of the Dominant P450 Enzyme in Human Drug...

Documents

Transcript of Structure-Based Methods for the Prediction of the Dominant P450 Enzyme in Human Drug...