Review of Literatureshodhganga.inflibnet.ac.in/bitstream/10603/43098/9/09_chapter 2.pdf ·...

Review of Literature

38

The cherished goal of medicinal chemists has been to create molecules with

sufficient number of favorable properties to justify huge investments involved in terms

of time and money to bring a drug molecule into the market. (Quantitative) structure

activity / property relationships [(Q)SAR/QSPR] studies play an important role in the

development of cost-effective and efficient drug discovery process. The QSAR/QSPR

studies predict complex physical, chemical, biological, and technological properties of

molecules directly from their structure. The structural formula of an organic compound

encodes within it all the information that predetermines the chemical, biological and

physical properties of that compound (Grover et al, 2000a). However the molecular

structure is not easily accessible to numerical analysis and learning methods due to its

non-quantitative nature (Trinajstic, 1983). This inherent problem of non-quantitative

nature of chemical structures in (Q)SAR/QSPRs can be easily overcome by deriving

parameters (descriptors) from molecules that describe their physicochemical

/biological properties, electronic features, and so on. In recent years, non-empirical

graph theoretic parameters/graph invariants called as topological descriptors (TDs)

are among the most useful descriptors known nowadays.

In chemical graph theory, the molecules are depicted as hydrogen-depleted

graphs with non-hydrogen atoms as vertices and covalent bonds as edges. A

molecular structure can be represented by planar graphs, G = (V, E), where the

nonempty set V represents the set of atoms and the set E generally represent covalent

bonds (Randic, 1991a; 1991b; Dureja and Madan, 2007). Such a graph represents the

molecular topology by depicting the pattern of connectedness of atoms in the

molecule and at the same time independent of the metric aspects of molecular

structure such as bond angles equilibrium, internuclear distance etc. TDs are numbers

associated with constitutional formulas by mathematical operations on the graphs

representing these formulas, which can be used to characterize and order molecules

and predict properties (Basak et al., 1990). They offer a simple way of measuring

shape, size, symmetry, molecular branching, cyclicity, chirality, complexity and

heterogeneity of atomic environments in the molecule (Roy, 2004). When compared

with other classes of structural descriptors, such as geometric quantum or grid (field)

descriptors, topology based MDs have distinct advantage because they can be easily

computed from molecular graph.


39

The Wiener index (Wiener, 1947a; 1947b) and the Platt index (Platt, 1947)

were first graph theoretical descriptors practically simultaneously introduced in 1947.

After that, there were no significant developments in the use of topological

descriptors for almost two decades. Then, in 1970‟s the concept of topological

descriptors again revived. Randic (1973) stated that “the revival of interest in graph

theory and graph-theoretical approach to chemistry has probably been stimulated by

the impressive developments in the synthetic chemistry and the production of a large

number of structurally complex organic molecules”. But the use of topological

descriptors actually gained momentum in 1990‟s with the availability of computers as

efficient tools for faster computing , and this trend continues till date.

Randic (1991b) proposed some desirable requirements for the chemical codes/

topology based MDs in the view of preventing their hazardous proliferation:

Good correlation with at least one property

Simplicity

Direct structural interpretation

Ability to discriminate the isomers

Generalization for higher analogues

Not dependent on physico-chemical properties

Not trivially related to other MDs

Linearly independent

Based on familiar structural concepts

Show a correct size-dependence

Gradual change with gradual change in structure

The application of topology based MDs to the design and selection of novel

active compounds is probably one of the most active areas of research in the

application of such descriptors to biological problems (Estrada and Uriate, 2001).

There are several advantages and few limitations of topological descriptors

Advantages

The main advantages suggested for TDs are (Gozalbes et al., 2002):


40

Use of TDs does not require any experimentally derived measurement, and

structure information is easily available in modern databases. Furthermore, the

TDs can be calculated for new and underdevelopment drugs.

Use of TDs to develop QSAR models is possible without a previous

knowledge of the receptor structure or mechanism of action of the drugs.

Any structure is susceptible to be described in terms of topological features.

Therefore, unlike other descriptors, TDs can be calculated for compounds with

very different structures, and theoretical models can be built in databases

without a common parent structure.

TDs are simple descriptors that may be quickly calculated for a large number

of compounds. It is not necessary to previously display the structures on

graphical workstations.

The information provided by the TDs is 2D and not 3D. This could be

regarded as a limitation, but it is also an advantage as the conformational and

alignment problems related with 3-D QSAR (as in CoMFA applications) are

completely avoided, and as a consequence of this, the results are more

reproducible.

Some Limitations of TDs have also been described (Gozalbes et al., 2002),

especially:

Degree of redundancy and degeneracy of certain TDs can be very high. TDs

are degenerate when presenting identical values for two or more different

molecular graphs, while redundancy is the duplicated information contained

by several TDs. However, the 4th

and higher generation descriptors will solve

this problem to some extent.

Criticism about TDs is usually involved with the physicochemical

interpretation of the meaning of TDs. Not all the TDs present this difficulty of

interpretation, and a good example of this are the E-state descriptors, in which

the role of substituents and skeletal atoms is directly related with

electronegativity.

Some reports have been criticized because of the use of large numbers of TDs,

with the inherent danger of chance effects. Obviously, this is a common

statistical problem shared with other descriptors, not exclusive to TDs, and


41

caution should also be applied in the indiscriminant use of TDs for sets

without a common parent.

In the last few years, a large number of TDs have been reported in the literature.

Various TDs have been found to possess different correlating abilities with molecular

properties/activities of diverse nature. Different TDs and matrices associated with them

have been briefly reviewed in the present chapter.

Balaban et al. (1992) classified TDs according to their nature in first, second

and third generations. The first generation TDs are integer numbers derived from

integer „„local vertex invariants (LOVIs)‟‟ assigned for each vertex, such as vertex

degrees or distance sums and are based on integer graph properties like topological

distances. These descriptors suffer from the drawback that they have high degeneracy,

i.e. the same value for the descriptor for many non-isomorphic graphs. The most

important descriptors of this class are Wiener index, W (Wiener, 1947a; 1947b),

Hosoya index, Z (Hosoya, 1971) and the centric descriptors of Balaban B (Balaban,

1979). The second generation TDs are the real numbers based on integer LOVIs such

as vertex degrees/distance sums or integer graph properties. The Randic´ molecular

connectivity (Randic, 1975), the related Kier-Hall descriptors (Kier and Hall, 1976a;

1976b; 1986), the mean square distance (Balaban, 1983), as well as the Balaban index

J (Balaban, 1982) and Jhet (Balaban, 1986; Ivanciuc et al. 1998b), overall connectivity

descriptor by Bonchev (1999), Harary descriptor, also called Harary number, H

(Plavsic et al., 1993) etc. are most frequently used second generation TDs till now.

The third generation TDs are real numbers based on real-number LOVIs having very

low degeneracy. These descriptors convert a matrix into a system of linear equations

whose solutions are LOVIs, by including a column vector on the main diagonal and

another column vector as the free term. The column vectors may reveal chemical

information e.g., the atomic number of the atom symbolized by the corresponding

vertex), graph-theoretical information e.g., the vertex degree or the distance sum of

the corresponding vertex or simply a constant numerical data e.g. the number of

vertices in the graph, or its square. Information-theoretic indices (Basak et al., 1997a;

Bonchev, 1983; Balaban et al. 1991), the triplet indices (Filip et al. 1987) and hyper-

Wiener descriptor or the molecular identification (ID) numbers (Randic, 1984) are the

examples of third-generation descriptors. The fourth generation TDs have


42

discriminating power >100 for structures containing only five vertices {with or

without heteroatom(s)}. The examples include augmented eccentric connectivity

descriptors, superaugmented eccentric connectivity descriptors (Dureja and Madan,

2007; Dureja et al., 2008). Topology based MDs suffer from one major limitation of

not considering the presence of heteroatom(s) in a molecule. To overcome this

limitation, the MDs need to be refined as their topochemical counterparts. These

topochemical counterparts of proposed MDs are sensitive to both the presence as well

as the relative position(s) of heteroatom(s) (Bajaj et al, 2005). The topochemical

version of MDs having sensitivity for both the presence and relative position(s) of

heteratom(s) along with the high discriminating power of 100 for all possible

structures containing only five vertices (> 25 in case of pendenticity based TDs) are

treated as fifth generation topology based MDs (Dureja and Madan, 2012). As a

consequence fourth generation MDs may be treated as topostructural MDs whereas

fifth generation as topochemical MDs.

A large number of TDs of diverse nature have been reported in the recent past.

These TDs have been found to possess different correlating abilities with molecular

properties/activities of diverse nature. These are widely employed as simple numerical

descriptors for quantitative comparison of physical, chemical or biological parameters

of molecules in wide range of [(Q)SAR/SPR] studies. Various TDs and matrices

associated with them have been briefly reviewed in the present chapter.

Adjacency based graph invariants

These invariants are based on the consideration that the whole set of

connections between adjacent pairs of atoms may be represented in a matrix form,

termed as adjacency matrix. The simplest number that can be associated with chemical

structure is the graph adjacency, A(G), which is the sum of all entries of the adjacency

matrix of the graph. However, this simplest topological descriptor is extremely

degenerate; it has the same numerical value for all graphs having the same number of

edges. Various attempts have been reported to express the connectivity of atoms in the

molecule by more discriminating graph invariants.


43

Platt (1947) proposed total edge adjacency descriptor also known as Platt

number or F descriptor and may be defined as the summation of all entries of the edge

adjacency matrix:

2

11

2)( NEGFA

j

ij

A

i

The entries Eij are equal to one if edges ei and ej are adjacent (the two edges

thus forming a path of length two) and zero otherwise. This topological descriptor is

simultaneously a measure of the order and dimension of the molecular graph, that is

of the size of the molecule and the degree of chain branching. For equal size (equal

number of edges) the calculated value of this topological descriptor is higher for the

branched molecular graph.

Gordon and Scantlebury (1964) proposed Gordon and Scantlebury’s

descriptor, also known as connection number (N2) or Bertz branching descriptor (BI) is

the simplest graph-invariant obtained from the edge adjacency matrix which considers

both vertices and edges. This descriptor is based on the topological distance and is

calculated as:

2/)(2

2 ti

i

APN

Where At is the total edge adjacency descriptor and 2P is the second-order path count.

Platt number or F descriptions is twice the Gordon-Scantlebury descriptor (Gordon

and Scantlebury, 1964) or connection number, N2 or Bertz branching descriptor, BI.

Morgan (1965) described the concept of extended connectivity, according to

which graph vertices are ordered on the basis of their extended connectivity values

obtained after a number of iterations of until constant atom ordering is obtained in two

consecutive steps. The extended connectivity (or extended vertex degree), denoted as

ECi, of a vertex is calculated as the iterative summation of connectivities of all first

neighbors as the following:

n

j

g

k

iji

k ECaEC1

1 .

Where aij are the elements of the adjacency matrix, at k = 0, the connectivity of each

atom is simply the vertex degree δ.


44

Bonacich (1972) devised the eigenvector centrality, ECi of a vertex i, which is

derived from the leading eigenvector of the adjacency matrix (Bonacich, 1972, 2007).

The concept of centrality is related to the ability of a vertex to communicate with

other vertices or to its closeness to many other vertices or to the number of pairs of

vertices that need a specific vertex as intermediary in their communications. ECi is

defined as the ith

component of the eigenvector associated to the highest eigen value

of adjacency matrix:

ECi =ℓi1

A vertex connected to many other vertices will have has high ECi value (Freeman,

1977; 1979).

Gutman and Trinajstic (1972) proposed two novel TDs called first and

second Zagreb indices or Zagreb group parameters based on the vertex degree (δ)

(i.e. equal to the sum of all the entries in the ith

row of the adjacency matrix) of the

atoms in the H-depleted molecular graph.

The first Zagreb group index M1 was defined as the summation of squares of

vertex degrees of a graph (rather than a simple sum), whereas the second Zagreb

group index M2 is the sum of the products of the degrees of pairs of adjacent vertices

of the respective (molecular) Graph:

n

i

iGMM

1

211

),(

22

ji

jiGMM

where δi is the degree (number of first neighbors) of the vertex vi in the molecular

graph and δiδj is the weight of edge{i, j}. The two Zagreb group parameter are strictly

related to zero-order χ0 and first order

1χ connectivity descriptors respectively.

The 1st Zagreb descriptor M1 (also called Gutman descriptor) is also related to the Platt

number F and the connectivity number N2 by the following relationship:

M1 = F + 2(A-1) = 2* (N2 + A - 1)

where A represents the number of atoms.

Lovasz and Pelikan (1973) developed Lovasz-Pelikan descriptor, denoted as

λ1LP

, by using largest eigen values of the adjacency matrix as molecular descriptors

(also known as leading eigen value λ1).


45

λ1LP

≡ max SP (A)

the of the adjacency matrix has been suggested a descriptor of molecular branching,

the small values of leading eigen value λ1 correspond to linear or chain graphs and the

large values to the more branched graphs. It is not a very discriminant descriptor

because in many cases the same value is obtained for two or more non-isomorphic

graphs.

Randic (1975) introduced first connectivity descriptor namely Randic

connectivity index (1χ), also known connectivity index or branching index, by

transforming M2 into an inverse square-root function for characterizing the branching

in molecular graphs. It is defined as per following equation:

2/1

)(),(

1

GEji

ji

Where δi and δj represent the degrees of the vertices vi and vj; the term (δiδj)-1/2

for

each pair of adjacent vertices is called edge connectivity.

Kier et al. (1975a) extended this idea from edges (paths of unit length) to

paths of higher lengths two, three, etc. MDs thus constructed were termed molecular

connectivities of first; second, third, etc. order, respectively. These generalized

descriptors are also known as Kier-Hall connectivity descriptors (Kier and Hall,

1976a) and are calculated by the following:

2/1

1

0

n

i

i , 2/1

1

n

edgesall

ji , 2/1

2

2

n

pathsall

kji

A zero order descriptor was defined for completeness; the first order Kier-Hall

connectivity descriptor is the Randic connectivity descriptor and could be written as

1.

Kier and Hall (1976b) have further extended the validity of Randic descriptor

(χ) in order to account for heteroatom differentiation as well as for different subgraphs

in the molecule to heteroatom-containing molecules. They replaced vertex degree δi

by the valence vertex degree vδi in the construction of the analogous valence

connectivity descriptor, v.

2/1

)(),(

GEji

jv

ivv


46

Where vδi is equal to:

vδi =

vZi- hi = ζi + πi + ni - hi

where vZi is the number of valence electrons(ζ electrons, π electrons, and lone pair (n)

electrons)of ith

atom and hi is the number of hydrogen atoms bonded to it. This

definition holds for atoms of the second principal quantum level (C, N, O, F). For

atoms of higher principal quantum levels (P, S, Cl, Br, I), Kier and Hall (1976b)

proposed to account for both valence and non-valence electrons, as the following:

1

iv

i

iiv

iv

zz

hz

Where Zi is the total number of electrons of the ith

atom, i.e. its atomic number and

vδi encodes the electronic identity of the atom in terms of both valence electron and

core electron counts; it is a valence electron descriptor.

Molecular connectivity descriptor has following advantages:

It possesses great discriminating power by virtue of its high monotonicity.

Computation of molecular connectivity descriptor is simple, as only basic

algorithm needs to be applied.

Molecular connectivity descriptor being related to degree of branching is a good

measure of molecular surface area or volume.

Molecular connectivity descriptor has provision to consider heteroatoms and

multiple bonds.

Molecular connectivity descriptor can be applied to acyclic, cyclic and aromatic

molecules.

Molecular connectivity descriptor correlates well with physicochemical and

biological properties.

Razinger (1982, 1986) showed that the kEC values are connected to the

respective kth

powers of the adjacency matrix. Later Rucker and Rucker (1993)

derived this relationship by proving that the vertex (or atom) walk count, kawc(i), and

the graph (molecule) walk count, kmwc(G), of length k are identical to Morgan‟s

kai and

kEC (G), respectively

Narumi and Katayama (1984) reported a simple topological descriptor, S

related to molecular branching and calculated as the product of the vertex degrees, δi :


47

n

i

iS

1

where n is the number of atoms.

Kier and Hall (1986) introduced benzene likeliness descriptor (BLI) with an

aim to measure the molecule aromaticity. It is calculated by dividing the first order

valence connectivity descriptor vχ

1 by the number of non-hydrogen bonds in the

molecule and then normalizing on the benzene molecule.

Gombar et al. (1987) conceptualized the modified connectivity descriptors

known as perturbation connectivity descriptors, p

q

m based on perturbation delta

values δ p and is defined as:

1/ 2

1 1

nKm p p

q ak a k

Where k runs over the entire mth order subgraph of type q constituted by n atoms; K is the

total number of mth order subgraphs. Perturbation delta values are obtained from valence

vertex degree δv by incorporating the effect of atomic environments at topological level.

Burden (1989) presented a method for generating molecular identification

numbers also known as Burden number of hydrogen-depleted structures from the

smallest eigen values of a modified connectivity matrix known as Burden matrix.

Burden numbers are attractive because of their one-dimensional nature and the

comparative ease of their computation. Moreover, two molecules with close Burden

numbers often appear similar when comparing their chemical structures for example,

by comparing numbers of fragments or functional groups two molecules have and

have not in common.

Lohninger (1993) proposed modified Randic descriptor (1χmod) as the sum of

atomic properties, accounting for valence electrons and extended connectivities in the

H-depleted molecular graph using Randic connectivity descriptor-type formula as:

1mod 1/ 2

1 1

1*

2 ( * )

iAi

i j i j

Z


48

Where the first sum runs over all the atoms in the molecular graph while the second

runs over the first neighbors of the considered atom; δ is the vertex degree, and Zi is the

atomic of the ith

atom.

Yang et al. (1994) proposed two novel molecular descriptors based upon

extended adjacency matrices (EA). These descriptors consider the presence of

heteroatom(s) and multiple bonds, possess high discriminating power, and correlate

well with a number of physico-chemical properties and biological activities of organic

compounds.

The first one is the sum of the absolute eigenvalues of the EA matrix, called the

EA descriptor, which can be calculated as:

1

AEAi

i

EA

The second molecular descriptor is the maximum absolute eigenvalues of the EA

matrix, called as EA max descriptor, this can be calculated as:

max

1

AEAi

i

EA

Goel et al. (1995) conceptualized an adjacency based topochemical graph

invariant, namely atomic molecular connectivity descriptor (A) which is modification

of Randic's molecular connectivity descriptor (). It may be defined as the sum of

inverse square root of product of chemical degrees (modified bond values) of adjacent

vertices over all edges in the hydrogen depleted molecular graph.

A (G)

n

ji

jcicVV

11

2/1

where Vic and Vjc are the chemical degrees of the vertices ic and jc. The computation of

A is conducted in a manner similar to that described by Kier and Hall (1976b) except

that the modified valency of each vertex involved in a pair is calculated by summing

up relative atomic weights of all the adjacent atoms.


49

Estrada (1995) has derived edge-analog of the classical Randic connectivity

descriptor known as edge connectivity descriptor. The descriptor was claimed to be

correlated well with molar volume and is capable of discriminating between isomers. It

is abbreviated as ε and can be calculated as:

2/1

r

jw

iw ee

Where wδ(ei ) and

wδ(ej ) are the values of edge-weighted degrees and the summation

is over all r-pairs of adjacent edges.

Estrada and Ramirez (1996) defined a new bond order-weighted edge

connectivity descriptor (επ) based on edge adjacency relationships. The proposed

descriptor is sensitive to both the presence as well as the position(s) of heteroatom(s) in

the molecule (greater values refer to the central positions) and is able to discriminate

conformational isomers. The elements of edge set were substituted by bond orders or

more precisely valence descriptors calculated from quantum chemical methods and

their calculation is as follows:

2/1

r

jw

iw ee

Where π is the bond order.

Hu and Xu (1996) proposed two new two super-descriptors, EATI1 and

EATI2, based on the extended adjacency matrix of weighted molecular graph. The

extended adjacency ID numbers or EATI1 and EATI2 are calculated as:

2

1

1 *

n

i

iiEAEATI

n

i

iiEAEATI

1

2 *

Where [EA*]ii are the diagonal entries of the matrix EA*.

EATI1 was tested for selectivity on over 610,000 structures and also good correlating

ability was found. EATI2 (also called EAID) was particularly tested for selectivity and


50

no degeneracy appeared. It is the most powerful descriptor designed so far and is a

candidate for CAS Registry Numbers (Guo et al., 1997).

Estrada (1996) developed the theory of the spectral moments of the so-called

edge adjacency matrix (which, in turn, is precisely the standard adjacency matrix of

the line graph of the molecular graph). Estrada‟s approach is the expansion of the

spectral moments in terms of counts of certain fragments contained in the molecular

graph. The spectral moments of the edge adjacency matrix E were defined as:

μk

= tr [Ek]

Where tr stands for the trace (sum of diagonal entries) of the respective matrix.

The spectral moments of the edge adjacency matrix have been successfully

applied in QSPR and QSAR studies of alkanes, alkyl halides, benzyl alcohols,

cycloalkanes and benzenoid hydrocarbons (Estrada, 1996; 1997; 1998).

Estrada and coworkers (Estrada, 1996; 1997; Gutierrez and Estrada, 1997)

introduced a novel approach, the TOpological SubStructural MOlecular DEsign

(TOSS-MODE) or now called TOPological Substructural MOlecular DEsign

(TOPSMODE) which is based on determining the spectral moments of the topological

bond matrix. The application of this approach in the study of quantitative structure–

permeability relationships (QSPR/QSAR) can be summarized as:

1. Construction of hydrogen–suppressed molecular graphs for every molecule of

the data set.

2. Using appropriate bond weights to differentiate the molecular bonds, e.g.,

bond length, bond dipoles and bond polarizabilities.

3. Calculating the spectral moments of the bond matrix with the appropriate

weights for each molecule in the data set and preparing a table in which rows

represent the compounds and columns represent the spectral moments in the

bond matrix.

4. Generate QSPR/QSAR with the help of appropriate linear or non–linear

multivariate statistical technique, such as MLRA.

5. Finally use cross–validation techniques to optimize the predictive capability of

the QSAR/QSPR model (Estrada, 2000).

Pearlman and Smith (1998; 1999) extended Burden approach to address

searching for chemicals on large data bases by conceptualizing BCUT descriptors


51

(Burden-CAS-University of Texas eigenvalues) based on three different matrices whose

diagonal elements were atomic polarizability-related values, atomic charge-related

values and atomic H-bond abilities.

Estrada et al. (1998a); Estrada (2000b); Estrada and Rodriguez (2000) made

an important extension by generalizing edge connectivity descriptors to propose

extended edge connectivity descriptors, which can be defined as:

1/ 2

1

Km

q bbk k

Where k runs over all of the mth

order subgraphs, m is the number of edges in the

subgraph; K is the total number of mth

order subgraphs present in the molecular graph,

the edge degrees of all of the edges involved in the subgraph are considered. The

subscript “q” refers to the type of molecular subgraph.

Bonchev (1999, 2000) proposed two overall connectivity descriptors, TC and

TC1, as a meaningful measure of topological complexity of molecules, since they

satisfy two fundamental requirements to a complexity measure: to increase with both

the number of structural elements and their interconnectedness (Bonchev, 1997). The

topological complexity descriptor TC(G) of the graph G, may be defined as the

summation of the total adjacencies of all eK connected subgraphs having e edges and

nt vertexes of degree ai, including the graph itself which has E edges and K connected

subgraphs; summarizing the information on the connectivity of vertexes in all

subgraphs, the new descriptor has the meaning of the overall connectivity of G:

)()()(

1100

GGTCGTCt

e n

i

i

K

t

E

e

E

e

e

The eTC(G) is the topological complexity of order e,(0,1,2…n).

The TC(G) descriptor is defined in two quantitatively different versions. In the basic

version, the vertex degrees δi are those in the entire graph G; in the second version,

the TC1(G) descriptor is calculated with the vertex degrees taken from the

corresponding subgraph Gt (Bonchev, 2000).

Nikolic et al. (2000) proposed symmetry-modified Zagreb invariants M1 and M2

by summing up only degrees (SMM1) or edge weights (SMM2) of symmetry

nonequivalent vertices or edges of graphs. On comparing closely related symmetry-


52

independent and symmetry-dependent complexity invariants they produced different

ordering.

Basak et al. (2000) considered molecular surface dependent properties (boiling

point and gas chromatograph retention times) and molecular volume dependent

properties (molar volume and molar refraction). They found that edge connectivity

invariants were appropriate for structure-molecular surface properties

Estrada and Molina (2001) defined spectral moment of the edge adjacency

matrix as the sum of diagonal entries of the different powers of the edge adjacency

matrix corresponding to a molecular fragment. It is defined as:

FB

b

bbk

k EMF

1

)(

Where MF indicates the molecular fragment considered, and the summation goes over

all the bonds forming the fragment.

Bonchev (2001a) verified the applicability of the overall connectivity

descriptors to QSPR studies by linear regression modeling of 10 physicochemical

properties of linear alkanes. The development of molecular connectivity concept and

some of its key elements- Randic‟s inverse-square-root function and the detailed

subgraph characterization- were also analyzed. This study demonstrated the

usefulness of overall connectivity concept for QSPR applications, which combines the

basic ideas of the classical concept of molecular connectivity with those of molecular

complexity.

Lukovits and Linert (2001) applied a chiral function F having the condition F

(D) = F (L), where D and L, represent the enantiomers of the same structure, in

combination with Randic atom-modified first order molecular connectivity descriptor.

Tomovic and Gutman (2001) renamed simple topological descriptor, S

developed by Narumi and Katayama (1984) as “Narumi-Katayama descriptor” while

studying the properties of this descriptor for phenylenes.

Kezele et al. (2001) tested the use of variable connectivity descriptor in QSPR.

The descriptor yielded very good regression equations in the case of homogenous sets

of molecules


53

Torrens (2002) optimized a method for determining the permanent of the

adjacency matrix, per(A), of fullerenes. The permanent of the adjacency matrix,

per(A) is defined as:

n

n

i

iiAPer

1

)(

Where Λn denotes the set of all possible permutations of (1, 2…..n).

The algorithm allows rapid computation of per(A) for adjacency matrix of molecules

large enough to be theoretically interesting and concluded that this approach could be

useful in designing and optimizing the structures of unknown fullerenes.

Ren (2002a; 2002b) derived a novel vertex degree vm for heteroatom in

molecular graph on the basis of the valence connectivity v of Kier-Hall. The atom-type

AI descriptor and Xu descriptor, were modified for compounds with heteroatom by

replacing the vertex-degree of hetero-atom by the proposed vm. The modified Xu

descriptor and AI descriptor provided QSPR models for the normal boiling points (BP),

Molar volumes (MV), molar refractions (MR) and molecular total surface areas (TSA)

of alcohols with up to 17 non-hydrogen atoms. These physical properties were

expressed as a linear combination of the individual descriptors related to molecular size

and atom-type.

Turker (2003a) started with the concept of T(A) graphs for alternant

hydrocarbons defined a novel topological descriptor (L). The proposed descriptor

differentiated isomeric as well as isospectral molecules, encode alternate saturated

and unsaturated systems and could be extended to heteroatom containing structures.

Cao et al. (2003) extended the application of eigenvalues of bonding orbital–

connection matrix to different physicochemical properties and developed bond

adjacency matrix BCH and orbital overlapping matrix BCC based on polarizability

effect descriptor (PEI) for each C-H bond carbon skeleton in alkane molecule.

Nikolic et al. (2003) amended the original Zagreb descriptors through

insertion of inverse values of the vertex-degrees:

]/1[]/1[1 i

n

verticesall

imM


54

]/1[]/1[2 ji

n

verticesall

jimM

They concluded that the modified Zagreb mM1 descriptor gave a greater

contribution to outer atoms than to inner atoms in a molecule. Similarly, the modified

Zagreb mM2 descriptor gave a greater contribution to outer bonds than to inner bonds

in a molecule. This was opposite to the behaviour of the original Zagreb indices and

in agreement with the chemists‟ understanding that the most important contributions

to the interactions between molecules that are essential for many of their physical,

chemical, biological and even technological properties arise from the more exposed

atoms and bonds.

Lailong and Chengjun (2004) developed a novel connectivity descriptor mF

on the basis of adjacency matrix and edge valency. This descriptor reflects the

chemical bond specificity of edge i.

5.0

....kjim fffF

Where fi is the edge valency.

Pompe (2005) constructed a modified variable topological descriptor to

determine the overall contributions of atoms or bonds in QSAR/QSPR studies.

n

i

k

ji

fj

n

i

fi

kfk

1

1

1

2/1

1

Where fi

k expresses the contribution of path length (k) to the variable connectivity

descriptor of the same order and n is the number of paths having length k.

He concluded that the modified variable connectivity descriptor may offer better

structural interpretations by providing adequate knowledge about the part of the

molecule responsible for enhancing or suppressing the modelled property.

Dureja and Madan (2005) renamed atomic molecular connectivity descriptor

as molecular connectivity topochemical descriptor on similar grounds of other

topochemical descriptors for the sake of simplicity and to avoid any confusion.

Bajaj et al. (2005) refined Zagreb indices M1 and M2. The topochemical

descriptors were based on topochemical adjacency matrix. These refined descriptors

were sensitive to both the presence as well as position of the heteroatom(s) in the


55

molecule. Zagreb topochemical descriptors M1c may be defined as the sum of the

squares of chemical degrees of all the vertices in the hydrogen depleted molecular

graph. Zagreb topochemical descriptors, M2c may be defined as the sum of the chemical

weights of all the edges in hydrogen depleted molecular graph.

21

c

ca a

M

2 ( )c c

cb i j b

M

Where a runs over the A atoms of the molecule and b over all of the B bonds of the

molecule. δic and δjc refers to the vertex degrees of the atoms incident to the considered

bond.

Zhang et al. (2007) characterized DNA by a numerical sequence by

considering positions of bases and the pairs of bases in DNA. For generating the

sequence invariants, the following function is used:

dr= m/n

where dr is defined as the relative position parameter of a base; m represents the

position of this base in a sequence, n represents the number of all bases in this

sequence. They extracted a novel invariant (molecular connectivity descriptor type)

from the derived numerical sequences.

Mu et al. (2008) devised novel molecular connectivity descriptor, denoted as

mχ′ based on the adjacency matrix of molecular graphs. By using delta value ( i )

instead of the original delta value ( vi ) of the molecular connectivity descriptor, this

new descriptor was obtained as:

2/1

1

1

1

)(

j

n

j

m

i

im

m

G

Where m is the order of the molecular connectivity descriptor. This descriptor was

successfully applied to predict the molar diamagnetic susceptibilities of organic

compounds. Later on the converse descriptor, denoted by m

χ′ was also proposed by

Mu et al. (2009) as:

y

j

n

j

m

i

im

m

G

1

1

1

)(


56

Where y is a variable, whose optimal value can be found by the optimization method.

Vukicevic and Furtula (2009) constructed a novel TD based on the end-

vertex degrees of edges called as „geometric-arithmetic‟ (GA) descriptor:

Euv vu

vuGGA

2/)(

Where δu and δv are the degrees of vertices that are connected with edge uv and the

summation goes over all edges of graph G. 2/vu is the arithmetic mean

whereas vu is geometric mean, hence the name „geometric-arithmetic‟

descriptor. The predictive ability of this descriptor was found to be better than Randic

connectivity descriptor χ in modeling some physico-chemical properties of octanes.

Furtula et al. (2010) defined augmented Zagreb descriptor, AZI of molecular

graph G as:

)(

3)2

()(

GEij ji

jiGAZI

Where E(G) is the edge set of G, and i and i are the degrees of the terminal vertices

i and j of edge ij. Some tight upper and lower bounds were also reported for the AZI

descriptor of a chemical tree.

Das and Trinajstic (2010) compared the geometrical-arithmetic descriptor

and atomic bond connectivity descriptors for chemical trees and molecular graphs.

Besides chemical trees and molecular graphs, general graphs were also investigated.

They concluded that geometrical-arithmetic descriptor is greater than atom-bond

connectivity descriptor for the difference between maximum and minimum degree,

less than or equal to three.

Fath-Tabar et al. (2010) proposed a new molecular-structure descriptor GA2

belonging to the class of geometric-arithmetic descriptors. It is closely related to the

Szeged and vertex PI descriptors.

)(

2)()(5.0

)()()(

GEuv vu

vu

enen

enenGGA

They established the main properties of GA2 including lower and upper bounds and

the trees with minimum and maximum GA2 were also characterized.


57

Randic et al. (2010) described novel matrix termed as natural distance matrix

for graphs, which is based on interpretation of columns of the adjacency matrix of a

graph as a set of points in n-dimensional space. The leading eigenvalue λ1, average

row sum (ARS) and the J descriptor of the natural distance matrix were proposed as

three new MDs for characterization of molecular graph.

Andova and Petrusevski (2011) devised variable Zagreb descriptors denoted

by: λM1 and

λM2 in accordance with Karamata’s Inequality. The first and second

variable Zagreb descriptors were defined as:

n

i

iGMM

1

211

),(

22

ji

jiGMM

Natarajan (2011) proposed three new descriptors by taking advantage of both

the uniqueness of path code and the spectrum of connectivity descriptors. The

formulae to compute the new TDs were expressed as:

Over all path multiplicity i

i

i POPM

0

Mean path connectivity i

i

i PMPM /

0

Sum of connectivity descriptors of all orders

0i

iSumR

Where Pi is the number of paths of length i and iχ is the connectivity descriptor of

order i. These descriptors ranked all planar graphs of alkanes C4 to C6 uniquely and

were found to have non-degenerate values for all the 7668 constitutional isomers

(alkane trees) from C4 to C15.

Doslic et al. (2011) defined the average neighbor degree number, an invariant

useful for measuring the diversity of vertices in molecular graph G, as:

)(

)()(

1)(

GVu

avgavg umGV

Gm

where mavg(u) is the average of degree of vertices adjacent to u.


58

The application of mavg(G) was investigated on the benchmark set of 18 octane

isomers and found a decent correlation between mavg(G) and enthalpy of vaporization,

as well as the standard enthalpy of vaporization.

Ghorbani and Hosseinzadeh (2012) reported new version of Zagreb

descriptors calculated as:

)(

22

GEij

jiGMM

)(

21

*1

*

GVj

jGMM

)(

22

GEij

jiGMM

Where i is the largest distance between i and any other vertex j of G.

Saha and Bandyopadhyay (2012) introduced novel cluster validity

descriptors which are able to automatically detect clusters of any shape, size or

convexity as long as they are well-separated. They measured connectivity using a

novel approach following the concept of relative neighborhood graph.

Ghorbani et al. (2012) modified Narumi-Katayama descriptor as NK* in

which each vertex degree (δi) is multiplied δi times. They determine its basic

properties and characterize graphs extremal with respect to it. This new version of NK

descriptor may be represented as:

n

i

iiGNKNK

1

)(**

Eliasi et al. (2012) described multiplicative version of first Zagreb descriptor

as:

)(

*

1

*

1)(

GEij

jiG

They proved that among all connected graphs with a given number of vertices, the

path has minimal ∏1*.

Singh et al. (2013) proposed refined general Randic descriptors (R2, R3 and

R4), as well as their topochemical counterparts as a modification of general Randic

descriptor which can be defined as the sum of the quotients of the inverse of the

product of the degree of each vertex on every edge in the hydrogen depleted

molecular graph. It can be expressed as:


59

RN = ∑ ( )

where vi and vj are degree of ith

vertex and degree of jth

vertex respectively, n is

number vertices and N is equal to 2, 3, 4 for refined general Randic descriptor-1, 2, 3

respectively (denoted by R2, R3, R4). The MDs values of complex chemical structures

were kept within reasonable limits by inserting constant km to avoid any compromise

with the discriminating power.

Distance based graph invariants

These descriptors utilizes distance matrix/detour matrix to characterize molecular

graphs. Distance matrix D(G), for a graph G is defined as a real, square, symmetrical

matrix of order n, with entries dij representing the distance traversed in moving from

vertex i to vertex j in G. The entries in the distance matrix indicate the number of edges

in the shortest path between vertices i and j whereas entries in case of detour matrix

indicate the number of edges in the longest path between vertices i and j.

The first chemical graph theory based theoretical MD called Wiener descriptor

was conceptualized by H. Wiener in 1947 (1947a; 1947b) to model some

thermodynamic properties of acyclic hydrocarbons. Wiener index W is defined as “the

summation of the distances between any two carbon atoms in an acyclic molecule, in

terms of carbon-carbon bonds.” The value of wiener descriptor can simply be calculated

by multiplying the number of carbon atoms on one side of any bond by those on the

other side and then the summing up these products as:

, ,( ) *e L e R ee e

W W G W N N

Where NL, NR being the number of vertices lying to the left and to the right of edge e,

and the summation runs over all edges in G.

The Wiener number was initially employed to predict physical parameters such

as boiling points, heats of formation, heats of vaporization, molar volumes, and molar

refractions of alkanes by simple QSPR models.

Hosoya (1971) found out that Wiener index (W) can also be obtained by

simply adding all the elements of the graph distance matrix above the main diagonal.

This not only offers an alternative route to W but also allows a particular extension of

W to cyclic structures. He extended the aforementioned definition to cyclic


60

compounds with the aid of the distance matrix and gave the formula for Wiener

descriptor/number as:

1 1

1

2

N N

iji j

W D

Where Dij are the off-diagonal elements of D and N represents the total number of

edges in graph G.

Hosoya, in 1971, also suggested for the first time a nontrivial single graph

descriptor (Z), for expressing structure-property relationship (Hosoya, 1971). Although

Hosoya’s Z descriptor has been associated with the adjacency matrix, it can be

classified among the distance matrix invariants due to the procedure used to calculate

p(G, k)

]2/[

0

),(

n

k

kGpZ

Where n represents the number of graph vertices, p(G, k) denotes the number of ways

in which non adjacent k bonds are selected from graph G, and the Gaussian brackets [ ]

represent the greatest integer whose value < n/2. The Z descriptor is calculated by

summing the p(G,k) coefficients over all different k values.

For linear graphs G, Z can also be defined as the summation of absolute

coefficients values in the characteristic polynomial.

2

0

( ) ( 1) ( , )m

k N kG

k

P X p G k X

Where m represents the highest number of bonds disconnected to each other in G.

The Hosoya Z descriptor is correlated well with the mode of branching and ring

closure and can be used as first sorting device for coding or retrieving the structures of

the compounds with or without rings (Hosoya, 1972).

Hosoya’s index Z has some advantages over other graph invariants:

Possess high discriminating power

can be computed, either from the molecular structure or from the polynomial


61

Extendable to distance polynomial and matching polynomial as it is defined

through the counting polynomial, which is closely related to the characteristic

polynomial

Ability to consider the effect of electronic systems.

Hosoya et al. (1973) reported the distance polynomial as the characteristic

polynomial of the distance matrix of the molecular graph and defined the Hosoya Z′

descriptor :

n

i

icZ

0

Where ci are the coefficients of the distance polynomial of the molecular graph.

Rouvray (1973) defined the total sum of the row entries of the distance matrix

as the new topological descriptor called Rouvray descriptor, denoted by IROUV. It is

actually twice the Wiener descriptor, W (Rouvray, 1976) and expressed as:

WSdIn

i

i

n

j

ij

n

i

ROUV 2

111

Hosoya et al. (1975) proposed a generalization of Z descriptor to account for

the contributions made by unsaturated system in the structure of a molecule. The π-

energy Z descriptor as:

n

i

iSZ0

Hosoya‟s Z descriptor despite being discriminatory, possess following limitations:

1. It represents a vague graph topological nature of the molecular structure with

respect to branching and cyclization.

2. No provision to consider heteroatoms.

3. Sophisticated programs are needed to compute the Z descriptor as the size,

branching and compactness of a molecule increases.

In analogous to above, a more generalized total π-energy descriptor, Z* is defined as:

n

i

ii AsZ

0

5.022*


62

An Huckel molecular orbital (HMO) characteristic polynomial P(i) is divided into an

even function S(i) and an odd function A(i) with respect to i (Aihara, 1976).

Bonchev et al. (1980) developed total distance rank (dr), based on the distance

sum (vdi) of the vertex vi. It's defined as the summation of all the elements (dij) of the

ith

row of the distance matrix D.

min1

n

j

ijddr

Where dij is the elements of distance matrix.

Balaban (1982; 1983) introduced three new topological distances based

descriptors denoted by D, D1 and J. The first one D, named as mean square distance

descriptor may be represented as:

k

i

i

i

k

ik

g

ig

D

/1

*

Where gi is number of occurrences of distances of length i.

This descriptor possesses good discriminating ability, especially for acyclic graphs;

however, it shows poor performance for polycyclic graphs.

The second invariant, known as end point mean (square) distance topological

descriptor, D1, was calculated by taking only distances between endpoints.

The third one was Balaban distance connectivity descriptor (also called distance

connectivity descriptor or average distance sum connectivity) and is denoted as J. At

the time of its inception, J was one of the most discriminating molecular descriptors and

its values did not increase substantially with molecular size or number of rings. Further

j was claimed to have the lowest degeneracy of all TDs proposed till the time of its

proposal. It is defined as:

1/ 2 * * 1/ 21* ( * ) * ( * )

1 1b i j b i jb b

BJ

C C

Where the summation runs over all the molecular bonds b, ζi and ζj are the vertex

distance degrees of the adjoining atoms, C is cyclomatic number representing number

of rings in the graph and B is the number of bonds in the molecular graph G. The


63

denominator C+1 is a normalization factor against the number of rings in the molecule.

ζi* = ζi /B is the average vertex distance degree.

Barysz et al. (1983) introduced Wiener-type descriptor from Z weighted

distance matrix [W(Gvew)]. It may be defined as the summation of all the elements in

upper triangular Barysz matrix as well as all the diagonal elements thereof and defined

as per the following:

1

2vew ii ij

i j

W G d d

Where6

1diiZ

i

Where Zi is the number of all (valence and inner shell) electrons in the atom i.

The off-diagonal elements of the distance matrix for the vertex- and edge-

weighted (multi) graphs are defined as:

ij rr

d k

Where the summation goes over r (r = 1,2,.......), weighted and unweighted, bonds,

while the parameter kr is given by (Barysz et al., 1983),

1 36

**

rr i j

kb Z Z

The value of br for single bond, double bond, triple bond and an aromatic bond are 1,

2, 3, and 1.5 respectively, Zi, and Zj, denote the numbers of electrons in atoms i and j

making up the r-bond (Barysz et al., 1983).

Broto et al. (1984a; 1984b) conceived autocorrelation descriptors of topological

structure namely autocorrelation of a topological structure (ATS. This is a spatial

autocorrelation defined on a molecular graph G as:

1 1* *

A A

i jd ij i j

dATS W W


64

Where W is any atomic property, A is the atom number, d is the considered topological

distance (i.e. the lag in autocorrelation terms), δij is Kronecker delta (δij =1 if dij = d,

zero otherwise).

Randic (1984) and Szymanski et al. (1985) introduced novel molecular

descriptors known as Molecular ID numbers (MID). These can be defined as the sum of

all paths (weighted or non weighted) in a molecule (graph). These were mainly

proposed to unequivocally identify a molecule by a single real number, with the aim to

obtain high discriminating power. These descriptors carry considerable structural

information and are successfully used in QSPR/QSAR analysis.

The first MID proposed by Randic (1984), namely Randic connectivity ID

number (CID) can be defined as a weighted molecular path count as per the following

equation:

Pij

ijm wACID

Where A is the number of atoms. mPij denotes a path of length m from the vertex vi to

vertex vj, and wij is the path weight. The sum runs over all paths of the graph.

The weight wij is calculated by multiplying the edge connectivity of all m edges

(bonds) of the path mPij as:

m

b

bbbijw1

2/1

)2()1( )*(

Where δb(1) and δb(2) are the vertex degree of the two atoms incident to the bth

edge and b

runs over all of the m edges of the path.

Kier (1985) developed molecular shape descriptors or kappa descriptors,

based on the count of the two-bond fragments in order to encode the overall molecular

shape. They are calculated using the counts of path of length one (one bond), two

(two bond) and three (three bond) in hydrogen suppressed molecular graph of the

molecule, and correspondingly the kappa descriptors were defined as of first, second

and third order.

The general formula for calculating kappa descriptors (m

k) is the following:

mk =2

mPmax

mPmin/ (

mPi)

2


65

The m

Pmax and m

Pmin values can be calculated directly from the number of non-

hydrogen atoms (nH) in the molecules. Their substitution for paths of different length

has resulted in the following equations for m

k:

21

21 1

iP

nHnH

22

22 21

iP

nHnH

23

23 23

iP

nHnH when nH is even

23

23 31

iP

nHnH when nH is odd

Where nH is number of non-hydrogen atoms in the molecule.

The kappa descriptors were derived assuming that all atoms in the molecule are

equivalent. The influence on molecular shape on atoms other than carbon in sp3

hybrid state is accounted by the kappa alpha shape descriptors (m

kα) (Kier, 1986a).

They can be obtained by modifying each nH and mPi in the above equations by

adding an α value:

α = r(x) / r (C(sp3))-1

Where r(x) is the covalent radius of atom (x); r (C(sp3)) is the covalent radius of carbon

atom in sp3

hybrid state (Kier, 1986b, 1986c).

Balaban (1986) modified average distance-sum-connectivity descriptor, J, in

order to account for both bond multiplicity and heteroatoms and defined two new

descriptors i.e. Jx and J

y employing fractional distance matrix. The quantities X and Y

are recalculated atomic Sanderson electronegativities and covalent radii relative to

carbon atom, respectively, obtained as a function of the atomic number.

Balaban (1987) also proposed a molecular identification number namely

Balaban ID number (BID) which can be defined as:

Pij

ijm wABID

Where A is the number of atoms. mPij denotes a path of length m from the vertex vi to

vertex vj, and wij is the path weight. The sum runs over all paths of the graph.


66

The weight wij is calculated by multiplying the edge weights of all m edges

(bonds) of the path mPij as per following:

m

b

bbbijw1

2/1

)2()1( )*(

Where δb(1) and δb(2) are the vertex degree of the two atoms incident to the b edge and b

runs over all of the m edges of the path.

Kier (1989) introduced shape flexibility descriptors. The flexibility of

molecules depends upon the presence of cycles/or branching. By combining 1kα, and

2kα descriptors with the number of non-hydrogen atoms (nHA) (for normalization), a

further descriptor ф has been defined, which is considered to measure molecular

flexibility.

ф = (1kα *

2kα)/ nHA

Where 1kα represents the information about the relative cyclicity and atoms count of

molecules and 2kα represent information about relative spatial density or branching of

molecules. The flexibility descriptor decreases with increased branching and cyclicity.

Schultz (1989) described Schultz molecular topological descriptor (MTI) by

using adjacency, valence, and distance matrices. The quantity is defined by the

following expression:

iDAvGMTIMTIN

i

1

)(

Where G is the molecular graph considered, possessing N= N(G) vertices, A is the (N X

N dimensional) adjacency matrix, D is the (N X N dimensional) distance matrix, and v =

(v1, v2,....... vN) is the (1 X N dimensional) vector of the vertex valencies (degrees) of the

molecular graph G.

Hall and Kier (1990) proposed topological state invariants Si, as numerical

values related to every vertex in a molecule which may encode information about the

topological environment of the vertex due to all other vertices in the molecule. The

molecular topological relationship to each other vertex is based on the encoding of

vertex information in all the paths emanating from that vertex and derived from

topological state matrix as:


67

n

j

ijii TTVSS

1

][)(

Where VS stands for the row sum operator. Molecular topologically equivalent vertices

have identical values of molecular topological state descriptor and in-equivalent

vertices would have different values.

Balaban et al. (1990) used distance measure in defining a highly selective

connectivity based spectrum of TDs (denoted by: DMk) as:

n

j

kkt

mt

mk RDM

1

/1])([(

Where the summation goes over all the connectivity descriptors of different type t up

to the sixth order (m = 6); k is an integer parameter ranging from 1 to 5 (k=1 is the

Manhattan distance, k=1 is the Euclidean distance). m

χt and m

χt (R) are the

connectivity descriptors for the considered molecule and a reference molecule R,

respectively (Balaban et al., 1990).

Lall (1991) defined Topological I-descriptor of a graph based on the

topological distances from a given vertex in the edge weighted graph of the organic

molecule and is calculated as:

I(G) =

N

jrn

N

rrgrn

1

1

Where nr is the number of rth

kind of vertices for which gr is the topological distance

from the root in the edge weighted graph and the topological distance dij between the

vertices i and j is defined as the distance associated with a minimum weight. The

weights in the edge-weighted graphs correspond to k values of the Huckel parameter for

the heteroatom.

Petitjean (1992) described graph theoretical shape coefficient I on the basis of

topological radius and diameter as:

RRDI / 0 ≤ I ≤ 1


68

The topological radius (R) is defined as the smallest vertex eccentricity in the graph

and topological diameter (D) is defined as the largest vertex eccentricity in the graph.

He proposed the graph-theoretical bivariate repartition of the (R,D) pairs in the form

of "radius-diameter diagram" and observed unexpected partitioning/classification of

the compounds from the chemical abstracts services registry file in comparison to

other shape coefficients (Petitjean, 1992).

Randic (1993) constructed new graph matrices by modifying Wiener‟s

procedure for calculation of Wiener numbers in alkanes and reported the sequences

(higher Wiener numbers KW) generated by summing the entries in the matrix for

vertices at the same distances from one another. Wiener matrix was extended to

consider heteroatoms. The matrix was termed as Wiener matrix.

Ivanciuc et al. (1993a) introduced the reciprocal distance matrices D-1

by

modifying distance matrix where each off-diagonal element is the reciprocal of the

topological distance between the vertices. A local graph invariant termed as the

reciprocal distance sum, RDSi derived from this matrix was defined as:

n

j

ijii dDVSRDS

1

11)(

Where the symbol VS stands for the row sum operator.

Randic (1993) introduced a modified Wiener descriptor known as Hyper-

Wiener Descriptor, denoted as WW=WW (G). However, Randic‟s algorithm for

computing the hyper-Wiener descriptor could be applied only to acyclic structures. It

was later shown that WW can be computed for all structures as follows (Lukovits and

Linert, 1994; Klein et al., 1995).

WW = 2

1 1

(1/ 4) [( ) ( ) ]N N

ij iji j

D D

Where the summation goes over all pairs of vertices i and j.

Plavsic et al. (1993) introduced The Harary descriptoror RDSUM descriptor

(H),in the honor of Prof. Frank Harary. It is defined as the half-sum of the off-diagonal

elements of the reciprocal molecular distance matrix Dr= D

r (G) as per the following:


69

N

i

N

j

ij

rDH1 1

)(*5.0

Ivanciuc et al. (1993a) defined two other MDs, known as RDSQ descriptor

and RDCHI descriptor, defined as:

5.0

1

1

1

)( ji

n

ij

ij

n

i

RDSRDSRDSQ

5.0

1

1

1

)(

ji

n

ij

ij

n

i

RDSRDSRDCHI

Where RDSi is reciprocal distance sum of vertex vi

Balaban and Diudea (1993) reported Balaban DJ descriptor in terms of

modified vertex distance degrees, Si but using the formula of the matrix sum

descriptors as with Balaban J descriptor. When the weighting factor w is equal to one

and the multigraph factor is equal to zero then the descriptor DJ is related to the

Balaban descriptor J by the following:

GEij

ji SSC

BJ

C

BDJ

2/1

1.2.

1.2

Gutman (1994a) proposed a Schultz-type topological descriptor namely

Gutman molecular topological descriptor by valence vertex degrees (SG).It may be

defined as:

N

i

N

j

ijjiG DvvS1 1

.

Where vivj.Dij is the topological distance between the vertices vi and vj weighted by

product of the endpoint vertex degrees. Like the Schultz molecular topological

descriptor, the Gutman molecular topological descriptor is a vertex-valency-weighted

analogue of the Wiener descriptor, whereas the weighting factor is multiplicative

instead of additive.

Randic et al. (l994a) proposed the first eigenvalue of Wiener matrix as an

alternative descriptor for the molecular branching. JJ descriptor was developed in

analogy with the connectivity descriptor χ from adjacency matrix and Balaban’s

descriptor J from the distance matrix. It can be calculated as:


70

2/1)( bjb iRRJJ

Where Ri is the wiener matrix degree.

Randic et al. (l994b) also proposed new structural invariants based upon

distance/distance matrices (DD matrix) for graphs which are embedded on two and

three-dimensional grids. First value of these matrices, λ/n for path graphs was reported

to be descriptor of folding The ratio Φ = λ/n approaches I (one) for geometrically linear

structures while it approaches o (zero) as path graph is repeatedly folded.

Ivanciuc and Balaban (1994) reported two path matrices based theoretical

invariants called maximum path sum (MPS) and maximum minimum path sum

(MmPS) topological descriptor. MPS topological descriptor is defined as the sum of

the number of bonds on the longest path between any two vertices in the molecular

graph i.e. half-sum of the elements of the maximum path matrix MP, whereas, the

MmPS topological descriptor was defined as the sum of the longest and shortest path

between any two vertices in the molecular graph i.e. sum of the elements of the

max/min path matrix MmP. These descriptors were represented as:

n

j

ij

n

i

MPGMPS11

][2/1)(

)()(][2/1)(11

GMPSGWMmPGMmPSn

j

ij

n

i

If G is an acyclic graph then:

MPS(G)= MmPS(G)/2= W(G)

Where W(G) is Wiener descriptor of a molecular graph G.

Khadikar et al. (1995) proposed Szeged descriptor (Sz), analogous to Wiener

descriptor, valid both for acyclic and cyclic graphs. Wiener descriptor is the sum of

the product of the number of vertices on each side of a bond, while the Szeged

descriptor is defined as the sum of the product of the number of vertices closer to the

atoms on each side of a bond (Gutman and Klavzar, 1995). Szeged descriptor was

defined as:

Sz = Sz (G) =

,u v

u v

n n


71

Where nu stands for the number of vertices nearer to the vertex v than u, and then nv

stands for the number of vertices nearer to the vertex u than v and summation goes over

all edges u, v in a cyclic graph G

Randic and Razinger (1995) introduced geometry dependent molecular

topographic descriptors which can be calculated from novel matrices whose elements

depend upon the molecular geometry. The entries in these matrices were either 3-D

geometric distances between atoms of some modified function of inter-atomic

distances.

Estrada and Gutman (1996) described a novel molecular topological

descriptor MTI (E) based on edge-distances in molecular graphs in analogy with

Schultz molecular topological descriptor (Schultz, 1989). The edge-based version of

the molecular topological descriptor may be expressed as:

n

i

ieee

i DAvEMTI

1

)(

Where Ae

is the edge adjacency matrix and De is the edge distance matrix. e

iv is the

degree of the ith edge of the molecular graph G. Distances between vertices of the

respective line graph are described as edge-distances in a graph. A simple relation was

found between edge-distances and the distances between the vertices that are incident

to the respective edges.

Lukovits (1996) introduced a wiener type descriptor, originally called MPS

topological descriptor (Ivanciuc and Balaban, 1994) but usually known as detour

descriptor and is denoted as ω. The detour descriptor is calculated as the sum of the

detour distances between any two vertices in the molecular graph G as:

1 1

(1/ 2) ( )N N

iji j

Where (∆)ij represents the length of a longest path between vertexes i and j of G.

Diudea (1996a; 1996b) described a novel unsymmetrical square matrix, CJu,

for calculating both Wiener (W) and hyper-Wiener (WW) numbers. This matrix is

constructed by using the principle of single endpoint characterization of paths.


72

Diudea (1997a; 1997b) reported several descriptors from the Cluj matrices,

either as the half-sum of entries in the corresponding symmetric matrices or directly

from the unsymmetric matrices as:

jiu

pe

ijupe MMTI /

/

When defined on edges, TIe is a Cluj descriptor: denoted by CDe or CΔe, depending

on whether it is derived from the Cluj-distance or Cluj-detour matrix. Similarly, when

defined on paths, TIp is a hyper-Cluj descriptor denoted by CDp or CDp. The novel

hyper-descriptor, CDp, showed good correlation with the boiling points of a selected

set of cycloalkanes.

Linert and Lukovits (1997) introduced hyper-detour descriptor (ωω) by

replacing (D)ij by (∆)ij in equation of detour descriptor. (). It was defined similarly

as the hyper-Wiener descriptor (WW), that is, as the quarter-sum of the sum-matrix

made up from the off diagonal elements of the detour matrix []ij and their squares

[]ij2. It is calculated as:

2

1 1

(1/ 4) [( ) ( ) ] / 2N N

ij iji j

Since the Wiener descriptor (W) and the detour descriptor () are identical for acyclic

structures, the same is also true for their hyper counterparts.

Lukovits (1998) introduced all-path Wiener descriptor (WAP

) as all path version

of the Wiener descriptor but with more discriminating power among cycle-containing

structures called as Pasareti descriptor (P). It is defined as:

1

1 1ij

A AAP

ijpi j i

W P

Where the two outer sums on the right side run over all pairs of vertices in the graph

and inner sum runs over all paths pij from vertex vi to vertex vj, and ijp denotes the

length of the considered path. This descriptor is also called the Pasareti descriptor,

because it was derived in home of Lukovits, which is located in the part of Budapest

called Pasarest.


73

The descriptor was later on transformed was into a new variant V = V(G) called the

Verhalom descriptor, which is calculated as:

V = P/k

Where k is the total number of paths in G divided by N (N -1)/2. The name Verhalom

descriptor is given to this distance related descriptor because it was originated in the

chemical research center of the Hungarian academy of sciences located in the district

Verhalom in Budapest.

Plavsic et al. (1998) described Wiener-sum or D/Δ descriptor of a molecular

graph G. It was defined as the half-sum of the off-diagonal elements of the molecular

quotient matrix D/:

])/()[(2/11 1

ij

N N

j

ij

i

DWS

The weiner-sum or D/Δ descriptor decreases as the cyclicity of the molecule

increases. Further, they also suggested the inverse Wiener-sum or /D descriptor,

which was defined as the half-sum of the off-diagonal elements of the distance/detour

quotient matrix / D:

])/()[(2/1)/(/1 1

ij

N N

j

iji

i

DDWD

Where Wi is the Wiener operator. These both descriptors have so far been used only for

the structure-boiling point modeling of condensed benzenoid hydrocarbons.

Diudea et al. (1998) defined new Harary-type descriptors on the basis of

detour and Cluj-detour matrices as:

ijMH /12/1

The symbol M stands for detour (Δe and Δp) and Cluj-detour matrices (CJΔe and

CJΔp). Additionally, the usefulness of Cluj descriptors and their Harary counterparts

in modeling of physicochemical properties of chemical structures was also

demonstrated.

Gupta et al. (1999) introduced a novel pendent-distance based graph invariant

known as superpendentic descriptor (p) to enhance the role of terminal vertices in

[(Q)SAR/SPR] studies. This descriptor was defined as the square root of the summation


74

of products of non-zero row elements in the pendent matrix which is a submatrix of the

distance matrix. The descriptor can be calculated as:

0.5

( )

1 1

mnP

ij

i j

P

Where m and n are maximum possible number of i and j respectively and the distance P

(vi,vjG) is length of the shortest path connecting vertices vi and vj.

Galvez et al. (2000) introduced novel invariants developed from the physical

model of wave interferences, known as differences of path lengths (DPs) and

demonstrated that the mean global kinetic energy of the electrons can be simply

measured from the overall sum of the inverse of the squares of the differences of

distances between all pairs of vertices of the graph. These invariants were employed to

predict the resonance energies and in the evaluation of biological properties such as

antibacterial activities of a wide set of heterogeneous compounds.

Gutman et al. (2000) described novel molecular structure descriptor called

multiplicative Wiener descriptor, (G) defined as the product of the distances of all

pairs of vertices of the underlying molecular graph. It can be calculated as:

)(,

)(

GVji

ijdG

Due to the very large numbers that are often reached by (G), its logarithmic version

i.e. lnπ(G), seems to be more appropriate in searching for (Q)SAR/QSPR models

(Todeschini and Consonni, 2009).

Balaban et al. (2000) conceptualized reverse Wiener matrix (RW) by

subtracting all topological distances from the graph diameter, with zeros as diagonal

elements. New integer-number graph invariants ζi was obtained by summing over

rows or column in this matrix. The half sum of ζi is also a novel topological

descriptor known as reverse Wiener’s descriptor Λ. The general formula for

calculating reverse Wiener’s descriptor Λ is:

WdNNRWn

j

ij

n

i

n

i

i

)1(5.05.0111

Where [RW]ij are the elements of reverse Wiener matrix.


75

They demonstrated that the descriptor value of both W and Λ increases as the size of

graphs increases. However, the value of W increases sharply in comparison to the

value of Λ for strongly branched graphs whereas the value of Λ increases sharply in

comparison to the value of W for lesser branched graphs such as the linear graphs.

Li et al. (2000) defined new quantum-topology descriptor by modifying the

base of molecular connectivity descriptor i.e. δ values. The novel quantum-topology

descriptor was expressed in terms of modified molecular connectivity descriptor as:

1

1

5.0

1

m

i

ji

n

j

tm

Where n and m are the number and the rank of subfigures respectively, δ is the

atomic delta value, and t is the type of subfigure.

The quantum-topology descriptor showed improved ability in correlating

molecular features with the force constant, bonding energy and radius of a

heterogeneous monohydrides as compared to the original descriptor. The advantage of

having topological as well as quantum parameters was proposed as obvious reason for

the improved correlating ability of the novel quantum-topology descriptors.

Espeso et al. (2000) conceptualized molecular descriptors kZ, based on kZ

matrices i.e. second decomposition of Hosoya Z matrix. The kZ descriptor was

defined as:

ji

ijkk ZZ )(

Where the summation runs over all off-diagonal entries kZij in the upper triangle of the

kZ matrix of T.

Khadikar et al. (2001a; 2001b) described a novel descriptor namely

Padmakar-Ivan Descriptor and abbreviated as PI Descriptor. The discriminating power

of the descriptor is comparable to that of Wiener and Szeged descriptors and is simple

to calculate. It can be defined as:

)(

)(

GEe

ejei GenGenGPIPI

Where nei (e|G) is the number of edges closer to vertex i than j and nej (e|G) is the

number of edges closer to j than to i. The summation goes over all edges of G. This


76

descriptor does not coincide with the Wiener descriptor for acyclic trees. Thus, unlike

the Sz descriptor, the PI descriptor is different for acyclic and cyclic graphs.

Randic et al. (2001a) proposed the variable Balaban J descriptor, the "reversed"

Balaban descriptor l/J and a novel descriptor 1/JJ based on J and 1/J. The distance

matrix and the "reversed" distance matrix were the basis of all the variable descriptors.

The "reversed" distance matrix was constructed from the distance matrix by replacing

the diagonal zeroes with the variables x, y, z (Randic and Pomp, 2001).The “reversed”

Balaban descriptor 1/J can be expressed as:

n

ij

jc

ic

n

i

SSC

BJ

1

5.01

11

/1

Where cSi are the row sum of the distance complement or reversed distance matrix.

Estrada and Molina (2001) conceptualized novel molecular invariants based

on local spectral moments of the bond matrix. Local spectral moments were defined as

the sum of diagonal entries of the different powers of the bond matrix corresponding

to a given molecular fragment. Mathematically, spectral moments of the bond matrix

were expressed as follows:

k

f

i

iik ef )()(

1

Where f is the corresponding fragment for which the moments are defined and the

sum is carried out over all bonds forming the fragment f. The elements (eii)k are the

diagonal elements of the kth

power in the bond matrix.

Balaban et al. (2001) derived a general formula W=d/[3(v+0)] for the

normalized Wiener descriptor of polymers. It made possible the calculation of the graph

invariant directly from simple structural information: the number of atoms v, the

number of rings (0) in the repeating polymer, and the topological distance d between the

corresponding pairs of equivalent atoms in two neighboring monomer units. The

reciprocal relationship with the similar descriptor J was pointed out and an approximate

hyperbolic dependence is presented between these two descriptors.

Ivanciuc et al. (2001) introduced novel MDs by separating the terms of the

Wiener‟s polynomial into even and odd molecular graph distances. The even and odd

Wiener‟s polynomial sums WiPolE(x) and WiPolO(x) were used as descriptors in


77

(Q)SAR/QSPR models. The sum of Wiener polynomial terms corresponding to even

graph distances and the sum of the terms corresponding to odd graph distances were

defined as:

k

evenk

k XfXWiPolE .)(,

k

oddk

k YfYWiPolO .)(,

Where kf is the number of pairs of vertices located at a topological distance equal to k,

and the summations go up to the maximal distance in the graph; X and Y are two

independent variable parameters optimized during the modeling procedure.

Bonchev (2001b) extended the approach of overall connectivity to overall

distances OW(G) for characterization of molecular structure. The overall Wiener

number OW(G) of any graph G was defined as the sum of the Wiener numbers Wi(Gi)

of all K subgraphs of G:

)()(

1

GGWGOW i

k

i

i

This descriptor was also defined as in the eth

-order terms, eOW(G), e represents the

number of edges in the sub graph. The topological complexity in acyclic and cyclic

structures was measured in terms of overall distance.

Randic and Zupan (2001) modified the Wiener descriptor-W to W* and

Hosoya descriptor-Z to Z*. In the modified Wiener descriptor-W*, bond contributions

were determined using the reciprocal of the product of the number of atoms on each

side of a bond as:

en

e

ejei NNW

1

1,

1,*

Where Ni,e and Nj,e denote the number of vertices on each side of the edge e, including

vertex i and vertex j, respectively, and ne represents the total number of graph edges.

Similarly, Hosoya descriptor Z was modified to Z* by considering the frequency of

occurrence of carbon-carbon bonds in the patterns of disjoint bonds as:

kk

xkGaxxZ ,....),(* 21

Where xk are integer weights representing the number of times each edge has

appeared in all disjoint edge patterns. Randic and Zupan (2001) outlined a general


78

scheme for partitioning of MDs into bond contributions for descriptors derived from a

selection of matrices associated with molecular graphs.

Cao and Yuan (2001) conceptualized a set of three novel MDs known as

VDI, OEI and RDI based upon distance, vertex and ring (in cyclic compounds. These

descriptors were defined as:

1. Vertex degree-distance descriptor (VDI):

Nn

i

ifVDI

/1

1

Where fi is the elements of vector (1xN) of the vector VS is derived by multiplying the

vertex degree matrix (V) and the derivative distance matrix (S):

2. The odd-even descriptor (OEI):

n

i j

DSOEI ij

1 1

1

)1(

Where n represents the number of vertices in molecular graph and S denotes the

derivative matrix of distance matrix D, whose elements are the squares of the

reciprocal distances.

3. Ring degree-distance descriptor (RDI):

Nn

i

igRDI

/1

1

gi is the elements of vector (1xN).

The usefulness of these descriptors was demonstrated through QSPR models for

prediction of boiling points of acyclic, monocyclic, and polycyclic alkanes (n=343).

Mercader et al. (2001) investigated the applications of TDs based on distance

and detour distance matrices. They employed some usual TDs based on both these

distances to investigate the heat of formation of a set of structurally diverse

hydrocarbons. Surprisingly, TDs based on detour matrix yielded better correlations to

predict enthalpies of formation.

Ivanciuc and Klein (2002) introduced efficient algorithms for the computation

of several distance based topological descriptors of a molecular graph from the distance

invariants of its subgraphs. The procedure utilized vertex- and edge- weighted


79

molecular graphs that account for the multiple bonds as well as the presence of

heteroatoms in the organic compounds.

Gupta et al. (2002b) conceptualized a novel distance based topological

descriptor termed as eccentric distance sum descriptor. Eccentric distance sum, denoted

by DS

, can be defined as the sum of product of eccentricity and distance sum of each

vertex in the hydrogen depleted molecular graph as per the following equation:

1

( ) *n

DSi i

i

G E S

Where Si and Ei are distance sum and eccentricity of vertex i respectively in graph G

having n number of vertices. Eccentric distance sum takes into consideration the

eccentricity and distance sum of all vertices in the graph.

Cao and Yuan (2002) conceptualized a novel topological descriptor VDI(±1)

based on Vertex degree–distance descriptor (VDI to distinguish the cis/trans isomers

of cycloalkanes. They substituted the derivative matrix S with Dmod to obtain the

VDI(±1).This new structural descriptor showed better QSPR results than with VDI.

Castro et al. (2002) reported some upgraded version of the Wiener’s

descriptor: a) by using sum of the bond lengths along the shortest path instead of

graph theoretical distance; b) by using Euclidean distance between the respective

pairs of atoms or c) by using hydrogen filled graphs. However, none of the

theoretically justifiable, modifications of the Wiener descriptor improved the

applicability and value of these structure- descriptors in designing quantitative

structure-property relations. They concluded that the original Wiener descriptor - now

already more than half-a-century old is a much more valuable topological descriptor

than one would expect from its extremely simple and seemingly naive definition.

Milicevci et al. (2003) extended the Z descriptor to general graphs and

investigated its behavior with regard to different structural characteristics of graph such

as branching, cyclicity, size, loops and multiple edges etc. Z counting polynomial and

the matching polynomial are used to calculate this descriptor. These polynomials could

be generated using proper recurrence relations. The structural behavior of the Z

descriptor for simple graphs and general graphs was tested against the total walk count

descriptor (twc) and it was found that the Z descriptor followed the structural changes,


80

i.e. the value of the descriptor increased with the loops, multiple edges, size, cycles and

branching.

Hu et al. (2003) proposed a new variable descriptor, external factor variable

connectivity descriptor (EFVCI), in which the atomic attribute was divided into two

parts i.e. internal and external attributes. Along with atomic attributes, the form of

molecular connectivity descriptor was used to get the external factor variable

connectivity descriptor:

edgesAll

njiEFVCIP

F AAA5.0

....

Where Ai is the attribute of carbon atom perturbed by other atoms.

This kind of descriptor can be regarded as an extension of the molecular connectivity

descriptor by using a new atomic attribute, which makes the descriptor flexible to

different properties.

Narumi (2003) defined two novel topological descriptors, based on the partition

function of a graph for analyzing the statisco-mechanical aspect of the Hosoya

descriptor. These TDs were termed as the bond descriptor B, and the connective

descriptor C.

The bond descriptor B was defined as:

1,,)(

0

GSkGnGBm

k

Where n (G,k) represents the number of different ways in which k bonds are selected

from graph G.

The connective descriptor C was defined as:

1,,)(

0

GRkGqGCm

k

Where m represents the maximum number of „k’ for molecular graph G.

Duchowicz et al. (2003) used the distance and detour matrices based TDs to

calculate the Gibbs free energy for a set of 60 hydrocarbons. The distance matrix

considers the shortest path between any two vertices whereas the detour matrix

considers longest path between any two vertices. The results showed that the TI derived

from detour matrix produces better correlation to predict Gibbs free energy. They


81

concluded that the detour matrix is an appropriate topological tool to be applied in

[(Q)SAR/SPR] analysis.

Yuan and Cao (2003) developed the Edge degree-Distance Descriptor (EDI)

and Sum of edges (Se) based on the edge and distance of molecular graph in order to

distinguish saturated and unsaturated structures. The EDI was defined as:

Nn

i

ihEDI

/1

1

Where hi is the elements of vector ES (1xN) obtained by edge degree vector E

multiplying derivative distance matrix S:

ES= (h1 , h2 ,….. hn )

The Sum of edges (Se) equals to the half sum of the edge degrees (Ei) of each vertex in

molecular graph as:

n

i

ie ES

1

5.0

Yuan and Cao (2003) suggested that the combination of these descriptors together

could represent the molecular structures not only of alkanes but also of alkenes,

alkynes, and benzenoid hydrocarbons.

Randic (2004) described the construction of a novel MD, called the Wiener-

Hosoya descriptor (W-H), in view of its structural relationship to both the Wiener

number W and the Hosoya topological descriptor Z. This descriptor was expressed as:

W-H(G) = W(G) + W(G-ee)

Where W is the Wiener descriptor and W(G-ee) is a Wiener-type descriptor calculated

by summing the Wiener descriptors relative to the subgraphs obtained with deletion

of each edge and all incident edges to it, following an analogous approach to the

Hosoya Z descriptor calculation.

Klein et al. (2004) introduced 3D-topological distance based (3D-TDB)

descriptors by relating Euclidean to topological distances. The descriptors were tested

with three different data sets: the benchmark steroids, a well characterized

benzodiazepine set, and a set of β-cyclodextrin inclusion compounds. The predictive

abilities of models obtained with 3D-TDB descriptors were reported to be in good

agreement with those obtained from other 3D-(Q)SAR methods.


82

Gutman et al. (2004) developed modified Wiener descriptors mWλ as:

e

m enenTW )]()([)( 21

Where λ is a parameter that may assume different values. Clearly, for λ = +1 the

modified Wiener descriptor mWλ reduces to the ordinary Wiener descriptor W.

Bajaj et al. (2004a) conceptualized a modification of Wiener's index, termed as

Wiener's topochemical descriptor (Wc) based on topochemical distance matrix:

1 1

1

2

n n

c c ji j

W Pi i

Where Picjc denotes the chemical path length in the graph G, n is highest possible

number of i and J. Wiener’s topochemical descriptor was derived from the weighted

molecular graph whose vertices were properly weighted with selected

chemical/physical property. It was not only sensitive to the nature, number, and relative

position of heteroatom but also exhibited far less degeneracy as compared to Wiener’s

descriptor

Lu et al. (2006) conceptualized a novel MD called Lu descriptor which is

based on Wiener descriptor, for modeling properties of heteroatom and multiple bond

containing organic compounds. The Lu descriptor was defined as:

n

i

n

j

ij

n

k

vk SqnLu

1 11

5.0log

Where qk is the relative electronegativity value of vertex k, Sij represents the sum of v

power of the relative bond lengths between two adjacent vertices in the shortest path

between the vertices i and j and n represents the number of vertices in a molecular

graph G.

Diudea (2006) conceptualized a new counting polynomial, called the

“Omega” Ω (G,x) polynomial on the ground of quasi-orthogonal cut “qoc” edge strips

in a bipartite lattice. The Omega Ω (G,x) polynomial qoc counting was defined as:

c

c

xcGmxG ,,


83

Where m(G, c) represents the number of qocs of length c. The summation runs up to

the maximum length of qocs in G. The polynomial is an elegant form of topological

description of lattice graphs. It is related to the well-known PI descriptor.

Balaban et al. (2007) developed five new TDs based on the number of paths

pi with length increasing from i = 1 (i.e. the number of edges) to the maximal value of

pi, which form the molecular path code. These descriptors were defined as:

1. Quadratic descriptor (Q) : Q = ipi2/(+1)

2. Descriptor S : S= ipi1/2

/(+1)

3. Path count descriptor : P = i{pi1/2

/[i1/2

(+1)]}

4. Distance-reduced descriptor D : D= i{pi1/2

/[i(+1)]}

5. distance-Attenuated descriptor A : A = i{pi/[i(+1)]}

Among these descriptors, path-count descriptor P was found to be least degenerate

and also showed best biparametric correlation with normal boiling point of alkanes.

Zhou and Trinajstic (2008) described lower bounds for the Kirchhoff

descriptor (Bonchev et al., 1994) in terms of its structural parameters viz. the number

of edges (bonds), the number of vertices (atoms), maximum vertex degree (valency),

connectivity and chromatic number etc. The bounds of a descriptor furnish important

information of a molecule (graph) as they establish the approximate range of the

descriptor in terms of molecular structural parameters (Zhou and Trinajstic, 2008).

Iranmanesh et al. (2009) defined edge version of well known Wiener

descriptor as:

)(},{

00 ),(5.0)(

GEfe

e fedGW

Where d0 (e, f) = d1{e, f)+1 if e ≠ j

= 0 if e = j

Here, the distance between the corresponding vertices is the distance between two

edges in the graph G.

Mahmiani et al. (2010) introduced the total version of Szeged descriptoras:

)()()(

)(

vtutGSz

GEuve

eeT


84

Where Te(u) represents the number of vertices and edges of G closer to u than to v

and Te(v) represents the number of vertices and edges of G closer to v than to u. The

computation of this novel descriptor was exemplified for some well-known graphs

and in particular for zigzag nanotubes.

Goyal et al. (2010) described a novel pendenticity based topochemical

descriptor termed as pendentic eccentricity descriptor expressed as:

n

i

m

j

iijp Ep

1 1

2/

Where P(ij) is the path length containing the least number of edges between vertices i

and j in graph G; Ei is the eccentricity of a vertex vi in G and n is the maximum

numbers of i and j.

Diudea et al. (2011) investigated the uniqueness (discriminating ability) of a

newly proposed CJN super descriptor using (real) atomic and synthetic structures.

This new descriptor distinguished all graphs uniquely and some MDs which are

embedded in the super descriptor have shown excellent correlating ability with

alkanes properties.

Alaeiyan and Asadpour (2011) proposed revised version of well known

Szeged descriptor of molecular graph G as:

)(}{

* )],(2

1),([)],(

2

1),([)(

GEe

Z uvouvnvuovunGS

Where n(u, v) and o(u,v) denotes the number of vertices that are closer to u than to v

and the number of vertices of the same distance from u and from v, respectively. They

demonstrated the computation of revised Szeged descriptor for bridged graphs.

Dong et al. (2011) proposed a novel version of the edge-Szeged descriptor in

parallel to the revised (vertex) Szeged descriptor. The revised edge-Szeged descriptor

was defined as:

)(}{

* )]2/),(),([]2/),(),([)(

GEe

GGGGe uvnuvnvunvunGSz

Where nG(u,v) are the number of edges equidistant from both ends of e = u,v )(GE .

The lower and upper bounds were also demonstrated by them for this MD for various

graphs.


85

Iranmanesh et al. (2011) proposed a new version of the hyper-Wiener

descriptor as:

)()()(2

GWGWGWW deieiei

where

)(},{

2 4,0),,(5.0)(2

GEfe

id

ei ifedGW

The calculation of this edge version of hyper-Wiener descriptor was exemplified on

some well-known graphs such as path, cycle, complete graphs.

Deng (2011) introduced a novel variant topological descriptor for molecular

graphs, called sum-Balaban descriptor. For a simple and connected graph G with

vertex-set V (G) and edge-set E(G), sum-Balaban descriptor was defined as:

GEij

ji SSC

BGSJ

5.0

1)(

Where Si and Sj are the distance sum of the vertices i and j respectively, B the number

of graph edges, and C the cyclomatic number, that is, the number of rings. The

predictive ability of this descriptor was investigated through QSPR modeling of some

physiochemical properties of octanes.

Bruckler et al. (2011) deduced a new class of distance-based molecular

structure descriptors i.e. Q-descriptors with an aim to eliminate a general shortcoming

of the Wiener and Wiener-type descriptors, namely that the greatest contributions to

their numerical values come from vertex pairs at greatest distance. The Q-descriptor

may be represented by the following:

)(},{

,

GVvu

vuQ

Where γ(u,v) depends solely on the distance d(u,v) between the vertices u and v.

Q-descriptor was also related with the Hosoya polynomial as:

,2 GHQ

The multiplier 2 comes from the fact that each pairwise interaction has been counted

twice. Thus Q is an additive function of increments associated with pairs of vertices

of G.


86

Zhang et al. (2012) reported q-analogs of Wiener descriptor motivated by the

theory of hypergeometric series. Some possible chemical interpretations and

applications of the q-Wiener descriptors were also discussed.

Doslic and Reti (2012) investigated discriminating potential of traditional

degree-based descriptors and proposed a novel T(G) descriptor characterized by an

improved discriminating potential and reduced degeneracy. The T(G) descriptor was

expressed as:

1)(),(max

)(),(min

)(

1)(

)(}{ 21

21 GVu

umum

umum

GVGT

This descriptor was judged to be more efficient for discriminating between

topological structures of molecular graphs than several traditional molecular

descriptors.

Adjacency-cum-distance based graph invariants:

These descriptors employ distance matrix as well as adjacency matrix to characterize a

molecular graphs as these descriptors combine the information of both adjacency matrix

as well as distance matrix, hence, these MDs contain considerably more topological

information than the other MDs derived from only single matrix.

Sharma et al. (1997) conceptualized a novel, distance-cum-adjacency based

MD termed as eccentric connectivity descriptor (ECI) and defined as the summation

of the product of eccentricity and degree of each vertex in hydrogen depleted

molecular graph. It can be expressed as:

n

i

ii

c VE1

)*(

Where n is the total number of vertices, Ei is the eccentricity and Vi is the degree of

vertex in graph G. The descriptor was successfully used for mathematical models of

biological properties of diverse nature i.e anticonvulsant activity (Sardana and Madan,

2002b), CDK-1 inhibitory activity (Lather and Madan, 2005a), genotoxicity (Mosier

et al., 2003; Linnan et al., 2005), anti-HIV activity (Kumar et al., 2004; Lather and

Madan, 2005b; Dureja and Madan, 2009), anti-inflammatory activity (Gupta et al.,

2002a), Diuretic activity (Sardana and Madan, 2001), phospholipase A2 inhibitory

activity (Kumar and Madan, 2006), glycogen synthase kinase-3 inhibitory activity


87

(Kumar and Madan, 2005), carbonic anhydrase inhibitory activity (Kumar and

Madan, 2007), anti allergic activity (Kumar and Madan, 2007b), adenosine receptors

binding activity (Lather and Madan, 2004, Kumar and Madan, 2007c). The

mathematical properties of ECI have also been investigated extensively in the recent

past (Ilic, 2010; Doslic et al., 2010; Zhou and Du, 2010; Moradi and Baba-Rahim,

2013).

Ren (1999) developed a novel topological descriptor based on adjacency matrix

and distance matrix. It was denoted as Xu descriptor and was claimed to have high

discriminant power particularly for molecular size and branching. It is defined as:

2

1

1

*

* log * log

*

A

i ii

i A

i ii

Xu A L A

Where A represents the number of atoms and L represents the valence average

topological distance calculated by vertex degree δ and vertex distance degree ζ of all

the atoms. The Xu descriptor has better discriminatory power of alkane isomers and is

very simple to calculate. The Xu descriptor promises to be a useful parameter in the

(Q)SAR/QSPR studies.

Gupta et al. (2000) conceptualized and developed adjacency-cum-path length

based topological descriptor termed as connective eccentricity descriptor (Cξ). It is

defined as the sum of the ratios of the degree of a vertex (Vi) and its eccentricity (Ei) for

all vertices in the hydrogen depleted molecular graph. It can be expressed as:

1

/n

i ii

C V E

The discriminating power and sensitivity of connective eccentricity descriptor was

found to be better than that of well-known Balaban’s mean square distance (MSD)

descriptor. The utility of connective eccentricity descriptor in structure-activity

studies was investigated by developing models to predict antihypertensive activity of

81 derivatives of N-benzylimidazole. The results obtained using connective

eccentricity descriptor was reported to be better than those obtained using Balaban’s

MSD descriptor.


88

Gupta et al. (2001) conceptualized a new descriptor - eccentric adjacency

descriptor (A) for characterization of molecular structure and can be simply

calculated from a modified adjacency matrix termed as additive adjacency matrix.

Eccentric adjacency descriptor was defined as the summation of ratios, of sum of the

degrees of adjacent vertices and eccentricity of the concerned vertex, for each vertex

in the hydrogen suppressed molecular structure:

1

nA i

iiE

Where ζi is sum of valence values of all the vertices adjacent to the concerned vertex in

a hydrogen depleted molecular graph, n represents the total number of vertices and Ei

represents the eccentricity of the vertex i in graph G. This descriptor was found to be

more sensitive compared to the first order molecular connectivity descriptor.

Ren (2002a) conceptualized novel atom-type AI topological descriptors based

on the distance matrix and adjacency matrix of a graph to code the structural

environment of each atomic type in a molecule. The topological descriptor for any

atom type i in molecular graph, AIi, was defined as:

1iAI

iiii svsv /2

Where parameter ɸ is considered as a perturbing term of ith atom, reflecting the

effects its structural environment on its AIi value; vi is the vertex degree and si is the

distance sum. The efficiency of the Xu descriptor and AI descriptors was verified by

high quality QSPR/(Q)SAR models obtained for several physical properties and

biological activities of several data sets of alcohols with a wide range of non-

hydrogen atoms.

Sardana and Madan (2002a; 2002b) conceptualized a novel adjacency-cum-

distance based topological descriptor known as adjacent eccentric distance sum

descriptor (ξSV

). It was defined as the sum of the values of product of distance sum

and eccentricity and divided by the degree of the corresponding vertex for each vertex

in a hydrogen depleted molecular graph having n vertices as:


89

i

n

i

iiSV VES /

1

Where Si, Ei and Vi are the distance sum, eccentricity and degree of vertex i in graph

G respectively. The adjacent eccentric distance sum descriptor exhibited very low

degeneracy. The discriminating power of adjacent eccentric distance sum descriptor

was found to be much superior to that of the eccentric connectivity descriptor.

Quigley and Naughton (2002) modified the eccentric adjacency descriptor

(Gupta et al., 2001) and proposed valence eccentricity adjacency descriptor by

substituting simple connectivity value with vertex valence value. The valence

eccentric connectivity descriptor can be easily calculated from additive valence

adjacency and distance matrices by using following equation:

n

i

ivi

v ES

1

/

Where viS is the sum of vertex valences and Ei is eccentricity of vertex i.

The vertex valences (incorporating the superscript v to allow for calculations

involving multiple bonding and heteroatoms) were defined as follows:

ivv

i hZ

Where Zv is the number of valence electrons of the vertex (atom) and hi is the number

of hydrogen atoms attached to it.

Quigley and Naughton (2002) also derived another descriptor (Δξ) in a

manner analogous to the differential molecular connectivity descriptor (Δm

χ). This

differential eccentric adjacency descriptor (Δξ) was expressed as:

Δξ = ξ - ξv

They envisaged that this descriptor will be useful in encoding further information

which may be employed in structure/activity studies.

Gupta et al. (2003) developed three novel eccentric adjacency topochemical

descriptors i.e. eccentric adjacency topochemical descriptor-1(1cA), eccentric

adjacency topochemical descriptor-2(2cA) and eccentric adjacency topochemical

descriptor-3(3cA). These MDs can be represented as:


90

1.

n

i ic

icAc

E

S

1

1

2.

n

i ic

iAc

E

S

1

2

3.

n

i i

icAc

E

S

1

3

Where Ei is the eccentricity and Si is the distance sum of concerned vertex i, Eic is the

chemical eccentricity and Sic is the chemical distance sum of concerned vertex i, n

represents the number of vertices in the hydrogen suppressed graph.

These descriptors were found to be sensitive towards small change in molecular

structure and showed high discriminating power with regard to anti-HIV activity of

HEPT derivatives.

Kumar et al. (2004) refined eccentric connectivity descriptor to improve its

degeneracy and made it sensitive towards the presence and relative position of

heteroatom(s). The refined eccentric connectivity descriptor, termed as eccentric

connectivity topochemical descriptor (cc) overcomes the limitations of eccentric

connectivity descriptor by exhibiting very low degeneracy and displaying sensitivity to

both the presence and relative position heteroatom(s) without compromising with the

discriminating power of eccentric connectivity descriptor. It was defined as the sum of

the product of chemical eccentricity and the chemical degree of each vertex in the

hydrogen depleted molecular graph. It can be expressed as:

cc

n

iicic VE

1

Where Eic is the chemical eccentricity and Vic is the chemical degree of vertex i. n

represents the number of the vertices in graph G. The values of eccentric connectivity

topochemical descriptor were computed using topochemical adjacency matrix (Ac) and

topochemical distance matrix (Dc).

Bajaj et al. (2004b) conceptualized a new adjacency-cum-distance based

topochemical descriptor with high discriminating power, known as superadjacency

topochemical descriptor (AC). It was defined as the sum of the quotients of the


91

product of concerned vertex chemical degree and sum of adjacent vertex chemical

degrees; and chemical eccentricity of the concerned vertex, for each vertex of the

hydrogen depleted molecular graph. It was represented as:

n

i ic

icicAC

E

SvG

1

deg

Where Sic represents the sum of chemical degrees of all vertices (vj), adjacent to vertex

i and n is the number of vertices in graph G. The discriminating power of

superadjacency topochemical descriptor was m far superior as compared to distance

based Wiener‟s descriptor and adjacency based molecular connectivity descriptor.

This descriptor has been successfully utilized for the modeling anti-HIV activity

(Bajaj et al., 2005c; 2005d) and anti-tumor activity (Bajaj et al., 2005b).

Shamsipur et al. (2004a, 2004b) proposed some new topological descriptors

(Sh descriptors) based on the distance sum (Si) and connectivity ( vi ) of a molecular

graph that derived directly from 2D molecular topology for use in (Q)SAR/QSPR

studies. These are a set of descriptors calculated by different combination of distance

sum and connectivity vectors:

b

vj

vi

ji

b

SSSh

*

*log1

bji

vj

vi

b SSSh

*

*log2

2/1

3 **log

b

vj

viji

bSSSh

2/1

4*

*log

bji

vj

vi

b SSSh

2/1

5 **

b

vj

viji

bSSSh

b

vj

viji

bSSSh **log6

bji

vj

vi

bSSSh *log*7


92

vSSh T log8

n

i

n

j

ijSdSh

1 1

9 log

SdSpSh maxlog10

1ShNNSh CC

Where S is the column vector collecting the distance sum, v the column vector

collecting the valence vertex degrees, and Sd the square A*A matrix obtained by the

inner product of the two vectors Si and v. In the Sh descriptors 1-7, the summations

run over all the adjacent vertices. Sh9 descriptor is the sum over all the entries of the

Sd matrix, whereas Sh10descriptor is the logarithm of its highest eigenvalue; Sh

descriptor was derived from the descriptor Sh1, including the number of carbon atoms

NC to account for molecular size.

Bajaj et al. (2006) developed a highly discriminating TD, termed as

augmented eccentric connectivity descriptor (Aξ

c). It was defined as the sum of the

quotients of the product of adjacent vertex degrees and eccentricity of the concerned

vertex, for each vertex in the hydrogen depleted molecular graph and expressed as:

n

i i

icA

E

M

1

Where n represents the number of vertices, Ei is the eccentricity and Mi is the product

of degrees of all vertices (vj) adjacent to vertex i in graph G. The augmented eccentric

connectivity descriptor was having superior discriminating power than the distance

based Wiener’s descriptor and adjacency based molecular connectivity descriptor.

Moreover, this descriptor exhibited very low degeneracy and predicted the anti-HIV

activity of 2-pyridinone derivatives with an accuracy of 89% (Bajaj et al., 2006).

Dureja et al. (2008) conceptualized three new generation descriptors, termed

as: superaugmented eccentric connectivity descriptor-1 cSA

1 , superaugmented

eccentric connectivity descriptor-2 cSA

2 and superaugmented eccentric connectivity

descriptor-3 cSA

3 . These can be defined as the sum of the quotients of the product of


93

adjacent vertex degrees and power of concerned vertex, for each vertex in the hydrogen

suppressed molecular graph and expressed as:

1

nSA c i

N Ni i

M

E

Where Mi is the product of degrees of all the vertices (vj), adjacent to vertex i, Ei is the

eccentricity, and n is the number of vertices in the graph. N is equal to 1, 3, 4 for cSA

1 ,

cSA

2 , cSA

3 respectively.

Dureja et al. (2008) also proposed the topochemical version of

superaugmented eccentric connectivity descriptors termed as: superaugmented

eccentric connectivity topochemical descriptor-1 cSAc

1 , superaugmented eccentric

connectivity topochemical descriptor-2 cSAc

2 , and superaugmented eccentric

connectivity topochemical descriptor-3 cSAc

3 . These can be defined as the sum of the

quotients of the product of adjacent vertex chemical degrees and powered chemical

eccentricity of concerned vertex, for each vertex in the hydrogen suppressed

molecular graph and can be expressed as:

1

nSA c ic

N Ni ic

M

E

Where Mic is the product of chemical degrees of all the vertices (vj), adjacent to vertex i,

Ei is the chemical eccentricity of concerned vertex, and n is the number of vertices in

the graph. N is equal to 2, 3, 4 for cSAc

1 , cSAc

2 , cSAc

3 respectively.

The (Q)SAR models based on superaugmented eccentric connectivity topochemical

descriptors predicted anti-HIV-1 activity of 6-arylbenzonitriles with high degree of

accuracy.

Dureja et al. (2009) defined four novel TDs termed as superaugmented

pendentic descriptors ( 1PSA

, 2PSA

, 3PSA

and 4PSA

) defined as the summation of

quotients, of the product of non-zero row elements in the pendent matrix and product


94

of adjacent vertex degrees; and Nth

power eccentricity of the concerned vertex, for all

the vertices in a hydrogen suppressed molecular graph. It can be expressed as:

n

i

n

jNi

iSA

E

MPij

1 1

NP

Where P(ij) is the length of the path that contains the least number of edges between

vertex i and vertex j in graph G; Mi is the product of degrees of all vertices (vj),

adjacent to vertex i. The eccentricity Ei of a vertex vi in G is the path length from

vertex i to the vertex j which is farthest from vi (Ei = max d (vi ,vjG) and n is the

maximum possible numbers of i and j. The value of N is equal to 1, 2, 3 and 4 for

1PSA

, 2PSA

, 3PSA

and 4PSA

respectively. These descriptors exhibited high sensitivity

towards branching, high discriminating power and extremely low degeneracy.

Dutt and Madan (2010) proposed new generation superaugmented eccentric

connectivity descriptors (denoted by: SA

c4,

SA

c5,

SA

c6 and

SA

c7) along with their

topochemical versions (denoted by: SAc

c4,

SAc

c5,

SAc

c6 and

SAc

c7) for the purpose of

(Q)SAR/QSPR modeling. These descriptors can be expressed as:

n

iNi

icSA

E

M

1

2

Where Mi is the product of degrees of all the vertices (vj), adjacent to vertex i; Ei is the

eccentricity; n is the number of vertices in the graph and the N is equal to 1, 2, 3 and 4

for SAc

c4,

SAc

c5,

SAc

c6 and

SAc

c7 respectively. These descriptors exhibited high

discriminating power and very low degeneracy.

Todeschini and Consonni (2010) proposed different kinds of novel local

vertex invariant (LOVIs), based on a multiplicative form of some reported vertex

degrees. In addition to above, they derived different kinds of MDs from each of the

LOVIs in analysis. Some of these MDs defined in terms of LOVIs (L) and derived

from the eigenvalues (λ) was expressed as:

1. Sum like descriptors

n

i

iLS

1

2. First Zagreb like descriptors

n

i

iLM

1

21


95

3. Second Zagreb like descriptors )(1

1 1

2

n

i

n

ij

jiij LLaM

4. Randic like descriptors

1

1 1

5.01 )(n

i

n

ij

jiij LLa

5. Leading eigenvalues SpMax (V,1) = maxi {λi}

6. Estrada like descriptors

n

i

ieVEE

1

)1,(

7. Hosoya-type descriptors

n

i

iO CVH

0

)1;(

The utility of proposed descriptors were also investigated through deriving QSPR

models for the series of 18 hydrocarbons with 8 carbon atoms (C8). The MDs based

on the newly defined vertex degrees showed higher prediction ability than that

obtained by the classical vertex degrees.

Ediz (2010) defined a modified version of ECI, called as Ediz eccentric

connectivity descriptor as:

Vv

vcE

VEi

SG

)()(

Where Sv is the sum of degrees of all vertices u, adjacent to vertex v; Ei(v) is the

eccentricity of v. The calculation of this descriptor was demonstrated for nanostar

dendrimers.

Turker et al. (2010) developed a novel MD called the Turker-Gumus

descriptor (TG descriptor) by utilizing the concepts of both, the connectivity and the

path distances in defining this novel MD as:

m

ij

ij

n

ij

ij ddTG

Where ijd are the elements of distance degree matrix. The n and m are the number of

atoms in starred and unstarred set respectively.

Zhou and Du (2010) described mathematical properties of the eccentric

connectivity descriptor (ECI). They established various lower and upper bounds for

the ECI in terms of other graph invariants including the number of vertices, the

number of edges and the degree distance.


96

Goyal et al. (2011) proposed four refinements of eccentric distance sum

topochemical descriptor termed as augmented eccentric distance sum topochemical

descriptors 1-4 ( ADSc1 , ,2

ADSc ADS

c3 and )4ADSc so as to significantly augment

discriminating power and to reduce degeneracy. These MDs were defined as:

ic

n

i

icADSc SE

1

21 , ic

n

i

icADSc SE

1

32 ,

n

i

icicADSc SE

1

23 ,

n

i

icicADSc SE

1

34

Where Sic is chemical distance-sum of vertex i, Eic is chemical eccentricity of vertex i

and n is the number of vertices in graph G. These MDs were successfully utilized for

developing models for prediction of anti-tumor activity of bisphosphonates.

Das and Trinajstić (2011) compared the relationship between ECI and

Zagreb descriptors (M1 and M2) for chemical trees. Besides chemical trees, molecular

graphs were also treated and the value of ECI was found greater than Zagreb

descriptor-M1 for diameter greater than or equal to 7.

Gupta et al. (2011) conceptualized highly discriminating superaugmented

eccentric distance sum connectivity descriptors as fourth generation MDs. The

topochemical versions of these MDs (denoted by: cc

SED1 , c

cSED

2 , cc

SED3 and c

cSED

4 ) was

expressed by the following:

1

21

icNic

icn

i

ccN

SED

SE

M

Where Mic is the product of chemical degrees of all the vertices (vj), adjacent to vertex

i; Eic is the chemical eccentricity; Si is the chemical distance sum of vertex i and n is

the number of vertices in the graph and the N is equal to 1,2,3,4 for superaugmented

eccentric distance sum connectivity topochemical descriptors-1, 2, 3, 4 (denoted by:

cc

SED1 , c

cSED

2 , cc

SED3 and c

cSED

4 ). These MDs were successfully employed for

development of numerous models for Chk2 inhibitory activity of 2-

arylbenzimidazoles.

Ediz (2012) proposed another modified version of ECI, called as reverse

eccentric connectivity descriptor (REEC) as:


97

Vvv

icRE

S

vEG

)()(

Where summation goes over all vertices of graph G; Ei(v) represents the eccentricity

of v and Sv denotes sum of degrees of all vertices adjacent to vertex v. The predictive

power of this MD was demonstrated on some physico-chemical properties of octanes.

In addition, basic mathematical properties in terms of lower and upper bounds were

also investigated.

Xu and Guo (2012) devised the edge version of commonly used adjacency –

cum-distance based eccentric connectivity descriptor, ECI. The edge version of ECI

(denoted by: ce ) was defined as:

)(

)()(()(

GEi

ce fVfEG

Where f = ij i.e. an edge in E(G); V(f) is the degree of an edge f and E(f) is its

eccentricity. Various upper and lower bounds were also reported for this MD of

connected graphs in terms of order, size and girth.

Gupta et al. (2011) proposed four highly discriminating fourth-generation

topological descriptors, termed as superaugmented eccentric distance sum connectivity

descriptors, as well as their topochemical versions known as superaugmented eccentric

distance sum topochemical connectivity descriptors.

Dutt and Madan (2012b) conceptualized and developed four novel MDs

termed as superpendentic eccentric distance sum descriptors 1-4 (denoted by: 1PEDS

,

2PEDS

, 3PEDS

and 4PEDS

) along with their topochemical counterparts (denoted by: 1P

c

EDS

, 2P

c

EDS

, 3P

c

EDS

and 4P

c

EDS

). The topochemical version of these MDs can be expressed

as:

n

ji icNic

icicjcP

N

EDS

SE

MP

11

1

Where Picjc represents the chemical path length with least number of edges between

vertices I and j in graph G. Mic is the product of chemical degrees of all vertices (vj),

adjacent to vertex i. Eic is the chemical eccentricity of concerned vertex i, Sic is the


98

chemical distance sum of vertex i and n is the number of vertices in the hydrogen

depleted graph. N is equal to 1,2,3,4 for 1P

c

EDS

, 2P

c

EDS

, 3P

c

EDS

and 4P

c

EDS

respectively.

The utility of proposed MDs was investigated through development of models for the

prediction of hCRF-1 binding affinity of substituted pyrazines using decision tree and

moving average analysis.

Singh et al. (2013) conceptualized and developed three novel MDs termed as

refined general Randic descriptors along with their topochemical counterparts. The

descriptors can be defined as the summation of the quotients of the inverse of the

product of the degree of each vertex on every edge in the hydrogen-suppressed

molecular graph having n vertices.

Centric graph invariants

The concept of graph center is based on molecular topological distances between the

graph vertices. The graph center can be a single vertex, a single edge, or a single

group of equivalent vertices. The center vertices have the smallest maximal distance

to other vertices.

dijmax

= min for j = 1, 2, ……, p

Invariants derived from the concept of center are called centric graph descriptors and

were proposed to quantify the degree of compactness of molecules by distinguishing

between molecular structures organized differently with respect to their centers.

Centric descriptors are MDs that quantify the degree of compactness of molecules

based on the recognition of the graph center (Todeschini and Consonni, 2009).

Balaban (1979) proposed a set of five new graph invariants classified as centric

invariants on the basis of sequences of numbers obtained by pruning an acyclic graph.

By pruning stepwise all vertices of degree one (δi), a vertex (center) or an edge

connecting two adjacent vertices (bicenter) is obtained. Balaban developed a Balaban

centric descriptor (B), which is defined as:

2i

i

B

This descriptor provides a measure of molecular branching: the higher the value of B,

the more branched is the tree.

Balaban centric descriptor (B) provides a measure of molecular branching: the

higher the value of B, the more branched the tree. It is known as centric descriptor


99

because it reflects the topology of the tree as viewed from the centre. Four invariants

were devised from B and M1 to differentiate branching from number of vertices by

normalization and binormalization. Normalized centric (C), binormalized centric (C‟)

and quadratic invariants (Q, Q‟) were defined as:

2

2

B n UC

2

2

( 2) 2

B n UC

n U

4 33Q V V 4 32(3 )

( 2) ( 3)

V VQ

n n

Where n is the number of vertices. U = [1 – (-1)N]/2, while V3and V4are vertices of

degree three and four respectively.

Normalization of the topological descriptor is done by imposing the same lower bound

(regardless of n) for all graphs which is equal to zero for the least branched (linear) tree

on all the graphs. In order to find the normalized quadratic descriptor one is required to

find the quadratic function of the general form. It was found that the centric invariants

parallel the ordering induced by descriptor B, while the quadratic invariants induced

ordering which parallel those due to Gutman et al.‟s descriptor M1 and Gordon-

Scantlebury descriptor N2

Bonchev et al. (1980) generalized the graph center concept known as polycentre

for any connected cyclic or acyclic graph based on topological distance matrix. The

centric invariants were used to determine correlations, differentiating isomeric chemical

structures and for coding and computer processing of chemical structures.

The descriptors termed as Bonchev centric information descriptors were derived

from distance matrix D and edge distance matrix ED. One of them along with its

corresponding generalized centric information descriptor is being defined below.

Distance degree centric descriptor .deg

v

cI

is defined as:

.deg 21

log

v Gg g

c

g

n nI

A A


100

Where ng is the number of graph vertices having both the same atom eccentricity η and

the same vertex distance degree ζ, G is the number of different vertex equivalence

classes and A is the total number of atoms.

Generalized Distance degree centric descriptor .deg

v G

cI

is defined as:

.deg 21

log

v G Gg g

c

g

n nI

A A

Where ng is the number of graph vertices having both the same average topological

distance to the polycentre and the same vertex distance degree ζ, G is the number of

different vertex equivalence classes and A is the total number of atoms

Diudea et al. (1991) introduced the so-called B-matrix to develop a novel

descriptor based on counting of the vertices in graph spheres (layers). A sphere may be

defined as the list of atoms at a given topological distance surrounding a central vertex

and its use is advantageous in studies which investigate the influence of neighboring

vertices on a specific property of central vertex.

The two types of originally proposed centro-complexity operators were

defined as:

3

0

3

0

1 )(

kbkbBxD

k

iki

D

k

ikRi

kD

k

ikik

D

k

ikRi bbBx

1010)(

00

2

Where B is the branching layer matrix and δ the vertex degree.

Balaban and Diudea (1993) constructed a new type of layer matrix, called R

matrix, on the basis of distance sums of vertices. This matrix was operated with two

classes of operators: one of “centricity” (“c”) type and the other of

“centrocomplexity” (“x”) type, the last one taking into account the „more important‟

vertices in molecular graphs. By analogy with the regressive degrees they defined

new real-number LOVIs, regressive distance sums as:


101

ik

D

k

nki rr

0

10

Where D is the diameter and n denotes the number of digits for the maximal rik value

in G.A simplified form of the centrocomplexity operator was also proposed as:

nkD

k

ikink

D

k

iki rSrRx

1010)(

10

Where Si is the distance sum (i.e., vertex distance sum) of the ith vertex and z is the

number of digits of the max rik-value in graph.

Diudea (1994) differentiated the layer matrices (LM) from the sequence

matrices (SM).The layer matrix is a collection of the properties of vertices u located in

layers (concentric shells) at a distance j around each vertex i in G whereas sequence

matrix collects the walks starting from the vertices i to all other n -1 vertices in G. He

also defined two descriptors - centrocomplexity x (LM) and centric invariant c (LM)

based on LM matrices

SM(e)

= [mi(e)

, i (1,n); e = 1,2,….., esp]

LM(e)

= [lmij(e)

= ∑mu(e)

; i (1,n); j(0, d); e(1,esp)]

uG(u)i =[u;diu=j]

Where m is the label for particular type of walk or property, n is the number of vertices

in g, d is the diameter of G and esp being eccentricity of vertex i.

Balaban (1995) proposed regressive decremental distance sums to obtain

greater discrimination between the terminal and central vertices. They are calculated

from the distance sum layer matrix LDS by the following:

zj

j

ijii

i

lmSLDSx

10][

0

Where Si is the distance sum of the ith

vertex. In this way, the progressively attenuated

contributions due to more distant vertices are subtracted from the distance degree of

the focused vertex.

Newman (2005) reoprted a new betweenness measure i.e. betweeness

centrality that counts essentially fraction of shortest paths going through a given

vertex i as:


102

ijkp

ipiBC

n

k

n

kj kjshort

kjshort

,)(

)(1

1 1

Where short

pkj is the number of shortest paths connecting vertices k and j, and short

pkj (i)

is the number of these shortest paths that pass through the vertex i. The betweenness

centrality BC(i) characterizes the degree of influence a vertex has in communicating

between vertex pairs.

Information theory based graph invariants:

Information theory has been used in chemical graph theory for describing chemical

structures and for providing good correlations between physico-chemical and structural

properties. Information descriptors are constructed for various matrices and also for

some topological descriptors. The advantage of such kind of descriptors is that they

may be used directly as simple numerical descriptors in a comparison with physical,

chemical or biologic parameters of molecules in structure property and activity

relationships. It can also be noted that information descriptors normally have greater

discriminating power for isomers than the respective topological descriptors.

An appropriate set A of n elements is derived from a molecular graph G

depending upon certain structural characteristics. On the basis of an equivalence

relation defined on A, the set A is partitioned into disjoint subsets Ai of order ni (i=1, 2,

3, ……h; ini = h). A probability distribution is then assigned to the set of equivalent

classes:

(A1, A2, ……., Ah)

(p1, p2, ………ph)

Where pi=ni/n is the probability that a randomly selected element of A will occur in the

ith

set.

The information content of an element of A is defined by Shannon‟s relation

(Shannon, 1948).

21

logh

i ii

IC p p

The logarithm is taken at base 2 for measuring the information content in bits. The total

complexity of the molecule or the information content of the set A is then nIC.


103

Shannon and Weaver (1949) showed that the statistical concept of entropy

can be extended beyond the thermodynamics and applied to the process of

transmitting information. The basic Shannon’s formula to measure entropy of

information in bits can be expressed as:

i

n

i

i nnnnH

1

22 loglog

Where ni is the probability of randomly selecting an element of the ith

class.

One of the major consequences of Shannon‟s theory was the radically new idea of

viewing the structure of any kind as a communication. This study was one of the

founding works in the field of information theory.

Dancoff and Quastler (1953) introduced the first molecular information

descriptor as “information on the kind of atoms in a molecule”. The elemental

composition distribution incorporates subsets of atoms of the same chemical element.

The entropy of this distribution also called, information on chemical composition, Icc-

was proposed as a measure of the compound compositional diversity:

i

i

ihh

CC nnnnI 22 loglog

Where nh is the total number of atoms (hydrogen included) and ni is the number of

atoms of chemical element of type i.

Rashevsky (1955) was the first to calculate the information content of graphs

where “topologically equivalent” vertices were placed in the same equivalence class. In

Rashevsky approach, two vertices u and v of a graph are said to be topologically

equivalent if and only if for each neighboring vertex ui (i=1,2,….,k) of the vertex u,

there is a distinct neighboring vertex vi of the same degree for the vertex v. While

Rashevsky used simple linear graphs with indistinguishable vertices to symbolize

molecular structure, weighted linear graphs or multigraphs are better models for

conjugated or aromatic molecules because they more properly reflect the actual bonding

patterns, i.e. electron distribution.

Trucco (1956) refined the definition of topological equivalence of atoms in

terms of the orbits of the automorphism group of the molecular graph. This type of

molecular information content was later termed as orbit’s information descriptor, Iorb.


104

In the latter case, two vertices are considered equivalent if they belong to the same

orbit of the automorphism group, i.e., if they can interchange preserving the adjacency

of the graph.

Brillouin (1962) defined a complementary quantity from the Shannon entropy

H, called Brillouin redundancy descriptor-R (or redundancy descriptor), to measure

the information redundancy of the system:

N

HR

2log1

The logarithm is taken at basis two for measuring the information contents in bits.

Mowshowitz (1968a) described a rigorous reinterpretation of Shannon’s H

function as information content but not entropy. He pointed out that Shannon’s

function does not measure the average uncertainty per structure of a given ensemble

of all structures having the same number of elements. Rather, it is the information

content of the structure relative to a system of symmetry transformations that leave

the system invariant.

Mowshowitz (1968b) formalized the Shannon’s equation to finite systems

with a symmetry element. He introduced a probability scheme applicable to any

system having N elements partitioned into k classes according to equivalence criterion

α :

Equivalent classes 1,2,………..k

Element partition n1, n2,……nk

Probability distribution p1, p2,……pk

Where pi = ni /n is the probability for a randomly chosen element to belong to class i

having ni elements and

k

i

inn

1

.

Bonchev et al. (1976) introduced their first information descriptor on the basis

of grouping the atoms in a molecule into equivalence classes determined by the point

group of symmetry to which the molecule belongs. This molecular symmetry

descriptor, Isym, complements the orbit‟s information descriptor, in accounting for

specific molecular geometry and conformations. Linear relationships were obtained

between Isym and thermodynamic entropy for several homologous series of organic

compounds.


105

Bonchev and Trinajstic (1977) applied Shannon‟s formula to the summands

before the summation, obtaining thereby information-based TDs patterned after

various other descriptors, but having lower degeneracy. Thus instead of adding all

graphs distances dij to obtain the Wiener descriptor, one sorts first these distances

according to their values into groups of gi sum and having the same i value.

The Wiener descriptor is then

l

i

iigW1

The information theory descriptors having information for adjacency,

incidence, and polynomial coefficients of the adjacency matrix and for distance of

molecular graph were defined by the following equations:

)222(log)22()1(2log)1(2log 2

2

2

2

2

2 NNNNNNNNIadj

)2)(1(log)2)(1()1(2log)1(2)1(log)1( 222 NNNNNNNNNNI inc

2/

0

22 ,log,logN

k

pc kGpkGpZZI

m

i

iiD kkNNNNI1

22

2

2

2 2log2loglog

Where Z was Hosoya’s descriptor, N was the number of vertices, , p(G,k) is probability

of randomly chosen polynomial coefficient, 2kiis the number of times the distance value

i appear in distance matrix. These invariants are largely defined from a combination of

the parameters used to obtain Wiener’s descriptor, Hosoya descriptor and Randic

connectivity descriptor.

Bonchev et al. (1979) introduced a novel information theory based descriptor

termed as information content descriptor (I) to deal with the problem of

characterizing molecular structures. The relation describes the information of a

system, I, having N elements and expressed as:

j

n

j

NNNI

1

22 loglog

Where n represents the number of different sets of the elements and Nj is the number

of elements in the set j of the elements and summation is done over all sets of

elements.

Basak and co-workers (Basak et al., 1980; Basak and Magnuson, 1983)

developed information-theoretic descriptors that take into account all atoms in the


106

constitutional formula (hydrogens also being included), and consider the information

content provided by various classes of atoms based on their topological neighborhood.

Three main types of informational descriptors developed by Basak et al. (1980; 1983)

are:

Mean Information content (IC) or complexity of a hydrogen-filled graph, with

vertices grouped into equivalence classes having r vertices; the equivalence is

based on the nature of atoms and bonds, in successive neighborhood groups)

SIC (structural information content) and

CIC (complementary information content).

The mean information content IC, also called Shannon’s entropy H (Shannon and

Weaver, 1949) was defined as:

Iir PiPIC 2log

Where n represents the total number of vertices of the graph and ICr is computed

using Shannon‟s relation. It is the most common measure of uncertainty.

The rth

order structural information content SICr, was defined in a normalized form of

the ICr to delete the influence of graph size:

nICSIC rr 2log/ ,

The rth

order complementary information content CICr, measures the deviation of ICr

from its maximum value, which corresponds to the vertex partition into equivalence

classes containing one element each:

rr ICnCIC 2log

The ICr, SICr, and CICr descriptors can be calculated for different orders of

neighborhoods, r (r = 0, 1, 2, ...,ρ), where ρ is the radius of the molecular graph G.

Bertz (1981) proposed a new general information descriptor that incorporates

the information on atomic composition, information on graph connections and

molecular size. The general descriptor of molecular complexity of Bertz is given by

the equation:

IBERTZ = IAC + ICONN + ISIZE


107

Where ICC, ICONN and ISIZE are the information contents related to the chemical/atomic

composition, bond connectivity and the molecular size respectively. Chirality and

stereochemistry can be reflected by the distribution of connections into classes of

orbital equivalency.

Bonchev (1983) proposed information-theoretic topological descriptors. The

information-theoretic descriptors on graph distance Iw

D and Īw

D are computed using

distance matrix and described as follows:

h

h

w

DhhgWWI 22 loglog

h

w

Dh

w

DWWhWhg II //log/ 2

Raychaudhury (1983) introduced the concept of „vertex distance complexity‟

which has been found to have a high discriminating power. This local vertex invariant

calculated on the hydrogen filled molecular graph was expressed as:

i

ijn

i i

ijdi

S

dlb

S

dV

1

iig

ig

S

glb

S

gf

i

1

Where n is the number of vertices, g spans all of the different distances from the

vertex vi, ig f is the number of distances from the vertex vi equal to g,

i is the i

th atom

eccentricity, Si is the distance sum, lb stands for „ log2‟ and the descriptor diV is

expressed in bits.

Raychaudhury et al. (1984) defined three information invariants- degree

complexity (Id), graph vertex complexity (H

V) and graph distance complexity (H

D).

Among these, graph distance complexity was found to the only descriptor to

discriminate well all the studied graphs:

di

n

i ROUV

iD vI

SH

1

Where div is the vertex distance complexity, Si is the i

th vertex distance sum, IROUV is

the Rouvray descriptor

Bertz (1988) devised an descriptor C(m) based on information theory and the

graph invariant m using Shannon‟s formula, and taking into account the complexity of


108

the graph including the presence of heteroatoms (as vertex weights) and multiple

bonds (as edge weights):

i

i

i mmmmmC 22 loglog2

Klopman and Raychaudhury (1988) described an information descriptor-

vertex distance complexity (Vd), for the vertices of a molecular graph and used the

same for qualitative evaluation of mutagenic activity of a series of non-fused ring

aromatic compounds.

King (1989) described two other information descriptors from the descriptors

of neighborhood symmetry by modifying the classical definition of Shannon‟s

entropy. In particular, the modified information content descriptor (or MIC descriptor)

was proposed using the weighted atomic masses as:

)log.( 2

1

ii

n

i

i nnmMIC

Where mi is the atomic mass of all the equivalent atoms in the ith class and ni is the

probabilities of selecting a vertex of class i.

The Z-modified information content descriptor (or ZMIC descriptor) was

analogously defined as:

)log(. 2

1

iii

n

i

i nnZnZMIC

Where Zi is the atomic number and ni the number of atoms in the ith class.

Konstantinova and Paleev (1990) introduced the information distance

descriptor of graph vertices on the basis of the distance matrix as:

p

j

ijij

Did

d

id

diH

1

2)(

log)(

)(

Where

p

j

ijdid

1

)( and pij= dij /d(i) is the probability for an arbitrarily chosen vertex to

be at a distance dij from the vertex i.

Skorobogatov et al. (1991) considered one more information descriptor based

on the distance matrix in structure-activity correlation. The information descriptor H2

was defined as:


109

n

i

iii

W

S

W

SnH

1

222

log2

k

i

i

W

kid

W

kidH

1

122

2log

2 ,

Where ki, i = 1,…,k, is the number of vertices having the distance d(i), 2W is the

Rouvray descriptor, di is the vertex distance degree of the ith

atom, ni is the number of

vertices having equal vertex distance degrees in the ith

class

Balaban and Balaban (1991) defined four new information MDs i.e. U

descriptor, V descriptor, X descriptor and Y descriptor on the basis of local graph

invariants ui, vi, xi and yi respectively:

GE

jiuuC

BGU

5.0

1)(

GE

jivvC

BGV

5.0

1)(

GE

ji xxC

BGX

5.0

1)(

GE

ji yyC

BGY

5.0

1)(

In all these formulas summations are over all edges in the molecular graph. Using

information theory applied to distance degree sequences; these highly degenerate

MDs were obtained (Ivanciuc et al., 1993b).

Sahu and Lee (2004) derived a novel information theoretic topological

descriptor Ik on the basis of chemical signed graph theory or specifically edge signed

graph. The expression for this novel information theoretic topological descriptor, Ik

was defined as:

Ik =mlbm – nlbn – (n – m)lb |n – m|

Where m and n represent the total number of positive (+) signs and total number of

negative (-) signs from edge signed graphs of the corresponding molecular graph and

k is the molecular orbital level.

Raychaudhury and Ghosh (2004) proposed new information-theoretical

measure of similarity, INFSIM, based on Shannon's measure of information content of a

discrete system. They used Shannon‟s measure of information theoretical measure of

redundancy of a system to derive the similarity measure. They also used a topological


110

shape and size descriptor (TSS) and a topo-physical molecular descriptor (TPMD) for

the study. These descriptors have been used to carry out molecular similarity analysis

for quantitative discrimination (active/ inactive) of eleven β-lactams with respect to

anti-bacterial activity of penicillin G. It was concluded that information- theoretical

similarity measure, INFSIM, has been able to produce similarities that appear to help

classify (active /inactive) the studied compounds with significant accuracy.

Sahu and Lee (2008) deduced the novel net-sign identity information

descriptor, Iε from the molecular electronic structure on the basis of chemical graph

theory. It was defined as the summation of the square of the numerical values

obtained for Ik (Sahu and Lee, 2004) for each molecular orbital level, k, ranging from

1 to n:

n

k

kll

1

2

Where k is the molecular orbital level.

The net-sign-identity information descriptor was utilized in QSPR studies of the

saturated and unsaturated hydrocarbons successfully.

Varmuza et al. (2009) proposed new family of topological information

descriptors based on the full neighborhood of all atoms. They considered each atom of

a molecular structure as a subsystem and for each atom the complete neighbourhood

was characterized by an information functional fi, based on the number of atoms in all

spheres around the atom. The properties of all atoms were normalized to a sum of one

(a probability-like measure, pi) from which the information entropy was calculated.

The entropy was scaled by the number of atoms in the structure to give a molecular

descriptor E.

n

i

ii pldpnldaE

1

Where pi is “normalized” probabilities and calculated as:

n

j

iii ffp

1

/


111

For each subsystem the value fi of an invariant is calculated based on the complete

neighborhood. The values of the invariants are normalized to give “probabilities” pi

that are combined to an entropy measure E.

Dehmer et al. (2010) derived entropic measures to calculate the information

content of vertex- and edge labeled graphs and investigated the influence of proposed

MDs on the prediction performance of the underlying graph classification problem.

They demonstrated that the application of entropic measures to molecules

representing graphs is useful to characterize such structures meaningfully and such

methods might be valuable for solving problems within biological network analysis

Dehmer et al. (2012) evaluated the uniqueness of several information-

theoretic measures for graphs based on so-called information functions and compared

the results with other information descriptors and non-information-theoretic measures

such as the well-known Balaban J descriptor. They found that one of the information

measures for graphs using the information functional based on degree–degree

associations outperformed the Balaban J descriptor.

Miscellaneous graph invariants and approaches:

Geary (1954) suggested the contiguity ratio, c, on the basis of the squared

differences between contiguous areas:

22'

1 /2/1 xxxxknc ttt

Where n is the number of areas, xt is the value for area t, x is the mean of all the

values, kt is the number of areas connected to area t and k1 = kt is twice the sum of

all connections.

Wiberg (1968) proposed bond descriptor to measure the multiplicity of bonds

between two atoms. It was defined as the sum of the squares of the bond orders (pjk)

between any one atomic orbital and all other orbitals in a molecule. It is two times the

charge density in that orbital (pij) less the square of the charge density:

22 2 ijij

k

jk ppp

For a unit charge density, the value is 1, whereas it goes to zero for pij = 2 (a non-

bonded pair) or for pij = 0 (an empty orbital). Correspondingly, the sum of the squares


112

of the bond orders to an atom corresponds to the number of covalent bonds formed by

that atom, corrected for the ionic character in each bond (Trindle, 1969).

Gutman and Randic (1977) used the algebraic concept of comparability of

functions to derive a new comparability descriptor. They suggested that the structure

having an identical distribution of valencies should not be discriminated.

Moreau and Broto (1980a; 1980b) derived 2D-autocorrelation descriptors

from the molecular graph weighted by atom physicochemical properties (i.e. the atom

weightings wi). The spatial autocorrelation was then evaluated by considering

separately all the contributions of each different path length (lag) in the molecular

graph, as collected in the topological distance matrix. The total spatial autocorrelation

at lag kATSk was obtained by summing all the products wi.wj of all the pairs of atoms i

and j, for which the topological distance equals the lag as:

dkdkwwATS ijj

N

i

N

j

ik ,....,2,1,0;1

1 1

Where w is any atomic property; N is the number of atoms in a molecule; k is the lag

and dij is the topological distance between atoms i and j, d is the topological diameter,

i.e. the maximum topological distance in the molecule, and δ is a Dirac- delta function

defined as:

δ(k; dij) = 1 if dij = k

0 otherwise

The autocorrelation ATS0 defined for the path of length zero was calculated as:

N

i

iwATS

1

20

i.e. the sum of the squares of the atomic properties. Typical atomic properties that can

be considered are atomic masses, polarisabilities, charges, and electronegativities

(Broto et al., 1984a; 1984b).

Mekenyan and Bonchev (1986) outlined the Optimized Approach based on

Structural Descriptors Set (OASIS) methodology as a generalization of Hansch

approach. A large set of calculable geometrical, topological and quantum-chemical

descriptors were utilized to characterize the molecular structures. This methodology

was reported as a second generation (Q)SAR approach for SAR studies of structurally

related compounds.


113

Ghose and Crippen (1987) defined atomic refractivity values of the

topological environment of each skeleton atom in the molecule as:

iinAMR

Where ni = no. of atoms; αi = atomic refractivity value.

Cramer et al.(1988) introduced a three-dimensional (3D) (Q)SAR technique

termed as comparative molecular field analysis (CoMFA) for structure/activity

correlation studies. This approach involves the alignment of a set of molecules in 3D

space. Once a suitable alignment is obtained, a steric or electrostatic field is

constructed using a probe atom. The resultant field is then correlated with the reported

activity values of the molecules.

Pal et al. (1988, 1989) developed a novel topochemically arrived unique

(TAU) descriptors based on electronic and nuclear properties of the atoms present in the

molecular graph. This scheme describes the molecular graph in terms of sets of the edge

weights (E) and vertex weights (V). Four MDs namely the functionality descriptor (T)

skeletal descriptor (TR), functionality descriptor (F) and branchedness descriptor

(B),were derived from it. These descriptors were calculated from core and mobile

valence electron (VEM) count. QSAR model based on these descriptors were also

developed for inhibition of M. tuberculosis by substituted bromophenols.

Bangov (1990) conceptualized charge-related molecular descriptor (CMI)

defined as:

ijj

i j

i dLLCMI /

Where dij is the inter-atomic distance and Li are local descriptors featuring each one

heavy (non-hydrogen) atom I and can be expressed as follows:

Li = Lo - nH + Qi

Lo is the constant value for each atom for each hybridization state; nH is the number

of the hydrogen atoms, attached to a given heavy atom, and Qi is the corresponding

charge densities.

Kier and Hall (1990) introduced a new set of MDs called the electrotopolgical

state (E- state) descriptors based on graph invariants for each atom in the molecule. The

E-state variable encodes the intrinsic electronic state of the atom as influenced by

electronic environment of all other atoms within the topological framework of the


114

molecule. For simplicity, these descriptors were referred to as the E-state descriptors.

The electrotopolgical state Si of the ith atom in the molecule or E-state descriptor was

defined as:

Si = Ii + ΔIi

2/)( ijjii dIII

Where Ii is the intrinsic state of the ith

atom and ΔIi is the field effect on the ith

atom

calculated as perturbation of the intrinsic state of ith

atom by all other atoms in the

molecule, dij is the topological distance between the ith

and the jth

atoms.

The intrinsic state I is based on the Kier-Hall’s electronegativity and derived

from the ratio of that electronegativity to the number of skeletal ζ bonds for that atom.

/1)/2( 2 vNI

Where the symbol δ and δv are the molecular connectivity δ values:

δ = ζ – h = number of connections in the skeleton

Where ζ is the number of electrons in ζ orbitals; h is the number of hydrogen atoms

bonded to the atom.

Randic (1991a; 1991b) developed orthogonal descriptors in multivariate

regression and observed that the concept of orthogonality applies equally to molecular

properties as to descriptors, to quantum chemical descriptors as well as to ad hoc

combinations of topological descriptors. He also proposed a new structure-explicit

graph matrix P and also developed a novel molecular descriptor P’/P based on it.

1

1 1

)(/'N

i

N

j

ijPPP

The quantity P’/P represents the graphical bond order, πeij of the edge (bond)

eij of G

Bonchev et al. (1992) utilized graph topological extrapolation method for the

modeling of polymer properties (TEMPO) that was based on the graph topological

description of the polymer elementary units by means of the normalized Wiener number

represented as a polynomial of degree 3 with respect to the number of atoms. The

method was applied to the calculation of p-electron energies and energy gaps of various


115

conjugated polymers, as well as to the assessment of the melting point, density,

refractive descriptor, and specific rotation of some industrially produced polymers.

Yao et al. (1993) developed three new topological descriptors Ax1, Ax2 and Ax3

for use in multivariate analysis in structure-property relationship and structure-activity

relationship studies. Good results were obtained by using them to predict the physical

and chemical properties and biological activities of some organic compounds. The

studies also indicated that the three topological descriptors have high structural

selectivity.

Todeschini et al. (1994, 1995) developed novel 3D molecular descriptors,

termed as Weighted Holistic Invariant Molecular (WHIM) descriptors, which represent

different sources of chemical information. WHIM descriptors contain information

about the whole 3D molecular structure in terms of size, shape, symmetry and atom

distribution. These descriptors were calculated from x, y, z-coordinates of a 3D

structure of the molecule, usually from a spatial conformation of minimum energy,

within different weighting schemes in a straightforward manner and represent a very

general approach to describe molecules in a unitary conceptual framework. The

directional WHIM size descriptors were defined as the eigenvalues λ1, λ2, and λ3 of the

weighted covariance matrix of the molecule atomic coordinates; they account for the

molecular size along each principal direction. The weighted covariance matrix is a 3×3

matrix whose elements are the weighted covariance Sjk between jth

and the kth

atomic

coordinates for j,k ϵ {1,2,3}

defined as per following:

n

i

i

avjik

avkij

A

i

i

wjk

w

qqqqw

s

1

1

Where, A is the number of atoms, qij and qik are the jth

and the kth

coordinates of the ith

atom and avq is the corresponding average value; wi the weight of the ith

atom. Six

different weighting schemes were proposed i.e (1) the unweighted case U (2) atomic

masses M (3) the vander Waals volumes V , (4) the Mulliken atomic

electronegativities E (5) the atomic polarizabilities P and (6) the electrotopological

descriptors of Kier and Hall’s . All the weights (1)–(5) were scaled with respect to the


116

carbon atom and their values (original and scaled values) (Todeschini and Gramatica,

1998).

The non-directional WHIM descriptors were also derived directly from the

directional WHIM descriptors. Thus, for non-directional WHIM descriptors, any

information related to the principal axes disappears and the description is related only

to a global holistic view of the molecule. These descriptors were built in such a way

so as to capture variation of molecular properties along with the three principal

directions in the molecule:

321

3

1

11

ATV

p

p

321 T

323121 A

Where T and A are the linear and quadratic contributions to the total molecular size. V

is the complete expansion including also the third order term. λ1, λ2, and λ3 are

eigenvalues of weighted covariance matrix of the molecule atomic coordinates

(Todeschini and Consonni, 2009).

Galvez et al. (1995) demonstrated that by an adequate choice of topological

descriptors it is possible to not only predict different pharmacological activities but

also to design new active compounds, including lead drugs, in several therapeutic

scopes, with a surprising level of efficiency, especially considering the simplicity of

the calculations. They concluded that in spite of its limitations, molecular topology

ought to be considered not just as an excellent tool for molecular and drug design but

as a real alternative approach to the study of chemical bonds, whose theoretical

physicochemical basis is still to be developed.

Hall and Kier (1995) defined atom-type E-state descriptors encoding

topological and electronic information related to particular atom types in the

molecule. These descriptors were calculated by summing the E-state values of all

atoms of the same atom-type in the molecule or, alternatively, as average of the E-

state values. The electrotopological state descriptors have shown considerable

usefulness in the establishment of (Q)SAR/QSPR/QSTR equations. The ability to

focus on individual atoms has provided significant utility in their applicability.


117

Schuur et al. (1996) derived a molecule representation of structures based on

electron diffraction (MoRSE) code from an equation used in electron diffraction

studies that allowed the representation of the 3D structure of a molecule by a fixed

number of values. Various atomic properties were taken into account giving high

flexibility to this representation of a molecule.

ij

ij

j

N

i

i

j

isr

srAAsI

sin)(

2

1

1

s = 0, …., 31.01 Å-1

Values of this function were calculated at 32 evenly distributed values of s in the

range of 0-31.0 Å-1 from the 3D atomic coordinates of a molecule. This 3D-MoRSE

code retained the important structural features such as the mass and the amount of

branching and was able to distinguish between benzene, cyclohexane, and

naphthalene derivatives in a dataset of great structural variety. For the atomic

weighting scheme w, various physico-chemical properties such as atomic mass, partial

atomic charges, and atomic polarisability were considered. These descriptors have

shown wide applicability in (Q)SAR/QSPR studies.

Basak et al. (1997b) used topostructural, topochemical and geometric

parameters to develop hierarchical QSAR approach using for limiting the number of

independent variables in linear regression modeling to avoid the problems of chance

correlations. This new approach was found to be useful in illuminating the

relationships of different types of molecular description information to

physicochemical property.

Hu and Xu (1997) devised a new topological descriptor called as molecular

identification number from an all-paths method. This new topological descriptor

displayed high discriminating power for various kinds of organic compounds such as

alkane trees, complex cyclic or polycyclic graphs, and structures containing

heteroatoms and thus used as a molecular identification number (MID06) for

chemical documentation.

Dejulian-Ortiz et al. (1998) proposed chiral MDs to consider the chirality

within a topological model. They suggested that the chiral information is related to

symmetry, which allows the topological handling of chiral atoms by weighted graphs

and the calculation of new descriptors that give a weight to the corresponding entry in


118

the main diagonal of the topological matrix. These chiral MDs differentiated the

pharmacological activity between pairs of enantiomers.

Ivanciuc et al. (1998b) presented two new approaches for the calculation of

atom and bond parameters for heteroatom-containing molecules. In the first approach,

the atom and bond weights were computed on the basis of relative atomic

electronegativity, using carbon as standard. The weight parameter AWX for atom i,

which utilized the relative electronegativity, was defined as:

AWXi =1-1/Xi

In the second system, the relative covalent radii were used to compute atom and bond

weights, again with the carbon atom as standard. The bond weight parameter BWX

was defined as:

BWXij =1/BXiXj

The two approaches were used to define and compute topological descriptors based

on graph distance.

Borodina et al. (1998) applied a method based on topoelectrical invariants to

estimate the synthetic molecule resemblance to small endogenous bio-regulators. In

this work, each atom was characterized by its electronegativity and equilibrium

charge. The results demonstrated discriminative ability of proposed structure

description and measure of similarity.

Pearlman and Smith (1998; 1999) developed Burden- CAS university of

Texas eigenvalues (BCUT) descriptors on the basis of Burden matrix (Burden, 1989,

1997) which is an adjacency matrix in which the non-diagonal elements are weighted

based on the nature of the connectivity of the atoms involved. The fundamental

modification made by them was to place atomic properties along the diagonal of the

Burden matrix. This leads to a variety of weighted Burden matrices where the weights

include atomic weight, polarizability, electronegativity and hydrogen bonding ability.

The actual descriptors were obtained by performing an eigenvalues decomposition of

the Burden matrix and taking the lowest and highest eigenvalues. It was also shown

that the extreme eigenvalues of the Burden matrix encode global information

regarding the molecule. The holistic nature of these descriptors have led to their

frequent use in studies of chemical diversity, library design and hit selection in high

throughput screens (Stanton, 1999).


119

Hemmer et al. (1999) represented the 3D structure of a molecule by a radial

distribution function (RDF) code. The RDF of an ensemble of N atoms was

interpreted as the probability distribution to find an atom in a spherical volume of

radius r:

2)(1

1

)( ijrrBj

N

i

N

j

i eAAfrg

Where f is a scaling factor and N is the number of atoms. By including characteristic

atomic properties A of the atoms i and j, the RDF code can be used in different tasks

to fit the requirements of the information to be represented. These atomic properties

enable the discrimination of the atoms of a molecule for almost any property that can

be attributed to an atom.

Tuppurainen (1999) described a modification of EVA descriptors termed as

electronic eigenvalues (EEVA), for use in the derivation of predictive (Q)SAR and

QSPR models. In this approach, semi-empirical molecular orbital energies, i.e. the

eigenvalues of the Schrodinger equation, were used instead of the vibrational

frequencies of the molecule. Its performance was tested with respect to the Ah

receptor binding of polychlorinated biphenyls (PCBs), dibenzo-p-dioxins (PCDDs)

and dibenzofurans (PCDFs).

Karmarkar et al. (2000) estimated the proton-ligand formation constants of

salicylhydoroxamic acids and their nuclear substituted derivatives using the

normalized Wiener’s descriptor, referred to as mean square Wiener’s descriptor

(Wms). It was defined as the mean of the square of the elements dij(G) of the off-

diagonal submatrix:

ij

ijms dN

W 2

1

1

They indicated that the normalized Wiener‟s descriptor gives better results than the

Wiener’s descriptor itself.

Estrada (2000a) introduced a graph-spectrum-based invariant which is now

known as Estrada descriptor. This descriptor was defined as:

n

i

ieGEE

1

)(


120

Estrada descriptor gives maximum values for the most folded structures, thus it is

useful in the measure of folding of the molecular structures, especially protein chain.

Estrada descriptor is also an effective method to measure the centrality of complex

networks, extended atomic branching and the carbon-atom skeleton (de la Pena et al.,

2007).

Palyulin et al. (2000) developed a novel approach of QSAR analysis for

organic compounds known as molecular field topology analysis (MFTA). This method

involved the construction of a molecular supergraph (MSG) by topological

superposition of the training set structures and resulted in generation of uniform

descriptor vectors based on the local physicochemical parameters (atom and bond

properties) of the molecules. He concluded that MFTA may provide the prediction

models that are comparable or superior in quality of description and prediction to the

models based on the widely used classical (Q)SAR methods and 3D approaches.

Randic (2001a; 2001b) reported novel shape descriptors based on the number

of paths and the number of walks within a graph for all atoms and then making the

quotients of the number of paths and the number of walks the same length. The new

shape descriptors showed superior discriminating power among isomers as compared

to the kappa shape descriptors. The new descriptors offered regressions of high

quality for diverse physicochemical properties of octanes.

O’Brien and Popelier (2001) described a new molecular similarity method

called quantum topological molecular similarity (QTMS) depending on the topology

of the electron density. The QTMS method directly compares discrete topological

representations of molecules without 3D superposition, using properties evaluated at

the bond critical points (BCP) and is able to suggest a molecular fragment that contains

the active center or the part of the molecule responsible for the QSAR. QTMS was

applied to five carboxylic acid systems at five different levels of calculation. Each

level benefited from the geometry optimization of the lower level since successively

updated geometries were obtained. All levels of calculation provided very good

regression outcomes.

Cao and Yuan (2001) proposed three novel topological descriptors: OEI (odd –

even descriptor), VDI (vertex degree-distance descriptor), and RDI (ring degree-


121

distance descriptor) and then carried out multiple regression analysis with these

descriptors against the boiling points of paraffins and cycloalkanes.

The three descriptors are defined as:

1

1 1

[( 1) ]N N

Dij

i j

OEI S

Where N is the number of vertices in molecular graph. S is the derivative matrix from

distance matrix D, whose elements are the squares of the reciprocal distances (Dij)-2

i.e.

S=[1/Dij2] (when i=j, let 1/Dij

2=0). It means that the interaction between vertex i and j is

proportional to (Dij)-2

.

The interaction of vertex i and j is determined not only by the distance between i

and j, but also by their vertex degrees. So VDI is defined as:

1/

1

( )N

Ni

i

VDI f

Where fi is the elements of vector (1 N)VS obtained by V S

VS= [f1, f2, …..,fN]

Because of the rigidity of the ring, the freedom of vertex in the ring is smaller than that

in the chain. Thus another descriptor RDI was proposed as:

1/

1

( )N

Ni

i

RDI g

Where gi is the elements of vector (1 N)RS obtained by R S

RS = [g1, g2, …..,gN]

Consonni et al. (2002a) developed novel 3D GEometry, Topology, and Atom-

Weights AssemblY (GETAWAY) MDs based on an influence or leverage matrix.

These descriptors encode both the geometrical information given by the influence

molecular matrix and the topological information given by the molecular graph,

weighted by chemical information encoded in selected atomic weightings. Two sets of

MDs were devised: H-GETAWAY (calculated from molecular influence matrix H) and

R-GETAWAY (calculated from the influence/distance matrix R) descriptors. A set of

the H- GETAWAY (HGM, ITH, ISH, HIC) and R- GETAWAY (RARS, RCON and REIG)

was derived by applying some traditional matrix operators and concepts of


122

information theory both to the molecular influence matrix H and to the

influence/distance matrix R.

The geometric mean on the leverage magnitude (HGM) was defined as:

NN

i

iiGM hH

/1

1

100

Where N represents the number of atoms and the factor 100 scales the descriptor

values between 0 and 100.The diagonal elements hii of the molecular influence

matrix, called leverages.

The total information content on the leverage equality (ITH) and standardized

information content on the leverage equality (ISH) were defined as:

G

g

ggoTH nnAAI

1

220 log.log.

oo

THTH

AA

II

2log

Where Ngis the number of atoms with the same leverage value and G is the number of

equivalence classes into which the atoms are partitioned according to the leverage

equality. A0 represents the number of non-hydrogen atoms in the molecule.

The mean information content on the leverage magnitude (HIC) was defined

as:

D

h

D

hH ii

N

i

iiIC

H

1

2log

Where D is the matrix rank (i.e. the sum of all leverages) and NH is the total number

of atoms including hydrogens.

The average row sum of the influence/distance matrix (RARS) and R-

connectivity descriptor (RCON) were derived from R matrix as:

N

i

iRSNIRARS

1

/

Where N is the number of atoms in the molecule and RSi is the ith

row sum.

B

bbji RSRSRCON

1

5.0


123

Where the sum runs over all bonds in the molecule and RSi and RSj indicate the row

sums of two adjacent vertices.

The third R-GETAWAY descriptor was defined in analogy with the Lovasz-Pelikan

descriptor (Lovasz and Pelikan, 1973). The R-matrix leading eigenvalue (REIG), an

descriptor of molecular branching, was calculated as the first eigenvalue of the

influence/distance matrix. RARS and REIG descriptors are closely related; their values

decrease as the molecular size increases and seem to be a little more sensitive to

molecular branching than to cyclicity and conformational changes (Todeschini and

Consonni, 2009).

In analogy with Moreau-Broto autocorrelation descriptors, ATS (Moreau and Broto,

1980a; 1980b) the GETAWAY autocorrelation descriptors were defined, weighting

each atom of the molecule by using physicochemical weights combined with the

elements of H or R matrix, thus also accounting for the 3D features of the molecules.

HATS descriptors were defined as:

2

1

0 )(

A

i

iji hwwHATS

);()()()(1

1

A

iij

ijjjjiiik dkhwhwwHATS

k=1,2,…..,d

Where hij and hjj are the diagonal entries corresponding to the atoms i and j in the

molecular influence matrix, dij is the topological distance between atoms i and j and d

is the topological diameter. The function δ (k; dij) is Kronecker delta function.

To consider also the off-diagonal elements which provide information on the degree

of interaction between atom pairs these H descriptors were derived:

A

i

iii whwH

1

20 )(

);;()(

1

ij

A

iij

ijjiijk hdkwwhwH

k=1,2,…..,d

Where dij is the topological distance between atoms i and j and d is the topological

diameter. The function δ (k; dij; hij) is a direct delta function. The terms H1, H2,…….

Hd, represents autocorrelation quantities of each different path length i.e lag 1,


124

2…….,d weighted by molecular influence matrix. The weights used for the

GETAWAY descriptors were those proposed for the calculation of the WHIM

descriptors (Todeschini et al., 1997).

Consonni et al. (2002b) utilized GETAWAY and WHIM descriptors in

(Q)SAR/QSPR studies and suggested that the joint use of GETAWAY and WHIM

descriptors may provide more predictive models especially when the property to be

modelled depends strictly on the 3D features of the molecule.

Golbraikh et al. (2002) introduced several series of novel ZE-isomerism

descriptors for description of cis- and trans-isomers. They introduced a quantity

named ZE-isomerism correction for the vertex degrees of atoms connected by double

bonds in Z- or E-configuration following the general approach introduced earlier for

the chirality descriptors (Golbraikh et al., 2001). They included modified molecular

connectivity descriptors, Zagreb group descriptors, extended connectivity, overall

connectivity, and topological charge descriptors.

Junkes et al. (2003) proposed a new semi-empirical topological descriptor

denoted as IET, for the prediction of retention descriptors for a diverse set of organic

compounds i.e., alkanes, alkenes, esters, ketones, aldehydes, and alcohols. The

descriptor was based on the hypothesis that the chromatographic retention is due to the

interaction of each atom of the molecule with the stationary phase and consequently the

value of the descriptor is reduced by steric effects from its neighbors. It can be

calculated as:

i

iiET CI

1~

log

j

ji C

Where Ci is the value attributed to each carbon atom i and to the functional group in

the molecule and δi is the sum of the logarithm of the value of each adjacent carbon

atom (C1, C2, C3 and C4) and/or the logarithm of the value of the functional group.

Hu et al. (2003) devised a new variable descriptor; external factor variable

connectivity descriptor (EFVCI), in which the atomic attribute is divided into two

parts: the innate part and the external part or perturbation term. The innate part was

defined in terms of the number of valence electrons, while the perturbation term by


125

reciprocal square distances and a variable parameter x. The local vertex invariant

relative to the ith atom was calculated as:

xDVSZ ivii )( 2

Where Zv is the number of valence electrons and VSi is the i

th row sum of reciprocal

square distance matrix D-2

. Then, the EFVCIs were calculated by using the variable

local vertex invariants γi in place of the classic vertex degree δi in the formula of the

Kier–Hall connectivity descriptors (Kier and Hall, 1986b).

Ma et al. (2003) defined a new TD i.e., edge structure descriptor (ESI), for

evaluating the ground-state properties of one–dimensional macro– to suprabenzenoid

hydrocarbons.

ESI = (np – c x nb) / (np + na),

Where np, na, and nb are the number of phenanthrene–edge structures, acene–edge

structures and benzo[c]phenanthrene–edge structures, respectively, and c is an

empirical parameter. The ESI was shown to be effective in providing good

correlations with the ground state properties. For one dimensional macro to

suprabenzenoid hydrocarbons, ESI showed better correlations than the connectivity

descriptor.

Toropova and Toropova (2004) proposed an empirical descriptor termed as

hydrogen bond descriptor (HBI) for chloro–fluoro hydrocarbons (CFC) as:

HBI= 5000 + NH – (NCl +NF)

Where NH, NCl, and NF are the number hydrogens, chlorine, and fluorine atoms,

respectively; the offset 5000 was added to numerically distinguish this descriptor

from other descriptors.

Hert et al. (2004) compared a range of different types of 2D fingerprints when

used for similarity based virtual screening with multiple reference structures. They

demonstrated the effectiveness of fingerprints that encode circular substructure

descriptors generated using the Morgan algorithm. These fingerprints were found to

be more effective than fingerprints based on a fragment dictionary, on hashing and on

topological pharmacophores. The combination of these fingerprints with data fusion

based on similarity scores was proposed to be an effective approach to virtual

screening in lead-discovery programmes.


126

Ivanciuc (2004) presented a new application of topological descriptors in

computing similarity matrices that were subsequently used to develop QSPR/QSAR

models. The similarity matrices were computed using four similarity descriptors,

namely the Cosine, Dice, Richards, and Good similarity descriptors. The similarity

matrices were used to develop multilinear regression QSAR models of the

anticonvulsant activity of 30 phenylacetanilides. The results showed that similarity

matrices derived from molecular graph descriptors could provide the basis for the

investigation of [(Q)SAR/SPR] relationships.

Garcia et al. (2005) developed an algorithm for the generation of molecular

graphs with a given value of the Wiener descriptor. The selection of parameters as the

interval of values for the Wiener descriptor, the diversity and occurrence of atoms and

bonds, the size and number of cycles, and the presence of structural patterns guide the

processing of the heuristics generating molecular graphs with a considerable saving in

computational cost. The modularity in the design of the algorithm allows it to be used

as a pattern for the development of other algorithms based on different topological

invariants, which allow for its use in areas of interest.

Cuadrado et al. (2006) corrected invariant-based similarity measurements

using non-isomorphic fragment (NIF) dissimilarities with the aim of achieving more

realistic similarity values. Thus, NIF information was used for correcting invariant

based similarities (approximate similarity) because external NIF substructures have

key influence on activity values. The new method for computing approximate

similarities was expressed as:

BA

BABABA

TDTD

NIFTDNIFTDabsSAS 11

,,

Where TD(NIFA) and TD(NIFB)account for the NIF fragments of molecules A and B.

The new similarity measurements methods can be used for the development of fast,

cheap, and simple (Q)SAR/QSPR models.

Xu et al. (2006) devised three extended TDs for characterizing chiral

molecules as:

eAm1 = λmax1/ 2 - Am1




127

Where λmax1, λmax2, λmax3 are the largest eigenvalues of matrices Z1, Z2, and Z3. The

applicability of the modified TDs was demonstrated through (Q)SAR studies on D2

for dopamine receptor and α receptor activities of fourteen N-alkylated 3-(3-

hydroxyphenyl)-piperidines.

Gutierrez-Oliva et al. (2006) analyze the application of the core-valence

bifurcation (CVB) descriptor and bond order descriptor by considering a series of

doubly hydrogen-bonded complexes. Their values are seen to be linearly related to

bond energies estimated through a bond-energy-bond-order relationship; also, the mean

value of the topological descriptor appears to be related to the complexation energy

computed by methods based on density functional theory.

Cheng and Yuan (2006) developed two novel structural descriptors namely

lone-pair electrons descriptor (LEI) and molecular volume descriptor (MVI) to

quantify the molecular electrostatic and steric effects, respectively.

hetn

i

n

j ij

i

d

LELEI

1 12

5.0

Where n is the number of vertices in hydrogen-suppressed molecular graph, nhet is the

number of heteroatoms and dijis the topological distance between vertex i and j.

)( bi

nnnLE

Where n is the principal quantum number, n and nb are the numbers of valence

electrons and bonding electrons, respectively, and x is the Pauling electronegativity.

Molecular volume descriptor (MVI) was defines as:

tn

i

n

ij ij

ji

d

VVMVI

1 12

5.0

Where Viand Vj are vander Waal‟s volumes of groups i and j, respectively. Group i is

composed of vertex i and the adjacent hydrogen atoms. The utility of these descriptors

was also evaluated through QSPR modeling of diverse physicochemical properties of

four data sets.

Zhou et al. (2006) reported a novel molecular structural expression method

named three-dimensional vector of atomic interaction field (3D-VAIF) based on

electrostatic and steric interactions between different types of atoms.


128

Dureja and Madan (2006) utilized normalized Wiener’s topochemical

descriptor and normalized eccentric connectivity topochemical descriptor along with

molecular connectivity topochemical descriptor, Wiener’s topochemical descriptor

and eccentric connectivity topochemical descriptor for prediction of permeability

through blood brain barrier of diverse series of compounds using simpler approach.

They concluded that the high predictability of the proposed models derived from the

topochemical descriptors offer a vast potential for providing compounds for the

development of potent therapeutic agents with high permeability through blood brain

barrier.

Estrada and Matamala (2007) proposed the use of the generalized

topological descriptors (GTDs), which account for several of the classical TDs in one

single graph invariant. GTDs represent points in a six-dimensional space of

topological parameters, which can be optimized for describing a specific property.

The situation shows some resemblance with the geometry optimization procedures

used to minimize molecular energy. The family of GTI-simplex descriptors comprised

of autocorrelation descriptors was defined as:

kD

k

ok pxCGTI 1

0 ,

Where the summation goes over the different topological distances in the graph, D

being the topological diameter, and accounts for the contributions ηk of pairs of

vertices located at the same topological distance k.

Using this approach, it was observed that GTI have improved QSPRs by

reducing the standard deviation by almost 50%. In addition, the current approach

permits the illustration of the similarities and differences among the different

descriptors studied, indicating possible directions for searching new optimal

molecular descriptors.

Hosoya (2007) stressed that mathematical importance of the topological

descriptor, ZG, or the so-called Hosoya descriptor. He proposed a conjecture that for a

given pair of positive integers (n1<n2) which are prime with each other there exists a

series of Z-trees {Gm} of the property, Z(Gm) = Z(Gm-1) + Z(Gm-2) (m3), with Z(G1)

= n1 and Z(G2) = n2. He suggested that the role of Z-descriptor is not limited to


129

elementary mathematics but also will be found in sophisticated algebraic number

theory and graph theory.

Peltason and Bajorath (2007) conceptualized structure-activity relationship

descriptor (SARI) for evaluating presence of activity cliffs and was defined as a

function of two separately calculated scores that assess intra class diversity and

activity differences of similar compounds:

SARI= 0.5 [Scorecount + (1- Scoredisc )]

Where Scorecount and Scoredisc are the continuity and discontinuity score, respectively.

The continuity score measures potency-weighted structural diversity within a class of

active compounds. High continuity scores reflect the presence of structurally diverse

molecules having comparable potency, which is a major characteristic of continuous

SARs. The discontinuity score determines average potency differences for pairs of

similar ligands, which reveals the presence of activity cliffs as a major determinant of

discontinuous SARs.

Veljkovic et al. (2007) described a very simple and efficient criterion based

on the electron-ion interaction potential (EIIP) and the average quasi valence number

(AQVN) to discriminate active from inactive flavonoids and selection and

optimization of lead compounds with anti-HIV-1 activity. In comparison with other

more complex approaches for in silico selection of flavonoids, EIIP/AQVN approach

showed a good correlation with anti-HIV-1 activity.

Bonacich (2007) described eigenvector centrality-x in two equivalent ways, as

a matrix equation and as a sum. The centrality of a vertex is proportional to the sum of

the centralities of the vertices to which it is connected. It was defined as the ith

component of the eigenvector associated to the largest eigenvalue of A:

Ax = λx,

n

j

jiji nixax

1

, ,.....,1

A second measure of centrality, beta-centrality or c(β), was also defined as a

weighted sum of paths connecting other vertices to each position, where longer paths

are weighted less.

1

1 1

k

kk Ac

Where |β|<1/λand 1 is a vector of ones.


130

The advantages of eigenvectors and beta-centrality over conventional graph theoretic

measures like degree, betweenness, and closeness centralities of centrality were also

discussed in this study.

Guha and Ven Drie (2008) proposed structure activity landscape (SAL)

descriptor to identify “structure-activity cliffs” i.e. pairs of molecules which are most

similar but have the largest change in activity. The SAL descriptor a pair of

compounds was defined as:

ij

iji

ijSim

AASALI

1

Where Ai and Aj are the biological activities ith

and jth

molecules and Simij is similarity

coefficient between the two molecules. The robustness of this method was also

demonstrated using a variety of computational control experiments.

Tong et al. (2008) derived a novel descriptor, vector of principal component

scores (VSW) for weighted holistic invariant molecular descriptor, from the principal

component analysis of a matrix of 99 weighted holistic invariant molecular

descriptors of amino acids. The satisfactory results were obtained by utilizing VSW

descriptors in (Q)SAR studies for three kinds of classical peptide analogues. Their

study indicated that the novel VSW descriptors were suitable for not only small-

molecule drugs, but also for structural characterization of polypeptide sequences.

They concluded that the VSW descriptors have a great prospect in (Q)SAR studies for

polypeptide and its analogues.

Chekmarev et al. (2008) extended the application of the shape signatures

methodology to the domain of computational models for cardiotoxicity. They applied

Shape Signatures method to generate MDs for use in classification techniques such as

k-nearest neighbors (k-NN), support vector machines (SVM), and Kohonen self-

organizing maps (SOM). They concluded that the shape signatures method offers a

novel practical approach to classifying compounds with respect to their potential for

cardiotoxicity.

Guha and Ven Drie (2008) proposed structure activity landscape (SAL)

descriptor to identify “structure-activity cliffs” i.e. pairs of molecules which are most

similar but have the largest change in activity. The SAL descriptor a pair of

compounds was defined as:


131

ij

iji

ijSim

AASALI

1

Where Ai and Aj are the biological activities ith

and jth

molecules and Simij is similarity

coefficient between the two molecules. The robustness of this method was also

demonstrated using a variety of computational control experiments.

Burden et al. (2009) described charge fingerprints as a new type of universal

descriptors for building good (Q)SAR/QSPR models of a diverse range of

physicochemical and biological properties. The atomistic and charge fingerprint

descriptors were found more successful than the eigenvalue descriptors in building

(Q)SAR models on their own. They have suggested that universal descriptors will be

useful for modeling large data sets as well as for screening large virtual libraries.

Vukicevic (2011) presented a new measure for checking fitting ability of the

model i.e chor coefficient (rc) and compared it to Pearson correlation coefficient, r.

The chor coefficient can be calculated as:

,1min1

2rrc

,

0

3

0

2

0

1 ,,,1minDX

DX

DX

DX

DX

DX

Where r2

is squared value of correlation coefficient.

Besides illustrating its advantages, he showed that it is strongly connected

with Pearson correlation coefficient and that all algorithms for optimization of r can

be applied to optimize rc with minimal programming interventions.

Verma and Hansch (2011) reviewed the various application/use of 13

C-NMR

chemical shift as (Q)SAR/QSPR descriptor. Their detailed investigation indicated that

the 13

C-NMR chemical shifts are sufficiently rich in chemical information and are

able to encode the structural features of the molecules contributing significantly to

their biological activity, chemical reactivity, or physical characteristics. They

proposed 13

C-NMR chemical shifts as promising descriptor in classical

(Q)SAR/QSPR modelling studies.

Liu et al. (2011) derived a novel class of MDs termed as “Power keys” by

exhaustively enumerating, canonicalizing, and uniquely encoding all possible

subgraphs up to a certain length. In this work, they have demonstrated the utility of

“Power keys” in substructure searching/screening a chemical database in order to


132

minimize the number of molecules that need to be verified by expensive atom-to-atom

matching.

Hemmateenejad et al. (2011) proposed four different sets of amino acid (AA)

descriptors on basis of QTMS approach for use in the (Q)SAR study of peptides.

These descriptors were successfully utilized for modeling 3 data sets of peptides.

Nie et al. (2012) proposed a novel TD ′EDm by introducing the bond angle into

hidden hydrogen graph of molecules and using the geometric distance instead of the

sum of bond length between two vertices. The ′EDm was derived from ionicity

descriptor matrix Q (a subtype of distance matrix), and branching degree matrix ′G as:

′EDm = ′GSm Q (m= 1,2,3)

The utility of ′EDm was also demonstrated through development of high quality

QSPR models of 44 cis-trans isomers for alkenes. The ′EDm described the molecular

structure more accurately, and realizes unique characterization to cis-trans isomers.

Rabal and Oyarzabal (2012) developed a novel descriptor (LIR1f)

accounting for ligand-receptor interactions to define and visually explore biologically

relevant chemical space. It converts structural information into a one-dimensional

string accounting for the plausible ligand-receptor interactions as well as for

topological information. This descriptor was proposed with an aim to enable the

clustering, profiling, and comparison of libraries of compounds from a chemical

biology and medicinal chemistry perspective. The ligand receptor application of

LIR1f was demonstrated with four reported compound data sets associated with four

different target families.

Matrices associated to a molecular graph

The graph-theoretical approach to quantitative structure- activity/ property

relationships [(Q)SAR/ SPR] is based on a well-defined mathematical representation of

the chemical structure. In Chemical Graph Theory, molecular structures are normally

represented as hydrogen-suppressed graphs, whose vertices and edges act as atoms and

covalent bonds, respectively therefore molecular graph is a non-numerical

representation of the chemical structure. In order to obtain a quantitative

characterization of the molecular structure, to compare molecular structures, and to

compute various structural and topological descriptors, one has to use graphs

represented as matrices


133

The calculation of the descriptors begins with the reduction of the molecule to the

hydrogen-suppressed skeleton or graph and reduction of this graph to several different

matrices depending upon what kind of entries are chosen for the atoms and bonds.

Thus a variety of matrices have been proposed in the literature (Engel, 2012). The

most commonly used matrices are as follows:

The Adjacency matrix

First identification of an organic molecule with a graph and its representation

by an adjacency matrix was made by Sylvester (1878). The adjacency matrix A is one

of the fundamental graph theoretical matrices; which represents the whole set of

connections between adjacent pairs of atoms (Trinajstic, 1983). The entries in the

adjacency matrix are symbolized as Aij and are equal to either one or zero, depending

respectively on whether or not the vertices are connected. The entries Aij of the matrix

equal 1 if vertices vi and vj are adjacent (i.e., the atoms i and j are bonded) and zero

otherwise. Thus, matrix representation is a Boolean matrix with bits (0 or 1).

The adjacency matrix A = A (G) of a graph G with N vertices is the square N x

N symmetric matrix whose entry in the jth

column is defined as:

[A]ij = 1 if i j and eijE (G)

= 0 if i = j and eijE (G)

Where E(G) is the set of the edges in a connected graph (G), eij is the edge formed by

atoms i and j. The diagonal elements are zero.

The ith

row sum of the adjacency matrix is called, vertex degree, δi and defined as:

A

j

iji a

1

Where aij are the elements of adjacency matrix (Todeschini and Consonni, 2009).

The Edge-adjacency Matrix

The edge-adjacency matrix of a graph G, denoted by EA (also called Bond

matrix) encodes information about the connectivity between graph edges:

[EA]ij = [Eij] = 1 if (i,j) are adjacent bonds

= 0 otherwise

The entries Eij of the matrix are equal to one if edges ei and ej are adjacent (the two

edges thus forming a path of length two) and zero otherwise (Gutman and Estrada,

1996).


134

The augmented adjacency matrix

To account for heteroatoms and multiple bonds in the molecule, the

augmented adjacency matrix, aA was proposed by Randic (1991b) replacing the zero

diagonal entries of the adjacency matrix of the simple graph with values

characterizing different atoms in the molecule. Molecules containing heteroatom (s)

and/or multiple bonds are represented by vertex- and edge-weighted graphs. The

adjacency matrix of a vertex- and edge-weighted molecular graph is defined by:

[aA(w)]ij = Ewij if (eij) E (G)

= Vwi if i = j

= 0 if (eij) E (G)

Where Vwi is the parameter of the vertex vi , and Ewij is the parameter of the edge eij

and the diagonal elements wi usually are some atomic physico-chemical properties

(Randic and Dobrowolski, 1998).

The additive adjacency matrix

The additive adjacency matrix (A) is obtained by modifying adjacency

matrix. When sum of the non-zero row elements in the adjacency matrix represents

the degree of corresponding vertex (of the vertices adjacent to vertex i) of a molecular

graph G, the matrix obtained is termed as additive adjacency matrix (Gupta et al.,

2001).

The augmentative adjacency matrix

The augmentative adjacency matrix (A)is obtained by modifying adjacency

matrix. When product of the non-zero row elements in the adjacency matrix

represents the degree of corresponding vertex (of the vertices adjacent to vertex i) of a

molecular graph G, the matrix may be defined as augmentative adjacency matrix

(Dureja and Madan, 2007).

The extended adjacency matrix

The extended adjacency matrices, denoted by Ex

A, are weighted adjacency

matrices N x N whose elements are defined as a function of local vertex invariants of

the adjacency matrix A and of some atomic properties (Yang et al., 1994) and can be

defined as:

[Ex

A]ij = aij •(δi* δj+ δj* δi)/2if i j

= 0 if i = j


135

Where aij are the entries of the adjacency matrix and δ is the vertex degree (Yang et

al., 1994).

The additive chemical adjacency matrix

The additive topochemical adjacency matrix (Ac)is obtained by modifying

adjacency matrix. When sum of the non-zero row elements in the adjacency matrix

represents the chemical degree of corresponding vertex (of the vertices adjacent to

vertex i)of a molecular graph G, the matrix may be defined as additive topochemical

adjacency matrix. The chemical degree of a vertex was obtained from the adjacency

matrix by substituting, row elements corresponding to heteroatom(s), with atomic

weight with respect to carbon atom (Gupta et al., 2003).

The augmentative chemical adjacency matrix

The augmentative chemical adjacency matrix (Ac) is obtained by modifying

augmentative adjacency matrix. When product of the non-zero row elements in the

adjacency matrix represents the chemical degree of corresponding vertex (of the

vertices adjacent to vertex i) of a molecular graph G, the matrix may be defined as

augmentative chemical adjacency matrix. The chemical degree of a vertex was

calculated from the adjacency matrix by substituting the row elements corresponding

to heteroatom(s), with atomic weight with respect to carbon atom (Dureja and Madan,

2007).

The Laplacian Matrix

The Laplacian Matrix of G, L = L (G), is a square N x N symmetric matrix, „N‟

being the number of vertices in the molecular graph, obtained as the difference

between the diagonal matrix of vertex degree, V(G) and the adjacency matrix A(G). It

is defined by the following equation:

L(G) = DEG(G) A(G)

The elements of the Laplacian Matrix are:

[L]ij = δi if i = j

= -1 if eijE (G)

= 0 if eijE (G)

Where δi is the vertex degree of atom i.

The Laplacian Matrix is also called the Kirchhoff matrix due to its role in the

spanning tree theorem of Kirchhoff. The Laplacian Matrix offers a new method for


136

computing the Wiener descriptor of trees, and represents the source of new graph

invariants and topological descriptors. (Mohar, 1989; Trinajstic et al., 1994).

The Matrix

The Matrix is derived from the adjacency matrix by assigning the value

(δi*δj)-1/2

to the matrix element corresponding to the edge eij between vertices vi and vj.

[]ij = (δi*δj)-1/2

if eijE (G)

= 0 otherwise

Where δ being vertex degree of the atoms (Randic, 1992).

The Burden Matrix

The Burden matrix is another interesting weighted adjacency matrix from which

Burden eigenvalues are computed and used in (Q)SAR/QSPR/QSTR modeling. This

is defined as:

[B]ij = *πij x10-1

if eijE (G)

= Zi if i = j

= 0.001 if eijE (G)

Where Zi is the atomic number of atoms. The off diagonal elements (Bij) are chosen as

positive real numbers that depend on whether two atoms are neighbors and, if so, on

the type of bond between them or alternatively, the off-diagonal elements (Bij)

represents two bonded atoms i and j are equal to function of the conventional bond

order-*π, i.e. 0.1, 0.2, 0.3 and 0.15 for a single, double, triple and aromatic bond,

respectively; and the rest matrix elements are set at 0.001 (Burden, 1989; 1997)

The Zagreb matrix

Zagreb matrices are a generalization of the matrix in terms of a variable

exponent λ as:

[ZMe (λ)]ij = (δi* δj)λ

if eijE (G)

= 0 otherwise

For λ = -1/2, the Zagreb matrix obviously reduces to the edge- matrix.

The Zagreb matrix, can also be considered as the vertex- and edge-weighted matrices

related to the vertex- and edge-connectivity matrices. They can be formulated in terms

of the vertex- or edge-degrees (Janezic et al., 2007).

The vertex-Zagreb matrix, ZMv , was defined as (Janezic et al., 2007)

[ZMv]ij = (δi2)if i = j


137

= 0 if i j

Similarly, the modified vertex-Zagreb matrix, m

ZMv , was defined as (Janezic et al.,

2007),

[m

ZMv]ij = (1/δi2)if i = j

= 0 if i j

The sum of the diagonal elements of the vertex Zagreb matrix results into the first

Zagreb descriptor (M1) (Gutman and Trinajstic, 1972), while the sum of the diagonal

elements of the modified vertex Zagreb matrix results into the modified first Zagreb

descriptor.

The edge-Zagreb matrix, ZMe , was defined for λ = 1 as (Janezic et al., 2007):

[ZMe (λ)]ij = (δi* δj)λ

if eijE (G)

= 0 otherwise

Modified edge-Zagreb matrix, denoted by m

ZMe, was defined for λ = -1 as (Janezic et

al., 2007):

[m

ZMe (λ)]ij = (1/δi* δj)λ

if eijE (G)

= 0 otherwise

The half sum of the off-diagonal elements of the edge-Zagreb matrix is the

secondZagreb descriptor (M2) (Gutman and Trinajstic, 1972), whereas the half sum

of the off-diagonal elements of the modified edge-Zagreb matrix is the modified

second Zagreb descriptor(Janezic et al., 2007).

The Distance Matrix

The distance matrix D(G), introduced by Harary (1969) is based on the very

old concept of the topological distance between vertices in a graph, which is measured

by the number of edges separating a pair of vertices. The Distance Matrix, D = D(G),

of a connected graph G is a real symmetric matrix whose elements [D]i j are defined as

[D]ij = dij if i j

= 0 if i = j

The distance matrix is the source of a large number of graph invariants and

topological descriptors, and its computation can be performed with various

algorithms.


138

The distance sum of the vertex vi, DSi, is defined as the sum of the topological

distances between vertex vi and every vertex in the molecular graph. i.e., the sum of

over row i in the D matrix:

N

ij

iji DDS

The distance sum was used to define the topological descriptor J, while the sum of the

graph distances between all the pairs of vertices defines the Wiener descriptor, W.

The reciprocal distance matrix

In a graph descriptor or topological descriptor computed on the basis of graph

distances the highest contribution to the numerical value of the descriptor is made by

the large distances between the vertices of the molecular graph. A new graph metric,

the reciprocal distance, was introduced in order to define graph descriptors in which

the contribution of the distance between two vertices decreases with the increase of

the distance. The reciprocal distance matrix of a graph G with N vertices, RD =

RD(G), is the square N N symmetric matrix whose entries [RD]ij are equal to the

reciprocal of the distances between vertices vi and vj, i.e., 1/di j, for non-diagonal

elements, and is equal to zero for the diagonal elements:

[RD]ij = 0 if i = j

= 1/dij if i j

Besides calculation of Wiener number analogue, called the Harary descriptor (Plavis

et al., 1993), the D-1

or Harary matrix was successfully used to generate new structural

descriptors and in the computer generation of acyclic graphs based on the local vertex

invariants and TDs. For vertex- and edge-weighted molecular graphs, the reciprocal

distance matrix was defined as:

[D-1

(w)]ij = [d(w)ii] if i = j

= 1/[d(w)ij] if i j

Where d(w) is a weighted distance matrix and w denotes a weighting scheme

(Ivanciuc, 2000).


139

The resistance distance matrix (Ω)

The resistance distance matrix (Ω), proposed by Klein and Randic (1993) is

based on electrical network theory and considers that a single bond between two

carbon atoms from the molecular graph corresponds to a 1 resistor. The resistance

distance between a given pair of vertices vi and vj is defined as the effective electrical

resistance between the vertices.

The elements of resistance distance matrix i.e. Ωij are defined as:

[Ω]ij = 0 if i = j

= Ωij if i j

For acyclic graphs, resistance distances are equal to topological distances but

for cyclic graphs resistance distances may be smaller than, or equal to topological

distances. The resistance distance was used to generate rules to characterize molecular

cyclicity and centricity. (Klein and Randic, 1993).

The reciprocal resistance matrix or conductance matrix (σ)

The inverse of the resistance distance is the conductance ζij between two

vertices vi and vj, is calculated as the following (Klein and Ivanciuc, 2001):

[ζ]ij = 1/ Ωij = Σpij [pij]-1

Where the sum runs over all the paths pij connecting the two considered vertices and

pij is the length of the considered path pij (Klein and Ivanciuc, 2001).

The conductance matrix (or electrical conductance matrix) is therefore the reciprocal

resistance matrix, whose elements are the conductance values ζij between two vertices

vi and vj. Other quotient matrices derived from the resistance matrix (Babic et al.

2002) are the distance/resistance quotient matrix, D/Ω (or topological

distance/resistance distance quotient matrix) and resistance/distance quotient matrix,

Ω/D. The two descriptors obtained from these matrices are D/Ω descriptor or Wiener

sum D/Ω descriptor and the Kirchhoff sum descriptor, Ω/D descriptor respectively.

The Detour matrix (Δ)

The detour matrix, together with the distance matrix, was introduced into the

mathematical literature by Frank Harary (1969). The detour matrix was introduced

into the chemical literature under the name the maximum path matrix of a molecular

graph by Ivanciuc and Balaban (1994) and independently by Amic and Trinajstic

(1995). The detour matrix Δ of a graph G (or maximum path matrix) is a square


140

symmetric NxN matrix, N being the number of graph vertices, whose entry i–j is the

length of the longest path from vertex vi to vj. The element of detour matrix (or the

maximum path matrix MP), []ij, is defined as:

[]ij = δij if i j

= 0 if i = j

Where δij is the number of steps in a longest path (i.e. the maximum number of edges)

in G between vertices i and j and is called detour distance. For acyclic graphs, the

detour matrix is identical to the distance matrix, but for cyclic graphs elements in the

detour matrix may be equal to, or larger than, those of the distance matrix (Trinajstic

et al., 1997). The two types of paths, the shortest and the largest ones, can be

combined in one and the same square matrix i.e. detour-distance matrix, -D

(originally called Maximum minimum Path, MmP or topological distance–detour

distance combined matrix), whose entries are defined as:

[-D]ij = []ij (i,j) E (G) if i < j

= [D]ij (i,j) E (G) if i > j

= 0 if i = j

The detour-distance matrix -D defined by Ivanciuc and Balaban (1994) in which the

elements of its upper triangle is identical to that of detour matrix while the lower

triangle elements are identical to those in the distance matrix (Amic and Trinajstic,

1995). Several molecular descriptors (MDs) derived from this matrix, such as the

spectral descriptors and Wiener-type descriptors, are the same as those from the

detour–distance combined matrix.

Reciprocal detour matrix (Δ-1

)

The reciprocal detour matrix (Δ-1

) may be expressed as:

[Δ-1

]ij =Δ-1

ij if i j

= 0 if i = j

All elements equal to zero are left unchanged in the reciprocal matrix. Harary detour

descriptors are derived from the reciprocal detour matrix (Diudea et al., 1998).

Detour-path matrix

The detour-path matrix is denoted as p, is a combinatorial matrix whose off-

diagonal entry i–j is the count of all paths of any length m (1≤ m≤ ij) that are

included within the longest path from vertex vi to vj (ij) (Diudea, 1996a). The


141

diagonal entries are zero. Each entry i–j of the detour-path matrix is calculated from

the detour matrix D as the following:

[Δp]ij = (Δij +1) /2

= (Δij2+ Δij) /2

Detour-delta matrix (ΔΔ)

The detour-delta matrix (ΔΔ) is another combinatorial matrix derived as the difference

between the detour-path matrix (Δp) and the detour matrix (Δ) (Janezic et al., 2007):

ΔΔ = Δp – Δ

The distance/detour quotient matrix (or topological distance/detour distance

quotient matrix), denoted as D/Δ, is also derived from detour and distance matrices

whose off-diagonal entries are the ratio of the lengths of the shortest over the longest

path between any pair of vertices (Randic, 1997b). It is defined as:

[D/Δ]ij =dij/ Δij if i j

= 0 if i = j

Where dij and Δij are the topological and detour distances between vertices vi and vj

respectively.

The detour complement matrix (Δc) for simple graphs is defined as (Janezic et

al., 2007):

[Δc]ij =N- Δij if i j

= 0 if i = j

Where N is the number of atoms.

The Wiener Matrix

The Wiener matrix W is a square symmetric N N matrix proposed by Randic

in 1993, used to define new structural invariants useful in QSPR/(Q)SAR studies.

Each off-diagonal entry of the Wiener matrix corresponds to the number of external

paths in the graph that contains the path pij from vertex vi to vertex vj and is calculated

as the product of the numbers of vertices on each side of the path pij, namely, Ni,p and

Nj,p, including both vertices i and j (Randic et al., 1994a; Ivanciuc and Ivanciuc,

1999). This matrix, which is a dense Wiener matrix, is usually called path-Wiener

matrix denoted as Wp. Wiener matrix entries are:

[We/p]ij = Ni.e/pNj.e/p


142

Where Ni and Nj denote the number of vertices lying on the two sides of the edge/path

e/p having vertices vi and vj as the endpoints. This definition gives the „edge/path

contributions‟ to a global descriptor, which is identical to the Wiener descriptor when

it is defined on edges. The similar equation defined on the graph paths gives a

structural descriptor, which is identical to the hyper-Wiener descriptor. The Wiener

matrices were used as the basis of new topological descriptors.

The reciprocal Wiener matrix

It is denoted by W-1

, is the matrix whose elements are the reciprocal of the

corresponding Wiener matrices elements (Diudea, 1997c). Moreover, the Wiener

difference matrix, WΔ was also proposed as:

WΔ = Wp – We

whose non-diagonal elements are based on path contributions calculated only on paths

larger than 1 (Diudea, 1996a).

The Szeged matrices

The Szeged matrix of a hydrogen depleted molecular graph G is a square

unsymmetrical whose off-diagonal entry i–j is the number of vertices. Ni,p lying closer

to the focused vertex vi. This matrix was defined as (Dobrynin and Gutman, 1994;

Diudea et al., 1997b):

[SZDe/p]ij = Ni,e/p * Nj,e/p

Where [SZe/p]ij are the non-diagonal entries of this matrix.

Since the Szeged matrix was not defined in terms of cyclic structures, Gutman (1994a)

has changed the meaning of Ni and Nj as follows:

Ni,e/p = {vkvk V(G); [D]ik< [D]jk}

Nj,e/p = {vkvk V(G); [D]jk< [D]ik}

Thus, Ni and Nj denote the cardinality of the sets of vertices closer to the two

vertices vi and vj, respectively; vertices equidistant to vi and vj are not counted. The

half sum of entries in the SZe/p matrix gives the Szeged descriptor SZc, and the hyper-

Szeged descriptor SZp, respectively. The difference between SZp and SZc gives the

SZmatrix: SZ=SZp- SZc.. (Gutman, 1994a).


143

The Cluj Matrix

The Cluj matrix, CJu, a square unsymmetrical matrix was defined by Diudea

(1997a; 1997b) following the principle of “single endpoint characterization of a path”

by using either the distance or the detour concept:

[CJu]ij = Ni,(i,j)

Ni,ij = max{vkvk V(G); [D]ik< [D]jk}

; (i,k)∩ (i,j)=max {i};(i,j)= min}

It collects the vertices lying closer to the focused vertex i but out of the shortest path

(i,j) or, in other words, the “external” paths on the side of i, which include the path

(i,j). The above definition is valid both for acyclic and cyclic graphs. It can be used as

a basis for constructing Wiener-type descriptors as well as Schultz-type descriptors

(Diudea et al., 2002).

The Barysz Distance Matrix

The Barysz distance matrix (DZ) is a weighted distance matrix accounting

simultaneously for the presence of heteroatom(s) and multiple bond(s) in the molecule;

it is defined as:

[DZ]ij =

jiifZZ

Zc

jiifZ

Zc

ijd

b bbb

i

1 )2()1(

2

* **

1

1

Where Zc is the atomic number of the carbon atom. Zi is the atomic number of the i th

atom. π* is the conventional bond order, the sum runs over all dij bonds involved in the

shortest path between vertices vi and vj. dij being the topological distance, and the

subscripts b(1) and b(2) represent the two vertices incident to the considered b bond.

The combinatorial matrices

Two matrices have been derived from the classical distance matrix D: the

distance delta matrix, D, and distance-path matrix, Dp, whose elements are calculated

by a combinatorial algorithm:

[D]ij = [D]ij/2

[Dp]ij = {[D]ij +1 }/2


144

The element [D]ij counts the number of „internal‟ paths (larger than unity) included

in the shortest paths between vertices vi and vj; the elements [Dp]ij counts all internal

paths included in the shortest paths between vertices vi and vj in a graph (Brualdi and

Ryser, 1991).

The Hosoya matrix

Randic (1994) introduced the square symmetric Hosoya matrix or Z matrix by

an analogue cutting procedure. The original Hosoya Z matrix was defined only for

acyclic graphs; each off-diagonal element is equal to the Hosoya Z descriptor of the

subgraph G’ obtained from the graph G by erasing all edges along the path connecting

two vertices viand vj. The Z matrix entry [Z]ij corresponding to a pair of vertices vi and

vj of a tree, T, is given by:

[Z]ij = Z (T-pij) if i j

0 if i = j

Where Z (T-pij) is the Z descriptor of the spanning subgraph. T-pij obtained from T by

the removal of all edges along the path pij connecting the vertices vi and vj(Randic,

1994; Milicevic et al., 2003).

A general definition of the Hosoya Z matrix (generalized Hosoya Z matrix) able to

represent both acyclic and cyclic graphs is the following:

[Z]ij = ∑min pij Z (T-pij)/pij if i j

Zij if i = j

Where Z(T-pij) is the Z descriptor of the spanning subgraph (Plavšic et al., 1997).

The Path Matrix

Randic (1991a) defined the elements of Path matrix P as the quotient between

numbers of paths P′ in a subgraph G′ to the number of paths p in simple G. It is square

symmetric N N matrix whose entry in the ith

row and jth

column is defined by the

equation:

[P]ij = p′ ij/p if i j and if (i, j) E(G)

= 0 otherwise

Where p′ij is the total number of paths in the subgraph G′ = G-(i,j) and p is the total

number of paths in G. If G‟ is disjoint then the contributions of each component are

added. The descriptor calculated on this matrix is called the P'/P descriptor (Randic,

1991b).


145

The natural distance matrix

Randic et al. (2010) recently developed this novel distance matrix termed as

natural distance matrix (NDM), which provided a potential scope of developing novel

graph invariants as MDs for structure-property-activity studies. The matrix elements

(i,j) of NDM are given by:

NDM(i, j) = [di +dj – naij]0.5

Where di and djare the degrees (valence) of vertices i and j, and nai,j is the number of

vertices adjacent to both i and j (Randic et al., 2010).

The walk matrix

There are not much nonsymmetrical matrices that are of interest in chemistry.

One of the simplest is the „„random walk‟‟ matrix, the elements of which are

determined by the number of walks needed to move from a point i to the point j. In

general, the number of walks from a point i to the point j is different from the number

of walks from the point j to the point i (Diudea, 1996a). The random walk matrix,

nWm, constructed on the basis of principle of the single endpoint characterization of a

path, is a diagonal matrix whose diagonal elements are the nth-order weighted walk

degree, nWmi, that is, the sum of the weights (the property collected in matrix M) of all

walks of length n starting from the ith vertex to any other vertex in the graph, directly

calculated as:

A

j

ijn

min MW

1

][

Where A is the total number of the vertices in the graph and Mn is the nth power of

the matrix M, which can be any square N N topological matrix (Diudea, 1996a;

Diudea and Randic, 1997).

The pendent matrix

The pendent matrix was proposed to enhance the role of terminal or pendent

vertices in (Q)SAR and QSPR studies. The pendent matrix (Dp) of a graph G is a

submatrix of distance matrix, obtained by retaining the columns corresponding to

pendent vertices (Gupta et al., 1999).

[Dp]ij = dij if i j

= 0 if i = j


146

where dij is the length of the path that contains the least number of edges between

vertex i and vertex j in graph G.

The chemical pendent matrix

The chemical pendent matrix is obtained by modifying pendent matrix. The

chemical distance of a terminal or pendent vertex was obtained from the pendent

matrix by replacing the row elements corresponding to heteroatom(s), with atomic

weight with respect to carbon atom (Gupta et al., 1999; Gupta et al 2002b; Goyal et

al., 2010).

Review of the literature reveals that although large number of topology based

molecular descriptors have been reported in literature but many of them either contain

similar information as others or are information poor. Accordingly, only a small

fraction of these MDs have been successfully employed in structure-activity studies.

As a consequence, there is a strong need to develop novel topology based descriptors

having very high discriminating power, low degeneracy and non-correlation with the

existing topology based descriptors so as to accelerate the process of lead

discovery/optimization in a rapid and cost effective manner. Moreover, the novel

descriptors with these desired features may give better insight into the structure-

activity/property relationship.

Review of Literatureshodhganga.inflibnet.ac.in/bitstream/10603/43098/9/09_chapter 2.pdf ·...

Documents

Transcript of Review of Literatureshodhganga.inflibnet.ac.in/bitstream/10603/43098/9/09_chapter 2.pdf ·...