Maximal tipping angles of nonempty bottles ESSIM 2012, Dresden Group 12 NAMES.
Review of Literatureshodhganga.inflibnet.ac.in/bitstream/10603/43098/9/09_chapter 2.pdf ·...
Transcript of Review of Literatureshodhganga.inflibnet.ac.in/bitstream/10603/43098/9/09_chapter 2.pdf ·...
Review of Literature
38
The cherished goal of medicinal chemists has been to create molecules with
sufficient number of favorable properties to justify huge investments involved in terms
of time and money to bring a drug molecule into the market. (Quantitative) structure
activity / property relationships [(Q)SAR/QSPR] studies play an important role in the
development of cost-effective and efficient drug discovery process. The QSAR/QSPR
studies predict complex physical, chemical, biological, and technological properties of
molecules directly from their structure. The structural formula of an organic compound
encodes within it all the information that predetermines the chemical, biological and
physical properties of that compound (Grover et al, 2000a). However the molecular
structure is not easily accessible to numerical analysis and learning methods due to its
non-quantitative nature (Trinajstic, 1983). This inherent problem of non-quantitative
nature of chemical structures in (Q)SAR/QSPRs can be easily overcome by deriving
parameters (descriptors) from molecules that describe their physicochemical
/biological properties, electronic features, and so on. In recent years, non-empirical
graph theoretic parameters/graph invariants called as topological descriptors (TDs)
are among the most useful descriptors known nowadays.
In chemical graph theory, the molecules are depicted as hydrogen-depleted
graphs with non-hydrogen atoms as vertices and covalent bonds as edges. A
molecular structure can be represented by planar graphs, G = (V, E), where the
nonempty set V represents the set of atoms and the set E generally represent covalent
bonds (Randic, 1991a; 1991b; Dureja and Madan, 2007). Such a graph represents the
molecular topology by depicting the pattern of connectedness of atoms in the
molecule and at the same time independent of the metric aspects of molecular
structure such as bond angles equilibrium, internuclear distance etc. TDs are numbers
associated with constitutional formulas by mathematical operations on the graphs
representing these formulas, which can be used to characterize and order molecules
and predict properties (Basak et al., 1990). They offer a simple way of measuring
shape, size, symmetry, molecular branching, cyclicity, chirality, complexity and
heterogeneity of atomic environments in the molecule (Roy, 2004). When compared
with other classes of structural descriptors, such as geometric quantum or grid (field)
descriptors, topology based MDs have distinct advantage because they can be easily
computed from molecular graph.
Review of Literature
39
The Wiener index (Wiener, 1947a; 1947b) and the Platt index (Platt, 1947)
were first graph theoretical descriptors practically simultaneously introduced in 1947.
After that, there were no significant developments in the use of topological
descriptors for almost two decades. Then, in 1970‟s the concept of topological
descriptors again revived. Randic (1973) stated that “the revival of interest in graph
theory and graph-theoretical approach to chemistry has probably been stimulated by
the impressive developments in the synthetic chemistry and the production of a large
number of structurally complex organic molecules”. But the use of topological
descriptors actually gained momentum in 1990‟s with the availability of computers as
efficient tools for faster computing , and this trend continues till date.
Randic (1991b) proposed some desirable requirements for the chemical codes/
topology based MDs in the view of preventing their hazardous proliferation:
Good correlation with at least one property
Simplicity
Direct structural interpretation
Ability to discriminate the isomers
Generalization for higher analogues
Not dependent on physico-chemical properties
Not trivially related to other MDs
Linearly independent
Based on familiar structural concepts
Show a correct size-dependence
Gradual change with gradual change in structure
The application of topology based MDs to the design and selection of novel
active compounds is probably one of the most active areas of research in the
application of such descriptors to biological problems (Estrada and Uriate, 2001).
There are several advantages and few limitations of topological descriptors
Advantages
The main advantages suggested for TDs are (Gozalbes et al., 2002):
Review of Literature
40
Use of TDs does not require any experimentally derived measurement, and
structure information is easily available in modern databases. Furthermore, the
TDs can be calculated for new and underdevelopment drugs.
Use of TDs to develop QSAR models is possible without a previous
knowledge of the receptor structure or mechanism of action of the drugs.
Any structure is susceptible to be described in terms of topological features.
Therefore, unlike other descriptors, TDs can be calculated for compounds with
very different structures, and theoretical models can be built in databases
without a common parent structure.
TDs are simple descriptors that may be quickly calculated for a large number
of compounds. It is not necessary to previously display the structures on
graphical workstations.
The information provided by the TDs is 2D and not 3D. This could be
regarded as a limitation, but it is also an advantage as the conformational and
alignment problems related with 3-D QSAR (as in CoMFA applications) are
completely avoided, and as a consequence of this, the results are more
reproducible.
Some Limitations of TDs have also been described (Gozalbes et al., 2002),
especially:
Degree of redundancy and degeneracy of certain TDs can be very high. TDs
are degenerate when presenting identical values for two or more different
molecular graphs, while redundancy is the duplicated information contained
by several TDs. However, the 4th
and higher generation descriptors will solve
this problem to some extent.
Criticism about TDs is usually involved with the physicochemical
interpretation of the meaning of TDs. Not all the TDs present this difficulty of
interpretation, and a good example of this are the E-state descriptors, in which
the role of substituents and skeletal atoms is directly related with
electronegativity.
Some reports have been criticized because of the use of large numbers of TDs,
with the inherent danger of chance effects. Obviously, this is a common
statistical problem shared with other descriptors, not exclusive to TDs, and
Review of Literature
41
caution should also be applied in the indiscriminant use of TDs for sets
without a common parent.
In the last few years, a large number of TDs have been reported in the literature.
Various TDs have been found to possess different correlating abilities with molecular
properties/activities of diverse nature. Different TDs and matrices associated with them
have been briefly reviewed in the present chapter.
Balaban et al. (1992) classified TDs according to their nature in first, second
and third generations. The first generation TDs are integer numbers derived from
integer „„local vertex invariants (LOVIs)‟‟ assigned for each vertex, such as vertex
degrees or distance sums and are based on integer graph properties like topological
distances. These descriptors suffer from the drawback that they have high degeneracy,
i.e. the same value for the descriptor for many non-isomorphic graphs. The most
important descriptors of this class are Wiener index, W (Wiener, 1947a; 1947b),
Hosoya index, Z (Hosoya, 1971) and the centric descriptors of Balaban B (Balaban,
1979). The second generation TDs are the real numbers based on integer LOVIs such
as vertex degrees/distance sums or integer graph properties. The Randic´ molecular
connectivity (Randic, 1975), the related Kier-Hall descriptors (Kier and Hall, 1976a;
1976b; 1986), the mean square distance (Balaban, 1983), as well as the Balaban index
J (Balaban, 1982) and Jhet (Balaban, 1986; Ivanciuc et al. 1998b), overall connectivity
descriptor by Bonchev (1999), Harary descriptor, also called Harary number, H
(Plavsic et al., 1993) etc. are most frequently used second generation TDs till now.
The third generation TDs are real numbers based on real-number LOVIs having very
low degeneracy. These descriptors convert a matrix into a system of linear equations
whose solutions are LOVIs, by including a column vector on the main diagonal and
another column vector as the free term. The column vectors may reveal chemical
information e.g., the atomic number of the atom symbolized by the corresponding
vertex), graph-theoretical information e.g., the vertex degree or the distance sum of
the corresponding vertex or simply a constant numerical data e.g. the number of
vertices in the graph, or its square. Information-theoretic indices (Basak et al., 1997a;
Bonchev, 1983; Balaban et al. 1991), the triplet indices (Filip et al. 1987) and hyper-
Wiener descriptor or the molecular identification (ID) numbers (Randic, 1984) are the
examples of third-generation descriptors. The fourth generation TDs have
Review of Literature
42
discriminating power >100 for structures containing only five vertices {with or
without heteroatom(s)}. The examples include augmented eccentric connectivity
descriptors, superaugmented eccentric connectivity descriptors (Dureja and Madan,
2007; Dureja et al., 2008). Topology based MDs suffer from one major limitation of
not considering the presence of heteroatom(s) in a molecule. To overcome this
limitation, the MDs need to be refined as their topochemical counterparts. These
topochemical counterparts of proposed MDs are sensitive to both the presence as well
as the relative position(s) of heteroatom(s) (Bajaj et al, 2005). The topochemical
version of MDs having sensitivity for both the presence and relative position(s) of
heteratom(s) along with the high discriminating power of 100 for all possible
structures containing only five vertices (> 25 in case of pendenticity based TDs) are
treated as fifth generation topology based MDs (Dureja and Madan, 2012). As a
consequence fourth generation MDs may be treated as topostructural MDs whereas
fifth generation as topochemical MDs.
A large number of TDs of diverse nature have been reported in the recent past.
These TDs have been found to possess different correlating abilities with molecular
properties/activities of diverse nature. These are widely employed as simple numerical
descriptors for quantitative comparison of physical, chemical or biological parameters
of molecules in wide range of [(Q)SAR/SPR] studies. Various TDs and matrices
associated with them have been briefly reviewed in the present chapter.
Adjacency based graph invariants
These invariants are based on the consideration that the whole set of
connections between adjacent pairs of atoms may be represented in a matrix form,
termed as adjacency matrix. The simplest number that can be associated with chemical
structure is the graph adjacency, A(G), which is the sum of all entries of the adjacency
matrix of the graph. However, this simplest topological descriptor is extremely
degenerate; it has the same numerical value for all graphs having the same number of
edges. Various attempts have been reported to express the connectivity of atoms in the
molecule by more discriminating graph invariants.
Review of Literature
43
Platt (1947) proposed total edge adjacency descriptor also known as Platt
number or F descriptor and may be defined as the summation of all entries of the edge
adjacency matrix:
2
11
2)( NEGFA
j
ij
A
i
The entries Eij are equal to one if edges ei and ej are adjacent (the two edges
thus forming a path of length two) and zero otherwise. This topological descriptor is
simultaneously a measure of the order and dimension of the molecular graph, that is
of the size of the molecule and the degree of chain branching. For equal size (equal
number of edges) the calculated value of this topological descriptor is higher for the
branched molecular graph.
Gordon and Scantlebury (1964) proposed Gordon and Scantlebury’s
descriptor, also known as connection number (N2) or Bertz branching descriptor (BI) is
the simplest graph-invariant obtained from the edge adjacency matrix which considers
both vertices and edges. This descriptor is based on the topological distance and is
calculated as:
2/)(2
2 ti
i
APN
Where At is the total edge adjacency descriptor and 2P is the second-order path count.
Platt number or F descriptions is twice the Gordon-Scantlebury descriptor (Gordon
and Scantlebury, 1964) or connection number, N2 or Bertz branching descriptor, BI.
Morgan (1965) described the concept of extended connectivity, according to
which graph vertices are ordered on the basis of their extended connectivity values
obtained after a number of iterations of until constant atom ordering is obtained in two
consecutive steps. The extended connectivity (or extended vertex degree), denoted as
ECi, of a vertex is calculated as the iterative summation of connectivities of all first
neighbors as the following:
n
j
g
k
iji
k ECaEC1
1 .
Where aij are the elements of the adjacency matrix, at k = 0, the connectivity of each
atom is simply the vertex degree δ.
Review of Literature
44
Bonacich (1972) devised the eigenvector centrality, ECi of a vertex i, which is
derived from the leading eigenvector of the adjacency matrix (Bonacich, 1972, 2007).
The concept of centrality is related to the ability of a vertex to communicate with
other vertices or to its closeness to many other vertices or to the number of pairs of
vertices that need a specific vertex as intermediary in their communications. ECi is
defined as the ith
component of the eigenvector associated to the highest eigen value
of adjacency matrix:
ECi =ℓi1
A vertex connected to many other vertices will have has high ECi value (Freeman,
1977; 1979).
Gutman and Trinajstic (1972) proposed two novel TDs called first and
second Zagreb indices or Zagreb group parameters based on the vertex degree (δ)
(i.e. equal to the sum of all the entries in the ith
row of the adjacency matrix) of the
atoms in the H-depleted molecular graph.
The first Zagreb group index M1 was defined as the summation of squares of
vertex degrees of a graph (rather than a simple sum), whereas the second Zagreb
group index M2 is the sum of the products of the degrees of pairs of adjacent vertices
of the respective (molecular) Graph:
n
i
iGMM
1
211
),(
22
ji
jiGMM
where δi is the degree (number of first neighbors) of the vertex vi in the molecular
graph and δiδj is the weight of edge{i, j}. The two Zagreb group parameter are strictly
related to zero-order χ0 and first order
1χ connectivity descriptors respectively.
The 1st Zagreb descriptor M1 (also called Gutman descriptor) is also related to the Platt
number F and the connectivity number N2 by the following relationship:
M1 = F + 2(A-1) = 2* (N2 + A - 1)
where A represents the number of atoms.
Lovasz and Pelikan (1973) developed Lovasz-Pelikan descriptor, denoted as
λ1LP
, by using largest eigen values of the adjacency matrix as molecular descriptors
(also known as leading eigen value λ1).
Review of Literature
45
λ1LP
≡ max SP (A)
the of the adjacency matrix has been suggested a descriptor of molecular branching,
the small values of leading eigen value λ1 correspond to linear or chain graphs and the
large values to the more branched graphs. It is not a very discriminant descriptor
because in many cases the same value is obtained for two or more non-isomorphic
graphs.
Randic (1975) introduced first connectivity descriptor namely Randic
connectivity index (1χ), also known connectivity index or branching index, by
transforming M2 into an inverse square-root function for characterizing the branching
in molecular graphs. It is defined as per following equation:
2/1
)(),(
1
GEji
ji
Where δi and δj represent the degrees of the vertices vi and vj; the term (δiδj)-1/2
for
each pair of adjacent vertices is called edge connectivity.
Kier et al. (1975a) extended this idea from edges (paths of unit length) to
paths of higher lengths two, three, etc. MDs thus constructed were termed molecular
connectivities of first; second, third, etc. order, respectively. These generalized
descriptors are also known as Kier-Hall connectivity descriptors (Kier and Hall,
1976a) and are calculated by the following:
2/1
1
0
n
i
i , 2/1
1
n
edgesall
ji , 2/1
2
2
n
pathsall
kji
A zero order descriptor was defined for completeness; the first order Kier-Hall
connectivity descriptor is the Randic connectivity descriptor and could be written as
1.
Kier and Hall (1976b) have further extended the validity of Randic descriptor
(χ) in order to account for heteroatom differentiation as well as for different subgraphs
in the molecule to heteroatom-containing molecules. They replaced vertex degree δi
by the valence vertex degree vδi in the construction of the analogous valence
connectivity descriptor, v.
2/1
)(),(
GEji
jv
ivv
Review of Literature
46
Where vδi is equal to:
vδi =
vZi- hi = ζi + πi + ni - hi
where vZi is the number of valence electrons(ζ electrons, π electrons, and lone pair (n)
electrons)of ith
atom and hi is the number of hydrogen atoms bonded to it. This
definition holds for atoms of the second principal quantum level (C, N, O, F). For
atoms of higher principal quantum levels (P, S, Cl, Br, I), Kier and Hall (1976b)
proposed to account for both valence and non-valence electrons, as the following:
1
iv
i
iiv
iv
zz
hz
Where Zi is the total number of electrons of the ith
atom, i.e. its atomic number and
vδi encodes the electronic identity of the atom in terms of both valence electron and
core electron counts; it is a valence electron descriptor.
Molecular connectivity descriptor has following advantages:
It possesses great discriminating power by virtue of its high monotonicity.
Computation of molecular connectivity descriptor is simple, as only basic
algorithm needs to be applied.
Molecular connectivity descriptor being related to degree of branching is a good
measure of molecular surface area or volume.
Molecular connectivity descriptor has provision to consider heteroatoms and
multiple bonds.
Molecular connectivity descriptor can be applied to acyclic, cyclic and aromatic
molecules.
Molecular connectivity descriptor correlates well with physicochemical and
biological properties.
Razinger (1982, 1986) showed that the kEC values are connected to the
respective kth
powers of the adjacency matrix. Later Rucker and Rucker (1993)
derived this relationship by proving that the vertex (or atom) walk count, kawc(i), and
the graph (molecule) walk count, kmwc(G), of length k are identical to Morgan‟s
kai and
kEC (G), respectively
Narumi and Katayama (1984) reported a simple topological descriptor, S
related to molecular branching and calculated as the product of the vertex degrees, δi :
Review of Literature
47
n
i
iS
1
where n is the number of atoms.
Kier and Hall (1986) introduced benzene likeliness descriptor (BLI) with an
aim to measure the molecule aromaticity. It is calculated by dividing the first order
valence connectivity descriptor vχ
1 by the number of non-hydrogen bonds in the
molecule and then normalizing on the benzene molecule.
Gombar et al. (1987) conceptualized the modified connectivity descriptors
known as perturbation connectivity descriptors, p
q
m based on perturbation delta
values δ p and is defined as:
1/ 2
1 1
nKm p p
q ak a k
Where k runs over the entire mth order subgraph of type q constituted by n atoms; K is the
total number of mth order subgraphs. Perturbation delta values are obtained from valence
vertex degree δv by incorporating the effect of atomic environments at topological level.
Burden (1989) presented a method for generating molecular identification
numbers also known as Burden number of hydrogen-depleted structures from the
smallest eigen values of a modified connectivity matrix known as Burden matrix.
Burden numbers are attractive because of their one-dimensional nature and the
comparative ease of their computation. Moreover, two molecules with close Burden
numbers often appear similar when comparing their chemical structures for example,
by comparing numbers of fragments or functional groups two molecules have and
have not in common.
Lohninger (1993) proposed modified Randic descriptor (1χmod) as the sum of
atomic properties, accounting for valence electrons and extended connectivities in the
H-depleted molecular graph using Randic connectivity descriptor-type formula as:
1mod 1/ 2
1 1
1*
2 ( * )
iAi
i j i j
Z
Review of Literature
48
Where the first sum runs over all the atoms in the molecular graph while the second
runs over the first neighbors of the considered atom; δ is the vertex degree, and Zi is the
atomic of the ith
atom.
Yang et al. (1994) proposed two novel molecular descriptors based upon
extended adjacency matrices (EA). These descriptors consider the presence of
heteroatom(s) and multiple bonds, possess high discriminating power, and correlate
well with a number of physico-chemical properties and biological activities of organic
compounds.
The first one is the sum of the absolute eigenvalues of the EA matrix, called the
EA descriptor, which can be calculated as:
1
AEAi
i
EA
The second molecular descriptor is the maximum absolute eigenvalues of the EA
matrix, called as EA max descriptor, this can be calculated as:
max
1
AEAi
i
EA
Goel et al. (1995) conceptualized an adjacency based topochemical graph
invariant, namely atomic molecular connectivity descriptor (A) which is modification
of Randic's molecular connectivity descriptor (). It may be defined as the sum of
inverse square root of product of chemical degrees (modified bond values) of adjacent
vertices over all edges in the hydrogen depleted molecular graph.
A (G)
n
ji
jcicVV
11
2/1
where Vic and Vjc are the chemical degrees of the vertices ic and jc. The computation of
A is conducted in a manner similar to that described by Kier and Hall (1976b) except
that the modified valency of each vertex involved in a pair is calculated by summing
up relative atomic weights of all the adjacent atoms.
Review of Literature
49
Estrada (1995) has derived edge-analog of the classical Randic connectivity
descriptor known as edge connectivity descriptor. The descriptor was claimed to be
correlated well with molar volume and is capable of discriminating between isomers. It
is abbreviated as ε and can be calculated as:
2/1
r
jw
iw ee
Where wδ(ei ) and
wδ(ej ) are the values of edge-weighted degrees and the summation
is over all r-pairs of adjacent edges.
Estrada and Ramirez (1996) defined a new bond order-weighted edge
connectivity descriptor (επ) based on edge adjacency relationships. The proposed
descriptor is sensitive to both the presence as well as the position(s) of heteroatom(s) in
the molecule (greater values refer to the central positions) and is able to discriminate
conformational isomers. The elements of edge set were substituted by bond orders or
more precisely valence descriptors calculated from quantum chemical methods and
their calculation is as follows:
2/1
r
jw
iw ee
Where π is the bond order.
Hu and Xu (1996) proposed two new two super-descriptors, EATI1 and
EATI2, based on the extended adjacency matrix of weighted molecular graph. The
extended adjacency ID numbers or EATI1 and EATI2 are calculated as:
2
1
1 *
n
i
iiEAEATI
n
i
iiEAEATI
1
2 *
Where [EA*]ii are the diagonal entries of the matrix EA*.
EATI1 was tested for selectivity on over 610,000 structures and also good correlating
ability was found. EATI2 (also called EAID) was particularly tested for selectivity and
Review of Literature
50
no degeneracy appeared. It is the most powerful descriptor designed so far and is a
candidate for CAS Registry Numbers (Guo et al., 1997).
Estrada (1996) developed the theory of the spectral moments of the so-called
edge adjacency matrix (which, in turn, is precisely the standard adjacency matrix of
the line graph of the molecular graph). Estrada‟s approach is the expansion of the
spectral moments in terms of counts of certain fragments contained in the molecular
graph. The spectral moments of the edge adjacency matrix E were defined as:
μk
= tr [Ek]
Where tr stands for the trace (sum of diagonal entries) of the respective matrix.
The spectral moments of the edge adjacency matrix have been successfully
applied in QSPR and QSAR studies of alkanes, alkyl halides, benzyl alcohols,
cycloalkanes and benzenoid hydrocarbons (Estrada, 1996; 1997; 1998).
Estrada and coworkers (Estrada, 1996; 1997; Gutierrez and Estrada, 1997)
introduced a novel approach, the TOpological SubStructural MOlecular DEsign
(TOSS-MODE) or now called TOPological Substructural MOlecular DEsign
(TOPSMODE) which is based on determining the spectral moments of the topological
bond matrix. The application of this approach in the study of quantitative structure–
permeability relationships (QSPR/QSAR) can be summarized as:
1. Construction of hydrogen–suppressed molecular graphs for every molecule of
the data set.
2. Using appropriate bond weights to differentiate the molecular bonds, e.g.,
bond length, bond dipoles and bond polarizabilities.
3. Calculating the spectral moments of the bond matrix with the appropriate
weights for each molecule in the data set and preparing a table in which rows
represent the compounds and columns represent the spectral moments in the
bond matrix.
4. Generate QSPR/QSAR with the help of appropriate linear or non–linear
multivariate statistical technique, such as MLRA.
5. Finally use cross–validation techniques to optimize the predictive capability of
the QSAR/QSPR model (Estrada, 2000).
Pearlman and Smith (1998; 1999) extended Burden approach to address
searching for chemicals on large data bases by conceptualizing BCUT descriptors
Review of Literature
51
(Burden-CAS-University of Texas eigenvalues) based on three different matrices whose
diagonal elements were atomic polarizability-related values, atomic charge-related
values and atomic H-bond abilities.
Estrada et al. (1998a); Estrada (2000b); Estrada and Rodriguez (2000) made
an important extension by generalizing edge connectivity descriptors to propose
extended edge connectivity descriptors, which can be defined as:
1/ 2
1
Km
q bbk k
Where k runs over all of the mth
order subgraphs, m is the number of edges in the
subgraph; K is the total number of mth
order subgraphs present in the molecular graph,
the edge degrees of all of the edges involved in the subgraph are considered. The
subscript “q” refers to the type of molecular subgraph.
Bonchev (1999, 2000) proposed two overall connectivity descriptors, TC and
TC1, as a meaningful measure of topological complexity of molecules, since they
satisfy two fundamental requirements to a complexity measure: to increase with both
the number of structural elements and their interconnectedness (Bonchev, 1997). The
topological complexity descriptor TC(G) of the graph G, may be defined as the
summation of the total adjacencies of all eK connected subgraphs having e edges and
nt vertexes of degree ai, including the graph itself which has E edges and K connected
subgraphs; summarizing the information on the connectivity of vertexes in all
subgraphs, the new descriptor has the meaning of the overall connectivity of G:
)()()(
1100
GGTCGTCt
e n
i
i
K
t
E
e
E
e
e
The eTC(G) is the topological complexity of order e,(0,1,2…n).
The TC(G) descriptor is defined in two quantitatively different versions. In the basic
version, the vertex degrees δi are those in the entire graph G; in the second version,
the TC1(G) descriptor is calculated with the vertex degrees taken from the
corresponding subgraph Gt (Bonchev, 2000).
Nikolic et al. (2000) proposed symmetry-modified Zagreb invariants M1 and M2
by summing up only degrees (SMM1) or edge weights (SMM2) of symmetry
nonequivalent vertices or edges of graphs. On comparing closely related symmetry-
Review of Literature
52
independent and symmetry-dependent complexity invariants they produced different
ordering.
Basak et al. (2000) considered molecular surface dependent properties (boiling
point and gas chromatograph retention times) and molecular volume dependent
properties (molar volume and molar refraction). They found that edge connectivity
invariants were appropriate for structure-molecular surface properties
Estrada and Molina (2001) defined spectral moment of the edge adjacency
matrix as the sum of diagonal entries of the different powers of the edge adjacency
matrix corresponding to a molecular fragment. It is defined as:
FB
b
bbk
k EMF
1
)(
Where MF indicates the molecular fragment considered, and the summation goes over
all the bonds forming the fragment.
Bonchev (2001a) verified the applicability of the overall connectivity
descriptors to QSPR studies by linear regression modeling of 10 physicochemical
properties of linear alkanes. The development of molecular connectivity concept and
some of its key elements- Randic‟s inverse-square-root function and the detailed
subgraph characterization- were also analyzed. This study demonstrated the
usefulness of overall connectivity concept for QSPR applications, which combines the
basic ideas of the classical concept of molecular connectivity with those of molecular
complexity.
Lukovits and Linert (2001) applied a chiral function F having the condition F
(D) = F (L), where D and L, represent the enantiomers of the same structure, in
combination with Randic atom-modified first order molecular connectivity descriptor.
Tomovic and Gutman (2001) renamed simple topological descriptor, S
developed by Narumi and Katayama (1984) as “Narumi-Katayama descriptor” while
studying the properties of this descriptor for phenylenes.
Kezele et al. (2001) tested the use of variable connectivity descriptor in QSPR.
The descriptor yielded very good regression equations in the case of homogenous sets
of molecules
Review of Literature
53
Torrens (2002) optimized a method for determining the permanent of the
adjacency matrix, per(A), of fullerenes. The permanent of the adjacency matrix,
per(A) is defined as:
n
n
i
iiAPer
1
)(
Where Λn denotes the set of all possible permutations of (1, 2…..n).
The algorithm allows rapid computation of per(A) for adjacency matrix of molecules
large enough to be theoretically interesting and concluded that this approach could be
useful in designing and optimizing the structures of unknown fullerenes.
Ren (2002a; 2002b) derived a novel vertex degree vm for heteroatom in
molecular graph on the basis of the valence connectivity v of Kier-Hall. The atom-type
AI descriptor and Xu descriptor, were modified for compounds with heteroatom by
replacing the vertex-degree of hetero-atom by the proposed vm. The modified Xu
descriptor and AI descriptor provided QSPR models for the normal boiling points (BP),
Molar volumes (MV), molar refractions (MR) and molecular total surface areas (TSA)
of alcohols with up to 17 non-hydrogen atoms. These physical properties were
expressed as a linear combination of the individual descriptors related to molecular size
and atom-type.
Turker (2003a) started with the concept of T(A) graphs for alternant
hydrocarbons defined a novel topological descriptor (L). The proposed descriptor
differentiated isomeric as well as isospectral molecules, encode alternate saturated
and unsaturated systems and could be extended to heteroatom containing structures.
Cao et al. (2003) extended the application of eigenvalues of bonding orbital–
connection matrix to different physicochemical properties and developed bond
adjacency matrix BCH and orbital overlapping matrix BCC based on polarizability
effect descriptor (PEI) for each C-H bond carbon skeleton in alkane molecule.
Nikolic et al. (2003) amended the original Zagreb descriptors through
insertion of inverse values of the vertex-degrees:
]/1[]/1[1 i
n
verticesall
imM
Review of Literature
54
]/1[]/1[2 ji
n
verticesall
jimM
They concluded that the modified Zagreb mM1 descriptor gave a greater
contribution to outer atoms than to inner atoms in a molecule. Similarly, the modified
Zagreb mM2 descriptor gave a greater contribution to outer bonds than to inner bonds
in a molecule. This was opposite to the behaviour of the original Zagreb indices and
in agreement with the chemists‟ understanding that the most important contributions
to the interactions between molecules that are essential for many of their physical,
chemical, biological and even technological properties arise from the more exposed
atoms and bonds.
Lailong and Chengjun (2004) developed a novel connectivity descriptor mF
on the basis of adjacency matrix and edge valency. This descriptor reflects the
chemical bond specificity of edge i.
5.0
....kjim fffF
Where fi is the edge valency.
Pompe (2005) constructed a modified variable topological descriptor to
determine the overall contributions of atoms or bonds in QSAR/QSPR studies.
n
i
k
ji
fj
n
i
fi
kfk
1
1
1
2/1
1
Where fi
k expresses the contribution of path length (k) to the variable connectivity
descriptor of the same order and n is the number of paths having length k.
He concluded that the modified variable connectivity descriptor may offer better
structural interpretations by providing adequate knowledge about the part of the
molecule responsible for enhancing or suppressing the modelled property.
Dureja and Madan (2005) renamed atomic molecular connectivity descriptor
as molecular connectivity topochemical descriptor on similar grounds of other
topochemical descriptors for the sake of simplicity and to avoid any confusion.
Bajaj et al. (2005) refined Zagreb indices M1 and M2. The topochemical
descriptors were based on topochemical adjacency matrix. These refined descriptors
were sensitive to both the presence as well as position of the heteroatom(s) in the
Review of Literature
55
molecule. Zagreb topochemical descriptors M1c may be defined as the sum of the
squares of chemical degrees of all the vertices in the hydrogen depleted molecular
graph. Zagreb topochemical descriptors, M2c may be defined as the sum of the chemical
weights of all the edges in hydrogen depleted molecular graph.
21
c
ca a
M
2 ( )c c
cb i j b
M
Where a runs over the A atoms of the molecule and b over all of the B bonds of the
molecule. δic and δjc refers to the vertex degrees of the atoms incident to the considered
bond.
Zhang et al. (2007) characterized DNA by a numerical sequence by
considering positions of bases and the pairs of bases in DNA. For generating the
sequence invariants, the following function is used:
dr= m/n
where dr is defined as the relative position parameter of a base; m represents the
position of this base in a sequence, n represents the number of all bases in this
sequence. They extracted a novel invariant (molecular connectivity descriptor type)
from the derived numerical sequences.
Mu et al. (2008) devised novel molecular connectivity descriptor, denoted as
mχ′ based on the adjacency matrix of molecular graphs. By using delta value ( i )
instead of the original delta value ( vi ) of the molecular connectivity descriptor, this
new descriptor was obtained as:
2/1
1
1
1
)(
j
n
j
m
i
im
m
G
Where m is the order of the molecular connectivity descriptor. This descriptor was
successfully applied to predict the molar diamagnetic susceptibilities of organic
compounds. Later on the converse descriptor, denoted by m
χ′ was also proposed by
Mu et al. (2009) as:
y
j
n
j
m
i
im
m
G
1
1
1
)(
Review of Literature
56
Where y is a variable, whose optimal value can be found by the optimization method.
Vukicevic and Furtula (2009) constructed a novel TD based on the end-
vertex degrees of edges called as „geometric-arithmetic‟ (GA) descriptor:
Euv vu
vuGGA
2/)(
Where δu and δv are the degrees of vertices that are connected with edge uv and the
summation goes over all edges of graph G. 2/vu is the arithmetic mean
whereas vu is geometric mean, hence the name „geometric-arithmetic‟
descriptor. The predictive ability of this descriptor was found to be better than Randic
connectivity descriptor χ in modeling some physico-chemical properties of octanes.
Furtula et al. (2010) defined augmented Zagreb descriptor, AZI of molecular
graph G as:
)(
3)2
()(
GEij ji
jiGAZI
Where E(G) is the edge set of G, and i and i are the degrees of the terminal vertices
i and j of edge ij. Some tight upper and lower bounds were also reported for the AZI
descriptor of a chemical tree.
Das and Trinajstic (2010) compared the geometrical-arithmetic descriptor
and atomic bond connectivity descriptors for chemical trees and molecular graphs.
Besides chemical trees and molecular graphs, general graphs were also investigated.
They concluded that geometrical-arithmetic descriptor is greater than atom-bond
connectivity descriptor for the difference between maximum and minimum degree,
less than or equal to three.
Fath-Tabar et al. (2010) proposed a new molecular-structure descriptor GA2
belonging to the class of geometric-arithmetic descriptors. It is closely related to the
Szeged and vertex PI descriptors.
)(
2)()(5.0
)()()(
GEuv vu
vu
enen
enenGGA
They established the main properties of GA2 including lower and upper bounds and
the trees with minimum and maximum GA2 were also characterized.
Review of Literature
57
Randic et al. (2010) described novel matrix termed as natural distance matrix
for graphs, which is based on interpretation of columns of the adjacency matrix of a
graph as a set of points in n-dimensional space. The leading eigenvalue λ1, average
row sum (ARS) and the J descriptor of the natural distance matrix were proposed as
three new MDs for characterization of molecular graph.
Andova and Petrusevski (2011) devised variable Zagreb descriptors denoted
by: λM1 and
λM2 in accordance with Karamata’s Inequality. The first and second
variable Zagreb descriptors were defined as:
n
i
iGMM
1
211
),(
22
ji
jiGMM
Natarajan (2011) proposed three new descriptors by taking advantage of both
the uniqueness of path code and the spectrum of connectivity descriptors. The
formulae to compute the new TDs were expressed as:
Over all path multiplicity i
i
i POPM
0
Mean path connectivity i
i
i PMPM /
0
Sum of connectivity descriptors of all orders
0i
iSumR
Where Pi is the number of paths of length i and iχ is the connectivity descriptor of
order i. These descriptors ranked all planar graphs of alkanes C4 to C6 uniquely and
were found to have non-degenerate values for all the 7668 constitutional isomers
(alkane trees) from C4 to C15.
Doslic et al. (2011) defined the average neighbor degree number, an invariant
useful for measuring the diversity of vertices in molecular graph G, as:
)(
)()(
1)(
GVu
avgavg umGV
Gm
where mavg(u) is the average of degree of vertices adjacent to u.
Review of Literature
58
The application of mavg(G) was investigated on the benchmark set of 18 octane
isomers and found a decent correlation between mavg(G) and enthalpy of vaporization,
as well as the standard enthalpy of vaporization.
Ghorbani and Hosseinzadeh (2012) reported new version of Zagreb
descriptors calculated as:
)(
22
GEij
jiGMM
)(
21
*1
*
GVj
jGMM
)(
22
GEij
jiGMM
Where i is the largest distance between i and any other vertex j of G.
Saha and Bandyopadhyay (2012) introduced novel cluster validity
descriptors which are able to automatically detect clusters of any shape, size or
convexity as long as they are well-separated. They measured connectivity using a
novel approach following the concept of relative neighborhood graph.
Ghorbani et al. (2012) modified Narumi-Katayama descriptor as NK* in
which each vertex degree (δi) is multiplied δi times. They determine its basic
properties and characterize graphs extremal with respect to it. This new version of NK
descriptor may be represented as:
n
i
iiGNKNK
1
)(**
Eliasi et al. (2012) described multiplicative version of first Zagreb descriptor
as:
)(
*
1
*
1)(
GEij
jiG
They proved that among all connected graphs with a given number of vertices, the
path has minimal ∏1*.
Singh et al. (2013) proposed refined general Randic descriptors (R2, R3 and
R4), as well as their topochemical counterparts as a modification of general Randic
descriptor which can be defined as the sum of the quotients of the inverse of the
product of the degree of each vertex on every edge in the hydrogen depleted
molecular graph. It can be expressed as:
Review of Literature
59
RN = ∑ ( )
where vi and vj are degree of ith
vertex and degree of jth
vertex respectively, n is
number vertices and N is equal to 2, 3, 4 for refined general Randic descriptor-1, 2, 3
respectively (denoted by R2, R3, R4). The MDs values of complex chemical structures
were kept within reasonable limits by inserting constant km to avoid any compromise
with the discriminating power.
Distance based graph invariants
These descriptors utilizes distance matrix/detour matrix to characterize molecular
graphs. Distance matrix D(G), for a graph G is defined as a real, square, symmetrical
matrix of order n, with entries dij representing the distance traversed in moving from
vertex i to vertex j in G. The entries in the distance matrix indicate the number of edges
in the shortest path between vertices i and j whereas entries in case of detour matrix
indicate the number of edges in the longest path between vertices i and j.
The first chemical graph theory based theoretical MD called Wiener descriptor
was conceptualized by H. Wiener in 1947 (1947a; 1947b) to model some
thermodynamic properties of acyclic hydrocarbons. Wiener index W is defined as “the
summation of the distances between any two carbon atoms in an acyclic molecule, in
terms of carbon-carbon bonds.” The value of wiener descriptor can simply be calculated
by multiplying the number of carbon atoms on one side of any bond by those on the
other side and then the summing up these products as:
, ,( ) *e L e R ee e
W W G W N N
Where NL, NR being the number of vertices lying to the left and to the right of edge e,
and the summation runs over all edges in G.
The Wiener number was initially employed to predict physical parameters such
as boiling points, heats of formation, heats of vaporization, molar volumes, and molar
refractions of alkanes by simple QSPR models.
Hosoya (1971) found out that Wiener index (W) can also be obtained by
simply adding all the elements of the graph distance matrix above the main diagonal.
This not only offers an alternative route to W but also allows a particular extension of
W to cyclic structures. He extended the aforementioned definition to cyclic
Review of Literature
60
compounds with the aid of the distance matrix and gave the formula for Wiener
descriptor/number as:
1 1
1
2
N N
iji j
W D
Where Dij are the off-diagonal elements of D and N represents the total number of
edges in graph G.
Hosoya, in 1971, also suggested for the first time a nontrivial single graph
descriptor (Z), for expressing structure-property relationship (Hosoya, 1971). Although
Hosoya’s Z descriptor has been associated with the adjacency matrix, it can be
classified among the distance matrix invariants due to the procedure used to calculate
p(G, k)
]2/[
0
),(
n
k
kGpZ
Where n represents the number of graph vertices, p(G, k) denotes the number of ways
in which non adjacent k bonds are selected from graph G, and the Gaussian brackets [ ]
represent the greatest integer whose value < n/2. The Z descriptor is calculated by
summing the p(G,k) coefficients over all different k values.
For linear graphs G, Z can also be defined as the summation of absolute
coefficients values in the characteristic polynomial.
2
0
( ) ( 1) ( , )m
k N kG
k
P X p G k X
Where m represents the highest number of bonds disconnected to each other in G.
The Hosoya Z descriptor is correlated well with the mode of branching and ring
closure and can be used as first sorting device for coding or retrieving the structures of
the compounds with or without rings (Hosoya, 1972).
Hosoya’s index Z has some advantages over other graph invariants:
Possess high discriminating power
can be computed, either from the molecular structure or from the polynomial
Review of Literature
61
Extendable to distance polynomial and matching polynomial as it is defined
through the counting polynomial, which is closely related to the characteristic
polynomial
Ability to consider the effect of electronic systems.
Hosoya et al. (1973) reported the distance polynomial as the characteristic
polynomial of the distance matrix of the molecular graph and defined the Hosoya Z′
descriptor :
n
i
icZ
0
Where ci are the coefficients of the distance polynomial of the molecular graph.
Rouvray (1973) defined the total sum of the row entries of the distance matrix
as the new topological descriptor called Rouvray descriptor, denoted by IROUV. It is
actually twice the Wiener descriptor, W (Rouvray, 1976) and expressed as:
WSdIn
i
i
n
j
ij
n
i
ROUV 2
111
Hosoya et al. (1975) proposed a generalization of Z descriptor to account for
the contributions made by unsaturated system in the structure of a molecule. The π-
energy Z descriptor as:
n
i
iSZ0
Hosoya‟s Z descriptor despite being discriminatory, possess following limitations:
1. It represents a vague graph topological nature of the molecular structure with
respect to branching and cyclization.
2. No provision to consider heteroatoms.
3. Sophisticated programs are needed to compute the Z descriptor as the size,
branching and compactness of a molecule increases.
In analogous to above, a more generalized total π-energy descriptor, Z* is defined as:
n
i
ii AsZ
0
5.022*
Review of Literature
62
An Huckel molecular orbital (HMO) characteristic polynomial P(i) is divided into an
even function S(i) and an odd function A(i) with respect to i (Aihara, 1976).
Bonchev et al. (1980) developed total distance rank (dr), based on the distance
sum (vdi) of the vertex vi. It's defined as the summation of all the elements (dij) of the
ith
row of the distance matrix D.
min1
n
j
ijddr
Where dij is the elements of distance matrix.
Balaban (1982; 1983) introduced three new topological distances based
descriptors denoted by D, D1 and J. The first one D, named as mean square distance
descriptor may be represented as:
k
i
i
i
k
ik
g
ig
D
/1
*
Where gi is number of occurrences of distances of length i.
This descriptor possesses good discriminating ability, especially for acyclic graphs;
however, it shows poor performance for polycyclic graphs.
The second invariant, known as end point mean (square) distance topological
descriptor, D1, was calculated by taking only distances between endpoints.
The third one was Balaban distance connectivity descriptor (also called distance
connectivity descriptor or average distance sum connectivity) and is denoted as J. At
the time of its inception, J was one of the most discriminating molecular descriptors and
its values did not increase substantially with molecular size or number of rings. Further
j was claimed to have the lowest degeneracy of all TDs proposed till the time of its
proposal. It is defined as:
1/ 2 * * 1/ 21* ( * ) * ( * )
1 1b i j b i jb b
BJ
C C
Where the summation runs over all the molecular bonds b, ζi and ζj are the vertex
distance degrees of the adjoining atoms, C is cyclomatic number representing number
of rings in the graph and B is the number of bonds in the molecular graph G. The
Review of Literature
63
denominator C+1 is a normalization factor against the number of rings in the molecule.
ζi* = ζi /B is the average vertex distance degree.
Barysz et al. (1983) introduced Wiener-type descriptor from Z weighted
distance matrix [W(Gvew)]. It may be defined as the summation of all the elements in
upper triangular Barysz matrix as well as all the diagonal elements thereof and defined
as per the following:
1
2vew ii ij
i j
W G d d
Where6
1diiZ
i
Where Zi is the number of all (valence and inner shell) electrons in the atom i.
The off-diagonal elements of the distance matrix for the vertex- and edge-
weighted (multi) graphs are defined as:
ij rr
d k
Where the summation goes over r (r = 1,2,.......), weighted and unweighted, bonds,
while the parameter kr is given by (Barysz et al., 1983),
1 36
**
rr i j
kb Z Z
The value of br for single bond, double bond, triple bond and an aromatic bond are 1,
2, 3, and 1.5 respectively, Zi, and Zj, denote the numbers of electrons in atoms i and j
making up the r-bond (Barysz et al., 1983).
Broto et al. (1984a; 1984b) conceived autocorrelation descriptors of topological
structure namely autocorrelation of a topological structure (ATS. This is a spatial
autocorrelation defined on a molecular graph G as:
1 1* *
A A
i jd ij i j
dATS W W
Review of Literature
64
Where W is any atomic property, A is the atom number, d is the considered topological
distance (i.e. the lag in autocorrelation terms), δij is Kronecker delta (δij =1 if dij = d,
zero otherwise).
Randic (1984) and Szymanski et al. (1985) introduced novel molecular
descriptors known as Molecular ID numbers (MID). These can be defined as the sum of
all paths (weighted or non weighted) in a molecule (graph). These were mainly
proposed to unequivocally identify a molecule by a single real number, with the aim to
obtain high discriminating power. These descriptors carry considerable structural
information and are successfully used in QSPR/QSAR analysis.
The first MID proposed by Randic (1984), namely Randic connectivity ID
number (CID) can be defined as a weighted molecular path count as per the following
equation:
Pij
ijm wACID
Where A is the number of atoms. mPij denotes a path of length m from the vertex vi to
vertex vj, and wij is the path weight. The sum runs over all paths of the graph.
The weight wij is calculated by multiplying the edge connectivity of all m edges
(bonds) of the path mPij as:
m
b
bbbijw1
2/1
)2()1( )*(
Where δb(1) and δb(2) are the vertex degree of the two atoms incident to the bth
edge and b
runs over all of the m edges of the path.
Kier (1985) developed molecular shape descriptors or kappa descriptors,
based on the count of the two-bond fragments in order to encode the overall molecular
shape. They are calculated using the counts of path of length one (one bond), two
(two bond) and three (three bond) in hydrogen suppressed molecular graph of the
molecule, and correspondingly the kappa descriptors were defined as of first, second
and third order.
The general formula for calculating kappa descriptors (m
k) is the following:
mk =2
mPmax
mPmin/ (
mPi)
2
Review of Literature
65
The m
Pmax and m
Pmin values can be calculated directly from the number of non-
hydrogen atoms (nH) in the molecules. Their substitution for paths of different length
has resulted in the following equations for m
k:
21
21 1
iP
nHnH
22
22 21
iP
nHnH
23
23 23
iP
nHnH when nH is even
23
23 31
iP
nHnH when nH is odd
Where nH is number of non-hydrogen atoms in the molecule.
The kappa descriptors were derived assuming that all atoms in the molecule are
equivalent. The influence on molecular shape on atoms other than carbon in sp3
hybrid state is accounted by the kappa alpha shape descriptors (m
kα) (Kier, 1986a).
They can be obtained by modifying each nH and mPi in the above equations by
adding an α value:
α = r(x) / r (C(sp3))-1
Where r(x) is the covalent radius of atom (x); r (C(sp3)) is the covalent radius of carbon
atom in sp3
hybrid state (Kier, 1986b, 1986c).
Balaban (1986) modified average distance-sum-connectivity descriptor, J, in
order to account for both bond multiplicity and heteroatoms and defined two new
descriptors i.e. Jx and J
y employing fractional distance matrix. The quantities X and Y
are recalculated atomic Sanderson electronegativities and covalent radii relative to
carbon atom, respectively, obtained as a function of the atomic number.
Balaban (1987) also proposed a molecular identification number namely
Balaban ID number (BID) which can be defined as:
Pij
ijm wABID
Where A is the number of atoms. mPij denotes a path of length m from the vertex vi to
vertex vj, and wij is the path weight. The sum runs over all paths of the graph.
Review of Literature
66
The weight wij is calculated by multiplying the edge weights of all m edges
(bonds) of the path mPij as per following:
m
b
bbbijw1
2/1
)2()1( )*(
Where δb(1) and δb(2) are the vertex degree of the two atoms incident to the b edge and b
runs over all of the m edges of the path.
Kier (1989) introduced shape flexibility descriptors. The flexibility of
molecules depends upon the presence of cycles/or branching. By combining 1kα, and
2kα descriptors with the number of non-hydrogen atoms (nHA) (for normalization), a
further descriptor ф has been defined, which is considered to measure molecular
flexibility.
ф = (1kα *
2kα)/ nHA
Where 1kα represents the information about the relative cyclicity and atoms count of
molecules and 2kα represent information about relative spatial density or branching of
molecules. The flexibility descriptor decreases with increased branching and cyclicity.
Schultz (1989) described Schultz molecular topological descriptor (MTI) by
using adjacency, valence, and distance matrices. The quantity is defined by the
following expression:
iDAvGMTIMTIN
i
1
)(
Where G is the molecular graph considered, possessing N= N(G) vertices, A is the (N X
N dimensional) adjacency matrix, D is the (N X N dimensional) distance matrix, and v =
(v1, v2,....... vN) is the (1 X N dimensional) vector of the vertex valencies (degrees) of the
molecular graph G.
Hall and Kier (1990) proposed topological state invariants Si, as numerical
values related to every vertex in a molecule which may encode information about the
topological environment of the vertex due to all other vertices in the molecule. The
molecular topological relationship to each other vertex is based on the encoding of
vertex information in all the paths emanating from that vertex and derived from
topological state matrix as:
Review of Literature
67
n
j
ijii TTVSS
1
][)(
Where VS stands for the row sum operator. Molecular topologically equivalent vertices
have identical values of molecular topological state descriptor and in-equivalent
vertices would have different values.
Balaban et al. (1990) used distance measure in defining a highly selective
connectivity based spectrum of TDs (denoted by: DMk) as:
n
j
kkt
mt
mk RDM
1
/1])([(
Where the summation goes over all the connectivity descriptors of different type t up
to the sixth order (m = 6); k is an integer parameter ranging from 1 to 5 (k=1 is the
Manhattan distance, k=1 is the Euclidean distance). m
χt and m
χt (R) are the
connectivity descriptors for the considered molecule and a reference molecule R,
respectively (Balaban et al., 1990).
Lall (1991) defined Topological I-descriptor of a graph based on the
topological distances from a given vertex in the edge weighted graph of the organic
molecule and is calculated as:
I(G) =
N
jrn
N
rrgrn
1
1
Where nr is the number of rth
kind of vertices for which gr is the topological distance
from the root in the edge weighted graph and the topological distance dij between the
vertices i and j is defined as the distance associated with a minimum weight. The
weights in the edge-weighted graphs correspond to k values of the Huckel parameter for
the heteroatom.
Petitjean (1992) described graph theoretical shape coefficient I on the basis of
topological radius and diameter as:
RRDI / 0 ≤ I ≤ 1
Review of Literature
68
The topological radius (R) is defined as the smallest vertex eccentricity in the graph
and topological diameter (D) is defined as the largest vertex eccentricity in the graph.
He proposed the graph-theoretical bivariate repartition of the (R,D) pairs in the form
of "radius-diameter diagram" and observed unexpected partitioning/classification of
the compounds from the chemical abstracts services registry file in comparison to
other shape coefficients (Petitjean, 1992).
Randic (1993) constructed new graph matrices by modifying Wiener‟s
procedure for calculation of Wiener numbers in alkanes and reported the sequences
(higher Wiener numbers KW) generated by summing the entries in the matrix for
vertices at the same distances from one another. Wiener matrix was extended to
consider heteroatoms. The matrix was termed as Wiener matrix.
Ivanciuc et al. (1993a) introduced the reciprocal distance matrices D-1
by
modifying distance matrix where each off-diagonal element is the reciprocal of the
topological distance between the vertices. A local graph invariant termed as the
reciprocal distance sum, RDSi derived from this matrix was defined as:
n
j
ijii dDVSRDS
1
11)(
Where the symbol VS stands for the row sum operator.
Randic (1993) introduced a modified Wiener descriptor known as Hyper-
Wiener Descriptor, denoted as WW=WW (G). However, Randic‟s algorithm for
computing the hyper-Wiener descriptor could be applied only to acyclic structures. It
was later shown that WW can be computed for all structures as follows (Lukovits and
Linert, 1994; Klein et al., 1995).
WW = 2
1 1
(1/ 4) [( ) ( ) ]N N
ij iji j
D D
Where the summation goes over all pairs of vertices i and j.
Plavsic et al. (1993) introduced The Harary descriptoror RDSUM descriptor
(H),in the honor of Prof. Frank Harary. It is defined as the half-sum of the off-diagonal
elements of the reciprocal molecular distance matrix Dr= D
r (G) as per the following:
Review of Literature
69
N
i
N
j
ij
rDH1 1
)(*5.0
Ivanciuc et al. (1993a) defined two other MDs, known as RDSQ descriptor
and RDCHI descriptor, defined as:
5.0
1
1
1
)( ji
n
ij
ij
n
i
RDSRDSRDSQ
5.0
1
1
1
)(
ji
n
ij
ij
n
i
RDSRDSRDCHI
Where RDSi is reciprocal distance sum of vertex vi
Balaban and Diudea (1993) reported Balaban DJ descriptor in terms of
modified vertex distance degrees, Si but using the formula of the matrix sum
descriptors as with Balaban J descriptor. When the weighting factor w is equal to one
and the multigraph factor is equal to zero then the descriptor DJ is related to the
Balaban descriptor J by the following:
GEij
ji SSC
BJ
C
BDJ
2/1
1.2.
1.2
Gutman (1994a) proposed a Schultz-type topological descriptor namely
Gutman molecular topological descriptor by valence vertex degrees (SG).It may be
defined as:
N
i
N
j
ijjiG DvvS1 1
.
Where vivj.Dij is the topological distance between the vertices vi and vj weighted by
product of the endpoint vertex degrees. Like the Schultz molecular topological
descriptor, the Gutman molecular topological descriptor is a vertex-valency-weighted
analogue of the Wiener descriptor, whereas the weighting factor is multiplicative
instead of additive.
Randic et al. (l994a) proposed the first eigenvalue of Wiener matrix as an
alternative descriptor for the molecular branching. JJ descriptor was developed in
analogy with the connectivity descriptor χ from adjacency matrix and Balaban’s
descriptor J from the distance matrix. It can be calculated as:
Review of Literature
70
2/1)( bjb iRRJJ
Where Ri is the wiener matrix degree.
Randic et al. (l994b) also proposed new structural invariants based upon
distance/distance matrices (DD matrix) for graphs which are embedded on two and
three-dimensional grids. First value of these matrices, λ/n for path graphs was reported
to be descriptor of folding The ratio Φ = λ/n approaches I (one) for geometrically linear
structures while it approaches o (zero) as path graph is repeatedly folded.
Ivanciuc and Balaban (1994) reported two path matrices based theoretical
invariants called maximum path sum (MPS) and maximum minimum path sum
(MmPS) topological descriptor. MPS topological descriptor is defined as the sum of
the number of bonds on the longest path between any two vertices in the molecular
graph i.e. half-sum of the elements of the maximum path matrix MP, whereas, the
MmPS topological descriptor was defined as the sum of the longest and shortest path
between any two vertices in the molecular graph i.e. sum of the elements of the
max/min path matrix MmP. These descriptors were represented as:
n
j
ij
n
i
MPGMPS11
][2/1)(
)()(][2/1)(11
GMPSGWMmPGMmPSn
j
ij
n
i
If G is an acyclic graph then:
MPS(G)= MmPS(G)/2= W(G)
Where W(G) is Wiener descriptor of a molecular graph G.
Khadikar et al. (1995) proposed Szeged descriptor (Sz), analogous to Wiener
descriptor, valid both for acyclic and cyclic graphs. Wiener descriptor is the sum of
the product of the number of vertices on each side of a bond, while the Szeged
descriptor is defined as the sum of the product of the number of vertices closer to the
atoms on each side of a bond (Gutman and Klavzar, 1995). Szeged descriptor was
defined as:
Sz = Sz (G) =
,u v
u v
n n
Review of Literature
71
Where nu stands for the number of vertices nearer to the vertex v than u, and then nv
stands for the number of vertices nearer to the vertex u than v and summation goes over
all edges u, v in a cyclic graph G
Randic and Razinger (1995) introduced geometry dependent molecular
topographic descriptors which can be calculated from novel matrices whose elements
depend upon the molecular geometry. The entries in these matrices were either 3-D
geometric distances between atoms of some modified function of inter-atomic
distances.
Estrada and Gutman (1996) described a novel molecular topological
descriptor MTI (E) based on edge-distances in molecular graphs in analogy with
Schultz molecular topological descriptor (Schultz, 1989). The edge-based version of
the molecular topological descriptor may be expressed as:
n
i
ieee
i DAvEMTI
1
)(
Where Ae
is the edge adjacency matrix and De is the edge distance matrix. e
iv is the
degree of the ith edge of the molecular graph G. Distances between vertices of the
respective line graph are described as edge-distances in a graph. A simple relation was
found between edge-distances and the distances between the vertices that are incident
to the respective edges.
Lukovits (1996) introduced a wiener type descriptor, originally called MPS
topological descriptor (Ivanciuc and Balaban, 1994) but usually known as detour
descriptor and is denoted as ω. The detour descriptor is calculated as the sum of the
detour distances between any two vertices in the molecular graph G as:
1 1
(1/ 2) ( )N N
iji j
Where (∆)ij represents the length of a longest path between vertexes i and j of G.
Diudea (1996a; 1996b) described a novel unsymmetrical square matrix, CJu,
for calculating both Wiener (W) and hyper-Wiener (WW) numbers. This matrix is
constructed by using the principle of single endpoint characterization of paths.
Review of Literature
72
Diudea (1997a; 1997b) reported several descriptors from the Cluj matrices,
either as the half-sum of entries in the corresponding symmetric matrices or directly
from the unsymmetric matrices as:
jiu
pe
ijupe MMTI /
/
When defined on edges, TIe is a Cluj descriptor: denoted by CDe or CΔe, depending
on whether it is derived from the Cluj-distance or Cluj-detour matrix. Similarly, when
defined on paths, TIp is a hyper-Cluj descriptor denoted by CDp or CDp. The novel
hyper-descriptor, CDp, showed good correlation with the boiling points of a selected
set of cycloalkanes.
Linert and Lukovits (1997) introduced hyper-detour descriptor (ωω) by
replacing (D)ij by (∆)ij in equation of detour descriptor. (). It was defined similarly
as the hyper-Wiener descriptor (WW), that is, as the quarter-sum of the sum-matrix
made up from the off diagonal elements of the detour matrix []ij and their squares
[]ij2. It is calculated as:
2
1 1
(1/ 4) [( ) ( ) ] / 2N N
ij iji j
Since the Wiener descriptor (W) and the detour descriptor () are identical for acyclic
structures, the same is also true for their hyper counterparts.
Lukovits (1998) introduced all-path Wiener descriptor (WAP
) as all path version
of the Wiener descriptor but with more discriminating power among cycle-containing
structures called as Pasareti descriptor (P). It is defined as:
1
1 1ij
A AAP
ijpi j i
W P
Where the two outer sums on the right side run over all pairs of vertices in the graph
and inner sum runs over all paths pij from vertex vi to vertex vj, and ijp denotes the
length of the considered path. This descriptor is also called the Pasareti descriptor,
because it was derived in home of Lukovits, which is located in the part of Budapest
called Pasarest.
Review of Literature
73
The descriptor was later on transformed was into a new variant V = V(G) called the
Verhalom descriptor, which is calculated as:
V = P/k
Where k is the total number of paths in G divided by N (N -1)/2. The name Verhalom
descriptor is given to this distance related descriptor because it was originated in the
chemical research center of the Hungarian academy of sciences located in the district
Verhalom in Budapest.
Plavsic et al. (1998) described Wiener-sum or D/Δ descriptor of a molecular
graph G. It was defined as the half-sum of the off-diagonal elements of the molecular
quotient matrix D/:
])/()[(2/11 1
ij
N N
j
ij
i
DWS
The weiner-sum or D/Δ descriptor decreases as the cyclicity of the molecule
increases. Further, they also suggested the inverse Wiener-sum or /D descriptor,
which was defined as the half-sum of the off-diagonal elements of the distance/detour
quotient matrix / D:
])/()[(2/1)/(/1 1
ij
N N
j
iji
i
DDWD
Where Wi is the Wiener operator. These both descriptors have so far been used only for
the structure-boiling point modeling of condensed benzenoid hydrocarbons.
Diudea et al. (1998) defined new Harary-type descriptors on the basis of
detour and Cluj-detour matrices as:
ijMH /12/1
The symbol M stands for detour (Δe and Δp) and Cluj-detour matrices (CJΔe and
CJΔp). Additionally, the usefulness of Cluj descriptors and their Harary counterparts
in modeling of physicochemical properties of chemical structures was also
demonstrated.
Gupta et al. (1999) introduced a novel pendent-distance based graph invariant
known as superpendentic descriptor (p) to enhance the role of terminal vertices in
[(Q)SAR/SPR] studies. This descriptor was defined as the square root of the summation
Review of Literature
74
of products of non-zero row elements in the pendent matrix which is a submatrix of the
distance matrix. The descriptor can be calculated as:
0.5
( )
1 1
mnP
ij
i j
P
Where m and n are maximum possible number of i and j respectively and the distance P
(vi,vjG) is length of the shortest path connecting vertices vi and vj.
Galvez et al. (2000) introduced novel invariants developed from the physical
model of wave interferences, known as differences of path lengths (DPs) and
demonstrated that the mean global kinetic energy of the electrons can be simply
measured from the overall sum of the inverse of the squares of the differences of
distances between all pairs of vertices of the graph. These invariants were employed to
predict the resonance energies and in the evaluation of biological properties such as
antibacterial activities of a wide set of heterogeneous compounds.
Gutman et al. (2000) described novel molecular structure descriptor called
multiplicative Wiener descriptor, (G) defined as the product of the distances of all
pairs of vertices of the underlying molecular graph. It can be calculated as:
)(,
)(
GVji
ijdG
Due to the very large numbers that are often reached by (G), its logarithmic version
i.e. lnπ(G), seems to be more appropriate in searching for (Q)SAR/QSPR models
(Todeschini and Consonni, 2009).
Balaban et al. (2000) conceptualized reverse Wiener matrix (RW) by
subtracting all topological distances from the graph diameter, with zeros as diagonal
elements. New integer-number graph invariants ζi was obtained by summing over
rows or column in this matrix. The half sum of ζi is also a novel topological
descriptor known as reverse Wiener’s descriptor Λ. The general formula for
calculating reverse Wiener’s descriptor Λ is:
WdNNRWn
j
ij
n
i
n
i
i
)1(5.05.0111
Where [RW]ij are the elements of reverse Wiener matrix.
Review of Literature
75
They demonstrated that the descriptor value of both W and Λ increases as the size of
graphs increases. However, the value of W increases sharply in comparison to the
value of Λ for strongly branched graphs whereas the value of Λ increases sharply in
comparison to the value of W for lesser branched graphs such as the linear graphs.
Li et al. (2000) defined new quantum-topology descriptor by modifying the
base of molecular connectivity descriptor i.e. δ values. The novel quantum-topology
descriptor was expressed in terms of modified molecular connectivity descriptor as:
1
1
5.0
1
m
i
ji
n
j
tm
Where n and m are the number and the rank of subfigures respectively, δ is the
atomic delta value, and t is the type of subfigure.
The quantum-topology descriptor showed improved ability in correlating
molecular features with the force constant, bonding energy and radius of a
heterogeneous monohydrides as compared to the original descriptor. The advantage of
having topological as well as quantum parameters was proposed as obvious reason for
the improved correlating ability of the novel quantum-topology descriptors.
Espeso et al. (2000) conceptualized molecular descriptors kZ, based on kZ
matrices i.e. second decomposition of Hosoya Z matrix. The kZ descriptor was
defined as:
ji
ijkk ZZ )(
Where the summation runs over all off-diagonal entries kZij in the upper triangle of the
kZ matrix of T.
Khadikar et al. (2001a; 2001b) described a novel descriptor namely
Padmakar-Ivan Descriptor and abbreviated as PI Descriptor. The discriminating power
of the descriptor is comparable to that of Wiener and Szeged descriptors and is simple
to calculate. It can be defined as:
)(
)(
GEe
ejei GenGenGPIPI
Where nei (e|G) is the number of edges closer to vertex i than j and nej (e|G) is the
number of edges closer to j than to i. The summation goes over all edges of G. This
Review of Literature
76
descriptor does not coincide with the Wiener descriptor for acyclic trees. Thus, unlike
the Sz descriptor, the PI descriptor is different for acyclic and cyclic graphs.
Randic et al. (2001a) proposed the variable Balaban J descriptor, the "reversed"
Balaban descriptor l/J and a novel descriptor 1/JJ based on J and 1/J. The distance
matrix and the "reversed" distance matrix were the basis of all the variable descriptors.
The "reversed" distance matrix was constructed from the distance matrix by replacing
the diagonal zeroes with the variables x, y, z (Randic and Pomp, 2001).The “reversed”
Balaban descriptor 1/J can be expressed as:
n
ij
jc
ic
n
i
SSC
BJ
1
5.01
11
/1
Where cSi are the row sum of the distance complement or reversed distance matrix.
Estrada and Molina (2001) conceptualized novel molecular invariants based
on local spectral moments of the bond matrix. Local spectral moments were defined as
the sum of diagonal entries of the different powers of the bond matrix corresponding
to a given molecular fragment. Mathematically, spectral moments of the bond matrix
were expressed as follows:
k
f
i
iik ef )()(
1
Where f is the corresponding fragment for which the moments are defined and the
sum is carried out over all bonds forming the fragment f. The elements (eii)k are the
diagonal elements of the kth
power in the bond matrix.
Balaban et al. (2001) derived a general formula W=d/[3(v+0)] for the
normalized Wiener descriptor of polymers. It made possible the calculation of the graph
invariant directly from simple structural information: the number of atoms v, the
number of rings (0) in the repeating polymer, and the topological distance d between the
corresponding pairs of equivalent atoms in two neighboring monomer units. The
reciprocal relationship with the similar descriptor J was pointed out and an approximate
hyperbolic dependence is presented between these two descriptors.
Ivanciuc et al. (2001) introduced novel MDs by separating the terms of the
Wiener‟s polynomial into even and odd molecular graph distances. The even and odd
Wiener‟s polynomial sums WiPolE(x) and WiPolO(x) were used as descriptors in
Review of Literature
77
(Q)SAR/QSPR models. The sum of Wiener polynomial terms corresponding to even
graph distances and the sum of the terms corresponding to odd graph distances were
defined as:
k
evenk
k XfXWiPolE .)(,
k
oddk
k YfYWiPolO .)(,
Where kf is the number of pairs of vertices located at a topological distance equal to k,
and the summations go up to the maximal distance in the graph; X and Y are two
independent variable parameters optimized during the modeling procedure.
Bonchev (2001b) extended the approach of overall connectivity to overall
distances OW(G) for characterization of molecular structure. The overall Wiener
number OW(G) of any graph G was defined as the sum of the Wiener numbers Wi(Gi)
of all K subgraphs of G:
)()(
1
GGWGOW i
k
i
i
This descriptor was also defined as in the eth
-order terms, eOW(G), e represents the
number of edges in the sub graph. The topological complexity in acyclic and cyclic
structures was measured in terms of overall distance.
Randic and Zupan (2001) modified the Wiener descriptor-W to W* and
Hosoya descriptor-Z to Z*. In the modified Wiener descriptor-W*, bond contributions
were determined using the reciprocal of the product of the number of atoms on each
side of a bond as:
en
e
ejei NNW
1
1,
1,*
Where Ni,e and Nj,e denote the number of vertices on each side of the edge e, including
vertex i and vertex j, respectively, and ne represents the total number of graph edges.
Similarly, Hosoya descriptor Z was modified to Z* by considering the frequency of
occurrence of carbon-carbon bonds in the patterns of disjoint bonds as:
kk
xkGaxxZ ,....),(* 21
Where xk are integer weights representing the number of times each edge has
appeared in all disjoint edge patterns. Randic and Zupan (2001) outlined a general
Review of Literature
78
scheme for partitioning of MDs into bond contributions for descriptors derived from a
selection of matrices associated with molecular graphs.
Cao and Yuan (2001) conceptualized a set of three novel MDs known as
VDI, OEI and RDI based upon distance, vertex and ring (in cyclic compounds. These
descriptors were defined as:
1. Vertex degree-distance descriptor (VDI):
Nn
i
ifVDI
/1
1
Where fi is the elements of vector (1xN) of the vector VS is derived by multiplying the
vertex degree matrix (V) and the derivative distance matrix (S):
2. The odd-even descriptor (OEI):
n
i j
DSOEI ij
1 1
1
)1(
Where n represents the number of vertices in molecular graph and S denotes the
derivative matrix of distance matrix D, whose elements are the squares of the
reciprocal distances.
3. Ring degree-distance descriptor (RDI):
Nn
i
igRDI
/1
1
gi is the elements of vector (1xN).
The usefulness of these descriptors was demonstrated through QSPR models for
prediction of boiling points of acyclic, monocyclic, and polycyclic alkanes (n=343).
Mercader et al. (2001) investigated the applications of TDs based on distance
and detour distance matrices. They employed some usual TDs based on both these
distances to investigate the heat of formation of a set of structurally diverse
hydrocarbons. Surprisingly, TDs based on detour matrix yielded better correlations to
predict enthalpies of formation.
Ivanciuc and Klein (2002) introduced efficient algorithms for the computation
of several distance based topological descriptors of a molecular graph from the distance
invariants of its subgraphs. The procedure utilized vertex- and edge- weighted
Review of Literature
79
molecular graphs that account for the multiple bonds as well as the presence of
heteroatoms in the organic compounds.
Gupta et al. (2002b) conceptualized a novel distance based topological
descriptor termed as eccentric distance sum descriptor. Eccentric distance sum, denoted
by DS
, can be defined as the sum of product of eccentricity and distance sum of each
vertex in the hydrogen depleted molecular graph as per the following equation:
1
( ) *n
DSi i
i
G E S
Where Si and Ei are distance sum and eccentricity of vertex i respectively in graph G
having n number of vertices. Eccentric distance sum takes into consideration the
eccentricity and distance sum of all vertices in the graph.
Cao and Yuan (2002) conceptualized a novel topological descriptor VDI(±1)
based on Vertex degree–distance descriptor (VDI to distinguish the cis/trans isomers
of cycloalkanes. They substituted the derivative matrix S with Dmod to obtain the
VDI(±1).This new structural descriptor showed better QSPR results than with VDI.
Castro et al. (2002) reported some upgraded version of the Wiener’s
descriptor: a) by using sum of the bond lengths along the shortest path instead of
graph theoretical distance; b) by using Euclidean distance between the respective
pairs of atoms or c) by using hydrogen filled graphs. However, none of the
theoretically justifiable, modifications of the Wiener descriptor improved the
applicability and value of these structure- descriptors in designing quantitative
structure-property relations. They concluded that the original Wiener descriptor - now
already more than half-a-century old is a much more valuable topological descriptor
than one would expect from its extremely simple and seemingly naive definition.
Milicevci et al. (2003) extended the Z descriptor to general graphs and
investigated its behavior with regard to different structural characteristics of graph such
as branching, cyclicity, size, loops and multiple edges etc. Z counting polynomial and
the matching polynomial are used to calculate this descriptor. These polynomials could
be generated using proper recurrence relations. The structural behavior of the Z
descriptor for simple graphs and general graphs was tested against the total walk count
descriptor (twc) and it was found that the Z descriptor followed the structural changes,
Review of Literature
80
i.e. the value of the descriptor increased with the loops, multiple edges, size, cycles and
branching.
Hu et al. (2003) proposed a new variable descriptor, external factor variable
connectivity descriptor (EFVCI), in which the atomic attribute was divided into two
parts i.e. internal and external attributes. Along with atomic attributes, the form of
molecular connectivity descriptor was used to get the external factor variable
connectivity descriptor:
edgesAll
njiEFVCIP
F AAA5.0
....
Where Ai is the attribute of carbon atom perturbed by other atoms.
This kind of descriptor can be regarded as an extension of the molecular connectivity
descriptor by using a new atomic attribute, which makes the descriptor flexible to
different properties.
Narumi (2003) defined two novel topological descriptors, based on the partition
function of a graph for analyzing the statisco-mechanical aspect of the Hosoya
descriptor. These TDs were termed as the bond descriptor B, and the connective
descriptor C.
The bond descriptor B was defined as:
1,,)(
0
GSkGnGBm
k
Where n (G,k) represents the number of different ways in which k bonds are selected
from graph G.
The connective descriptor C was defined as:
1,,)(
0
GRkGqGCm
k
Where m represents the maximum number of „k’ for molecular graph G.
Duchowicz et al. (2003) used the distance and detour matrices based TDs to
calculate the Gibbs free energy for a set of 60 hydrocarbons. The distance matrix
considers the shortest path between any two vertices whereas the detour matrix
considers longest path between any two vertices. The results showed that the TI derived
from detour matrix produces better correlation to predict Gibbs free energy. They
Review of Literature
81
concluded that the detour matrix is an appropriate topological tool to be applied in
[(Q)SAR/SPR] analysis.
Yuan and Cao (2003) developed the Edge degree-Distance Descriptor (EDI)
and Sum of edges (Se) based on the edge and distance of molecular graph in order to
distinguish saturated and unsaturated structures. The EDI was defined as:
Nn
i
ihEDI
/1
1
Where hi is the elements of vector ES (1xN) obtained by edge degree vector E
multiplying derivative distance matrix S:
ES= (h1 , h2 ,….. hn )
The Sum of edges (Se) equals to the half sum of the edge degrees (Ei) of each vertex in
molecular graph as:
n
i
ie ES
1
5.0
Yuan and Cao (2003) suggested that the combination of these descriptors together
could represent the molecular structures not only of alkanes but also of alkenes,
alkynes, and benzenoid hydrocarbons.
Randic (2004) described the construction of a novel MD, called the Wiener-
Hosoya descriptor (W-H), in view of its structural relationship to both the Wiener
number W and the Hosoya topological descriptor Z. This descriptor was expressed as:
W-H(G) = W(G) + W(G-ee)
Where W is the Wiener descriptor and W(G-ee) is a Wiener-type descriptor calculated
by summing the Wiener descriptors relative to the subgraphs obtained with deletion
of each edge and all incident edges to it, following an analogous approach to the
Hosoya Z descriptor calculation.
Klein et al. (2004) introduced 3D-topological distance based (3D-TDB)
descriptors by relating Euclidean to topological distances. The descriptors were tested
with three different data sets: the benchmark steroids, a well characterized
benzodiazepine set, and a set of β-cyclodextrin inclusion compounds. The predictive
abilities of models obtained with 3D-TDB descriptors were reported to be in good
agreement with those obtained from other 3D-(Q)SAR methods.
Review of Literature
82
Gutman et al. (2004) developed modified Wiener descriptors mWλ as:
e
m enenTW )]()([)( 21
Where λ is a parameter that may assume different values. Clearly, for λ = +1 the
modified Wiener descriptor mWλ reduces to the ordinary Wiener descriptor W.
Bajaj et al. (2004a) conceptualized a modification of Wiener's index, termed as
Wiener's topochemical descriptor (Wc) based on topochemical distance matrix:
1 1
1
2
n n
c c ji j
W Pi i
Where Picjc denotes the chemical path length in the graph G, n is highest possible
number of i and J. Wiener’s topochemical descriptor was derived from the weighted
molecular graph whose vertices were properly weighted with selected
chemical/physical property. It was not only sensitive to the nature, number, and relative
position of heteroatom but also exhibited far less degeneracy as compared to Wiener’s
descriptor
Lu et al. (2006) conceptualized a novel MD called Lu descriptor which is
based on Wiener descriptor, for modeling properties of heteroatom and multiple bond
containing organic compounds. The Lu descriptor was defined as:
n
i
n
j
ij
n
k
vk SqnLu
1 11
5.0log
Where qk is the relative electronegativity value of vertex k, Sij represents the sum of v
power of the relative bond lengths between two adjacent vertices in the shortest path
between the vertices i and j and n represents the number of vertices in a molecular
graph G.
Diudea (2006) conceptualized a new counting polynomial, called the
“Omega” Ω (G,x) polynomial on the ground of quasi-orthogonal cut “qoc” edge strips
in a bipartite lattice. The Omega Ω (G,x) polynomial qoc counting was defined as:
c
c
xcGmxG ,,
Review of Literature
83
Where m(G, c) represents the number of qocs of length c. The summation runs up to
the maximum length of qocs in G. The polynomial is an elegant form of topological
description of lattice graphs. It is related to the well-known PI descriptor.
Balaban et al. (2007) developed five new TDs based on the number of paths
pi with length increasing from i = 1 (i.e. the number of edges) to the maximal value of
pi, which form the molecular path code. These descriptors were defined as:
1. Quadratic descriptor (Q) : Q = ipi2/(+1)
2. Descriptor S : S= ipi1/2
/(+1)
3. Path count descriptor : P = i{pi1/2
/[i1/2
(+1)]}
4. Distance-reduced descriptor D : D= i{pi1/2
/[i(+1)]}
5. distance-Attenuated descriptor A : A = i{pi/[i(+1)]}
Among these descriptors, path-count descriptor P was found to be least degenerate
and also showed best biparametric correlation with normal boiling point of alkanes.
Zhou and Trinajstic (2008) described lower bounds for the Kirchhoff
descriptor (Bonchev et al., 1994) in terms of its structural parameters viz. the number
of edges (bonds), the number of vertices (atoms), maximum vertex degree (valency),
connectivity and chromatic number etc. The bounds of a descriptor furnish important
information of a molecule (graph) as they establish the approximate range of the
descriptor in terms of molecular structural parameters (Zhou and Trinajstic, 2008).
Iranmanesh et al. (2009) defined edge version of well known Wiener
descriptor as:
)(},{
00 ),(5.0)(
GEfe
e fedGW
Where d0 (e, f) = d1{e, f)+1 if e ≠ j
= 0 if e = j
Here, the distance between the corresponding vertices is the distance between two
edges in the graph G.
Mahmiani et al. (2010) introduced the total version of Szeged descriptoras:
)()()(
)(
vtutGSz
GEuve
eeT
Review of Literature
84
Where Te(u) represents the number of vertices and edges of G closer to u than to v
and Te(v) represents the number of vertices and edges of G closer to v than to u. The
computation of this novel descriptor was exemplified for some well-known graphs
and in particular for zigzag nanotubes.
Goyal et al. (2010) described a novel pendenticity based topochemical
descriptor termed as pendentic eccentricity descriptor expressed as:
n
i
m
j
iijp Ep
1 1
2/
Where P(ij) is the path length containing the least number of edges between vertices i
and j in graph G; Ei is the eccentricity of a vertex vi in G and n is the maximum
numbers of i and j.
Diudea et al. (2011) investigated the uniqueness (discriminating ability) of a
newly proposed CJN super descriptor using (real) atomic and synthetic structures.
This new descriptor distinguished all graphs uniquely and some MDs which are
embedded in the super descriptor have shown excellent correlating ability with
alkanes properties.
Alaeiyan and Asadpour (2011) proposed revised version of well known
Szeged descriptor of molecular graph G as:
)(}{
* )],(2
1),([)],(
2
1),([)(
GEe
Z uvouvnvuovunGS
Where n(u, v) and o(u,v) denotes the number of vertices that are closer to u than to v
and the number of vertices of the same distance from u and from v, respectively. They
demonstrated the computation of revised Szeged descriptor for bridged graphs.
Dong et al. (2011) proposed a novel version of the edge-Szeged descriptor in
parallel to the revised (vertex) Szeged descriptor. The revised edge-Szeged descriptor
was defined as:
)(}{
* )]2/),(),([]2/),(),([)(
GEe
GGGGe uvnuvnvunvunGSz
Where nG(u,v) are the number of edges equidistant from both ends of e = u,v )(GE .
The lower and upper bounds were also demonstrated by them for this MD for various
graphs.
Review of Literature
85
Iranmanesh et al. (2011) proposed a new version of the hyper-Wiener
descriptor as:
)()()(2
GWGWGWW deieiei
where
)(},{
2 4,0),,(5.0)(2
GEfe
id
ei ifedGW
The calculation of this edge version of hyper-Wiener descriptor was exemplified on
some well-known graphs such as path, cycle, complete graphs.
Deng (2011) introduced a novel variant topological descriptor for molecular
graphs, called sum-Balaban descriptor. For a simple and connected graph G with
vertex-set V (G) and edge-set E(G), sum-Balaban descriptor was defined as:
GEij
ji SSC
BGSJ
5.0
1)(
Where Si and Sj are the distance sum of the vertices i and j respectively, B the number
of graph edges, and C the cyclomatic number, that is, the number of rings. The
predictive ability of this descriptor was investigated through QSPR modeling of some
physiochemical properties of octanes.
Bruckler et al. (2011) deduced a new class of distance-based molecular
structure descriptors i.e. Q-descriptors with an aim to eliminate a general shortcoming
of the Wiener and Wiener-type descriptors, namely that the greatest contributions to
their numerical values come from vertex pairs at greatest distance. The Q-descriptor
may be represented by the following:
)(},{
,
GVvu
vuQ
Where γ(u,v) depends solely on the distance d(u,v) between the vertices u and v.
Q-descriptor was also related with the Hosoya polynomial as:
,2 GHQ
The multiplier 2 comes from the fact that each pairwise interaction has been counted
twice. Thus Q is an additive function of increments associated with pairs of vertices
of G.
Review of Literature
86
Zhang et al. (2012) reported q-analogs of Wiener descriptor motivated by the
theory of hypergeometric series. Some possible chemical interpretations and
applications of the q-Wiener descriptors were also discussed.
Doslic and Reti (2012) investigated discriminating potential of traditional
degree-based descriptors and proposed a novel T(G) descriptor characterized by an
improved discriminating potential and reduced degeneracy. The T(G) descriptor was
expressed as:
1)(),(max
)(),(min
)(
1)(
)(}{ 21
21 GVu
umum
umum
GVGT
This descriptor was judged to be more efficient for discriminating between
topological structures of molecular graphs than several traditional molecular
descriptors.
Adjacency-cum-distance based graph invariants:
These descriptors employ distance matrix as well as adjacency matrix to characterize a
molecular graphs as these descriptors combine the information of both adjacency matrix
as well as distance matrix, hence, these MDs contain considerably more topological
information than the other MDs derived from only single matrix.
Sharma et al. (1997) conceptualized a novel, distance-cum-adjacency based
MD termed as eccentric connectivity descriptor (ECI) and defined as the summation
of the product of eccentricity and degree of each vertex in hydrogen depleted
molecular graph. It can be expressed as:
n
i
ii
c VE1
)*(
Where n is the total number of vertices, Ei is the eccentricity and Vi is the degree of
vertex in graph G. The descriptor was successfully used for mathematical models of
biological properties of diverse nature i.e anticonvulsant activity (Sardana and Madan,
2002b), CDK-1 inhibitory activity (Lather and Madan, 2005a), genotoxicity (Mosier
et al., 2003; Linnan et al., 2005), anti-HIV activity (Kumar et al., 2004; Lather and
Madan, 2005b; Dureja and Madan, 2009), anti-inflammatory activity (Gupta et al.,
2002a), Diuretic activity (Sardana and Madan, 2001), phospholipase A2 inhibitory
activity (Kumar and Madan, 2006), glycogen synthase kinase-3 inhibitory activity
Review of Literature
87
(Kumar and Madan, 2005), carbonic anhydrase inhibitory activity (Kumar and
Madan, 2007), anti allergic activity (Kumar and Madan, 2007b), adenosine receptors
binding activity (Lather and Madan, 2004, Kumar and Madan, 2007c). The
mathematical properties of ECI have also been investigated extensively in the recent
past (Ilic, 2010; Doslic et al., 2010; Zhou and Du, 2010; Moradi and Baba-Rahim,
2013).
Ren (1999) developed a novel topological descriptor based on adjacency matrix
and distance matrix. It was denoted as Xu descriptor and was claimed to have high
discriminant power particularly for molecular size and branching. It is defined as:
2
1
1
*
* log * log
*
A
i ii
i A
i ii
Xu A L A
Where A represents the number of atoms and L represents the valence average
topological distance calculated by vertex degree δ and vertex distance degree ζ of all
the atoms. The Xu descriptor has better discriminatory power of alkane isomers and is
very simple to calculate. The Xu descriptor promises to be a useful parameter in the
(Q)SAR/QSPR studies.
Gupta et al. (2000) conceptualized and developed adjacency-cum-path length
based topological descriptor termed as connective eccentricity descriptor (Cξ). It is
defined as the sum of the ratios of the degree of a vertex (Vi) and its eccentricity (Ei) for
all vertices in the hydrogen depleted molecular graph. It can be expressed as:
1
/n
i ii
C V E
The discriminating power and sensitivity of connective eccentricity descriptor was
found to be better than that of well-known Balaban’s mean square distance (MSD)
descriptor. The utility of connective eccentricity descriptor in structure-activity
studies was investigated by developing models to predict antihypertensive activity of
81 derivatives of N-benzylimidazole. The results obtained using connective
eccentricity descriptor was reported to be better than those obtained using Balaban’s
MSD descriptor.
Review of Literature
88
Gupta et al. (2001) conceptualized a new descriptor - eccentric adjacency
descriptor (A) for characterization of molecular structure and can be simply
calculated from a modified adjacency matrix termed as additive adjacency matrix.
Eccentric adjacency descriptor was defined as the summation of ratios, of sum of the
degrees of adjacent vertices and eccentricity of the concerned vertex, for each vertex
in the hydrogen suppressed molecular structure:
1
nA i
iiE
Where ζi is sum of valence values of all the vertices adjacent to the concerned vertex in
a hydrogen depleted molecular graph, n represents the total number of vertices and Ei
represents the eccentricity of the vertex i in graph G. This descriptor was found to be
more sensitive compared to the first order molecular connectivity descriptor.
Ren (2002a) conceptualized novel atom-type AI topological descriptors based
on the distance matrix and adjacency matrix of a graph to code the structural
environment of each atomic type in a molecule. The topological descriptor for any
atom type i in molecular graph, AIi, was defined as:
1iAI
iiii svsv /2
Where parameter ɸ is considered as a perturbing term of ith atom, reflecting the
effects its structural environment on its AIi value; vi is the vertex degree and si is the
distance sum. The efficiency of the Xu descriptor and AI descriptors was verified by
high quality QSPR/(Q)SAR models obtained for several physical properties and
biological activities of several data sets of alcohols with a wide range of non-
hydrogen atoms.
Sardana and Madan (2002a; 2002b) conceptualized a novel adjacency-cum-
distance based topological descriptor known as adjacent eccentric distance sum
descriptor (ξSV
). It was defined as the sum of the values of product of distance sum
and eccentricity and divided by the degree of the corresponding vertex for each vertex
in a hydrogen depleted molecular graph having n vertices as:
Review of Literature
89
i
n
i
iiSV VES /
1
Where Si, Ei and Vi are the distance sum, eccentricity and degree of vertex i in graph
G respectively. The adjacent eccentric distance sum descriptor exhibited very low
degeneracy. The discriminating power of adjacent eccentric distance sum descriptor
was found to be much superior to that of the eccentric connectivity descriptor.
Quigley and Naughton (2002) modified the eccentric adjacency descriptor
(Gupta et al., 2001) and proposed valence eccentricity adjacency descriptor by
substituting simple connectivity value with vertex valence value. The valence
eccentric connectivity descriptor can be easily calculated from additive valence
adjacency and distance matrices by using following equation:
n
i
ivi
v ES
1
/
Where viS is the sum of vertex valences and Ei is eccentricity of vertex i.
The vertex valences (incorporating the superscript v to allow for calculations
involving multiple bonding and heteroatoms) were defined as follows:
ivv
i hZ
Where Zv is the number of valence electrons of the vertex (atom) and hi is the number
of hydrogen atoms attached to it.
Quigley and Naughton (2002) also derived another descriptor (Δξ) in a
manner analogous to the differential molecular connectivity descriptor (Δm
χ). This
differential eccentric adjacency descriptor (Δξ) was expressed as:
Δξ = ξ - ξv
They envisaged that this descriptor will be useful in encoding further information
which may be employed in structure/activity studies.
Gupta et al. (2003) developed three novel eccentric adjacency topochemical
descriptors i.e. eccentric adjacency topochemical descriptor-1(1cA), eccentric
adjacency topochemical descriptor-2(2cA) and eccentric adjacency topochemical
descriptor-3(3cA). These MDs can be represented as:
Review of Literature
90
1.
n
i ic
icAc
E
S
1
1
2.
n
i ic
iAc
E
S
1
2
3.
n
i i
icAc
E
S
1
3
Where Ei is the eccentricity and Si is the distance sum of concerned vertex i, Eic is the
chemical eccentricity and Sic is the chemical distance sum of concerned vertex i, n
represents the number of vertices in the hydrogen suppressed graph.
These descriptors were found to be sensitive towards small change in molecular
structure and showed high discriminating power with regard to anti-HIV activity of
HEPT derivatives.
Kumar et al. (2004) refined eccentric connectivity descriptor to improve its
degeneracy and made it sensitive towards the presence and relative position of
heteroatom(s). The refined eccentric connectivity descriptor, termed as eccentric
connectivity topochemical descriptor (cc) overcomes the limitations of eccentric
connectivity descriptor by exhibiting very low degeneracy and displaying sensitivity to
both the presence and relative position heteroatom(s) without compromising with the
discriminating power of eccentric connectivity descriptor. It was defined as the sum of
the product of chemical eccentricity and the chemical degree of each vertex in the
hydrogen depleted molecular graph. It can be expressed as:
cc
n
iicic VE
1
Where Eic is the chemical eccentricity and Vic is the chemical degree of vertex i. n
represents the number of the vertices in graph G. The values of eccentric connectivity
topochemical descriptor were computed using topochemical adjacency matrix (Ac) and
topochemical distance matrix (Dc).
Bajaj et al. (2004b) conceptualized a new adjacency-cum-distance based
topochemical descriptor with high discriminating power, known as superadjacency
topochemical descriptor (AC). It was defined as the sum of the quotients of the
Review of Literature
91
product of concerned vertex chemical degree and sum of adjacent vertex chemical
degrees; and chemical eccentricity of the concerned vertex, for each vertex of the
hydrogen depleted molecular graph. It was represented as:
n
i ic
icicAC
E
SvG
1
deg
Where Sic represents the sum of chemical degrees of all vertices (vj), adjacent to vertex
i and n is the number of vertices in graph G. The discriminating power of
superadjacency topochemical descriptor was m far superior as compared to distance
based Wiener‟s descriptor and adjacency based molecular connectivity descriptor.
This descriptor has been successfully utilized for the modeling anti-HIV activity
(Bajaj et al., 2005c; 2005d) and anti-tumor activity (Bajaj et al., 2005b).
Shamsipur et al. (2004a, 2004b) proposed some new topological descriptors
(Sh descriptors) based on the distance sum (Si) and connectivity ( vi ) of a molecular
graph that derived directly from 2D molecular topology for use in (Q)SAR/QSPR
studies. These are a set of descriptors calculated by different combination of distance
sum and connectivity vectors:
b
vj
vi
ji
b
SSSh
*
*log1
bji
vj
vi
b SSSh
*
*log2
2/1
3 **log
b
vj
viji
bSSSh
2/1
4*
*log
bji
vj
vi
b SSSh
2/1
5 **
b
vj
viji
bSSSh
b
vj
viji
bSSSh **log6
bji
vj
vi
bSSSh *log*7
Review of Literature
92
vSSh T log8
n
i
n
j
ijSdSh
1 1
9 log
SdSpSh maxlog10
1ShNNSh CC
Where S is the column vector collecting the distance sum, v the column vector
collecting the valence vertex degrees, and Sd the square A*A matrix obtained by the
inner product of the two vectors Si and v. In the Sh descriptors 1-7, the summations
run over all the adjacent vertices. Sh9 descriptor is the sum over all the entries of the
Sd matrix, whereas Sh10descriptor is the logarithm of its highest eigenvalue; Sh
descriptor was derived from the descriptor Sh1, including the number of carbon atoms
NC to account for molecular size.
Bajaj et al. (2006) developed a highly discriminating TD, termed as
augmented eccentric connectivity descriptor (Aξ
c). It was defined as the sum of the
quotients of the product of adjacent vertex degrees and eccentricity of the concerned
vertex, for each vertex in the hydrogen depleted molecular graph and expressed as:
n
i i
icA
E
M
1
Where n represents the number of vertices, Ei is the eccentricity and Mi is the product
of degrees of all vertices (vj) adjacent to vertex i in graph G. The augmented eccentric
connectivity descriptor was having superior discriminating power than the distance
based Wiener’s descriptor and adjacency based molecular connectivity descriptor.
Moreover, this descriptor exhibited very low degeneracy and predicted the anti-HIV
activity of 2-pyridinone derivatives with an accuracy of 89% (Bajaj et al., 2006).
Dureja et al. (2008) conceptualized three new generation descriptors, termed
as: superaugmented eccentric connectivity descriptor-1 cSA
1 , superaugmented
eccentric connectivity descriptor-2 cSA
2 and superaugmented eccentric connectivity
descriptor-3 cSA
3 . These can be defined as the sum of the quotients of the product of
Review of Literature
93
adjacent vertex degrees and power of concerned vertex, for each vertex in the hydrogen
suppressed molecular graph and expressed as:
1
nSA c i
N Ni i
M
E
Where Mi is the product of degrees of all the vertices (vj), adjacent to vertex i, Ei is the
eccentricity, and n is the number of vertices in the graph. N is equal to 1, 3, 4 for cSA
1 ,
cSA
2 , cSA
3 respectively.
Dureja et al. (2008) also proposed the topochemical version of
superaugmented eccentric connectivity descriptors termed as: superaugmented
eccentric connectivity topochemical descriptor-1 cSAc
1 , superaugmented eccentric
connectivity topochemical descriptor-2 cSAc
2 , and superaugmented eccentric
connectivity topochemical descriptor-3 cSAc
3 . These can be defined as the sum of the
quotients of the product of adjacent vertex chemical degrees and powered chemical
eccentricity of concerned vertex, for each vertex in the hydrogen suppressed
molecular graph and can be expressed as:
1
nSA c ic
N Ni ic
M
E
Where Mic is the product of chemical degrees of all the vertices (vj), adjacent to vertex i,
Ei is the chemical eccentricity of concerned vertex, and n is the number of vertices in
the graph. N is equal to 2, 3, 4 for cSAc
1 , cSAc
2 , cSAc
3 respectively.
The (Q)SAR models based on superaugmented eccentric connectivity topochemical
descriptors predicted anti-HIV-1 activity of 6-arylbenzonitriles with high degree of
accuracy.
Dureja et al. (2009) defined four novel TDs termed as superaugmented
pendentic descriptors ( 1PSA
, 2PSA
, 3PSA
and 4PSA
) defined as the summation of
quotients, of the product of non-zero row elements in the pendent matrix and product
Review of Literature
94
of adjacent vertex degrees; and Nth
power eccentricity of the concerned vertex, for all
the vertices in a hydrogen suppressed molecular graph. It can be expressed as:
n
i
n
jNi
iSA
E
MPij
1 1
NP
Where P(ij) is the length of the path that contains the least number of edges between
vertex i and vertex j in graph G; Mi is the product of degrees of all vertices (vj),
adjacent to vertex i. The eccentricity Ei of a vertex vi in G is the path length from
vertex i to the vertex j which is farthest from vi (Ei = max d (vi ,vjG) and n is the
maximum possible numbers of i and j. The value of N is equal to 1, 2, 3 and 4 for
1PSA
, 2PSA
, 3PSA
and 4PSA
respectively. These descriptors exhibited high sensitivity
towards branching, high discriminating power and extremely low degeneracy.
Dutt and Madan (2010) proposed new generation superaugmented eccentric
connectivity descriptors (denoted by: SA
c4,
SA
c5,
SA
c6 and
SA
c7) along with their
topochemical versions (denoted by: SAc
c4,
SAc
c5,
SAc
c6 and
SAc
c7) for the purpose of
(Q)SAR/QSPR modeling. These descriptors can be expressed as:
n
iNi
icSA
E
M
1
2
Where Mi is the product of degrees of all the vertices (vj), adjacent to vertex i; Ei is the
eccentricity; n is the number of vertices in the graph and the N is equal to 1, 2, 3 and 4
for SAc
c4,
SAc
c5,
SAc
c6 and
SAc
c7 respectively. These descriptors exhibited high
discriminating power and very low degeneracy.
Todeschini and Consonni (2010) proposed different kinds of novel local
vertex invariant (LOVIs), based on a multiplicative form of some reported vertex
degrees. In addition to above, they derived different kinds of MDs from each of the
LOVIs in analysis. Some of these MDs defined in terms of LOVIs (L) and derived
from the eigenvalues (λ) was expressed as:
1. Sum like descriptors
n
i
iLS
1
2. First Zagreb like descriptors
n
i
iLM
1
21
Review of Literature
95
3. Second Zagreb like descriptors )(1
1 1
2
n
i
n
ij
jiij LLaM
4. Randic like descriptors
1
1 1
5.01 )(n
i
n
ij
jiij LLa
5. Leading eigenvalues SpMax (V,1) = maxi {λi}
6. Estrada like descriptors
n
i
ieVEE
1
)1,(
7. Hosoya-type descriptors
n
i
iO CVH
0
)1;(
The utility of proposed descriptors were also investigated through deriving QSPR
models for the series of 18 hydrocarbons with 8 carbon atoms (C8). The MDs based
on the newly defined vertex degrees showed higher prediction ability than that
obtained by the classical vertex degrees.
Ediz (2010) defined a modified version of ECI, called as Ediz eccentric
connectivity descriptor as:
Vv
vcE
VEi
SG
)()(
Where Sv is the sum of degrees of all vertices u, adjacent to vertex v; Ei(v) is the
eccentricity of v. The calculation of this descriptor was demonstrated for nanostar
dendrimers.
Turker et al. (2010) developed a novel MD called the Turker-Gumus
descriptor (TG descriptor) by utilizing the concepts of both, the connectivity and the
path distances in defining this novel MD as:
m
ij
ij
n
ij
ij ddTG
Where ijd are the elements of distance degree matrix. The n and m are the number of
atoms in starred and unstarred set respectively.
Zhou and Du (2010) described mathematical properties of the eccentric
connectivity descriptor (ECI). They established various lower and upper bounds for
the ECI in terms of other graph invariants including the number of vertices, the
number of edges and the degree distance.
Review of Literature
96
Goyal et al. (2011) proposed four refinements of eccentric distance sum
topochemical descriptor termed as augmented eccentric distance sum topochemical
descriptors 1-4 ( ADSc1 , ,2
ADSc ADS
c3 and )4ADSc so as to significantly augment
discriminating power and to reduce degeneracy. These MDs were defined as:
ic
n
i
icADSc SE
1
21 , ic
n
i
icADSc SE
1
32 ,
n
i
icicADSc SE
1
23 ,
n
i
icicADSc SE
1
34
Where Sic is chemical distance-sum of vertex i, Eic is chemical eccentricity of vertex i
and n is the number of vertices in graph G. These MDs were successfully utilized for
developing models for prediction of anti-tumor activity of bisphosphonates.
Das and Trinajstić (2011) compared the relationship between ECI and
Zagreb descriptors (M1 and M2) for chemical trees. Besides chemical trees, molecular
graphs were also treated and the value of ECI was found greater than Zagreb
descriptor-M1 for diameter greater than or equal to 7.
Gupta et al. (2011) conceptualized highly discriminating superaugmented
eccentric distance sum connectivity descriptors as fourth generation MDs. The
topochemical versions of these MDs (denoted by: cc
SED1 , c
cSED
2 , cc
SED3 and c
cSED
4 ) was
expressed by the following:
1
21
icNic
icn
i
ccN
SED
SE
M
Where Mic is the product of chemical degrees of all the vertices (vj), adjacent to vertex
i; Eic is the chemical eccentricity; Si is the chemical distance sum of vertex i and n is
the number of vertices in the graph and the N is equal to 1,2,3,4 for superaugmented
eccentric distance sum connectivity topochemical descriptors-1, 2, 3, 4 (denoted by:
cc
SED1 , c
cSED
2 , cc
SED3 and c
cSED
4 ). These MDs were successfully employed for
development of numerous models for Chk2 inhibitory activity of 2-
arylbenzimidazoles.
Ediz (2012) proposed another modified version of ECI, called as reverse
eccentric connectivity descriptor (REEC) as:
Review of Literature
97
Vvv
icRE
S
vEG
)()(
Where summation goes over all vertices of graph G; Ei(v) represents the eccentricity
of v and Sv denotes sum of degrees of all vertices adjacent to vertex v. The predictive
power of this MD was demonstrated on some physico-chemical properties of octanes.
In addition, basic mathematical properties in terms of lower and upper bounds were
also investigated.
Xu and Guo (2012) devised the edge version of commonly used adjacency –
cum-distance based eccentric connectivity descriptor, ECI. The edge version of ECI
(denoted by: ce ) was defined as:
)(
)()(()(
GEi
ce fVfEG
Where f = ij i.e. an edge in E(G); V(f) is the degree of an edge f and E(f) is its
eccentricity. Various upper and lower bounds were also reported for this MD of
connected graphs in terms of order, size and girth.
Gupta et al. (2011) proposed four highly discriminating fourth-generation
topological descriptors, termed as superaugmented eccentric distance sum connectivity
descriptors, as well as their topochemical versions known as superaugmented eccentric
distance sum topochemical connectivity descriptors.
Dutt and Madan (2012b) conceptualized and developed four novel MDs
termed as superpendentic eccentric distance sum descriptors 1-4 (denoted by: 1PEDS
,
2PEDS
, 3PEDS
and 4PEDS
) along with their topochemical counterparts (denoted by: 1P
c
EDS
, 2P
c
EDS
, 3P
c
EDS
and 4P
c
EDS
). The topochemical version of these MDs can be expressed
as:
n
ji icNic
icicjcP
N
EDS
SE
MP
11
1
Where Picjc represents the chemical path length with least number of edges between
vertices I and j in graph G. Mic is the product of chemical degrees of all vertices (vj),
adjacent to vertex i. Eic is the chemical eccentricity of concerned vertex i, Sic is the
Review of Literature
98
chemical distance sum of vertex i and n is the number of vertices in the hydrogen
depleted graph. N is equal to 1,2,3,4 for 1P
c
EDS
, 2P
c
EDS
, 3P
c
EDS
and 4P
c
EDS
respectively.
The utility of proposed MDs was investigated through development of models for the
prediction of hCRF-1 binding affinity of substituted pyrazines using decision tree and
moving average analysis.
Singh et al. (2013) conceptualized and developed three novel MDs termed as
refined general Randic descriptors along with their topochemical counterparts. The
descriptors can be defined as the summation of the quotients of the inverse of the
product of the degree of each vertex on every edge in the hydrogen-suppressed
molecular graph having n vertices.
Centric graph invariants
The concept of graph center is based on molecular topological distances between the
graph vertices. The graph center can be a single vertex, a single edge, or a single
group of equivalent vertices. The center vertices have the smallest maximal distance
to other vertices.
dijmax
= min for j = 1, 2, ……, p
Invariants derived from the concept of center are called centric graph descriptors and
were proposed to quantify the degree of compactness of molecules by distinguishing
between molecular structures organized differently with respect to their centers.
Centric descriptors are MDs that quantify the degree of compactness of molecules
based on the recognition of the graph center (Todeschini and Consonni, 2009).
Balaban (1979) proposed a set of five new graph invariants classified as centric
invariants on the basis of sequences of numbers obtained by pruning an acyclic graph.
By pruning stepwise all vertices of degree one (δi), a vertex (center) or an edge
connecting two adjacent vertices (bicenter) is obtained. Balaban developed a Balaban
centric descriptor (B), which is defined as:
2i
i
B
This descriptor provides a measure of molecular branching: the higher the value of B,
the more branched is the tree.
Balaban centric descriptor (B) provides a measure of molecular branching: the
higher the value of B, the more branched the tree. It is known as centric descriptor
Review of Literature
99
because it reflects the topology of the tree as viewed from the centre. Four invariants
were devised from B and M1 to differentiate branching from number of vertices by
normalization and binormalization. Normalized centric (C), binormalized centric (C‟)
and quadratic invariants (Q, Q‟) were defined as:
2
2
B n UC
2
2
( 2) 2
B n UC
n U
4 33Q V V 4 32(3 )
( 2) ( 3)
V VQ
n n
Where n is the number of vertices. U = [1 – (-1)N]/2, while V3and V4are vertices of
degree three and four respectively.
Normalization of the topological descriptor is done by imposing the same lower bound
(regardless of n) for all graphs which is equal to zero for the least branched (linear) tree
on all the graphs. In order to find the normalized quadratic descriptor one is required to
find the quadratic function of the general form. It was found that the centric invariants
parallel the ordering induced by descriptor B, while the quadratic invariants induced
ordering which parallel those due to Gutman et al.‟s descriptor M1 and Gordon-
Scantlebury descriptor N2
Bonchev et al. (1980) generalized the graph center concept known as polycentre
for any connected cyclic or acyclic graph based on topological distance matrix. The
centric invariants were used to determine correlations, differentiating isomeric chemical
structures and for coding and computer processing of chemical structures.
The descriptors termed as Bonchev centric information descriptors were derived
from distance matrix D and edge distance matrix ED. One of them along with its
corresponding generalized centric information descriptor is being defined below.
Distance degree centric descriptor .deg
v
cI
is defined as:
.deg 21
log
v Gg g
c
g
n nI
A A
Review of Literature
100
Where ng is the number of graph vertices having both the same atom eccentricity η and
the same vertex distance degree ζ, G is the number of different vertex equivalence
classes and A is the total number of atoms.
Generalized Distance degree centric descriptor .deg
v G
cI
is defined as:
.deg 21
log
v G Gg g
c
g
n nI
A A
Where ng is the number of graph vertices having both the same average topological
distance to the polycentre and the same vertex distance degree ζ, G is the number of
different vertex equivalence classes and A is the total number of atoms
Diudea et al. (1991) introduced the so-called B-matrix to develop a novel
descriptor based on counting of the vertices in graph spheres (layers). A sphere may be
defined as the list of atoms at a given topological distance surrounding a central vertex
and its use is advantageous in studies which investigate the influence of neighboring
vertices on a specific property of central vertex.
The two types of originally proposed centro-complexity operators were
defined as:
3
0
3
0
1 )(
kbkbBxD
k
iki
D
k
ikRi
kD
k
ikik
D
k
ikRi bbBx
1010)(
00
2
Where B is the branching layer matrix and δ the vertex degree.
Balaban and Diudea (1993) constructed a new type of layer matrix, called R
matrix, on the basis of distance sums of vertices. This matrix was operated with two
classes of operators: one of “centricity” (“c”) type and the other of
“centrocomplexity” (“x”) type, the last one taking into account the „more important‟
vertices in molecular graphs. By analogy with the regressive degrees they defined
new real-number LOVIs, regressive distance sums as:
Review of Literature
101
ik
D
k
nki rr
0
10
Where D is the diameter and n denotes the number of digits for the maximal rik value
in G.A simplified form of the centrocomplexity operator was also proposed as:
nkD
k
ikink
D
k
iki rSrRx
1010)(
10
Where Si is the distance sum (i.e., vertex distance sum) of the ith vertex and z is the
number of digits of the max rik-value in graph.
Diudea (1994) differentiated the layer matrices (LM) from the sequence
matrices (SM).The layer matrix is a collection of the properties of vertices u located in
layers (concentric shells) at a distance j around each vertex i in G whereas sequence
matrix collects the walks starting from the vertices i to all other n -1 vertices in G. He
also defined two descriptors - centrocomplexity x (LM) and centric invariant c (LM)
based on LM matrices
SM(e)
= [mi(e)
, i (1,n); e = 1,2,….., esp]
LM(e)
= [lmij(e)
= ∑mu(e)
; i (1,n); j(0, d); e(1,esp)]
uG(u)i =[u;diu=j]
Where m is the label for particular type of walk or property, n is the number of vertices
in g, d is the diameter of G and esp being eccentricity of vertex i.
Balaban (1995) proposed regressive decremental distance sums to obtain
greater discrimination between the terminal and central vertices. They are calculated
from the distance sum layer matrix LDS by the following:
zj
j
ijii
i
lmSLDSx
10][
0
Where Si is the distance sum of the ith
vertex. In this way, the progressively attenuated
contributions due to more distant vertices are subtracted from the distance degree of
the focused vertex.
Newman (2005) reoprted a new betweenness measure i.e. betweeness
centrality that counts essentially fraction of shortest paths going through a given
vertex i as:
Review of Literature
102
ijkp
ipiBC
n
k
n
kj kjshort
kjshort
,)(
)(1
1 1
Where short
pkj is the number of shortest paths connecting vertices k and j, and short
pkj (i)
is the number of these shortest paths that pass through the vertex i. The betweenness
centrality BC(i) characterizes the degree of influence a vertex has in communicating
between vertex pairs.
Information theory based graph invariants:
Information theory has been used in chemical graph theory for describing chemical
structures and for providing good correlations between physico-chemical and structural
properties. Information descriptors are constructed for various matrices and also for
some topological descriptors. The advantage of such kind of descriptors is that they
may be used directly as simple numerical descriptors in a comparison with physical,
chemical or biologic parameters of molecules in structure property and activity
relationships. It can also be noted that information descriptors normally have greater
discriminating power for isomers than the respective topological descriptors.
An appropriate set A of n elements is derived from a molecular graph G
depending upon certain structural characteristics. On the basis of an equivalence
relation defined on A, the set A is partitioned into disjoint subsets Ai of order ni (i=1, 2,
3, ……h; ini = h). A probability distribution is then assigned to the set of equivalent
classes:
(A1, A2, ……., Ah)
(p1, p2, ………ph)
Where pi=ni/n is the probability that a randomly selected element of A will occur in the
ith
set.
The information content of an element of A is defined by Shannon‟s relation
(Shannon, 1948).
21
logh
i ii
IC p p
The logarithm is taken at base 2 for measuring the information content in bits. The total
complexity of the molecule or the information content of the set A is then nIC.
Review of Literature
103
Shannon and Weaver (1949) showed that the statistical concept of entropy
can be extended beyond the thermodynamics and applied to the process of
transmitting information. The basic Shannon’s formula to measure entropy of
information in bits can be expressed as:
i
n
i
i nnnnH
1
22 loglog
Where ni is the probability of randomly selecting an element of the ith
class.
One of the major consequences of Shannon‟s theory was the radically new idea of
viewing the structure of any kind as a communication. This study was one of the
founding works in the field of information theory.
Dancoff and Quastler (1953) introduced the first molecular information
descriptor as “information on the kind of atoms in a molecule”. The elemental
composition distribution incorporates subsets of atoms of the same chemical element.
The entropy of this distribution also called, information on chemical composition, Icc-
was proposed as a measure of the compound compositional diversity:
i
i
ihh
CC nnnnI 22 loglog
Where nh is the total number of atoms (hydrogen included) and ni is the number of
atoms of chemical element of type i.
Rashevsky (1955) was the first to calculate the information content of graphs
where “topologically equivalent” vertices were placed in the same equivalence class. In
Rashevsky approach, two vertices u and v of a graph are said to be topologically
equivalent if and only if for each neighboring vertex ui (i=1,2,….,k) of the vertex u,
there is a distinct neighboring vertex vi of the same degree for the vertex v. While
Rashevsky used simple linear graphs with indistinguishable vertices to symbolize
molecular structure, weighted linear graphs or multigraphs are better models for
conjugated or aromatic molecules because they more properly reflect the actual bonding
patterns, i.e. electron distribution.
Trucco (1956) refined the definition of topological equivalence of atoms in
terms of the orbits of the automorphism group of the molecular graph. This type of
molecular information content was later termed as orbit’s information descriptor, Iorb.
Review of Literature
104
In the latter case, two vertices are considered equivalent if they belong to the same
orbit of the automorphism group, i.e., if they can interchange preserving the adjacency
of the graph.
Brillouin (1962) defined a complementary quantity from the Shannon entropy
H, called Brillouin redundancy descriptor-R (or redundancy descriptor), to measure
the information redundancy of the system:
N
HR
2log1
The logarithm is taken at basis two for measuring the information contents in bits.
Mowshowitz (1968a) described a rigorous reinterpretation of Shannon’s H
function as information content but not entropy. He pointed out that Shannon’s
function does not measure the average uncertainty per structure of a given ensemble
of all structures having the same number of elements. Rather, it is the information
content of the structure relative to a system of symmetry transformations that leave
the system invariant.
Mowshowitz (1968b) formalized the Shannon’s equation to finite systems
with a symmetry element. He introduced a probability scheme applicable to any
system having N elements partitioned into k classes according to equivalence criterion
α :
Equivalent classes 1,2,………..k
Element partition n1, n2,……nk
Probability distribution p1, p2,……pk
Where pi = ni /n is the probability for a randomly chosen element to belong to class i
having ni elements and
k
i
inn
1
.
Bonchev et al. (1976) introduced their first information descriptor on the basis
of grouping the atoms in a molecule into equivalence classes determined by the point
group of symmetry to which the molecule belongs. This molecular symmetry
descriptor, Isym, complements the orbit‟s information descriptor, in accounting for
specific molecular geometry and conformations. Linear relationships were obtained
between Isym and thermodynamic entropy for several homologous series of organic
compounds.
Review of Literature
105
Bonchev and Trinajstic (1977) applied Shannon‟s formula to the summands
before the summation, obtaining thereby information-based TDs patterned after
various other descriptors, but having lower degeneracy. Thus instead of adding all
graphs distances dij to obtain the Wiener descriptor, one sorts first these distances
according to their values into groups of gi sum and having the same i value.
The Wiener descriptor is then
l
i
iigW1
The information theory descriptors having information for adjacency,
incidence, and polynomial coefficients of the adjacency matrix and for distance of
molecular graph were defined by the following equations:
)222(log)22()1(2log)1(2log 2
2
2
2
2
2 NNNNNNNNIadj
)2)(1(log)2)(1()1(2log)1(2)1(log)1( 222 NNNNNNNNNNI inc
2/
0
22 ,log,logN
k
pc kGpkGpZZI
m
i
iiD kkNNNNI1
22
2
2
2 2log2loglog
Where Z was Hosoya’s descriptor, N was the number of vertices, , p(G,k) is probability
of randomly chosen polynomial coefficient, 2kiis the number of times the distance value
i appear in distance matrix. These invariants are largely defined from a combination of
the parameters used to obtain Wiener’s descriptor, Hosoya descriptor and Randic
connectivity descriptor.
Bonchev et al. (1979) introduced a novel information theory based descriptor
termed as information content descriptor (I) to deal with the problem of
characterizing molecular structures. The relation describes the information of a
system, I, having N elements and expressed as:
j
n
j
NNNI
1
22 loglog
Where n represents the number of different sets of the elements and Nj is the number
of elements in the set j of the elements and summation is done over all sets of
elements.
Basak and co-workers (Basak et al., 1980; Basak and Magnuson, 1983)
developed information-theoretic descriptors that take into account all atoms in the
Review of Literature
106
constitutional formula (hydrogens also being included), and consider the information
content provided by various classes of atoms based on their topological neighborhood.
Three main types of informational descriptors developed by Basak et al. (1980; 1983)
are:
Mean Information content (IC) or complexity of a hydrogen-filled graph, with
vertices grouped into equivalence classes having r vertices; the equivalence is
based on the nature of atoms and bonds, in successive neighborhood groups)
SIC (structural information content) and
CIC (complementary information content).
The mean information content IC, also called Shannon’s entropy H (Shannon and
Weaver, 1949) was defined as:
Iir PiPIC 2log
Where n represents the total number of vertices of the graph and ICr is computed
using Shannon‟s relation. It is the most common measure of uncertainty.
The rth
order structural information content SICr, was defined in a normalized form of
the ICr to delete the influence of graph size:
nICSIC rr 2log/ ,
The rth
order complementary information content CICr, measures the deviation of ICr
from its maximum value, which corresponds to the vertex partition into equivalence
classes containing one element each:
rr ICnCIC 2log
The ICr, SICr, and CICr descriptors can be calculated for different orders of
neighborhoods, r (r = 0, 1, 2, ...,ρ), where ρ is the radius of the molecular graph G.
Bertz (1981) proposed a new general information descriptor that incorporates
the information on atomic composition, information on graph connections and
molecular size. The general descriptor of molecular complexity of Bertz is given by
the equation:
IBERTZ = IAC + ICONN + ISIZE
Review of Literature
107
Where ICC, ICONN and ISIZE are the information contents related to the chemical/atomic
composition, bond connectivity and the molecular size respectively. Chirality and
stereochemistry can be reflected by the distribution of connections into classes of
orbital equivalency.
Bonchev (1983) proposed information-theoretic topological descriptors. The
information-theoretic descriptors on graph distance Iw
D and Īw
D are computed using
distance matrix and described as follows:
h
h
w
DhhgWWI 22 loglog
h
w
Dh
w
DWWhWhg II //log/ 2
Raychaudhury (1983) introduced the concept of „vertex distance complexity‟
which has been found to have a high discriminating power. This local vertex invariant
calculated on the hydrogen filled molecular graph was expressed as:
i
ijn
i i
ijdi
S
dlb
S
dV
1
iig
ig
S
glb
S
gf
i
1
Where n is the number of vertices, g spans all of the different distances from the
vertex vi, ig f is the number of distances from the vertex vi equal to g,
i is the i
th atom
eccentricity, Si is the distance sum, lb stands for „ log2‟ and the descriptor diV is
expressed in bits.
Raychaudhury et al. (1984) defined three information invariants- degree
complexity (Id), graph vertex complexity (H
V) and graph distance complexity (H
D).
Among these, graph distance complexity was found to the only descriptor to
discriminate well all the studied graphs:
di
n
i ROUV
iD vI
SH
1
Where div is the vertex distance complexity, Si is the i
th vertex distance sum, IROUV is
the Rouvray descriptor
Bertz (1988) devised an descriptor C(m) based on information theory and the
graph invariant m using Shannon‟s formula, and taking into account the complexity of
Review of Literature
108
the graph including the presence of heteroatoms (as vertex weights) and multiple
bonds (as edge weights):
i
i
i mmmmmC 22 loglog2
Klopman and Raychaudhury (1988) described an information descriptor-
vertex distance complexity (Vd), for the vertices of a molecular graph and used the
same for qualitative evaluation of mutagenic activity of a series of non-fused ring
aromatic compounds.
King (1989) described two other information descriptors from the descriptors
of neighborhood symmetry by modifying the classical definition of Shannon‟s
entropy. In particular, the modified information content descriptor (or MIC descriptor)
was proposed using the weighted atomic masses as:
)log.( 2
1
ii
n
i
i nnmMIC
Where mi is the atomic mass of all the equivalent atoms in the ith class and ni is the
probabilities of selecting a vertex of class i.
The Z-modified information content descriptor (or ZMIC descriptor) was
analogously defined as:
)log(. 2
1
iii
n
i
i nnZnZMIC
Where Zi is the atomic number and ni the number of atoms in the ith class.
Konstantinova and Paleev (1990) introduced the information distance
descriptor of graph vertices on the basis of the distance matrix as:
p
j
ijij
Did
d
id
diH
1
2)(
log)(
)(
Where
p
j
ijdid
1
)( and pij= dij /d(i) is the probability for an arbitrarily chosen vertex to
be at a distance dij from the vertex i.
Skorobogatov et al. (1991) considered one more information descriptor based
on the distance matrix in structure-activity correlation. The information descriptor H2
was defined as:
Review of Literature
109
n
i
iii
W
S
W
SnH
1
222
log2
k
i
i
W
kid
W
kidH
1
122
2log
2 ,
Where ki, i = 1,…,k, is the number of vertices having the distance d(i), 2W is the
Rouvray descriptor, di is the vertex distance degree of the ith
atom, ni is the number of
vertices having equal vertex distance degrees in the ith
class
Balaban and Balaban (1991) defined four new information MDs i.e. U
descriptor, V descriptor, X descriptor and Y descriptor on the basis of local graph
invariants ui, vi, xi and yi respectively:
GE
jiuuC
BGU
5.0
1)(
GE
jivvC
BGV
5.0
1)(
GE
ji xxC
BGX
5.0
1)(
GE
ji yyC
BGY
5.0
1)(
In all these formulas summations are over all edges in the molecular graph. Using
information theory applied to distance degree sequences; these highly degenerate
MDs were obtained (Ivanciuc et al., 1993b).
Sahu and Lee (2004) derived a novel information theoretic topological
descriptor Ik on the basis of chemical signed graph theory or specifically edge signed
graph. The expression for this novel information theoretic topological descriptor, Ik
was defined as:
Ik =mlbm – nlbn – (n – m)lb |n – m|
Where m and n represent the total number of positive (+) signs and total number of
negative (-) signs from edge signed graphs of the corresponding molecular graph and
k is the molecular orbital level.
Raychaudhury and Ghosh (2004) proposed new information-theoretical
measure of similarity, INFSIM, based on Shannon's measure of information content of a
discrete system. They used Shannon‟s measure of information theoretical measure of
redundancy of a system to derive the similarity measure. They also used a topological
Review of Literature
110
shape and size descriptor (TSS) and a topo-physical molecular descriptor (TPMD) for
the study. These descriptors have been used to carry out molecular similarity analysis
for quantitative discrimination (active/ inactive) of eleven β-lactams with respect to
anti-bacterial activity of penicillin G. It was concluded that information- theoretical
similarity measure, INFSIM, has been able to produce similarities that appear to help
classify (active /inactive) the studied compounds with significant accuracy.
Sahu and Lee (2008) deduced the novel net-sign identity information
descriptor, Iε from the molecular electronic structure on the basis of chemical graph
theory. It was defined as the summation of the square of the numerical values
obtained for Ik (Sahu and Lee, 2004) for each molecular orbital level, k, ranging from
1 to n:
n
k
kll
1
2
Where k is the molecular orbital level.
The net-sign-identity information descriptor was utilized in QSPR studies of the
saturated and unsaturated hydrocarbons successfully.
Varmuza et al. (2009) proposed new family of topological information
descriptors based on the full neighborhood of all atoms. They considered each atom of
a molecular structure as a subsystem and for each atom the complete neighbourhood
was characterized by an information functional fi, based on the number of atoms in all
spheres around the atom. The properties of all atoms were normalized to a sum of one
(a probability-like measure, pi) from which the information entropy was calculated.
The entropy was scaled by the number of atoms in the structure to give a molecular
descriptor E.
n
i
ii pldpnldaE
1
Where pi is “normalized” probabilities and calculated as:
n
j
iii ffp
1
/
Review of Literature
111
For each subsystem the value fi of an invariant is calculated based on the complete
neighborhood. The values of the invariants are normalized to give “probabilities” pi
that are combined to an entropy measure E.
Dehmer et al. (2010) derived entropic measures to calculate the information
content of vertex- and edge labeled graphs and investigated the influence of proposed
MDs on the prediction performance of the underlying graph classification problem.
They demonstrated that the application of entropic measures to molecules
representing graphs is useful to characterize such structures meaningfully and such
methods might be valuable for solving problems within biological network analysis
Dehmer et al. (2012) evaluated the uniqueness of several information-
theoretic measures for graphs based on so-called information functions and compared
the results with other information descriptors and non-information-theoretic measures
such as the well-known Balaban J descriptor. They found that one of the information
measures for graphs using the information functional based on degree–degree
associations outperformed the Balaban J descriptor.
Miscellaneous graph invariants and approaches:
Geary (1954) suggested the contiguity ratio, c, on the basis of the squared
differences between contiguous areas:
22'
1 /2/1 xxxxknc ttt
Where n is the number of areas, xt is the value for area t, x is the mean of all the
values, kt is the number of areas connected to area t and k1 = kt is twice the sum of
all connections.
Wiberg (1968) proposed bond descriptor to measure the multiplicity of bonds
between two atoms. It was defined as the sum of the squares of the bond orders (pjk)
between any one atomic orbital and all other orbitals in a molecule. It is two times the
charge density in that orbital (pij) less the square of the charge density:
22 2 ijij
k
jk ppp
For a unit charge density, the value is 1, whereas it goes to zero for pij = 2 (a non-
bonded pair) or for pij = 0 (an empty orbital). Correspondingly, the sum of the squares
Review of Literature
112
of the bond orders to an atom corresponds to the number of covalent bonds formed by
that atom, corrected for the ionic character in each bond (Trindle, 1969).
Gutman and Randic (1977) used the algebraic concept of comparability of
functions to derive a new comparability descriptor. They suggested that the structure
having an identical distribution of valencies should not be discriminated.
Moreau and Broto (1980a; 1980b) derived 2D-autocorrelation descriptors
from the molecular graph weighted by atom physicochemical properties (i.e. the atom
weightings wi). The spatial autocorrelation was then evaluated by considering
separately all the contributions of each different path length (lag) in the molecular
graph, as collected in the topological distance matrix. The total spatial autocorrelation
at lag kATSk was obtained by summing all the products wi.wj of all the pairs of atoms i
and j, for which the topological distance equals the lag as:
dkdkwwATS ijj
N
i
N
j
ik ,....,2,1,0;1
1 1
Where w is any atomic property; N is the number of atoms in a molecule; k is the lag
and dij is the topological distance between atoms i and j, d is the topological diameter,
i.e. the maximum topological distance in the molecule, and δ is a Dirac- delta function
defined as:
δ(k; dij) = 1 if dij = k
0 otherwise
The autocorrelation ATS0 defined for the path of length zero was calculated as:
N
i
iwATS
1
20
i.e. the sum of the squares of the atomic properties. Typical atomic properties that can
be considered are atomic masses, polarisabilities, charges, and electronegativities
(Broto et al., 1984a; 1984b).
Mekenyan and Bonchev (1986) outlined the Optimized Approach based on
Structural Descriptors Set (OASIS) methodology as a generalization of Hansch
approach. A large set of calculable geometrical, topological and quantum-chemical
descriptors were utilized to characterize the molecular structures. This methodology
was reported as a second generation (Q)SAR approach for SAR studies of structurally
related compounds.
Review of Literature
113
Ghose and Crippen (1987) defined atomic refractivity values of the
topological environment of each skeleton atom in the molecule as:
iinAMR
Where ni = no. of atoms; αi = atomic refractivity value.
Cramer et al.(1988) introduced a three-dimensional (3D) (Q)SAR technique
termed as comparative molecular field analysis (CoMFA) for structure/activity
correlation studies. This approach involves the alignment of a set of molecules in 3D
space. Once a suitable alignment is obtained, a steric or electrostatic field is
constructed using a probe atom. The resultant field is then correlated with the reported
activity values of the molecules.
Pal et al. (1988, 1989) developed a novel topochemically arrived unique
(TAU) descriptors based on electronic and nuclear properties of the atoms present in the
molecular graph. This scheme describes the molecular graph in terms of sets of the edge
weights (E) and vertex weights (V). Four MDs namely the functionality descriptor (T)
skeletal descriptor (TR), functionality descriptor (F) and branchedness descriptor
(B),were derived from it. These descriptors were calculated from core and mobile
valence electron (VEM) count. QSAR model based on these descriptors were also
developed for inhibition of M. tuberculosis by substituted bromophenols.
Bangov (1990) conceptualized charge-related molecular descriptor (CMI)
defined as:
ijj
i j
i dLLCMI /
Where dij is the inter-atomic distance and Li are local descriptors featuring each one
heavy (non-hydrogen) atom I and can be expressed as follows:
Li = Lo - nH + Qi
Lo is the constant value for each atom for each hybridization state; nH is the number
of the hydrogen atoms, attached to a given heavy atom, and Qi is the corresponding
charge densities.
Kier and Hall (1990) introduced a new set of MDs called the electrotopolgical
state (E- state) descriptors based on graph invariants for each atom in the molecule. The
E-state variable encodes the intrinsic electronic state of the atom as influenced by
electronic environment of all other atoms within the topological framework of the
Review of Literature
114
molecule. For simplicity, these descriptors were referred to as the E-state descriptors.
The electrotopolgical state Si of the ith atom in the molecule or E-state descriptor was
defined as:
Si = Ii + ΔIi
2/)( ijjii dIII
Where Ii is the intrinsic state of the ith
atom and ΔIi is the field effect on the ith
atom
calculated as perturbation of the intrinsic state of ith
atom by all other atoms in the
molecule, dij is the topological distance between the ith
and the jth
atoms.
The intrinsic state I is based on the Kier-Hall’s electronegativity and derived
from the ratio of that electronegativity to the number of skeletal ζ bonds for that atom.
/1)/2( 2 vNI
Where the symbol δ and δv are the molecular connectivity δ values:
δ = ζ – h = number of connections in the skeleton
Where ζ is the number of electrons in ζ orbitals; h is the number of hydrogen atoms
bonded to the atom.
Randic (1991a; 1991b) developed orthogonal descriptors in multivariate
regression and observed that the concept of orthogonality applies equally to molecular
properties as to descriptors, to quantum chemical descriptors as well as to ad hoc
combinations of topological descriptors. He also proposed a new structure-explicit
graph matrix P and also developed a novel molecular descriptor P’/P based on it.
1
1 1
)(/'N
i
N
j
ijPPP
The quantity P’/P represents the graphical bond order, πeij of the edge (bond)
eij of G
Bonchev et al. (1992) utilized graph topological extrapolation method for the
modeling of polymer properties (TEMPO) that was based on the graph topological
description of the polymer elementary units by means of the normalized Wiener number
represented as a polynomial of degree 3 with respect to the number of atoms. The
method was applied to the calculation of p-electron energies and energy gaps of various
Review of Literature
115
conjugated polymers, as well as to the assessment of the melting point, density,
refractive descriptor, and specific rotation of some industrially produced polymers.
Yao et al. (1993) developed three new topological descriptors Ax1, Ax2 and Ax3
for use in multivariate analysis in structure-property relationship and structure-activity
relationship studies. Good results were obtained by using them to predict the physical
and chemical properties and biological activities of some organic compounds. The
studies also indicated that the three topological descriptors have high structural
selectivity.
Todeschini et al. (1994, 1995) developed novel 3D molecular descriptors,
termed as Weighted Holistic Invariant Molecular (WHIM) descriptors, which represent
different sources of chemical information. WHIM descriptors contain information
about the whole 3D molecular structure in terms of size, shape, symmetry and atom
distribution. These descriptors were calculated from x, y, z-coordinates of a 3D
structure of the molecule, usually from a spatial conformation of minimum energy,
within different weighting schemes in a straightforward manner and represent a very
general approach to describe molecules in a unitary conceptual framework. The
directional WHIM size descriptors were defined as the eigenvalues λ1, λ2, and λ3 of the
weighted covariance matrix of the molecule atomic coordinates; they account for the
molecular size along each principal direction. The weighted covariance matrix is a 3×3
matrix whose elements are the weighted covariance Sjk between jth
and the kth
atomic
coordinates for j,k ϵ {1,2,3}
defined as per following:
n
i
i
avjik
avkij
A
i
i
wjk
w
qqqqw
s
1
1
Where, A is the number of atoms, qij and qik are the jth
and the kth
coordinates of the ith
atom and avq is the corresponding average value; wi the weight of the ith
atom. Six
different weighting schemes were proposed i.e (1) the unweighted case U (2) atomic
masses M (3) the vander Waals volumes V , (4) the Mulliken atomic
electronegativities E (5) the atomic polarizabilities P and (6) the electrotopological
descriptors of Kier and Hall’s . All the weights (1)–(5) were scaled with respect to the
Review of Literature
116
carbon atom and their values (original and scaled values) (Todeschini and Gramatica,
1998).
The non-directional WHIM descriptors were also derived directly from the
directional WHIM descriptors. Thus, for non-directional WHIM descriptors, any
information related to the principal axes disappears and the description is related only
to a global holistic view of the molecule. These descriptors were built in such a way
so as to capture variation of molecular properties along with the three principal
directions in the molecule:
321
3
1
11
ATV
p
p
321 T
323121 A
Where T and A are the linear and quadratic contributions to the total molecular size. V
is the complete expansion including also the third order term. λ1, λ2, and λ3 are
eigenvalues of weighted covariance matrix of the molecule atomic coordinates
(Todeschini and Consonni, 2009).
Galvez et al. (1995) demonstrated that by an adequate choice of topological
descriptors it is possible to not only predict different pharmacological activities but
also to design new active compounds, including lead drugs, in several therapeutic
scopes, with a surprising level of efficiency, especially considering the simplicity of
the calculations. They concluded that in spite of its limitations, molecular topology
ought to be considered not just as an excellent tool for molecular and drug design but
as a real alternative approach to the study of chemical bonds, whose theoretical
physicochemical basis is still to be developed.
Hall and Kier (1995) defined atom-type E-state descriptors encoding
topological and electronic information related to particular atom types in the
molecule. These descriptors were calculated by summing the E-state values of all
atoms of the same atom-type in the molecule or, alternatively, as average of the E-
state values. The electrotopological state descriptors have shown considerable
usefulness in the establishment of (Q)SAR/QSPR/QSTR equations. The ability to
focus on individual atoms has provided significant utility in their applicability.
Review of Literature
117
Schuur et al. (1996) derived a molecule representation of structures based on
electron diffraction (MoRSE) code from an equation used in electron diffraction
studies that allowed the representation of the 3D structure of a molecule by a fixed
number of values. Various atomic properties were taken into account giving high
flexibility to this representation of a molecule.
ij
ij
j
N
i
i
j
isr
srAAsI
sin)(
2
1
1
s = 0, …., 31.01 Å-1
Values of this function were calculated at 32 evenly distributed values of s in the
range of 0-31.0 Å-1 from the 3D atomic coordinates of a molecule. This 3D-MoRSE
code retained the important structural features such as the mass and the amount of
branching and was able to distinguish between benzene, cyclohexane, and
naphthalene derivatives in a dataset of great structural variety. For the atomic
weighting scheme w, various physico-chemical properties such as atomic mass, partial
atomic charges, and atomic polarisability were considered. These descriptors have
shown wide applicability in (Q)SAR/QSPR studies.
Basak et al. (1997b) used topostructural, topochemical and geometric
parameters to develop hierarchical QSAR approach using for limiting the number of
independent variables in linear regression modeling to avoid the problems of chance
correlations. This new approach was found to be useful in illuminating the
relationships of different types of molecular description information to
physicochemical property.
Hu and Xu (1997) devised a new topological descriptor called as molecular
identification number from an all-paths method. This new topological descriptor
displayed high discriminating power for various kinds of organic compounds such as
alkane trees, complex cyclic or polycyclic graphs, and structures containing
heteroatoms and thus used as a molecular identification number (MID06) for
chemical documentation.
Dejulian-Ortiz et al. (1998) proposed chiral MDs to consider the chirality
within a topological model. They suggested that the chiral information is related to
symmetry, which allows the topological handling of chiral atoms by weighted graphs
and the calculation of new descriptors that give a weight to the corresponding entry in
Review of Literature
118
the main diagonal of the topological matrix. These chiral MDs differentiated the
pharmacological activity between pairs of enantiomers.
Ivanciuc et al. (1998b) presented two new approaches for the calculation of
atom and bond parameters for heteroatom-containing molecules. In the first approach,
the atom and bond weights were computed on the basis of relative atomic
electronegativity, using carbon as standard. The weight parameter AWX for atom i,
which utilized the relative electronegativity, was defined as:
AWXi =1-1/Xi
In the second system, the relative covalent radii were used to compute atom and bond
weights, again with the carbon atom as standard. The bond weight parameter BWX
was defined as:
BWXij =1/BXiXj
The two approaches were used to define and compute topological descriptors based
on graph distance.
Borodina et al. (1998) applied a method based on topoelectrical invariants to
estimate the synthetic molecule resemblance to small endogenous bio-regulators. In
this work, each atom was characterized by its electronegativity and equilibrium
charge. The results demonstrated discriminative ability of proposed structure
description and measure of similarity.
Pearlman and Smith (1998; 1999) developed Burden- CAS university of
Texas eigenvalues (BCUT) descriptors on the basis of Burden matrix (Burden, 1989,
1997) which is an adjacency matrix in which the non-diagonal elements are weighted
based on the nature of the connectivity of the atoms involved. The fundamental
modification made by them was to place atomic properties along the diagonal of the
Burden matrix. This leads to a variety of weighted Burden matrices where the weights
include atomic weight, polarizability, electronegativity and hydrogen bonding ability.
The actual descriptors were obtained by performing an eigenvalues decomposition of
the Burden matrix and taking the lowest and highest eigenvalues. It was also shown
that the extreme eigenvalues of the Burden matrix encode global information
regarding the molecule. The holistic nature of these descriptors have led to their
frequent use in studies of chemical diversity, library design and hit selection in high
throughput screens (Stanton, 1999).
Review of Literature
119
Hemmer et al. (1999) represented the 3D structure of a molecule by a radial
distribution function (RDF) code. The RDF of an ensemble of N atoms was
interpreted as the probability distribution to find an atom in a spherical volume of
radius r:
2)(1
1
)( ijrrBj
N
i
N
j
i eAAfrg
Where f is a scaling factor and N is the number of atoms. By including characteristic
atomic properties A of the atoms i and j, the RDF code can be used in different tasks
to fit the requirements of the information to be represented. These atomic properties
enable the discrimination of the atoms of a molecule for almost any property that can
be attributed to an atom.
Tuppurainen (1999) described a modification of EVA descriptors termed as
electronic eigenvalues (EEVA), for use in the derivation of predictive (Q)SAR and
QSPR models. In this approach, semi-empirical molecular orbital energies, i.e. the
eigenvalues of the Schrodinger equation, were used instead of the vibrational
frequencies of the molecule. Its performance was tested with respect to the Ah
receptor binding of polychlorinated biphenyls (PCBs), dibenzo-p-dioxins (PCDDs)
and dibenzofurans (PCDFs).
Karmarkar et al. (2000) estimated the proton-ligand formation constants of
salicylhydoroxamic acids and their nuclear substituted derivatives using the
normalized Wiener’s descriptor, referred to as mean square Wiener’s descriptor
(Wms). It was defined as the mean of the square of the elements dij(G) of the off-
diagonal submatrix:
ij
ijms dN
W 2
1
1
They indicated that the normalized Wiener‟s descriptor gives better results than the
Wiener’s descriptor itself.
Estrada (2000a) introduced a graph-spectrum-based invariant which is now
known as Estrada descriptor. This descriptor was defined as:
n
i
ieGEE
1
)(
Review of Literature
120
Estrada descriptor gives maximum values for the most folded structures, thus it is
useful in the measure of folding of the molecular structures, especially protein chain.
Estrada descriptor is also an effective method to measure the centrality of complex
networks, extended atomic branching and the carbon-atom skeleton (de la Pena et al.,
2007).
Palyulin et al. (2000) developed a novel approach of QSAR analysis for
organic compounds known as molecular field topology analysis (MFTA). This method
involved the construction of a molecular supergraph (MSG) by topological
superposition of the training set structures and resulted in generation of uniform
descriptor vectors based on the local physicochemical parameters (atom and bond
properties) of the molecules. He concluded that MFTA may provide the prediction
models that are comparable or superior in quality of description and prediction to the
models based on the widely used classical (Q)SAR methods and 3D approaches.
Randic (2001a; 2001b) reported novel shape descriptors based on the number
of paths and the number of walks within a graph for all atoms and then making the
quotients of the number of paths and the number of walks the same length. The new
shape descriptors showed superior discriminating power among isomers as compared
to the kappa shape descriptors. The new descriptors offered regressions of high
quality for diverse physicochemical properties of octanes.
O’Brien and Popelier (2001) described a new molecular similarity method
called quantum topological molecular similarity (QTMS) depending on the topology
of the electron density. The QTMS method directly compares discrete topological
representations of molecules without 3D superposition, using properties evaluated at
the bond critical points (BCP) and is able to suggest a molecular fragment that contains
the active center or the part of the molecule responsible for the QSAR. QTMS was
applied to five carboxylic acid systems at five different levels of calculation. Each
level benefited from the geometry optimization of the lower level since successively
updated geometries were obtained. All levels of calculation provided very good
regression outcomes.
Cao and Yuan (2001) proposed three novel topological descriptors: OEI (odd –
even descriptor), VDI (vertex degree-distance descriptor), and RDI (ring degree-
Review of Literature
121
distance descriptor) and then carried out multiple regression analysis with these
descriptors against the boiling points of paraffins and cycloalkanes.
The three descriptors are defined as:
1
1 1
[( 1) ]N N
Dij
i j
OEI S
Where N is the number of vertices in molecular graph. S is the derivative matrix from
distance matrix D, whose elements are the squares of the reciprocal distances (Dij)-2
i.e.
S=[1/Dij2] (when i=j, let 1/Dij
2=0). It means that the interaction between vertex i and j is
proportional to (Dij)-2
.
The interaction of vertex i and j is determined not only by the distance between i
and j, but also by their vertex degrees. So VDI is defined as:
1/
1
( )N
Ni
i
VDI f
Where fi is the elements of vector (1 N)VS obtained by V S
VS= [f1, f2, …..,fN]
Because of the rigidity of the ring, the freedom of vertex in the ring is smaller than that
in the chain. Thus another descriptor RDI was proposed as:
1/
1
( )N
Ni
i
RDI g
Where gi is the elements of vector (1 N)RS obtained by R S
RS = [g1, g2, …..,gN]
Consonni et al. (2002a) developed novel 3D GEometry, Topology, and Atom-
Weights AssemblY (GETAWAY) MDs based on an influence or leverage matrix.
These descriptors encode both the geometrical information given by the influence
molecular matrix and the topological information given by the molecular graph,
weighted by chemical information encoded in selected atomic weightings. Two sets of
MDs were devised: H-GETAWAY (calculated from molecular influence matrix H) and
R-GETAWAY (calculated from the influence/distance matrix R) descriptors. A set of
the H- GETAWAY (HGM, ITH, ISH, HIC) and R- GETAWAY (RARS, RCON and REIG)
was derived by applying some traditional matrix operators and concepts of
Review of Literature
122
information theory both to the molecular influence matrix H and to the
influence/distance matrix R.
The geometric mean on the leverage magnitude (HGM) was defined as:
NN
i
iiGM hH
/1
1
100
Where N represents the number of atoms and the factor 100 scales the descriptor
values between 0 and 100.The diagonal elements hii of the molecular influence
matrix, called leverages.
The total information content on the leverage equality (ITH) and standardized
information content on the leverage equality (ISH) were defined as:
G
g
ggoTH nnAAI
1
220 log.log.
oo
THTH
AA
II
2log
Where Ngis the number of atoms with the same leverage value and G is the number of
equivalence classes into which the atoms are partitioned according to the leverage
equality. A0 represents the number of non-hydrogen atoms in the molecule.
The mean information content on the leverage magnitude (HIC) was defined
as:
D
h
D
hH ii
N
i
iiIC
H
1
2log
Where D is the matrix rank (i.e. the sum of all leverages) and NH is the total number
of atoms including hydrogens.
The average row sum of the influence/distance matrix (RARS) and R-
connectivity descriptor (RCON) were derived from R matrix as:
N
i
iRSNIRARS
1
/
Where N is the number of atoms in the molecule and RSi is the ith
row sum.
B
bbji RSRSRCON
1
5.0
Review of Literature
123
Where the sum runs over all bonds in the molecule and RSi and RSj indicate the row
sums of two adjacent vertices.
The third R-GETAWAY descriptor was defined in analogy with the Lovasz-Pelikan
descriptor (Lovasz and Pelikan, 1973). The R-matrix leading eigenvalue (REIG), an
descriptor of molecular branching, was calculated as the first eigenvalue of the
influence/distance matrix. RARS and REIG descriptors are closely related; their values
decrease as the molecular size increases and seem to be a little more sensitive to
molecular branching than to cyclicity and conformational changes (Todeschini and
Consonni, 2009).
In analogy with Moreau-Broto autocorrelation descriptors, ATS (Moreau and Broto,
1980a; 1980b) the GETAWAY autocorrelation descriptors were defined, weighting
each atom of the molecule by using physicochemical weights combined with the
elements of H or R matrix, thus also accounting for the 3D features of the molecules.
HATS descriptors were defined as:
2
1
0 )(
A
i
iji hwwHATS
);()()()(1
1
A
iij
ijjjjiiik dkhwhwwHATS
k=1,2,…..,d
Where hij and hjj are the diagonal entries corresponding to the atoms i and j in the
molecular influence matrix, dij is the topological distance between atoms i and j and d
is the topological diameter. The function δ (k; dij) is Kronecker delta function.
To consider also the off-diagonal elements which provide information on the degree
of interaction between atom pairs these H descriptors were derived:
A
i
iii whwH
1
20 )(
);;()(
1
ij
A
iij
ijjiijk hdkwwhwH
k=1,2,…..,d
Where dij is the topological distance between atoms i and j and d is the topological
diameter. The function δ (k; dij; hij) is a direct delta function. The terms H1, H2,…….
Hd, represents autocorrelation quantities of each different path length i.e lag 1,
Review of Literature
124
2…….,d weighted by molecular influence matrix. The weights used for the
GETAWAY descriptors were those proposed for the calculation of the WHIM
descriptors (Todeschini et al., 1997).
Consonni et al. (2002b) utilized GETAWAY and WHIM descriptors in
(Q)SAR/QSPR studies and suggested that the joint use of GETAWAY and WHIM
descriptors may provide more predictive models especially when the property to be
modelled depends strictly on the 3D features of the molecule.
Golbraikh et al. (2002) introduced several series of novel ZE-isomerism
descriptors for description of cis- and trans-isomers. They introduced a quantity
named ZE-isomerism correction for the vertex degrees of atoms connected by double
bonds in Z- or E-configuration following the general approach introduced earlier for
the chirality descriptors (Golbraikh et al., 2001). They included modified molecular
connectivity descriptors, Zagreb group descriptors, extended connectivity, overall
connectivity, and topological charge descriptors.
Junkes et al. (2003) proposed a new semi-empirical topological descriptor
denoted as IET, for the prediction of retention descriptors for a diverse set of organic
compounds i.e., alkanes, alkenes, esters, ketones, aldehydes, and alcohols. The
descriptor was based on the hypothesis that the chromatographic retention is due to the
interaction of each atom of the molecule with the stationary phase and consequently the
value of the descriptor is reduced by steric effects from its neighbors. It can be
calculated as:
i
iiET CI
1~
log
j
ji C
Where Ci is the value attributed to each carbon atom i and to the functional group in
the molecule and δi is the sum of the logarithm of the value of each adjacent carbon
atom (C1, C2, C3 and C4) and/or the logarithm of the value of the functional group.
Hu et al. (2003) devised a new variable descriptor; external factor variable
connectivity descriptor (EFVCI), in which the atomic attribute is divided into two
parts: the innate part and the external part or perturbation term. The innate part was
defined in terms of the number of valence electrons, while the perturbation term by
Review of Literature
125
reciprocal square distances and a variable parameter x. The local vertex invariant
relative to the ith atom was calculated as:
xDVSZ ivii )( 2
Where Zv is the number of valence electrons and VSi is the i
th row sum of reciprocal
square distance matrix D-2
. Then, the EFVCIs were calculated by using the variable
local vertex invariants γi in place of the classic vertex degree δi in the formula of the
Kier–Hall connectivity descriptors (Kier and Hall, 1986b).
Ma et al. (2003) defined a new TD i.e., edge structure descriptor (ESI), for
evaluating the ground-state properties of one–dimensional macro– to suprabenzenoid
hydrocarbons.
ESI = (np – c x nb) / (np + na),
Where np, na, and nb are the number of phenanthrene–edge structures, acene–edge
structures and benzo[c]phenanthrene–edge structures, respectively, and c is an
empirical parameter. The ESI was shown to be effective in providing good
correlations with the ground state properties. For one dimensional macro to
suprabenzenoid hydrocarbons, ESI showed better correlations than the connectivity
descriptor.
Toropova and Toropova (2004) proposed an empirical descriptor termed as
hydrogen bond descriptor (HBI) for chloro–fluoro hydrocarbons (CFC) as:
HBI= 5000 + NH – (NCl +NF)
Where NH, NCl, and NF are the number hydrogens, chlorine, and fluorine atoms,
respectively; the offset 5000 was added to numerically distinguish this descriptor
from other descriptors.
Hert et al. (2004) compared a range of different types of 2D fingerprints when
used for similarity based virtual screening with multiple reference structures. They
demonstrated the effectiveness of fingerprints that encode circular substructure
descriptors generated using the Morgan algorithm. These fingerprints were found to
be more effective than fingerprints based on a fragment dictionary, on hashing and on
topological pharmacophores. The combination of these fingerprints with data fusion
based on similarity scores was proposed to be an effective approach to virtual
screening in lead-discovery programmes.
Review of Literature
126
Ivanciuc (2004) presented a new application of topological descriptors in
computing similarity matrices that were subsequently used to develop QSPR/QSAR
models. The similarity matrices were computed using four similarity descriptors,
namely the Cosine, Dice, Richards, and Good similarity descriptors. The similarity
matrices were used to develop multilinear regression QSAR models of the
anticonvulsant activity of 30 phenylacetanilides. The results showed that similarity
matrices derived from molecular graph descriptors could provide the basis for the
investigation of [(Q)SAR/SPR] relationships.
Garcia et al. (2005) developed an algorithm for the generation of molecular
graphs with a given value of the Wiener descriptor. The selection of parameters as the
interval of values for the Wiener descriptor, the diversity and occurrence of atoms and
bonds, the size and number of cycles, and the presence of structural patterns guide the
processing of the heuristics generating molecular graphs with a considerable saving in
computational cost. The modularity in the design of the algorithm allows it to be used
as a pattern for the development of other algorithms based on different topological
invariants, which allow for its use in areas of interest.
Cuadrado et al. (2006) corrected invariant-based similarity measurements
using non-isomorphic fragment (NIF) dissimilarities with the aim of achieving more
realistic similarity values. Thus, NIF information was used for correcting invariant
based similarities (approximate similarity) because external NIF substructures have
key influence on activity values. The new method for computing approximate
similarities was expressed as:
BA
BABABA
TDTD
NIFTDNIFTDabsSAS 11
,,
Where TD(NIFA) and TD(NIFB)account for the NIF fragments of molecules A and B.
The new similarity measurements methods can be used for the development of fast,
cheap, and simple (Q)SAR/QSPR models.
Xu et al. (2006) devised three extended TDs for characterizing chiral
molecules as:
eAm1 = λmax1/ 2 - Am1
eAm2 = λmax2/ 2 - Am2
eAm3 = λmax3/ 2 - Am3
Review of Literature
127
Where λmax1, λmax2, λmax3 are the largest eigenvalues of matrices Z1, Z2, and Z3. The
applicability of the modified TDs was demonstrated through (Q)SAR studies on D2
for dopamine receptor and α receptor activities of fourteen N-alkylated 3-(3-
hydroxyphenyl)-piperidines.
Gutierrez-Oliva et al. (2006) analyze the application of the core-valence
bifurcation (CVB) descriptor and bond order descriptor by considering a series of
doubly hydrogen-bonded complexes. Their values are seen to be linearly related to
bond energies estimated through a bond-energy-bond-order relationship; also, the mean
value of the topological descriptor appears to be related to the complexation energy
computed by methods based on density functional theory.
Cheng and Yuan (2006) developed two novel structural descriptors namely
lone-pair electrons descriptor (LEI) and molecular volume descriptor (MVI) to
quantify the molecular electrostatic and steric effects, respectively.
hetn
i
n
j ij
i
d
LELEI
1 12
5.0
Where n is the number of vertices in hydrogen-suppressed molecular graph, nhet is the
number of heteroatoms and dijis the topological distance between vertex i and j.
)( bi
nnnLE
Where n is the principal quantum number, n and nb are the numbers of valence
electrons and bonding electrons, respectively, and x is the Pauling electronegativity.
Molecular volume descriptor (MVI) was defines as:
tn
i
n
ij ij
ji
d
VVMVI
1 12
5.0
Where Viand Vj are vander Waal‟s volumes of groups i and j, respectively. Group i is
composed of vertex i and the adjacent hydrogen atoms. The utility of these descriptors
was also evaluated through QSPR modeling of diverse physicochemical properties of
four data sets.
Zhou et al. (2006) reported a novel molecular structural expression method
named three-dimensional vector of atomic interaction field (3D-VAIF) based on
electrostatic and steric interactions between different types of atoms.
Review of Literature
128
Dureja and Madan (2006) utilized normalized Wiener’s topochemical
descriptor and normalized eccentric connectivity topochemical descriptor along with
molecular connectivity topochemical descriptor, Wiener’s topochemical descriptor
and eccentric connectivity topochemical descriptor for prediction of permeability
through blood brain barrier of diverse series of compounds using simpler approach.
They concluded that the high predictability of the proposed models derived from the
topochemical descriptors offer a vast potential for providing compounds for the
development of potent therapeutic agents with high permeability through blood brain
barrier.
Estrada and Matamala (2007) proposed the use of the generalized
topological descriptors (GTDs), which account for several of the classical TDs in one
single graph invariant. GTDs represent points in a six-dimensional space of
topological parameters, which can be optimized for describing a specific property.
The situation shows some resemblance with the geometry optimization procedures
used to minimize molecular energy. The family of GTI-simplex descriptors comprised
of autocorrelation descriptors was defined as:
kD
k
ok pxCGTI 1
0 ,
Where the summation goes over the different topological distances in the graph, D
being the topological diameter, and accounts for the contributions ηk of pairs of
vertices located at the same topological distance k.
Using this approach, it was observed that GTI have improved QSPRs by
reducing the standard deviation by almost 50%. In addition, the current approach
permits the illustration of the similarities and differences among the different
descriptors studied, indicating possible directions for searching new optimal
molecular descriptors.
Hosoya (2007) stressed that mathematical importance of the topological
descriptor, ZG, or the so-called Hosoya descriptor. He proposed a conjecture that for a
given pair of positive integers (n1<n2) which are prime with each other there exists a
series of Z-trees {Gm} of the property, Z(Gm) = Z(Gm-1) + Z(Gm-2) (m3), with Z(G1)
= n1 and Z(G2) = n2. He suggested that the role of Z-descriptor is not limited to
Review of Literature
129
elementary mathematics but also will be found in sophisticated algebraic number
theory and graph theory.
Peltason and Bajorath (2007) conceptualized structure-activity relationship
descriptor (SARI) for evaluating presence of activity cliffs and was defined as a
function of two separately calculated scores that assess intra class diversity and
activity differences of similar compounds:
SARI= 0.5 [Scorecount + (1- Scoredisc )]
Where Scorecount and Scoredisc are the continuity and discontinuity score, respectively.
The continuity score measures potency-weighted structural diversity within a class of
active compounds. High continuity scores reflect the presence of structurally diverse
molecules having comparable potency, which is a major characteristic of continuous
SARs. The discontinuity score determines average potency differences for pairs of
similar ligands, which reveals the presence of activity cliffs as a major determinant of
discontinuous SARs.
Veljkovic et al. (2007) described a very simple and efficient criterion based
on the electron-ion interaction potential (EIIP) and the average quasi valence number
(AQVN) to discriminate active from inactive flavonoids and selection and
optimization of lead compounds with anti-HIV-1 activity. In comparison with other
more complex approaches for in silico selection of flavonoids, EIIP/AQVN approach
showed a good correlation with anti-HIV-1 activity.
Bonacich (2007) described eigenvector centrality-x in two equivalent ways, as
a matrix equation and as a sum. The centrality of a vertex is proportional to the sum of
the centralities of the vertices to which it is connected. It was defined as the ith
component of the eigenvector associated to the largest eigenvalue of A:
Ax = λx,
n
j
jiji nixax
1
, ,.....,1
A second measure of centrality, beta-centrality or c(β), was also defined as a
weighted sum of paths connecting other vertices to each position, where longer paths
are weighted less.
1
1 1
k
kk Ac
Where |β|<1/λand 1 is a vector of ones.
Review of Literature
130
The advantages of eigenvectors and beta-centrality over conventional graph theoretic
measures like degree, betweenness, and closeness centralities of centrality were also
discussed in this study.
Guha and Ven Drie (2008) proposed structure activity landscape (SAL)
descriptor to identify “structure-activity cliffs” i.e. pairs of molecules which are most
similar but have the largest change in activity. The SAL descriptor a pair of
compounds was defined as:
ij
iji
ijSim
AASALI
1
Where Ai and Aj are the biological activities ith
and jth
molecules and Simij is similarity
coefficient between the two molecules. The robustness of this method was also
demonstrated using a variety of computational control experiments.
Tong et al. (2008) derived a novel descriptor, vector of principal component
scores (VSW) for weighted holistic invariant molecular descriptor, from the principal
component analysis of a matrix of 99 weighted holistic invariant molecular
descriptors of amino acids. The satisfactory results were obtained by utilizing VSW
descriptors in (Q)SAR studies for three kinds of classical peptide analogues. Their
study indicated that the novel VSW descriptors were suitable for not only small-
molecule drugs, but also for structural characterization of polypeptide sequences.
They concluded that the VSW descriptors have a great prospect in (Q)SAR studies for
polypeptide and its analogues.
Chekmarev et al. (2008) extended the application of the shape signatures
methodology to the domain of computational models for cardiotoxicity. They applied
Shape Signatures method to generate MDs for use in classification techniques such as
k-nearest neighbors (k-NN), support vector machines (SVM), and Kohonen self-
organizing maps (SOM). They concluded that the shape signatures method offers a
novel practical approach to classifying compounds with respect to their potential for
cardiotoxicity.
Guha and Ven Drie (2008) proposed structure activity landscape (SAL)
descriptor to identify “structure-activity cliffs” i.e. pairs of molecules which are most
similar but have the largest change in activity. The SAL descriptor a pair of
compounds was defined as:
Review of Literature
131
ij
iji
ijSim
AASALI
1
Where Ai and Aj are the biological activities ith
and jth
molecules and Simij is similarity
coefficient between the two molecules. The robustness of this method was also
demonstrated using a variety of computational control experiments.
Burden et al. (2009) described charge fingerprints as a new type of universal
descriptors for building good (Q)SAR/QSPR models of a diverse range of
physicochemical and biological properties. The atomistic and charge fingerprint
descriptors were found more successful than the eigenvalue descriptors in building
(Q)SAR models on their own. They have suggested that universal descriptors will be
useful for modeling large data sets as well as for screening large virtual libraries.
Vukicevic (2011) presented a new measure for checking fitting ability of the
model i.e chor coefficient (rc) and compared it to Pearson correlation coefficient, r.
The chor coefficient can be calculated as:
,1min1
2rrc
,
0
3
0
2
0
1 ,,,1minDX
DX
DX
DX
DX
DX
Where r2
is squared value of correlation coefficient.
Besides illustrating its advantages, he showed that it is strongly connected
with Pearson correlation coefficient and that all algorithms for optimization of r can
be applied to optimize rc with minimal programming interventions.
Verma and Hansch (2011) reviewed the various application/use of 13
C-NMR
chemical shift as (Q)SAR/QSPR descriptor. Their detailed investigation indicated that
the 13
C-NMR chemical shifts are sufficiently rich in chemical information and are
able to encode the structural features of the molecules contributing significantly to
their biological activity, chemical reactivity, or physical characteristics. They
proposed 13
C-NMR chemical shifts as promising descriptor in classical
(Q)SAR/QSPR modelling studies.
Liu et al. (2011) derived a novel class of MDs termed as “Power keys” by
exhaustively enumerating, canonicalizing, and uniquely encoding all possible
subgraphs up to a certain length. In this work, they have demonstrated the utility of
“Power keys” in substructure searching/screening a chemical database in order to
Review of Literature
132
minimize the number of molecules that need to be verified by expensive atom-to-atom
matching.
Hemmateenejad et al. (2011) proposed four different sets of amino acid (AA)
descriptors on basis of QTMS approach for use in the (Q)SAR study of peptides.
These descriptors were successfully utilized for modeling 3 data sets of peptides.
Nie et al. (2012) proposed a novel TD ′EDm by introducing the bond angle into
hidden hydrogen graph of molecules and using the geometric distance instead of the
sum of bond length between two vertices. The ′EDm was derived from ionicity
descriptor matrix Q (a subtype of distance matrix), and branching degree matrix ′G as:
′EDm = ′GSm Q (m= 1,2,3)
The utility of ′EDm was also demonstrated through development of high quality
QSPR models of 44 cis-trans isomers for alkenes. The ′EDm described the molecular
structure more accurately, and realizes unique characterization to cis-trans isomers.
Rabal and Oyarzabal (2012) developed a novel descriptor (LIR1f)
accounting for ligand-receptor interactions to define and visually explore biologically
relevant chemical space. It converts structural information into a one-dimensional
string accounting for the plausible ligand-receptor interactions as well as for
topological information. This descriptor was proposed with an aim to enable the
clustering, profiling, and comparison of libraries of compounds from a chemical
biology and medicinal chemistry perspective. The ligand receptor application of
LIR1f was demonstrated with four reported compound data sets associated with four
different target families.
Matrices associated to a molecular graph
The graph-theoretical approach to quantitative structure- activity/ property
relationships [(Q)SAR/ SPR] is based on a well-defined mathematical representation of
the chemical structure. In Chemical Graph Theory, molecular structures are normally
represented as hydrogen-suppressed graphs, whose vertices and edges act as atoms and
covalent bonds, respectively therefore molecular graph is a non-numerical
representation of the chemical structure. In order to obtain a quantitative
characterization of the molecular structure, to compare molecular structures, and to
compute various structural and topological descriptors, one has to use graphs
represented as matrices
Review of Literature
133
The calculation of the descriptors begins with the reduction of the molecule to the
hydrogen-suppressed skeleton or graph and reduction of this graph to several different
matrices depending upon what kind of entries are chosen for the atoms and bonds.
Thus a variety of matrices have been proposed in the literature (Engel, 2012). The
most commonly used matrices are as follows:
The Adjacency matrix
First identification of an organic molecule with a graph and its representation
by an adjacency matrix was made by Sylvester (1878). The adjacency matrix A is one
of the fundamental graph theoretical matrices; which represents the whole set of
connections between adjacent pairs of atoms (Trinajstic, 1983). The entries in the
adjacency matrix are symbolized as Aij and are equal to either one or zero, depending
respectively on whether or not the vertices are connected. The entries Aij of the matrix
equal 1 if vertices vi and vj are adjacent (i.e., the atoms i and j are bonded) and zero
otherwise. Thus, matrix representation is a Boolean matrix with bits (0 or 1).
The adjacency matrix A = A (G) of a graph G with N vertices is the square N x
N symmetric matrix whose entry in the jth
column is defined as:
[A]ij = 1 if i j and eijE (G)
= 0 if i = j and eijE (G)
Where E(G) is the set of the edges in a connected graph (G), eij is the edge formed by
atoms i and j. The diagonal elements are zero.
The ith
row sum of the adjacency matrix is called, vertex degree, δi and defined as:
A
j
iji a
1
Where aij are the elements of adjacency matrix (Todeschini and Consonni, 2009).
The Edge-adjacency Matrix
The edge-adjacency matrix of a graph G, denoted by EA (also called Bond
matrix) encodes information about the connectivity between graph edges:
[EA]ij = [Eij] = 1 if (i,j) are adjacent bonds
= 0 otherwise
The entries Eij of the matrix are equal to one if edges ei and ej are adjacent (the two
edges thus forming a path of length two) and zero otherwise (Gutman and Estrada,
1996).
Review of Literature
134
The augmented adjacency matrix
To account for heteroatoms and multiple bonds in the molecule, the
augmented adjacency matrix, aA was proposed by Randic (1991b) replacing the zero
diagonal entries of the adjacency matrix of the simple graph with values
characterizing different atoms in the molecule. Molecules containing heteroatom (s)
and/or multiple bonds are represented by vertex- and edge-weighted graphs. The
adjacency matrix of a vertex- and edge-weighted molecular graph is defined by:
[aA(w)]ij = Ewij if (eij) E (G)
= Vwi if i = j
= 0 if (eij) E (G)
Where Vwi is the parameter of the vertex vi , and Ewij is the parameter of the edge eij
and the diagonal elements wi usually are some atomic physico-chemical properties
(Randic and Dobrowolski, 1998).
The additive adjacency matrix
The additive adjacency matrix (A) is obtained by modifying adjacency
matrix. When sum of the non-zero row elements in the adjacency matrix represents
the degree of corresponding vertex (of the vertices adjacent to vertex i) of a molecular
graph G, the matrix obtained is termed as additive adjacency matrix (Gupta et al.,
2001).
The augmentative adjacency matrix
The augmentative adjacency matrix (A)is obtained by modifying adjacency
matrix. When product of the non-zero row elements in the adjacency matrix
represents the degree of corresponding vertex (of the vertices adjacent to vertex i) of a
molecular graph G, the matrix may be defined as augmentative adjacency matrix
(Dureja and Madan, 2007).
The extended adjacency matrix
The extended adjacency matrices, denoted by Ex
A, are weighted adjacency
matrices N x N whose elements are defined as a function of local vertex invariants of
the adjacency matrix A and of some atomic properties (Yang et al., 1994) and can be
defined as:
[Ex
A]ij = aij •(δi* δj+ δj* δi)/2if i j
= 0 if i = j
Review of Literature
135
Where aij are the entries of the adjacency matrix and δ is the vertex degree (Yang et
al., 1994).
The additive chemical adjacency matrix
The additive topochemical adjacency matrix (Ac)is obtained by modifying
adjacency matrix. When sum of the non-zero row elements in the adjacency matrix
represents the chemical degree of corresponding vertex (of the vertices adjacent to
vertex i)of a molecular graph G, the matrix may be defined as additive topochemical
adjacency matrix. The chemical degree of a vertex was obtained from the adjacency
matrix by substituting, row elements corresponding to heteroatom(s), with atomic
weight with respect to carbon atom (Gupta et al., 2003).
The augmentative chemical adjacency matrix
The augmentative chemical adjacency matrix (Ac) is obtained by modifying
augmentative adjacency matrix. When product of the non-zero row elements in the
adjacency matrix represents the chemical degree of corresponding vertex (of the
vertices adjacent to vertex i) of a molecular graph G, the matrix may be defined as
augmentative chemical adjacency matrix. The chemical degree of a vertex was
calculated from the adjacency matrix by substituting the row elements corresponding
to heteroatom(s), with atomic weight with respect to carbon atom (Dureja and Madan,
2007).
The Laplacian Matrix
The Laplacian Matrix of G, L = L (G), is a square N x N symmetric matrix, „N‟
being the number of vertices in the molecular graph, obtained as the difference
between the diagonal matrix of vertex degree, V(G) and the adjacency matrix A(G). It
is defined by the following equation:
L(G) = DEG(G) A(G)
The elements of the Laplacian Matrix are:
[L]ij = δi if i = j
= -1 if eijE (G)
= 0 if eijE (G)
Where δi is the vertex degree of atom i.
The Laplacian Matrix is also called the Kirchhoff matrix due to its role in the
spanning tree theorem of Kirchhoff. The Laplacian Matrix offers a new method for
Review of Literature
136
computing the Wiener descriptor of trees, and represents the source of new graph
invariants and topological descriptors. (Mohar, 1989; Trinajstic et al., 1994).
The Matrix
The Matrix is derived from the adjacency matrix by assigning the value
(δi*δj)-1/2
to the matrix element corresponding to the edge eij between vertices vi and vj.
[]ij = (δi*δj)-1/2
if eijE (G)
= 0 otherwise
Where δ being vertex degree of the atoms (Randic, 1992).
The Burden Matrix
The Burden matrix is another interesting weighted adjacency matrix from which
Burden eigenvalues are computed and used in (Q)SAR/QSPR/QSTR modeling. This
is defined as:
[B]ij = *πij x10-1
if eijE (G)
= Zi if i = j
= 0.001 if eijE (G)
Where Zi is the atomic number of atoms. The off diagonal elements (Bij) are chosen as
positive real numbers that depend on whether two atoms are neighbors and, if so, on
the type of bond between them or alternatively, the off-diagonal elements (Bij)
represents two bonded atoms i and j are equal to function of the conventional bond
order-*π, i.e. 0.1, 0.2, 0.3 and 0.15 for a single, double, triple and aromatic bond,
respectively; and the rest matrix elements are set at 0.001 (Burden, 1989; 1997)
The Zagreb matrix
Zagreb matrices are a generalization of the matrix in terms of a variable
exponent λ as:
[ZMe (λ)]ij = (δi* δj)λ
if eijE (G)
= 0 otherwise
For λ = -1/2, the Zagreb matrix obviously reduces to the edge- matrix.
The Zagreb matrix, can also be considered as the vertex- and edge-weighted matrices
related to the vertex- and edge-connectivity matrices. They can be formulated in terms
of the vertex- or edge-degrees (Janezic et al., 2007).
The vertex-Zagreb matrix, ZMv , was defined as (Janezic et al., 2007)
[ZMv]ij = (δi2)if i = j
Review of Literature
137
= 0 if i j
Similarly, the modified vertex-Zagreb matrix, m
ZMv , was defined as (Janezic et al.,
2007),
[m
ZMv]ij = (1/δi2)if i = j
= 0 if i j
The sum of the diagonal elements of the vertex Zagreb matrix results into the first
Zagreb descriptor (M1) (Gutman and Trinajstic, 1972), while the sum of the diagonal
elements of the modified vertex Zagreb matrix results into the modified first Zagreb
descriptor.
The edge-Zagreb matrix, ZMe , was defined for λ = 1 as (Janezic et al., 2007):
[ZMe (λ)]ij = (δi* δj)λ
if eijE (G)
= 0 otherwise
Modified edge-Zagreb matrix, denoted by m
ZMe, was defined for λ = -1 as (Janezic et
al., 2007):
[m
ZMe (λ)]ij = (1/δi* δj)λ
if eijE (G)
= 0 otherwise
The half sum of the off-diagonal elements of the edge-Zagreb matrix is the
secondZagreb descriptor (M2) (Gutman and Trinajstic, 1972), whereas the half sum
of the off-diagonal elements of the modified edge-Zagreb matrix is the modified
second Zagreb descriptor(Janezic et al., 2007).
The Distance Matrix
The distance matrix D(G), introduced by Harary (1969) is based on the very
old concept of the topological distance between vertices in a graph, which is measured
by the number of edges separating a pair of vertices. The Distance Matrix, D = D(G),
of a connected graph G is a real symmetric matrix whose elements [D]i j are defined as
[D]ij = dij if i j
= 0 if i = j
The distance matrix is the source of a large number of graph invariants and
topological descriptors, and its computation can be performed with various
algorithms.
Review of Literature
138
The distance sum of the vertex vi, DSi, is defined as the sum of the topological
distances between vertex vi and every vertex in the molecular graph. i.e., the sum of
over row i in the D matrix:
N
ij
iji DDS
The distance sum was used to define the topological descriptor J, while the sum of the
graph distances between all the pairs of vertices defines the Wiener descriptor, W.
The reciprocal distance matrix
In a graph descriptor or topological descriptor computed on the basis of graph
distances the highest contribution to the numerical value of the descriptor is made by
the large distances between the vertices of the molecular graph. A new graph metric,
the reciprocal distance, was introduced in order to define graph descriptors in which
the contribution of the distance between two vertices decreases with the increase of
the distance. The reciprocal distance matrix of a graph G with N vertices, RD =
RD(G), is the square N N symmetric matrix whose entries [RD]ij are equal to the
reciprocal of the distances between vertices vi and vj, i.e., 1/di j, for non-diagonal
elements, and is equal to zero for the diagonal elements:
[RD]ij = 0 if i = j
= 1/dij if i j
Besides calculation of Wiener number analogue, called the Harary descriptor (Plavis
et al., 1993), the D-1
or Harary matrix was successfully used to generate new structural
descriptors and in the computer generation of acyclic graphs based on the local vertex
invariants and TDs. For vertex- and edge-weighted molecular graphs, the reciprocal
distance matrix was defined as:
[D-1
(w)]ij = [d(w)ii] if i = j
= 1/[d(w)ij] if i j
Where d(w) is a weighted distance matrix and w denotes a weighting scheme
(Ivanciuc, 2000).
Review of Literature
139
The resistance distance matrix (Ω)
The resistance distance matrix (Ω), proposed by Klein and Randic (1993) is
based on electrical network theory and considers that a single bond between two
carbon atoms from the molecular graph corresponds to a 1 resistor. The resistance
distance between a given pair of vertices vi and vj is defined as the effective electrical
resistance between the vertices.
The elements of resistance distance matrix i.e. Ωij are defined as:
[Ω]ij = 0 if i = j
= Ωij if i j
For acyclic graphs, resistance distances are equal to topological distances but
for cyclic graphs resistance distances may be smaller than, or equal to topological
distances. The resistance distance was used to generate rules to characterize molecular
cyclicity and centricity. (Klein and Randic, 1993).
The reciprocal resistance matrix or conductance matrix (σ)
The inverse of the resistance distance is the conductance ζij between two
vertices vi and vj, is calculated as the following (Klein and Ivanciuc, 2001):
[ζ]ij = 1/ Ωij = Σpij [pij]-1
Where the sum runs over all the paths pij connecting the two considered vertices and
pij is the length of the considered path pij (Klein and Ivanciuc, 2001).
The conductance matrix (or electrical conductance matrix) is therefore the reciprocal
resistance matrix, whose elements are the conductance values ζij between two vertices
vi and vj. Other quotient matrices derived from the resistance matrix (Babic et al.
2002) are the distance/resistance quotient matrix, D/Ω (or topological
distance/resistance distance quotient matrix) and resistance/distance quotient matrix,
Ω/D. The two descriptors obtained from these matrices are D/Ω descriptor or Wiener
sum D/Ω descriptor and the Kirchhoff sum descriptor, Ω/D descriptor respectively.
The Detour matrix (Δ)
The detour matrix, together with the distance matrix, was introduced into the
mathematical literature by Frank Harary (1969). The detour matrix was introduced
into the chemical literature under the name the maximum path matrix of a molecular
graph by Ivanciuc and Balaban (1994) and independently by Amic and Trinajstic
(1995). The detour matrix Δ of a graph G (or maximum path matrix) is a square
Review of Literature
140
symmetric NxN matrix, N being the number of graph vertices, whose entry i–j is the
length of the longest path from vertex vi to vj. The element of detour matrix (or the
maximum path matrix MP), []ij, is defined as:
[]ij = δij if i j
= 0 if i = j
Where δij is the number of steps in a longest path (i.e. the maximum number of edges)
in G between vertices i and j and is called detour distance. For acyclic graphs, the
detour matrix is identical to the distance matrix, but for cyclic graphs elements in the
detour matrix may be equal to, or larger than, those of the distance matrix (Trinajstic
et al., 1997). The two types of paths, the shortest and the largest ones, can be
combined in one and the same square matrix i.e. detour-distance matrix, -D
(originally called Maximum minimum Path, MmP or topological distance–detour
distance combined matrix), whose entries are defined as:
[-D]ij = []ij (i,j) E (G) if i < j
= [D]ij (i,j) E (G) if i > j
= 0 if i = j
The detour-distance matrix -D defined by Ivanciuc and Balaban (1994) in which the
elements of its upper triangle is identical to that of detour matrix while the lower
triangle elements are identical to those in the distance matrix (Amic and Trinajstic,
1995). Several molecular descriptors (MDs) derived from this matrix, such as the
spectral descriptors and Wiener-type descriptors, are the same as those from the
detour–distance combined matrix.
Reciprocal detour matrix (Δ-1
)
The reciprocal detour matrix (Δ-1
) may be expressed as:
[Δ-1
]ij =Δ-1
ij if i j
= 0 if i = j
All elements equal to zero are left unchanged in the reciprocal matrix. Harary detour
descriptors are derived from the reciprocal detour matrix (Diudea et al., 1998).
Detour-path matrix
The detour-path matrix is denoted as p, is a combinatorial matrix whose off-
diagonal entry i–j is the count of all paths of any length m (1≤ m≤ ij) that are
included within the longest path from vertex vi to vj (ij) (Diudea, 1996a). The
Review of Literature
141
diagonal entries are zero. Each entry i–j of the detour-path matrix is calculated from
the detour matrix D as the following:
[Δp]ij = (Δij +1) /2
= (Δij2+ Δij) /2
Detour-delta matrix (ΔΔ)
The detour-delta matrix (ΔΔ) is another combinatorial matrix derived as the difference
between the detour-path matrix (Δp) and the detour matrix (Δ) (Janezic et al., 2007):
ΔΔ = Δp – Δ
The distance/detour quotient matrix (or topological distance/detour distance
quotient matrix), denoted as D/Δ, is also derived from detour and distance matrices
whose off-diagonal entries are the ratio of the lengths of the shortest over the longest
path between any pair of vertices (Randic, 1997b). It is defined as:
[D/Δ]ij =dij/ Δij if i j
= 0 if i = j
Where dij and Δij are the topological and detour distances between vertices vi and vj
respectively.
The detour complement matrix (Δc) for simple graphs is defined as (Janezic et
al., 2007):
[Δc]ij =N- Δij if i j
= 0 if i = j
Where N is the number of atoms.
The Wiener Matrix
The Wiener matrix W is a square symmetric N N matrix proposed by Randic
in 1993, used to define new structural invariants useful in QSPR/(Q)SAR studies.
Each off-diagonal entry of the Wiener matrix corresponds to the number of external
paths in the graph that contains the path pij from vertex vi to vertex vj and is calculated
as the product of the numbers of vertices on each side of the path pij, namely, Ni,p and
Nj,p, including both vertices i and j (Randic et al., 1994a; Ivanciuc and Ivanciuc,
1999). This matrix, which is a dense Wiener matrix, is usually called path-Wiener
matrix denoted as Wp. Wiener matrix entries are:
[We/p]ij = Ni.e/pNj.e/p
Review of Literature
142
Where Ni and Nj denote the number of vertices lying on the two sides of the edge/path
e/p having vertices vi and vj as the endpoints. This definition gives the „edge/path
contributions‟ to a global descriptor, which is identical to the Wiener descriptor when
it is defined on edges. The similar equation defined on the graph paths gives a
structural descriptor, which is identical to the hyper-Wiener descriptor. The Wiener
matrices were used as the basis of new topological descriptors.
The reciprocal Wiener matrix
It is denoted by W-1
, is the matrix whose elements are the reciprocal of the
corresponding Wiener matrices elements (Diudea, 1997c). Moreover, the Wiener
difference matrix, WΔ was also proposed as:
WΔ = Wp – We
whose non-diagonal elements are based on path contributions calculated only on paths
larger than 1 (Diudea, 1996a).
The Szeged matrices
The Szeged matrix of a hydrogen depleted molecular graph G is a square
unsymmetrical whose off-diagonal entry i–j is the number of vertices. Ni,p lying closer
to the focused vertex vi. This matrix was defined as (Dobrynin and Gutman, 1994;
Diudea et al., 1997b):
[SZDe/p]ij = Ni,e/p * Nj,e/p
Where [SZe/p]ij are the non-diagonal entries of this matrix.
Since the Szeged matrix was not defined in terms of cyclic structures, Gutman (1994a)
has changed the meaning of Ni and Nj as follows:
Ni,e/p = {vkvk V(G); [D]ik< [D]jk}
Nj,e/p = {vkvk V(G); [D]jk< [D]ik}
Thus, Ni and Nj denote the cardinality of the sets of vertices closer to the two
vertices vi and vj, respectively; vertices equidistant to vi and vj are not counted. The
half sum of entries in the SZe/p matrix gives the Szeged descriptor SZc, and the hyper-
Szeged descriptor SZp, respectively. The difference between SZp and SZc gives the
SZmatrix: SZ=SZp- SZc.. (Gutman, 1994a).
Review of Literature
143
The Cluj Matrix
The Cluj matrix, CJu, a square unsymmetrical matrix was defined by Diudea
(1997a; 1997b) following the principle of “single endpoint characterization of a path”
by using either the distance or the detour concept:
[CJu]ij = Ni,(i,j)
Ni,ij = max{vkvk V(G); [D]ik< [D]jk}
; (i,k)∩ (i,j)=max {i};(i,j)= min}
It collects the vertices lying closer to the focused vertex i but out of the shortest path
(i,j) or, in other words, the “external” paths on the side of i, which include the path
(i,j). The above definition is valid both for acyclic and cyclic graphs. It can be used as
a basis for constructing Wiener-type descriptors as well as Schultz-type descriptors
(Diudea et al., 2002).
The Barysz Distance Matrix
The Barysz distance matrix (DZ) is a weighted distance matrix accounting
simultaneously for the presence of heteroatom(s) and multiple bond(s) in the molecule;
it is defined as:
[DZ]ij =
jiifZZ
Zc
jiifZ
Zc
ijd
b bbb
i
1 )2()1(
2
* **
1
1
Where Zc is the atomic number of the carbon atom. Zi is the atomic number of the i th
atom. π* is the conventional bond order, the sum runs over all dij bonds involved in the
shortest path between vertices vi and vj. dij being the topological distance, and the
subscripts b(1) and b(2) represent the two vertices incident to the considered b bond.
The combinatorial matrices
Two matrices have been derived from the classical distance matrix D: the
distance delta matrix, D, and distance-path matrix, Dp, whose elements are calculated
by a combinatorial algorithm:
[D]ij = [D]ij/2
[Dp]ij = {[D]ij +1 }/2
Review of Literature
144
The element [D]ij counts the number of „internal‟ paths (larger than unity) included
in the shortest paths between vertices vi and vj; the elements [Dp]ij counts all internal
paths included in the shortest paths between vertices vi and vj in a graph (Brualdi and
Ryser, 1991).
The Hosoya matrix
Randic (1994) introduced the square symmetric Hosoya matrix or Z matrix by
an analogue cutting procedure. The original Hosoya Z matrix was defined only for
acyclic graphs; each off-diagonal element is equal to the Hosoya Z descriptor of the
subgraph G’ obtained from the graph G by erasing all edges along the path connecting
two vertices viand vj. The Z matrix entry [Z]ij corresponding to a pair of vertices vi and
vj of a tree, T, is given by:
[Z]ij = Z (T-pij) if i j
0 if i = j
Where Z (T-pij) is the Z descriptor of the spanning subgraph. T-pij obtained from T by
the removal of all edges along the path pij connecting the vertices vi and vj(Randic,
1994; Milicevic et al., 2003).
A general definition of the Hosoya Z matrix (generalized Hosoya Z matrix) able to
represent both acyclic and cyclic graphs is the following:
[Z]ij = ∑min pij Z (T-pij)/pij if i j
Zij if i = j
Where Z(T-pij) is the Z descriptor of the spanning subgraph (Plavšic et al., 1997).
The Path Matrix
Randic (1991a) defined the elements of Path matrix P as the quotient between
numbers of paths P′ in a subgraph G′ to the number of paths p in simple G. It is square
symmetric N N matrix whose entry in the ith
row and jth
column is defined by the
equation:
[P]ij = p′ ij/p if i j and if (i, j) E(G)
= 0 otherwise
Where p′ij is the total number of paths in the subgraph G′ = G-(i,j) and p is the total
number of paths in G. If G‟ is disjoint then the contributions of each component are
added. The descriptor calculated on this matrix is called the P'/P descriptor (Randic,
1991b).
Review of Literature
145
The natural distance matrix
Randic et al. (2010) recently developed this novel distance matrix termed as
natural distance matrix (NDM), which provided a potential scope of developing novel
graph invariants as MDs for structure-property-activity studies. The matrix elements
(i,j) of NDM are given by:
NDM(i, j) = [di +dj – naij]0.5
Where di and djare the degrees (valence) of vertices i and j, and nai,j is the number of
vertices adjacent to both i and j (Randic et al., 2010).
The walk matrix
There are not much nonsymmetrical matrices that are of interest in chemistry.
One of the simplest is the „„random walk‟‟ matrix, the elements of which are
determined by the number of walks needed to move from a point i to the point j. In
general, the number of walks from a point i to the point j is different from the number
of walks from the point j to the point i (Diudea, 1996a). The random walk matrix,
nWm, constructed on the basis of principle of the single endpoint characterization of a
path, is a diagonal matrix whose diagonal elements are the nth-order weighted walk
degree, nWmi, that is, the sum of the weights (the property collected in matrix M) of all
walks of length n starting from the ith vertex to any other vertex in the graph, directly
calculated as:
A
j
ijn
min MW
1
][
Where A is the total number of the vertices in the graph and Mn is the nth power of
the matrix M, which can be any square N N topological matrix (Diudea, 1996a;
Diudea and Randic, 1997).
The pendent matrix
The pendent matrix was proposed to enhance the role of terminal or pendent
vertices in (Q)SAR and QSPR studies. The pendent matrix (Dp) of a graph G is a
submatrix of distance matrix, obtained by retaining the columns corresponding to
pendent vertices (Gupta et al., 1999).
[Dp]ij = dij if i j
= 0 if i = j
Review of Literature
146
where dij is the length of the path that contains the least number of edges between
vertex i and vertex j in graph G.
The chemical pendent matrix
The chemical pendent matrix is obtained by modifying pendent matrix. The
chemical distance of a terminal or pendent vertex was obtained from the pendent
matrix by replacing the row elements corresponding to heteroatom(s), with atomic
weight with respect to carbon atom (Gupta et al., 1999; Gupta et al 2002b; Goyal et
al., 2010).
Review of the literature reveals that although large number of topology based
molecular descriptors have been reported in literature but many of them either contain
similar information as others or are information poor. Accordingly, only a small
fraction of these MDs have been successfully employed in structure-activity studies.
As a consequence, there is a strong need to develop novel topology based descriptors
having very high discriminating power, low degeneracy and non-correlation with the
existing topology based descriptors so as to accelerate the process of lead
discovery/optimization in a rapid and cost effective manner. Moreover, the novel
descriptors with these desired features may give better insight into the structure-
activity/property relationship.