Characterization of Chemical Libraries Using Scaffolds and Network Models
Transcript of Characterization of Chemical Libraries Using Scaffolds and Network Models
![Page 1: Characterization of Chemical Libraries Using Scaffolds and Network Models](https://reader031.fdocuments.net/reader031/viewer/2022030315/587e50801a28abeb1a8b5e15/html5/thumbnails/1.jpg)
Characteriza*on of Chemical Libraries Using Scaffolds and
Network Models
Dac-‐Trung Nguyen, Rajarshi Guha NIH NCATS
ACS Na:onal Mee:ng, Boston 2015
![Page 2: Characterization of Chemical Libraries Using Scaffolds and Network Models](https://reader031.fdocuments.net/reader031/viewer/2022030315/587e50801a28abeb1a8b5e15/html5/thumbnails/2.jpg)
Outline
OR
![Page 3: Characterization of Chemical Libraries Using Scaffolds and Network Models](https://reader031.fdocuments.net/reader031/viewer/2022030315/587e50801a28abeb1a8b5e15/html5/thumbnails/3.jpg)
Mo*va*ons
• Library comparison usually driven by a need to construct or expand a library – OLen with constraints on resources
• Two classes of features to consider – Compound-‐centric (physchem proper:es, bioac:vity, target preferences)
– Library-‐centric (diversity, chemical space coverage) • Library comparisons generally reduce to – Distribu:ons of compound features (univariate) – Overlap in some chemical space (mul:variate)
![Page 4: Characterization of Chemical Libraries Using Scaffolds and Network Models](https://reader031.fdocuments.net/reader031/viewer/2022030315/587e50801a28abeb1a8b5e15/html5/thumbnails/4.jpg)
Comparing Libraries
• Most comparisons employ a reduced (numerical) representa:on of the structure – Fingerprints, BCUTs, physicochemical descriptors
• Perform comparisons in the new space – PCA, SOM, MDS, GTM, …
Schamberger et al, DDT, 2011, 16, 636-‐641; Kireeva et al, Mol. Inf., 2012, 31, 301-‐312
![Page 5: Characterization of Chemical Libraries Using Scaffolds and Network Models](https://reader031.fdocuments.net/reader031/viewer/2022030315/587e50801a28abeb1a8b5e15/html5/thumbnails/5.jpg)
Scaffolds & Networks
• Scaffolds represent a chemically meaningful reduced representa:on of the structures
• Can be challenging to define what a (good) scaffold is
• A network representa:on of the collec:on of structures allows for novel ways to perform library comparisons – How fine grained can such comparisons be?
![Page 6: Characterization of Chemical Libraries Using Scaffolds and Network Models](https://reader031.fdocuments.net/reader031/viewer/2022030315/587e50801a28abeb1a8b5e15/html5/thumbnails/6.jpg)
Scaffold Network Representa*ons
• Scaffolds are generated by exhaus:ve enumera:on of SSSR
• Scaffolds are nodes, connected by directed edges • Nodes are labeled by a hash key of the scaffold
4 compounds 1912 compounds
![Page 7: Characterization of Chemical Libraries Using Scaffolds and Network Models](https://reader031.fdocuments.net/reader031/viewer/2022030315/587e50801a28abeb1a8b5e15/html5/thumbnails/7.jpg)
Scaffold Network Construc*on
• A scaffold network is a directed graph • Edges denote sub/super-‐structure rela:onships between scaffolds
• Each node in the network represents a unique scaffold
• Singletons are acyclic molecules
![Page 8: Characterization of Chemical Libraries Using Scaffolds and Network Models](https://reader031.fdocuments.net/reader031/viewer/2022030315/587e50801a28abeb1a8b5e15/html5/thumbnails/8.jpg)
Datasets CL1420, 31320 compounds
CL886, 3552 compounds
MIPE, 1920 compounds
Natural Products, 5000 compounds Mathews and Guha et al, PNAS, 2014, 111, 11365; Singh et al, JCIM, 2009, 49, 1010
LOPAC, 1280 compounds
1079 nodes, 115287 edges 69 trees
2131 nodes, 1843 edges 129 trees
Approved, inves:ga:onal drugs, constructed for func:onal diversity Diverse library, designed for enrichment of bioac:vity
15283 nodes, 13622 edges 729 trees
5563 nodes, 4832 edges 239 trees
23716 nodes, 21468 edges 750 trees
![Page 9: Characterization of Chemical Libraries Using Scaffolds and Network Models](https://reader031.fdocuments.net/reader031/viewer/2022030315/587e50801a28abeb1a8b5e15/html5/thumbnails/9.jpg)
• The overall structure of the complete network can characterize the library
• But distribu:ons of vertex-‐level network metrics may be informa:ve
• We can also consider approaches to iden:fy “important” scaffolds
Scaffold Network Representa*ons
![Page 10: Characterization of Chemical Libraries Using Scaffolds and Network Models](https://reader031.fdocuments.net/reader031/viewer/2022030315/587e50801a28abeb1a8b5e15/html5/thumbnails/10.jpg)
Metrics for the Complete Network
• Examined vertex-‐level measures of centrality – Closeness, betweenness, … – High similarity of MIPE & NP and low similarity of LOPAC & NP is surprising (Ertl et al, JCIM, 2008)
0.00
0.25
0.50
0.75
−10 −9 −8 −7 −6 −5log10(Betweenness)
density
CL1420CL886LOPACMIPENP
0
5000
10000
15000
20000
−8 −7 −6log Closeness (in−degree)
Num
. Sca
ffold
CL1420CL886LOPACMIPENP
![Page 11: Characterization of Chemical Libraries Using Scaffolds and Network Models](https://reader031.fdocuments.net/reader031/viewer/2022030315/587e50801a28abeb1a8b5e15/html5/thumbnails/11.jpg)
0.00
0.25
0.50
0.75
1.00
0.00
0.25
0.50
0.75
1.00
0.000
0.025
0.050
0.075
Centralization
CPL
Transitivity
CL1420 CL886 LOPAC MIPE NPLibrary
Value
Metrics for the Complete Network
• Useful to summarize distribu:ons by scalar metrics
• Path length metrics are not discriminatory due to many short paths
• Extent of clustering differs but is quite low overall
![Page 12: Characterization of Chemical Libraries Using Scaffolds and Network Models](https://reader031.fdocuments.net/reader031/viewer/2022030315/587e50801a28abeb1a8b5e15/html5/thumbnails/12.jpg)
Comparing Complete Networks
• Library overlap is characterized by the set of common scaffolds
• Scaffolds can be ranked (e.g., PageRank) – Small fragments have low PR – Large frameworks have high PR – Interes:ng scaffolds lie in between?
• Similar libraries will have common scaffolds with similar PageRank values
PageRank vector
PageRank vector
Subset Common
Fragments
Subset Common
Fragments
Normalized Dot Product
![Page 13: Characterization of Chemical Libraries Using Scaffolds and Network Models](https://reader031.fdocuments.net/reader031/viewer/2022030315/587e50801a28abeb1a8b5e15/html5/thumbnails/13.jpg)
Comparing Complete Networks
1 0 0 0 0
0 1 0 0 0
0 0 1 0.2 0.3
0 0 0.2 1 0.3
0 0 0.3 0.3 1
CL1420
CL886
LOPAC
MIPE
NP
CL1420 CL886 LOPAC MIPE NP
![Page 14: Characterization of Chemical Libraries Using Scaffolds and Network Models](https://reader031.fdocuments.net/reader031/viewer/2022030315/587e50801a28abeb1a8b5e15/html5/thumbnails/14.jpg)
Scaffold Recogni*on • What is a scaffold? • Can be addressed through the scaffold network – A scaffold is a hub within the scaffold network
• Provide a prac:cal answer to “What are the missing scaffolds in my library”
• Examples of unique scaffolds in MIPE but not in NP
![Page 15: Characterization of Chemical Libraries Using Scaffolds and Network Models](https://reader031.fdocuments.net/reader031/viewer/2022030315/587e50801a28abeb1a8b5e15/html5/thumbnails/15.jpg)
Scaffold Comparison
![Page 16: Characterization of Chemical Libraries Using Scaffolds and Network Models](https://reader031.fdocuments.net/reader031/viewer/2022030315/587e50801a28abeb1a8b5e15/html5/thumbnails/16.jpg)
Reduced Network Representa*on
• The complete network can be reduced to a forest of trees
• Order nodes by out-‐degree • From each node, traverse network un:l a terminal node is reached
• Result is a set of spanning trees
![Page 17: Characterization of Chemical Libraries Using Scaffolds and Network Models](https://reader031.fdocuments.net/reader031/viewer/2022030315/587e50801a28abeb1a8b5e15/html5/thumbnails/17.jpg)
Reduced Network Representa*on
MIPE, 1912 compounds
![Page 18: Characterization of Chemical Libraries Using Scaffolds and Network Models](https://reader031.fdocuments.net/reader031/viewer/2022030315/587e50801a28abeb1a8b5e15/html5/thumbnails/18.jpg)
Network Structure
• A scaffold forest is characterized by – Disconnected components
• structurally related scaffolds, scaffolds diversity – Singletons
• scaffolds with no superstructure – Branching within connected components
• scaffold complexity
![Page 19: Characterization of Chemical Libraries Using Scaffolds and Network Models](https://reader031.fdocuments.net/reader031/viewer/2022030315/587e50801a28abeb1a8b5e15/html5/thumbnails/19.jpg)
Forest Size vs Library Size
• A large libraries doesn’t imply a large forest • Forest size is a func:on of scaffold diversity
CL1420, 31K combinatorial library MIPE, 1912 (target) diverse library
![Page 20: Characterization of Chemical Libraries Using Scaffolds and Network Models](https://reader031.fdocuments.net/reader031/viewer/2022030315/587e50801a28abeb1a8b5e15/html5/thumbnails/20.jpg)
Summarizing Forests
• A key feature is the nature of branching in individual trees
• Characterized by ID -‐ informa:on theore:c descriptor of branching derived from the distance matrix
Bonchev & Trinajis:c, IJQC, 1978, 14, 293-‐303
ID = 978 ID = 90794 ID = 3456 ID = 979252
![Page 21: Characterization of Chemical Libraries Using Scaffolds and Network Models](https://reader031.fdocuments.net/reader031/viewer/2022030315/587e50801a28abeb1a8b5e15/html5/thumbnails/21.jpg)
Summarizing Forests
• Distribu:on of ID dis:nguishes datasets primarily in the tails
• Aggrega:ng by mean ID s:ll discriminates well – Driven by the tails
0.00
0.25
0.50
0.75
1.00
2 4 6log10(ID)
Density
CL1420CL886LOPACMIPENP
0
1
2
3
4
CL1420 CL886 LOPAC MIPE NP
Mea
n lo
g10(
I D)
![Page 22: Characterization of Chemical Libraries Using Scaffolds and Network Models](https://reader031.fdocuments.net/reader031/viewer/2022030315/587e50801a28abeb1a8b5e15/html5/thumbnails/22.jpg)
Exploring the Forest
• The metric also allows us to drill down – Select scaffolds of given branching complexity – Iden:fy scaffolds of given complexity range across different libraries (equivalent to finding holes in scaffold coverage)
≈
LOPAC, ID = 10214 MIPE, ID = 10197
![Page 23: Characterization of Chemical Libraries Using Scaffolds and Network Models](https://reader031.fdocuments.net/reader031/viewer/2022030315/587e50801a28abeb1a8b5e15/html5/thumbnails/23.jpg)
Library Comparison via Merging
• … reduces to comparing networks • We compute a graph union and construct new edges between nodes with the same hash
• How does the network structure of the union differ from the original networks?
• Can be extended to merge more than two networks
![Page 24: Characterization of Chemical Libraries Using Scaffolds and Network Models](https://reader031.fdocuments.net/reader031/viewer/2022030315/587e50801a28abeb1a8b5e15/html5/thumbnails/24.jpg)
Source Forests
• Structurally similar networks
• 2659 iden:cal nodes
• Construct union by connec:ng nodes with iden:cal hash
LOPAC MIPE
![Page 25: Characterization of Chemical Libraries Using Scaffolds and Network Models](https://reader031.fdocuments.net/reader031/viewer/2022030315/587e50801a28abeb1a8b5e15/html5/thumbnails/25.jpg)
Merged Network
• Green edges “bridge” the two networks
• Trees can now have two types of nodes
• How can we characterize the – Contrac:on? – Degree of mixing?
![Page 26: Characterization of Chemical Libraries Using Scaffolds and Network Models](https://reader031.fdocuments.net/reader031/viewer/2022030315/587e50801a28abeb1a8b5e15/html5/thumbnails/26.jpg)
Contrac*on to Measure Overlap
• Merging very similar libraries should generate a smaller forest compared to the original forests
• But this doesn’t really describe how the individual trees become (more) connected
Cnorm =F12
F1 + F2
where Fi = G1i,G2i,!,Gni{ }
0.00
0.25
0.50
0.75
1.00
Cl886/CL1420 MIPE/CL886 MIPE/LOPAC MIPE/NP
Cnorm
![Page 27: Characterization of Chemical Libraries Using Scaffolds and Network Models](https://reader031.fdocuments.net/reader031/viewer/2022030315/587e50801a28abeb1a8b5e15/html5/thumbnails/27.jpg)
0
25
50
75
100
Cl886/CL1420 MIPE/CL886 MIPE/LOPAC MIPE/NP
% o
f tre
es
Assortive Not Assortive
Assorta*vity to Measure Overlap
• Quan:fies the no:on that “like connects to like”
• Undefined for trees that only have one type of vertex (i.e., only from a single library)
• The number of trees that are assorta:ve is a global indicator of library similarity
Newman, Phys. Rev. E., 2003, 026126
![Page 28: Characterization of Chemical Libraries Using Scaffolds and Network Models](https://reader031.fdocuments.net/reader031/viewer/2022030315/587e50801a28abeb1a8b5e15/html5/thumbnails/28.jpg)
0
10
20
30
0.4 0.6 0.8 1.0Assortativity
density
Cl886/CL1420
MIPE/CL886
MIPE/LOPAC
MIPE/NP
Assorta*vity to Measure Overlap
• We then examine the distribu:on of assorta:vity across assorta:ve trees
• Dissimilar libraries have few assorta:ve trees – But they have high values of assorta:vity
• However, high assorta:vity doesn’t imply high overlap
![Page 29: Characterization of Chemical Libraries Using Scaffolds and Network Models](https://reader031.fdocuments.net/reader031/viewer/2022030315/587e50801a28abeb1a8b5e15/html5/thumbnails/29.jpg)
Assorta*vity to Measure Overlap
Assorta:vity = 0.85 (MIPE & NP)
Assorta:vity = 0.95 (CL886 & CL1420)
![Page 30: Characterization of Chemical Libraries Using Scaffolds and Network Models](https://reader031.fdocuments.net/reader031/viewer/2022030315/587e50801a28abeb1a8b5e15/html5/thumbnails/30.jpg)
Overlap via Tree Complexity
• Similar libraries lead to fewer trees in the merged network, but also denser trees
• Change in density (branching) across the forest can also measure the extent of overlap
MIPE LOPAC Merged
![Page 31: Characterization of Chemical Libraries Using Scaffolds and Network Models](https://reader031.fdocuments.net/reader031/viewer/2022030315/587e50801a28abeb1a8b5e15/html5/thumbnails/31.jpg)
Summarizing via Tree Complexity
• Distribu:ons of ID before and aLer merging don’t differ very much, visually
• However a KS test does discriminate them
0.0
0.2
0.4
0.6
0.8
1 2 3 4log10(ID)
density
IndividualMerged
CL886 / CL1420 MIPE / NP
0.0
0.1
0.2
0.3
0.4
2.5 5.0 7.5log10(ID)
density
IndividualMerged
D = 0.0173, p = 1 D = 0.0582, p = .0008
![Page 32: Characterization of Chemical Libraries Using Scaffolds and Network Models](https://reader031.fdocuments.net/reader031/viewer/2022030315/587e50801a28abeb1a8b5e15/html5/thumbnails/32.jpg)
Summary
• Scaffold networks are a rela:vely objec:ve way to characterize & compare libraries – Supports fast comparisons between libraries
• The approach supports mul:plexing informa:on in to a single data structure – Physchem proper:es, bioac:vi:es, …
• “What is a good comparison?” quickly becomes a philosophical ques:on