Www.ccdc.cam.ac.uk CCDC Tools for Mining Structural Databases Or – Building Solid Foundations for...

34
www.ccdc.cam.ac.u k CCDC Tools for Mining Structural Databases Or – Building Solid Foundations for a Structure Based Design Campaign John Liebeschuetz, Peter Carlqvist, Simon Bowden Cambridge Crystallographic Data Centre 12 Union Rd., Cambridge, UK

Transcript of Www.ccdc.cam.ac.uk CCDC Tools for Mining Structural Databases Or – Building Solid Foundations for...

Page 1: Www.ccdc.cam.ac.uk CCDC Tools for Mining Structural Databases Or – Building Solid Foundations for a Structure Based Design Campaign John Liebeschuetz,

www.ccdc.cam.ac.uk

CCDC Tools for Mining Structural Databases Or – Building Solid Foundations for a Structure Based Design Campaign

John Liebeschuetz, Peter Carlqvist, Simon BowdenCambridge Crystallographic Data Centre

12 Union Rd., Cambridge, UK

Page 2: Www.ccdc.cam.ac.uk CCDC Tools for Mining Structural Databases Or – Building Solid Foundations for a Structure Based Design Campaign John Liebeschuetz,

www.ccdc.cam.ac.uk

Assessment and Comparison of Ligand – Protein Structural Models

• For the Crystallographer

– What is wrong with my model?

– What interesting features or differences with related structures can I highlight in my publication?

• For the Molecular Modeller

– What is wrong with the Crystallographer’s model?

– What interesting features or differences with related structures can I use to inform my structure-based drug design campaign ?

– Are there non-homologous structures with similar features that I need to watch out for?

Page 3: Www.ccdc.cam.ac.uk CCDC Tools for Mining Structural Databases Or – Building Solid Foundations for a Structure Based Design Campaign John Liebeschuetz,

www.ccdc.cam.ac.uk

Why can’t I take a structure from the PDB and just use it ?

• Validation of ligand structures bound to proteins

15% of 100 recent PDB entries have ligand geometry that are almost certainly in significant error (in house analysis using Relibase+/Mogul)

evaluation of pdb ligand dataset from 1990's with Mogul and Relibase

correct34%

wrong26%

not unusual40% correct

wrong

not unusual

evaluation of most recent pdb ligand dataset with Mogul and Relibase

correct29%

wrong16%

not unusual55%

correct

wrong

notunusual

Pre 2000 2006

Page 4: Www.ccdc.cam.ac.uk CCDC Tools for Mining Structural Databases Or – Building Solid Foundations for a Structure Based Design Campaign John Liebeschuetz,

www.ccdc.cam.ac.uk

How much ligand strain is accomodated by the protein?

• Accepted View –Many ligands adopt strained conformation when bound to proteins, some (60%) do not bind even in a local minimum conformation. (Perola & Charifson, J. Med. Chem. 2004, 47, 2499-2510)

• Alternative view – Ligands usually (but not always) bind in a local minimum. Many ‘strained’ structures found in the PDB are imperfectly refined. (Open-Eye, B. Kelley and G. Warren, EuroCYP)

Page 5: Www.ccdc.cam.ac.uk CCDC Tools for Mining Structural Databases Or – Building Solid Foundations for a Structure Based Design Campaign John Liebeschuetz,

www.ccdc.cam.ac.uk

CCDC Tools that can help you

• Relibase/Relibase+ - Web-based database system for searching, retrieving and analysing 3D structures of protein-ligand complexes in the Brookhaven Protein Data Bank (PDB)

– Relibase is freely available for academics

– Relibase+ has extra features (some of these will be used in this workshop)

• The Cambridge Structural Database System - Database of > 400,000 small molecule crystallographic structures, and associated query software

– Mogul and IsoStar knowledge-bases of molecular geometry and inter-molecular interactions

– Directly linked access from Relibase+

Page 6: Www.ccdc.cam.ac.uk CCDC Tools for Mining Structural Databases Or – Building Solid Foundations for a Structure Based Design Campaign John Liebeschuetz,

www.ccdc.cam.ac.uk

The Workshop

Part 1: Validation of models and structural analysis

• Analysing a protein structure for errors and interesting features

• Comparing a structure with structures related by homology or by functionality

Part 2: Probing the Protein-Ligand Interface

• Substructure searching in Relibase/Relibase+

• Comparing the interactions of different ligands with the same target

• Validating an unusual interaction using substructure searching in Relibase+

Page 7: Www.ccdc.cam.ac.uk CCDC Tools for Mining Structural Databases Or – Building Solid Foundations for a Structure Based Design Campaign John Liebeschuetz,

www.ccdc.cam.ac.uk

Relibase+

• Relibase+

– Web-based database system for searching, retrieving and analysing 3D structures of protein-ligand complexes in the Brookhaven Protein Data Bank (PDB)

– Successor to ReLiBase (developed by Manfred Hendlich et al. (Merck, Marburg U.) M. Hendlich, Acta Cryst. D54,1178-1182, 1998

• Relibase: free on WWW for academics

– http://relibase.ccdc.cam.ac.uk/

– http://relibase.rutgers.edu/

Page 8: Www.ccdc.cam.ac.uk CCDC Tools for Mining Structural Databases Or – Building Solid Foundations for a Structure Based Design Campaign John Liebeschuetz,

www.ccdc.cam.ac.uk

Relibase+

• Keyword searching

• FASTA protein sequence searching

• 2D substructure searching

• 3D protein-ligand interaction searching

• Protein-protein interaction searching

• Similarity searching for ligands

• SMILES substructure matching

• Automatic superposition of related binding sites to compare ligand binding modes, water positions, etc.

• 3D visualisation with AstexViewer and ReliView(Hermes)

Basic Functionality

Page 9: Www.ccdc.cam.ac.uk CCDC Tools for Mining Structural Databases Or – Building Solid Foundations for a Structure Based Design Campaign John Liebeschuetz,

www.ccdc.cam.ac.uk

Relibase+

• Functionality for generation and search of proprietary databases of protein-ligand complexes alongside the PDB

• Links to the Mogul and IsoStar modules of the CSDS for geometry validation

• Additional modules: Crystal packing, WaterBase, CavBase

• Detailed analysis of superimposed binding sites

• Enhanced treatment of hitlists

• Reliscript: Command-line access via a Python-based toolkit

• Coming Soon: SecBase including Turn Classification

Advanced Functionality

Page 10: Www.ccdc.cam.ac.uk CCDC Tools for Mining Structural Databases Or – Building Solid Foundations for a Structure Based Design Campaign John Liebeschuetz,

www.ccdc.cam.ac.uk

CavBase

• Detect unexpected similarities amongst protein cavities (e.g. active sites) that share little or no sequence homology.

• Similarity judged by matching 3D property descriptors (pseudocentres) that encode the shape and chemical characteristics of each cavity

• No sequence information used, can detect similar cavities even if they have no obvious secondary-structure relationship

• Developed by S.Schmitt et al., J.Mol.Biol. (2002)

CavBase

Page 11: Www.ccdc.cam.ac.uk CCDC Tools for Mining Structural Databases Or – Building Solid Foundations for a Structure Based Design Campaign John Liebeschuetz,

www.ccdc.cam.ac.uk

Cambridge Structural Database

• Repository for the world’s small organic and metal-organic crystal structures (up to 500 non-H atoms)

• Experimentally determined 3D structures via X-ray, and neutron diffraction methods

• 2007 release contains 423,798 entries

– approximately 32,000 entries added per year

• Derived from around 1200 published sources

– official depository for >80 major journals

– majority of data directly deposited electronically (CIF)

• Increasing number of Private Communications

Page 12: Www.ccdc.cam.ac.uk CCDC Tools for Mining Structural Databases Or – Building Solid Foundations for a Structure Based Design Campaign John Liebeschuetz,

www.ccdc.cam.ac.uk

How much Data is Available?

CSD Growth 1970-2006

419,768 entries June 2007

0

100000

200000

300000

400000

500000

600000

2001 2003 2005 2007 2009

Growth of the CSD

Predicted Growthto 2010

>500,000 entries during 2009

Page 13: Www.ccdc.cam.ac.uk CCDC Tools for Mining Structural Databases Or – Building Solid Foundations for a Structure Based Design Campaign John Liebeschuetz,

www.ccdc.cam.ac.uk

CSD Information content

Atomic coordinates, unit-cell, space-group symmetry (fully validated)

Crystal structure data

Page 14: Www.ccdc.cam.ac.uk CCDC Tools for Mining Structural Databases Or – Building Solid Foundations for a Structure Based Design Campaign John Liebeschuetz,

www.ccdc.cam.ac.uk

Bibliographic and Chemical Information

• Bibliographic and chemical text and properties (all searchable)

4-Oxonicotinamide-1-

(1’-beta-D-2’,3’,5’-tri-O-acetyl-ribofuranoside)

Source: Rothmannia longiflora

Colour: pale yellow

Habit: acicular

Polymorph: Form IV

C17 H20 N2 O9

G. Bringmann, M. Ochse, K. Wolf,

J. Kraus, K. Peters, E-M. Peters,

M. Herderich, L. Ake, F. Tayman

Phytochemistry 51 (1999), p271

R-factor: .0506

• Chemical diagram and chemical connectivity to enable 2D and 3D searching for substructures, pharmacophores and intermolecular interactions

• Cross-referencing between entries

CSD Information content

Page 15: Www.ccdc.cam.ac.uk CCDC Tools for Mining Structural Databases Or – Building Solid Foundations for a Structure Based Design Campaign John Liebeschuetz,

www.ccdc.cam.ac.uk

Cambridge Structural Database System

CambridgeCambridge StructuralStructural DatabaseDatabase

PreQuestDatabase Production

VISTAStatisticalanalysis

MercuryGraphical display,packing analysis

ConQuestDatabase

Search

MogulLibrary of

Molecular Geometry

IsoStarLibrary of

Intermolecular Interactions

Knowledge Bases

Page 16: Www.ccdc.cam.ac.uk CCDC Tools for Mining Structural Databases Or – Building Solid Foundations for a Structure Based Design Campaign John Liebeschuetz,

www.ccdc.cam.ac.uk

MogulA Knowledge Base of Molecular Geometries

Bruno et al., J. Chem. Inf. Comput. Sci., 44, 2133-2144, 2004

Page 17: Www.ccdc.cam.ac.uk CCDC Tools for Mining Structural Databases Or – Building Solid Foundations for a Structure Based Design Campaign John Liebeschuetz,

www.ccdc.cam.ac.uk

Incorporates pre-computed libraries of bond lengths, valence angles and torsion angles, derived entirely from the CSD

Sketch or import molecule, then click on feature of interest to view distribution, mean values and statistics

Very fast search speeds, with hyperlinks to the CSD to view specific structures

Complete geometry: retrieve distributions for all bonds, angles and torsions in the molecule

MogulRapid access to CSD information

Page 18: Www.ccdc.cam.ac.uk CCDC Tools for Mining Structural Databases Or – Building Solid Foundations for a Structure Based Design Campaign John Liebeschuetz,

www.ccdc.cam.ac.uk

A Knowledge Base of Intermolecular Interactions

• Experimental data from:

– Cambridge Structural Database

– Protein Data Bank (protein-ligand complexes only)

– Theoretical potential energy minima (DMA, IMPT)

• Interaction distributions displayed immediately as scatterplots or contour surfaces

• >20,000 CSD scatterplots, >5,500 PDB, 1,500 Eminima

IsoStar

Page 19: Www.ccdc.cam.ac.uk CCDC Tools for Mining Structural Databases Or – Building Solid Foundations for a Structure Based Design Campaign John Liebeschuetz,

www.ccdc.cam.ac.uk

central group: -CONH2

contact group: NH

IsoStar Methodology

Search CSD or PDB for structures containing desired contact

Superimpose hits and display as scatterplots

Page 20: Www.ccdc.cam.ac.uk CCDC Tools for Mining Structural Databases Or – Building Solid Foundations for a Structure Based Design Campaign John Liebeschuetz,

www.ccdc.cam.ac.uk

Density Maps

Can also represent distribution as density maps

Page 21: Www.ccdc.cam.ac.uk CCDC Tools for Mining Structural Databases Or – Building Solid Foundations for a Structure Based Design Campaign John Liebeschuetz,

www.ccdc.cam.ac.uk

The Workshop

Part 1: Validation of models and structural analysis

• Analysing a protein structure for errors and interesting features

• Comparing a structure with structures related by homology or by functionality

Part 2: Probing the Protein-Ligand Interface

• Substructure searching in Relibase/Relibase+

• Comparing the interactions of different ligands with the same target

• Validating an unusual interaction using substructure searching in Relibase+

Page 22: Www.ccdc.cam.ac.uk CCDC Tools for Mining Structural Databases Or – Building Solid Foundations for a Structure Based Design Campaign John Liebeschuetz,

www.ccdc.cam.ac.uk

How to access the workshop

http://relibase.ccdc.cam.ac.uk/

[email protected]

s1mple

Webpage

Email address

Password

Page 23: Www.ccdc.cam.ac.uk CCDC Tools for Mining Structural Databases Or – Building Solid Foundations for a Structure Based Design Campaign John Liebeschuetz,

www.ccdc.cam.ac.uk

Page 24: Www.ccdc.cam.ac.uk CCDC Tools for Mining Structural Databases Or – Building Solid Foundations for a Structure Based Design Campaign John Liebeschuetz,

www.ccdc.cam.ac.uk

Cavity Detection

PROTEIN

N

O

OO

N

ON

N

O

N

OO

N

N

O

O

N

O

N

N

N

O

Based on the LIGSITE ProgramM.Hendlich et al., J.Mol.Graph. (1997).

Page 25: Www.ccdc.cam.ac.uk CCDC Tools for Mining Structural Databases Or – Building Solid Foundations for a Structure Based Design Campaign John Liebeschuetz,

www.ccdc.cam.ac.uk

The pseudo-centre concept

donor

acceptor

aliphatic

pi/aromatic

NH

O

O

O

N

O

O

N

HN

HH

Coding Molecular Recognition into Simple Descriptors

Page 26: Www.ccdc.cam.ac.uk CCDC Tools for Mining Structural Databases Or – Building Solid Foundations for a Structure Based Design Campaign John Liebeschuetz,

www.ccdc.cam.ac.uk

O

NH

Cavity

Protein

3D Property Description

Page 27: Www.ccdc.cam.ac.uk CCDC Tools for Mining Structural Databases Or – Building Solid Foundations for a Structure Based Design Campaign John Liebeschuetz,

www.ccdc.cam.ac.uk

Similarity Search

Page 28: Www.ccdc.cam.ac.uk CCDC Tools for Mining Structural Databases Or – Building Solid Foundations for a Structure Based Design Campaign John Liebeschuetz,

www.ccdc.cam.ac.uk

Similarity Search

Clique detectionBron-Kerbosch

Page 29: Www.ccdc.cam.ac.uk CCDC Tools for Mining Structural Databases Or – Building Solid Foundations for a Structure Based Design Campaign John Liebeschuetz,

www.ccdc.cam.ac.uk

Similarity Search

Clique detectionBron-Kerbosch

Page 30: Www.ccdc.cam.ac.uk CCDC Tools for Mining Structural Databases Or – Building Solid Foundations for a Structure Based Design Campaign John Liebeschuetz,

www.ccdc.cam.ac.uk

Similarity Analysis

Scoring based on matching pseudo-centres, and the associated surface patches

Page 31: Www.ccdc.cam.ac.uk CCDC Tools for Mining Structural Databases Or – Building Solid Foundations for a Structure Based Design Campaign John Liebeschuetz,

www.ccdc.cam.ac.uk

An Example

1OXO/1F2D

• Overlay of PLP ligands

• Matching pseudo-centres and surface patches shown

Page 32: Www.ccdc.cam.ac.uk CCDC Tools for Mining Structural Databases Or – Building Solid Foundations for a Structure Based Design Campaign John Liebeschuetz,

www.ccdc.cam.ac.uk

Crystal PackingImportant e.g. when docking ligands

Concanavalin A (1cjp) Binding site in Relibase+

Page 33: Www.ccdc.cam.ac.uk CCDC Tools for Mining Structural Databases Or – Building Solid Foundations for a Structure Based Design Campaign John Liebeschuetz,

www.ccdc.cam.ac.uk

1mtw

reference ligand, no packing

reference in green, first-rank solution atom-coloured

Page 34: Www.ccdc.cam.ac.uk CCDC Tools for Mining Structural Databases Or – Building Solid Foundations for a Structure Based Design Campaign John Liebeschuetz,

www.ccdc.cam.ac.uk

1mtw, Packing Included

reference ligand, no packing

including neighbouring chains

GOLD’s first-rank solution