Protein docking by LZerD, KiharaLab at CAPRI meeting 2016

Human and Server CAPRI Protein Docking Prediction Using LZerD with Combined

Scoring Functions

Daisuke Kihara Department of Biological SciencesDepartment of Computer SciencePurdue University, Indiana, USA

1

http://kiharalab.org

CAPRI Round 30 Results

2(Lensink et al., CAPRI30 group paper, 2016)

Overview of Protein Docking Prediction Using LZerD in CAPRI

3

Re-ranking with scoring functions

HHPred SparksXMUFold

TASSERPhyre2

TASSERliteMultiComSingle Chain

Modeling

PRESCO

Sub-unit models

LZerD

~50,000 docking models

Clustering, RMSD < 5 Å

10 models

MD relaxation Submit

LZerD(Local 3D Zernike descriptor-based Docking program)

4

normal vector

3DZernike descriptor

6Å

Interface area

(Venkatraman, Yang, Sael, & Kihara, BMC Bioinformatics, 2009)

(Lizard)

3D Zernike Descriptors (3DZD) An extension of

spherical harmonics based descriptors

A 3D object can be represented by a series of orthogonal functions, thus practically represented by a series of coefficients as a feature vector

Compact Rotation invariant

5

A surface representation of 1ew0A (A) is reconstructed from its 3D Zernike invariants of the order 5, 10, 15, 20, and 25 (B-F). (Sael & Kihara, 2009)

),()(),,( mlnl

mnl YrRrZ

),( mlY )(rRnl

),,( rZ mnl

: Spherical harmonics, : radial functions

polynomials in Cartesian coordinates

143 .)()(

xxxx dZf m

nlmnl Zernike moments:

Zernike Descriptor: 2)( mnl

lm

lmnlF

Protein Residue Environment SCOre (PRESCO)

6

within a sphere of 6 or 8 Å

along the main-chainCenter

(Kim & Kihara, Proteins 2014)

Finding Similar Side-Chain Depth Environment (SDE) from a database

7

Structure Database2536 proteins

500 lowest RMSD fragments of 9 side-chain centroids;Superimposed with the query fragment

Select SDE with the same number of side-chain centroids in the sphere of 8.0Å

Query SDE

Compute RMSD of residue-depth for corresponding side-chain centroids

Sort by depth RMSD to the query

surface

CASP11 Free Modeling Category Ranking (Model 1)

8

(http://www.predictioncenter.org/casp11/zscores_final.cgi?formula=assessors)

(Kim & Kihara, Proteins 2015)

DFIRE, GOAP, ITScore Scoring Functions

DFIRE (Yaoqi Zhou): statistical distance-dependent atom contact potential using the finite ideal-gas reference state

GOAP (Jeff Skolnick): DFIRE * orientation dependent term

ITScore (Xiaoqin Zou):iteratively refined statistical distance-dependent atom contact potential

9

The BindML Algorithm

10(La D, & Kihara D, Proteins 2012)

Generating Substitution ModelsiPFAM (505 Families)

Model Model11

iPfam Dataset Benchmark

ROC based on 449 Protein Complexes

12

BindML Webserver

13

http://kiharalab.org/bindml

(Wei Q, La D, & Kihara D, Methods in Mol.Biol. In press 2016)

T79 (Round 30)

(Interface 2) Kihara: 3 hits; LZerD: 1 hit Homodimer LZerD runs:

No-interface prediction With BindML-consPPISP prediction

LZerD selection strategy: Consensus of ITScore and GOAP 5 from no-interface, 5 from BindML-consPPISP

Kihara selection strategy: Manual combination of ITScore, GOAP, DFIRE,

and PRESCO 10 from no-interface

14

T79 Subunit Model Quality

Chain ARMSD: 4.0 Å

Chain B RMSD: 4.0 Å

nativemodel

15

T79 Human Selected Model

fnat 0.16, L-RMSD 14.1Å, i-RMSD 3.8 Å

nativemodel

16

T79 Interface Prediction

Method Precision Recall F-Score

BindML 0 0 NA

Cons-PPISP 0.10 0.18 0.12

17

T79 Scores (no-interface prediction)

18ITScoreGOAP DFIRE

LRM

SDfn

atiR

MSD

T79 Score Comparison

19ITScoreGOAP DFIRE

ITSc

ore

GOAP

DFIR

E

T79 PRESCO scores

20

lRM

SD

PRESCO PRESCO

With Inteface Prediction Without Interface Prediction

T79 Score performance summary

Run Score RFH Hits in top 10

nointerface ITScore 1 (62) 3

nointerface GOAP 1 (72) 3

nointerface DFIRE 1 (111) 5

BindML-consPPISP

all - -

RFH: rank of first acceptable (medium) hit

21

T91 (Round 30)

Kihara: 8 hits; LZerD: 2 hits Homodimer LZerD runs:

No-interface prediction (with our monomer model) With BindML+consPPISP interface prediction Zhang1 CASP server model, no-interface prediction

Server selection strategy 10 from no-interface

Human selection strategy Consensus of ITScore, GOAP, PRESCO, and visual

inspection 5 from no-interface, 5 from Zhang1

22

T91 Subunit Models

Chain COur model: RMSD 6.0 ÅZhang: RMSD 4.9 Å

nativeOur modelZhang1

Chain DOur model RMSD 6.5 ÅZhang: RMSD 5.7 Å

23

T91 Human Selected Model

modelnative

fnat 0.33, L-RMSD 9.0 Å, I-RMSD 4.2 Å

24

T91 Interface Prediction

Method Precision Recall F-Score

BindML 0.64 0.20 0.30

Cons-PPISP 0.50 0.28 0.36

25

T91 Score (no interface prediction)

26ITScoreGOAP DFIRE

LRM

SDfn

atiR

MSD

T91 Scores (With Interface prediction)

27ITScoreGOAP DFIRE

LRM

SDfn

atiR

MSD

T91 Scores (Zhang models)

28ITScoreGOAP DFIRE

LRM

SDfn

atiR

MSD

T91 Zhang1 Score Comparison

29ITScoreGOAP DFIRE

ITSc

ore

GOAP

DFIR

E

T91 PRESCO Scores

Without Interface PredictionDocking with Zhang models

PRESCO PRESCO

LRM

SD

Top 5 models selected from each30

T91 Score Performance Summary

Run Score RFH Hits in top 10nointerface ITScore 2 2

nointerface GOAP 2 1

nointerface DFIRE 1 2

interface ITScore 1042 0

interface GOAP 165 0

interface DFIRE 116 0

zhang1 ITScore 1 (4) 5

zhang1 GOAP 2 (16) 5

zhang1 DFIRE 1 (6) 6

RFH: rank of first acceptable (medium) hit

31

T96 (Round 31)

Heterodimer Predictor hits: 0 (5 by other groups) Scorer hits: human 1, server 0 (1 by other

group) Human: 6 selected by PRESCO, 4 selected from

with predicted interface, ITScore, GOAP, DFIRE

No PDB file for the native structure available: metrics computed using two scorer hits (average L-RMSD/I-RMSD, max fnat)

32

T96 scorer hits

Chain B S39.M03 (Haliloglu)fnat 0.22L-RMSD 5.68 ÅI-RMSD 2.44 Å

Chain A

Chain BS31.M06 (Kihara)fnat 0.32 L-RMSD 7.99 ÅI-RMSD 2.67 Å

33

T96 interface prediction

Chain Method Precision Recall F-score

A BindML 0.15 0.2 0.17

Cons-PPISP 0 0 NA

B BindML 0.12 0.11 0.12

Cons-PPISP* NA NA NA

*Cons-PPISP predictions were only for the N-terminal tail; visual inspection suggests that N-terminal tail is not a likely a binding site, so these predictions were not used.

34

T96 Scorer-Models Scores

35ITScoreGOAP DFIRE

lRM

SDfn

atiR

MSD

T96 Score Performance Summary

Score RFH Hits in top 10ITScore 529 0GOAP 6 1DFIRE 125 0

RFH: rank of first acceptable hit

• The hit for GOAP/DFIRE is the same model picked by PRESCO

36

Summary Our docking prediction procedure runs LZerD, and decoys

were selected by combining DFIRE, ITScore, GOAP, and PRESCO. Binding sites were predicted by BindML and cons-PPISP.

On the examples shown, PRESCO’s performance was not as spectacular as we expected from its performance on single chain str. prediction.

DFIRE, ITScore, GOAP showed similar, reasonably good performance.

Scoring functions performance depends on subunit model quality.

The way to use BindML prediction needs to be improved.

37

Lab Members

38@kiharalab

Lenna Peterson

Hyung-Rae Kim

Protein docking by LZerD, KiharaLab at CAPRI meeting 2016

Science

Transcript of Protein docking by LZerD, KiharaLab at CAPRI meeting 2016