modelling assignment

44
Modelling Assignment Submitted to: Submitted by: Dr. Durg Vijay Singh Shweta Kumari Roll- 21 M.Sc. Bioinformatics 2 Nd sem

Transcript of modelling assignment

Modelling Assignment

Submitted to: Submitted by:Dr. Durg Vijay Singh Shweta Kumari

Roll- 21M.Sc. Bioinformatics2Nd sem

CONTENT Objective

Structure prediction

Threading

Ab inito

Phyre2 and result

Dali str-str alignment and its result

Robetta and its result

Validation

Result and discussion

Conclusion

reference

OBJECTIVE

To build the model of given amino acid residue sequence and validate the generated model.

>gi|407259499|gb|AFT91383.1| EcdL [Emericella rugulosa]MDDSPWPQCDIRVQDTFGPQVSGCYEDFDFTLLFEESILYLPPLLIAASVALLRIWQL

RSTENLLKRSGLLSILKPTSTTRLSNAAIAIGFVASPIFAWLSFWEHARSLRPSTILNVYLLGTIPMDAARARTLFRMPGNSAIASIFATIVVCKVVLLVVEAMEKQRLLLDRGWAPEETAGILNRSFLWWFNPLLLSGYKQALTVDKLLAVDEDIGVEKSKDEIRRRWAQAVKQNASSLQDVLLAVYRTELWGGFLPRLCLIGVNYAQPFLVNRVVTFLGQPDTSTSRGVASGLIAAYAIVYMGIAVATAAFHHRSYRMVMMVRGGLILLIYDHTLTLNALSPSKNDSYTLITADIERIVSGLRSLHETWASLIEIALSLWLLETKIRVSAVAAAMVVLVCLLVSGALSGLLGVHQNLWLEAMQKRLNATLATIGSIKGIKATGRTNTLYETILQLRRTEIQKSLKFRELLVALVTLSYLSTTMAPTFAFGTYSILAKIRNMTPLLAAPAFSSLTIMTLLGQAVSGFVESLMGLRQAMASLERIRQYLVGKEAPEPSPNKPGVASTEGLVAWSASLDEPGLDPRVEMRRMSSLQHRFYNLGELQD

Structure PredictionProtein structure prediction is the prediction of the three-

dimensional structure of a protein from its amino acid sequence i.e, the prediction of its folding and its secondary, tertiary, and quaternary structure from its primary structure.

The knowledge of the 3D structure is useful for rational drug design, protein engineering, detailed study of protein –bio-molecular interactions, study of evolutionary relationship between proteins or protein families etc.

METHOD OF STRUCTURE PREDICTION

Structure prediction

Experimental Method Computational Method

X-Ray NMR EM Template based Template free

Homology Threading Ab inito

We have to build the model of given sequence, 604 AA residue of Ecdl (Emerucella rugulosa).

Hence, the given protein sequence have not shown the significant alignment with any solved structure

We cann't perform Homology Modelling to build the given sequence.

The only alternative way is THREADING or AB INITIO method.

Threading

“Remote Homology” Method of protein modeling which is used to model those proteins

which have the same fold as proteins of known structures, but do not have homologous proteins with known structure.

The software used for fold recognition methods are:

PHYRE2

I-TASSER

MUSTER

RaptorX

GenThreader

LOMETS

Ab inito method

               Predicting the 3D structure without any “prior knowledge”

If structure homologues (occasionally analogues) do not exist, or exist but cannot be identified, models have to be constructed from scratch. This procedure, called ab initio modelling.

Software used for Ab inito structure prediction

Robetta

PHYRE2( Protein Homology/analogY Recognition Engine V 2.0)

Developed by Dr. Kelly Released on 14th feb 2011. Most popular structure prediction server cited over 1500

times. Ranked as best for function prediction in CASPs 9. The basic principal of work of PHYRE2 is

Finding a sequence alignment to a known structure.

Copying the co-ordinate and relabeling the residues according to our sequence based on alignment.

PHYRE2

Features of PHYRE2:

Domain analysis

Highlight motif

Transmembrane helix are coloured Algorithm used to predict 3D str is LOCAL ALIGNMENT

&HMM. Localy aligned our seq against fold library and HMM matching

of our seq and known sequence structure. Return a confident prediction for a subsequence of our seq cut

this all confident seq and resubunit to join them for their assembly.

PHYRE2 result

PHYRE2 best model

PHYRE2

ALIGNMENT OF QUERY TO 4f4cA

PHYRE2 BEST MODEL

TRANSMEMBRENE REGION ANALYSED BY PHYRE2

DALI(Distance mAtrix aLIgnment)

Method for structure-structure alignment.

It uses 3D cartesian coordinate of c-alpha carbon atom of each protein in order to calculate residue-residue diatance matrix.

Output generate:

Rank of PDB identifier

Z-score

RMSD

Lali (number of aligned position)

Nres (number of aligned residue)

%ID

PDB discription

DALI result

DALI result analysis Low rmsd and high nres shows the better alignment. If both rmsd and nres is high or low, not possible to establish an

order between the alignment. Rmsd- It is the measure of the average deviation in distance between

aligned alpha carbons (i.e, calculate the divergance from one to another b/w two sequences)

Z score- The Z-Score is the measure of quality of the structural alignment.

Note:- DALI package is based on Fartran programming and perl script.

“The shows the best alignment with 4f4c_A with low rmsd 0.6 and high lali score 403.”

STRUCTURE-STRUCTURE ALIGNMENT BY PYMOL

Ab inito through ROBETTA

Non query templete based alignment

Robetta secure the best position in CASP (Critical Assessment of Techniques for Protein Structure Prediction) 4, 5, 6, 7 and 8.

Roberta prediction type-

1. Ginzu : Domain prediction

2. Structure : 3D Model (available per domain after Ginzu completes from result page)

Domain prediction by GINZU protocal

There are several model Robetta produces.

It determine more than one domain that means Robetta breaks up the query sequence into putative domains and model each of them separately.

After that assembles all the model into contiguous chain.

RESULT OF ROBETTA

Robetta result analysis Robetta shows the alignment with these three protein for domain

prediction:

Sl. no. Protein ID Discription

1 4p79 Crystal str of cloudin provides insight into the architecture of tight junction

Ion channel regulator, alpha helicalMembrane protein

2 1ni0 HydrolaseRestriction endonuclease PuvII from proteus vulgaris,

class alpha/beta proteinEC 3.1.21.4

3 4m1m Multidrug resistant protein ATP binding cassate transpoterPgp

VALIDATION of MODEL

ANOLEA

PROSA

PROCHECK(PDBsum)

ANOLEA

Atomic Non-LOcal Environment Assessment

Perform energy calculation on a protein chain evaluating non-local environment of each heavy atomin the molecule.

Steps-

1. Open anolea server

2. Browse sequence file

3. Fill job title n submit to servet .

ANOLEA result

PROSA

PROtein Structure Analysis

Developed by Sippl,1993.

Calculate quality score of C alpha carbon of input structure.

OUTPUT-

Z score

Plot of residue score-

3D structure of input protein

PROSA

PROtein Structure Analysis

Developed by Sippl,1993.

Calculate quality score of C alpha carbon of input structure.

OUTPUT-

Z score

Plot of residue score-

3D structure of input protein

PROSA1 .Z score- indicate the overall quality of model value display of all experimentally determined protein chain in PDB.

“more negative value more accurate structure”.

2. Plot of residue score- shows local quality of model by plotting energy as sum of AA sequence position i (take window size 40)

Positive value correspond problematic or erroneous part of structure.

3. Prosa web visualized the 3D structure of input protein using the molecular viewer Jmol.

Residue are colored from blue to red in order of increasing residue energy.

PROSA RESULT

PROSA RESULT

PROCHECK(PDBsum)

The PDB sum is a pictorial database that provides an at-a-glance overview of the contents of each 3D structure deposited in the Protein Data Bank (PDB).

The PROCHECK analyses provide an idea of the stereo-chemical quality of all protein chains in a given PDB structure.

They highlight regions of the proteins which appear to have unusual geometry and provide an overall assessment of the structure as a whole.

PDBsum uses version 3.6.2 of PROCHECK.

PROCHECK(PDBsum)

The PDB sum is a pictorial database that provides an at-a-glance overview of the contents of each 3D structure deposited inthe Protein Data Bank (PDB).

The PROCHECK analyses provide an idea of the stereochemical quality of all protein chains in a given PDB structure.

They highlight regions of the proteins which appear to have unusual geometry and provide an overall assessment of the structure as a whole.

PDBsumuses version 3.6.2 ofPROCHECK.

PROCHECK

PROCHECK result

PROCHECK result

PROCHECK ANALYSIS

• G factor- The G-factor is a log-odds score based on the observed distributions of these stereo-chemical parameters.

• A low G-factor indicates that the property corresponds to a low-probability conformation.

• These are the stereo-chemical property:

1. planarity

2. chirality

3. phi/psi preferences

4. chi angles.

Result and discussion Fold recognition was done through PHYRE2 server for fold

assessment. On the other hand ab initio prediction was analyzed by Robetta

sever which gives information about domain.

After build the model, model was validated through some server ANOLEA & PROSA.

Ramachandran plot of model analysed using PDBsum PROCHECK with the description of the allowed region.

Result and discussion

The comparative and combined study of phyre2 and Robetta shows:-Sl. no.Sl. no. Str. Prediction methodStr. Prediction method Protein idProtein id discription

1 Fold recognition by PHYRE2 4F4C Crystal structureof themultidrugtransporterP-glycoprotein from C. elegans

2 Ab initio by Robetta 4p79 Membrane protein

3 Ab initio by Robetta 1ni0 Hydrolase

4 Ab initio by Robetta 4m1m Multidrug resistant protein,ATP binding cassate transpoter,Pgp

Conclusion

The above results of PHYRE2 (fold recognition method) and Robetta (ab initio prediction) generate the model of given AA sequence which conclude that the given protein is

P-glycoprotein: multidrug-resistance and a superfamily of membrane-associated transport proteins.

ABC (ATP binding cassette) transporter

Transmembrane protein (alpha helical structure)

References http://www.sbg.bio.ic.ac.uk/phyre2/html/page.cgi?id=index

http://www.sbg.bio.ic.ac.uk/phyre2/phyre2_output/7330b2b464c1ea64/summary.html

http://robetta.bakerlab.org/

http://melolab.org/anolea/

https://prosa.services.came.sbg.ac.at/prosa.php

http://www.ebi.ac.uk/thornton-srv/databases/pdbsum/Generate.html

http://ekhidna.biocenter.helsinki.fi/dali_server/results/20150324-0049-69ef51112579617192cac4dcad7075f2/index.html