Mol.modelling of Nattokinase Final

8/3/2019 Mol.modelling of Nattokinase Final

1/33

1. INTRODUCTION

BIOINFORMATICS

Bioinformatics is the combination of biology and information technology. The

discipline encompasses any computational tools and methods used to manage, analyze and

manipulate large sets of biological data. Essentially, bioinformatics has three components:

Fig.1. Applications of Bioinformatics

The creation of databases allowing the storage and management of large

biological data sets.

HOMOLOGY MODELLING OF NATTOKINASE1


2/33

The development of algorithms and statistics to determine relationships among

members of large data sets.

The use of these tools for the analysis and interpretation of various types of biological

data, including DNA, RNA and protein sequences, protein structures, gene expressionProfiles, and biochemical pathways.

The term bioinformatics first came into use in the 1990s and was originally

synonymous with the management and analysis of DNA, RNA and protein sequence data.

Computational tools for sequence analysis had been available since the 1960s, but this was a

minority interest until advances in sequencing technology led to a rapid expansion in the

number of stored sequences in databases such as GenBank. Now, the term has expanded to

incorporate many other types of biological data, for example protein structures, gene

expression profiles and protein interactions. Each of these areas requires its own set of

databases, algorithms and statistical methods.

Second, computers are required for their problem-solving power. Typical problems that

might be addressed using bioinformatics could include solving the folding pathways of protein

given its amino acid sequence, or deducing a biochemical pathway given a collection of RNA

expression profiles. Computers can help with such problems, but it is important to note that

expert input and robust original data are also required.



3/33

The future of bioinformatics is integration. For example, integration of a wide variety

of data sources such as clinical and genomic data will allow us to use disease symptoms to

predict genetic mutations and vice versa. The integration of GIS data, such as maps, weather

systems, with crop health and genotype data, will allow us to predict successful outcomes of

agriculture experiments. Another future area of research in bioinformatics is large-scale

comparative genomics. For example, the development of tools that can do 10-way comparisons

of genomes will push forward the discovery rate in this field of bioinformatics. Along these

lines, the modeling and visualization of full networks of complex systems could be used in the

future to predict how the system (or cell) reacts to a drug for example.

A technical set of challenges faces bioinformatics and is being addressed by faster

computers, technological advances in disk storage space, and increased bandwidth. Finally, a

key research question for the future of bioinformatics will be how to computationally compare

complex biological observations, such as gene expression patterns and protein networks.

Bioinformatics is about converting biological observations to a model that a computer will

understand. This is a very challenging task since biology can be very complex. This problem of

how to digitize phenotypic data such as behavior, electrocardiograms, and crop health into a

computer readable form offers exciting challenges for future bioinformaticians.

HOMOLOGY MODELING

Homology modeling, also known as comparative modeling of protein refers to

constructing an atomic-resolution model of the "target" protein from its amino acid

sequence and an experimental three-dimensional structure of a related homologous protein

(the "template"). Homology modeling relies on the identification of one or more known

protein structures likely to resemble the structure of the query sequence, and on the

production of an alignment that maps residues in the query sequence to residues in the

template sequence. The sequence alignment and template structure are then used to

produce a structural model of the target. Because protein structures are more conserved

than DNA sequences, detectable levels of sequence similarity usually imply significant

structural similarity



4/33

The quality of the homology model is dependent on the quality of the sequence

alignment and template structure. The approach can be complicated by the presence of

alignment gaps (commonly called indels) that indicate a structural region present in the

target but not in the template, and by structure gaps in the template that arise from poor

resolution in the experimental procedure (usually X-ray crystallography) used to solve the

structure. Model quality declines with decreasing sequence identity; a typical model has

~1-2 root mean square deviation between the matched C atoms at 70% sequence

identity but only 2-4 agreement at 25% sequence identity. However, the errors are

significantly higher in the loop regions, where the amino acid sequences of the target and

template proteins may be completely different.

Regions of the model that were constructed without a template, usually by loop

modeling, are generally much less accurate than the rest of the model. Errors in side chain

packing and position also increase with decreasing identity, and variations in these packing

configurations have been suggested as a major reason for poor model quality at low

identity. Taken together, these various atomic-position errors are significant and impede

the use of homology models for purposes that require atomic-resolution data, such as drug

design and protein-protein interaction predictions; even the quaternary structure of a

protein may be difficult to predict from homology models of its subunit(s). Nevertheless,

homology models can be useful in reaching qualitative conclusions about the biochemistry

of the query sequence, especially in formulating hypotheses about why certain residues are

conserved, which may in turn lead to experiments to test those hypotheses. For example,

the spatial arrangement of conserved residues may suggest whether a particular residue is

conserved to stabilize the folding, to participate in binding some small molecule, or to

foster association with another protein or nucleic acid.

Homology modeling can produce high-quality structural models when the target

and template are closely related, which has inspired the formation of a structural genomics

consortium dedicated to the production of representative experimental structures for all

classes of protein folds. The chief inaccuracies in homology modeling, which worsen with

lower sequence identity, derive from errors in the initial sequence alignment and from

improper template selection. Like other methods of structure prediction, current practice in



5/33

homology modeling is assessed in a biannual large-scale experiment known as the Critical

Assessment of Techniques for Protein Structure Prediction, or CASP.

MODELLER

MODELLER is a computer program used in producing homology models of

protein tertiary structures as well as quaternary structures (rarer). It implements a technique

inspired by nuclear magnetic resonance known as satisfaction of spatial restraints, by

which a set of geometrical criteria are used to create a probability density function for the

location of each atom in the protein. The method relies on an input sequence alignment

between the target amino acid sequence to be modeled and a template protein whose

structure has been solved.

MODELLER was originally written and is currently maintained by Andrej Sali

at the University of California, San Francisco. Although it is freely available for academic

use, graphical user interfaces and commercial versions are distributed by Accelrys.

MODELLER is most frequently used for homology or comparative protein structure

modeling: The user provides an alignment of a sequence to be modeled with known related

structures and MODELLER will automatically calculate a model with all non-hydrogen

atoms. MODELLER can also perform multiple comparisons of protein sequences and/or

structures, clustering of proteins, and searching of sequence databases. The program is

used with a scripting language and does not include any graphics. MODELLER implements

an automated approach to comparative protein structure modeling by satisfaction of spatial

restraints.

Briefly, the core modeling procedure begins with an alignment of the

sequence to be modeled (target) with related known 3D structures (templates). This

alignment is usually the input to the program. The output is a 3D model for the targetsequence containing all main chain and side chain non-hydrogen atoms. Given an

alignment, the model is obtained without any user intervention.



6/33

Method for comparative protein structure modeling by

Modeller

Modeller implements an automated approach to comparative protein structure

modeling by satisfaction of spatial Briefly, the core modeling procedure begins with an

alignment of the sequence to be modeled (target) with related known 3D structures

(templates). This alignment is usually the input to the program. The output is a 3D model

for the target sequence containing all main chain and side chain non hydrogen atoms.

Given an alignment, the model is obtained without any user intervention. First, many

distance and dihedral angle restraints on the target sequence are calculated from its

alignment with template 3D structures. The form of these restraints was obtained from a

statistical analysis of the relationships between many pairs of homologous structures. Thisanalysis relied on a database of 105 family alignments that included 416 proteins with

known three dimensional structure. By scanning the database, tables quantifying various

correlations were obtained, such as the correlations between two equivalents C_ C_

distances, or between equivalent main chain dihedral angles from two related proteins.

These relationships were expressed as conditional probability density functions (pdf) and

can be used directly as spatial restraints. For example, probabilities for different values of

the main chain dihedral angles are calculated from the type of a residue considered, from

main chain conformation of an equivalent residue, and from sequence similarity between

the two proteins. Another example is the pdf for a certain C_C_ distance given equivalent

distances in two related protein structures.

Using Modeller for comparative modeling

Simple demonstrations of Modeller in all steps of comparative protein structure

modeling, including fold assignment, sequence-structure alignment, model building, and

model assessment, can be found in references listed http://salilab.org /modeler

/documentation.html. A number of additional tools useful in comparative modeling are

listed at http://salilab.org/bioinformatics resources.shtml.



7/33

The rest of this section is a hands on description of the most basic use of Modeller

in comparative modeling, in which the input are Protein Data Bank (PDB) atom files of

known protein structures, and their alignment with the target sequence to be modeled, and

the output is a model for the target that includes all non-hydrogen atoms. Although

Modeller can find template structures as well as calculate sequence and structure

alignments, it is better in the difficult cases to identify the templates and prepare the

alignment carefully by other means.

The sample input files in this tutorial can be found in the examples/auto model

directory of the Modeller distribution. There are three kinds of input files: Protein Data

Bank atom files with coordinates for the template structures, the alignment file with the

alignment of the template structures with the target sequence, and Modeller commands in

script files that instruct Modeller what to do.

Each atom file is named code.atm where code is a short protein code, preferably the

PDB code; for example,Peptococcus aerogenes ferredoxin would be in a file 1fdx.atm. If

you wish, you can also use file extensions .pdb and .ent instead of .atm. The code must be

used as that proteins identifier throughout the modeling.

Influence of the alignment on the quality of the model cannot be overemphasized.

To obtain the best possible model, it is important to understand how the alignment is used

by Modeller [Sali & Blundell, 1993]. In outline, for the aligned regions, Modeller tries to

derive a 3D model for the target sequence that is as close to one or the other of the

template structures as possible while also satisfying stereo chemical restraints ( e.g., bond

lengths, angles, non-bonded atom contacts, the inserted regions, which do not have any

equivalent segments in any of the templates, are modeled in the context of the whole

molecule, but using their sequence alone. This way of deriving a model means that

whenever a user aligns a target residue with a template residue, he tells Modeller to treatthe aligned residues as structurally equivalent. Command alignment. Check () can be used

to find some trivial alignment mistakes.



8/33

Modeller is a command-line only tool, and has no graphical user interface; instead,

you must provide it with a script file containing Modeller commands. This is an ordinary

Python script.

Modeller is a command-line only tool, and has no graphical user interface; instead,you must provide it with a script file containing Modeller commands. This is an ordinary

Python script. If you are not familiar with Python, you can simply adapt one of the many

examples in the examples directory, or look at the code for the classes used by Modeller

itself, in the modlib/modeller directory. Finally, there are many resources for learning

Python itself, such as a comprehensive tutorial at http://www.python.org/doc/2.3.5/tut/

To run Modeller with the script file model-default.py above, do the following:

1. On Windows: Click on the Modeller link on your Start Menu. This will give

you a Windows Command Prompt, set up for you to run Modeller.

2. Change to the directory containing the script and alignment files you created

earlier, using the cd command.

3. Run Modeller itself by typing the following at the command prompt:

4. Mod9v7 model-default.py

A number of intermediary files are created as the program proceeds. After about 10

seconds on a modern PC, the final 1fdx model is written to file 1fdx.B99990001.pdb.

Examine the model-default.log file for information about the run. In particular, one should

always check the output of the alignment. Check () command, which you can find by

searching for check a. Also,check for warning and error messages by searching for W>

and E>, respectively. There should be no error messages; most often, there are some

warning messages that can usually be ignored.



9/33

2. REVIEW OF LITERATURE

Nattokinase (NK) is a potent fibrinolytic enzyme from Bacillus natto. Closely

resembling plasmin, NK dissolves fibrin directly. In addition, it also enhances the bodys

production of both plasmin and other clot-dissolving agents, including urokinase. In someways, NK is actually superior to conventional clot-dissolving drugs, which has many

benefits including convenience of oral administration, confirmed efficacy, prolonged

effects, cost effectiveness and can be used preventatively. NK has demonstrated stability of

pH and temperature so that it can occur stably in the gastrointestinal tract

NK is a single-chain structure comprised of 275 amino acids and has no

intramolecular disulfide bond(Nakamura et al.,1992) .Belonging to subtilisin family of

serine protease, NK has the same conservative catalytic triad (D32, H64, S221) and

oxyanion hole (N155)(Yong et al.,2003). The binding sites (S125, L126, G127) of

substrate also position the binding pockets S1 and S4 of subtilisin (Bryan P.N et al., 2003).

NK keeps highly homologous character with most of subtilisins and the 3D structures of

many subtilisins have been obtained by using X-ray crystal diffraction and NMR. But the

3D structure of NK is still unknown.

The homology model for NK was generated by using the 3D structures of SB, SC,

SE and SS, which was based on the sequence homology of 84.9%, 67.8%, 98.9% and

62.92% between NK and them. In order to understand the catalyzing mechanism and

substrate specificity of NK, several substrates have been docked into the active site of the

model structure with Lamarckian Genetic Algorithm. The interaction between NK and

substrates has been determined by calculating the hydrogen bonds of the binding site for

the enzymesubstrate complexes. Based on our work, we attempt to explain the

interrelation between the structure and the function of NK.



10/33

Sequence and structure alignment

Sequence of NK was from NCBI protein database (GenBank accession no. is

S51909). Sequences and structures of SB, SC, SE and SS, all fromBacillus subtilis family,

were obtained from the RSCB protein data bank (PDB ID are 1AU9, 1AF4, 1SCJ and

1GCI, respectively).

Sequence alignment was derived using the CLUSTAL W program, and default

parameters were applied (Higgins et al., 1994). Structure alignment was obtained and

analyzed by using GRASP package with default parameters (Nicholls et al., 1991) and

both aligned results were inspected and adjusted manually to minimize the number of gaps

and insertions.

SUPPORT FOR HEALTHY BLOOD FLOW AND CIRCULATION

Nattokinase is a systemic enzyme isolated from the traditional Japanese soy food,

natto. It has been shown to support healthy blood flow by assisting the circulatory clearing

system of the body.

Nattokinase is a soybean food content. It is a 275 amino acid peptide. It is said to

have similar clot-dissolving abilities as does plasmin, an enzyme that we all have in ourblood as our natural defense mechanism to dissolve unwanted blood clots. The "clot

busters" used in clinical medicine (tPA=tissue plasminogen activator, streptokinase,

urokinase, etc) to dissolve blood clots that have led to heart attacks, strokes, pulmonary

embolism or deep vein thrombosis, all work through enhancing plasmin's action. They

have to be given intravenously, because they are not active when given orally.

Nattokinase increases the clot dissolving activities of blood in animals and human

volunteers and that it suppresses clot formation and enhances clot resolution in animals.

However, to my knowledge, only one clinical study has been performed to assess whether

Nattokinase has any real benefit in the prevention of blood clots in humans. In that study

Nattokinase or placebo were given to individuals prior to long distance (7-8 hours) flights.

Of the 92 individuals in the placebo group 7 developed a clot, all without symptoms,



11/33

discovered by ultrasound; of the 94 individuals in the Nattokinase group none developed a

clot. Main flaw of the study, limiting the usefulness of its conclusions, is, that the

publication does not indicate whether this was a double-blinded study, or, at least, an

investigator-blinded study. A non-blinded study has the potential for bias, limiting the

validity of its findings and conclusions.

Importance of hydrogen bonds in the active site of the subtilisin nattokinase

Hydrogen bonds occurring in the catalytic triad (Asp32, His64 and Ser221) and the

oxyanion hole (Asn155) are very important to the catalysis of peptide bond hydrolysis by

serine proteases. For nattokinase, a bacterial serine protease, construction and analysis of a

three-dimensional structural model suggested that several hydrogen bonds formed by four

residues function to stabilize the transition state of the hydrolysis reaction. These four

residues are Ser33, Asp60, Ser62 and Thr220. In order to remove the effect of these hydrogen

bonds, four mutants (Ser33-Ala33, Asp60-Ala60, Ser 62-Ala62, and Thr220-Ala220) were

constructed by site-directed mutagenesis. The results of enzyme kinetics indicated that

removal of these hydrogen bonds increases the free-energy of the transition state ( GT).

We concluded that these hydrogen bonds are more important for catalysis than for binding

the substrate, because removal of these bonds mainly affects the kcat but not theKm values.

A substrate, SUB1 (succinyl-Ala-Ala-Pro-Phe-p-nitroanilide), was used during enzymekinetics experiments. In the present study we have also shown the results of FEP (free-

energy perturbation) calculations with regard to the binding and catalysis reactions for

these mutant subtilisins. The calculated difference in FEP also suggested that these four

residues are more important for catalysis than binding of the substrate, and the simulated

values compared well with the experimental values from enzyme kinetics.

The results of molecular dynamics simulations further demonstrated that removal

of these hydrogen bonds partially releases Asp32, His64 and Asn155 so that the stability of the

transition state decreases. Another substrate, SUB2 (H-D-Val-Leu-Lys-p-nitroanilide), was

used for FEP calculations and MD simulations.



12/33

3. MATERIALS AND METHODS

Homology modeling is an improved method based on the fact that homologousproteins have similar 3D structures. In the case that a homologue of the protein of interest

is available, with such tools as MODELLER, it's possible to build a model from the

template 3D coordinates and an alignment of amino-acids sequences. MODELLER applies

the structure of the template to the protein of interest taking into account the sequence

constraints (steric clashes, electrostatic interactions, amino acids secondary structure

propensities, etc).

3.1 STEPS IN HOMOLOGY MODELING

1. Selection of Template molecule

2. Alignment of Template with Target

3. Model Generation

4. Model Assessment

3.1.1 Template Selection

If the percentage sequence identity between the sequence of interest and a protein

with known structure is high enough (more than 25 or 30 %) simple database search

programs like FASTA or BLAST are clearly adequate to detect the homology.

3.1.2 Template Alignment

A critical step in the development of a homology model is the alignment of the

unknown sequence with the homologues. Factors to be considered when performing an

alignment are

(1) Which algorithm to use for sequence alignment

(2) Which scoring method to apply

(3) Whether and how to assign gap penalties



13/33

3.1.3 Model Generation

Given a template and an alignment, the information contained therein must be used

to generate a three-dimensional structural model of the target, represented as a set of

Cartesian coordinates for each atom in the protein. Three major classes of model

generation methods have been proposed.

3.1.4 Fragment assembly

The original method of homology modeling relied on the assembly of a complete

model from conserved structural fragments identified in closely related solved structures.

For example, a modeling study of serine proteases in mammals identified a sharp

distinction between "core" structural regions conserved in all experimental structures in the

class, and variable regions typically located in the loops where the majority of the

sequence differences were localized. Thus unsolved proteins could be modeled by first

constructing the conserved core and then substituting variable regions from other proteins

in the set of solved structures. Current implementations of this method differ mainly in the

way they deal with regions that are not conserved or that lack a template.

3.1.5 Segment matching

The segment-matching method divides the target into a series of short segments,

each of which is matched to its own template fitted from the Protein Data Bank. Thus,

sequence alignment is done over segments rather than over the entire protein. Selection of

the template for each segment is based on sequence similarity, comparisons of alpha

carbon coordinates, and predicted steric conflicts arising from the van der Waals radii of

the divergent atoms between target and template.

3.1.6 Model Assessment

Assessment of homology models without reference to the true target structure is

usually performed with two methods: statistical potentials or physics-based energy

calculations. Both methods produce an estimate of the energy (or an energy-like analog)



14/33

for the model or models being assessed; independent criteria are needed to determine

acceptable cutoffs. Neither of the two methods correlates exceptionally well with true

structural accuracy, especially on protein types underrepresented in the PDB, such as

membrane proteins.

3.2 NATIONAL CENTER FOR BIOTECHNOLOGY INFORMATION (NCBI)

The National Center for Biotechnology Information advances science and health by

providing access to biomedical and genomic information.

The National Center for Biotechnology Information (NCBI) is part of the

United States National Library of Medicine (NLM), a branch of the National Institutes of

Health. The NCBI is located in Bethesda, Maryland and was founded in 1988 through

legislation sponsored by Senator Claude Pepper. The NCBI houses genome sequencingdata in GenBank and an index of biomedical research articles in Pub Med Central and Pub

Med, as well as other information relevant to biotechnology. All these databases are

available online through the Entrez search engine.

3.3 BASIC LOCAL ALIGNMENT SEARCH TOOL(BLAST)



15/33

In Bioinformatics, Basic Local Alignment Search Tool, orBLAST, is an algorithm

for comparing primary biological sequence information, such as the amino-acid sequences

of different proteins or the nucleotides of DNA sequences. A BLAST search enables a

researcher to compare a query sequence with a library or database of sequences, and

identify library sequences that resemble the query sequence above a certain threshold. Forexample, following the discovery of a previously unknown gene in the mouse, a scientist

will typically perform a BLAST search of the human genome to see if humans carry a

similar gene; BLAST will identify sequences in the human genome that resemble the

mouse gene based on similarity of sequence.



16/33

The four programs perform the following tasks

a) Blastp

Compares an amino acid query sequence against a protein sequence database

b) Blastn

Compares a nucleotide query sequence against a nucleotide sequence database

c) Blastx

Compares the six-frame conceptual translation products of a nucleotide query

sequence (both strands) against a protein sequence database

3.3.1 Working of Blast

The fundamental unit of BLAST algorithm output is the High-scoring Segment Pair

(HSP), wherein each segment of the pair is an equal-length but arbitrarily long run of

contiguous residues for which the aggregate alignment score against the other segment in

the pair is locally maximal and, further, meets or exceeds some positive-valued threshold

or cutoff score.

A (possibly empty) set of HSPs is thus defined by two sequences, a scoring system,

and a cutoff score.

In the programmatic implementations of the BLAST algorithm described here, each

HSP consists of a segment from the query sequence and one from a database

sequence.

The cutoff score has been parameterized to permit the programs' sensitivity and

selectivity to be adjusted.

A Maximal-scoring Segment Pair (MSP) is defined by two sequences and a scoring

system and is the highest-scoring of all possible segment pairs that can be produced



17/33

from the two sequences.The methods of are applicable to determining the statistical

significance of MSP scores in the limit of infinitely long sequences, under a

random sequence model that assumes independent and identically distributed

residues at each sequence position.

In the programs described here, statistics have been extrapolated to assessing the

significance of HSP scores obtained from comparisons of biological sequences

within the context of a database search.

The approach to similarity searching taken by the BLAST programs is first to look

for similar segments between the query sequence and a database sequence, then to

evaluate the statistical significance of any matches that were found, and finally to

report only those matches that satisfy a user-selectable threshold of significance.

3.4 PROTEIN DATA BANK (PDB):



18/33

The PDB archive contains information about experimentally-determined structures

of proteins, nucleic acids, and complex assemblies. As a member of the PDB, the RCSB

PDB curates and annotates PDB data according to agreed upon standards.

The RCSB PDB also provides a variety of tools and resources. Users can perform

simple and advanced searches based on annotations relating to sequence, structure and

function. These molecules are visualized, downloaded, and analyzed by users who range

from students to specialized scientists.

The PDB is a key resource in areas of structural biology, such as structural

genomics. Most major scientific journals, and some funding agencies, such as the NIH in

the USA, now require scientists to submit their structure data to the PDB. If the contents of

the PDB are thought of as primary data, then there are hundreds of derived (i.e., secondary)

databases that categorize the data differently. For example, both SCOP and CATH

categorize structures according to type of structure and assumed evolutionary relations;

GO categorize structures based on genes.

3.5 MODELLER:

Modeller is a computer program that models three-dimensional structures of

proteins and their assemblies by satisfaction of spatial restraints Modeller is most

frequently used for homology or comparative protein structure modeling: The user

provides an alignment of a sequence to be modeled with known related structures and

Modeller will automatically calculate a model with all non-hydrogen atoms.



19/33

3.5.1 TYPES OF MODELLER:

There are 5 types in modeller

a) Basic Modeling

Model a sequence with high identity to a template. This exercise introduces the use

of MODELLER in a simple case where the template selection and target-templatealignments are not a problem

b) Advanced Modeling

Model a sequence based on multiple templates and bound to a ligand. This exercise

introduces the use of multiple templates, ligands and loop refinement in the process of

model building with MODELLER.

c) Iterative Modeling

Increase the accuracy of the modeling exercise by iterating the 4 step process. This

exercise introduces the concept of MOULDING to improve the accuracy of comparative

models.

d) Difficult Modeling



20/33

Model a sequence based on a low identity to a template. This exercise uses

resources external to MODELLER in order to select a template for a difficult case of

protein structure prediction.

e) Modeling with Cyro-Em

Model a sequence using both template and cryo-EM data. This exercise assesses

the quality of generated models and loops by rigid fitting into cryo-EM maps, and

improves them with flexible EM fitting

The methods are applicable to determining the statistical significance of MSP

scores in the limit of infinitely long sequences, under a random sequence model

that assumes independent and identically distributed residues at each sequence

position.

4. RESULTS AND DISCUSSION

NATTOKINASE [Bacillus subtilis subsp. natto]

NattoKinase has the identity of sequence length including 275 amino acids. So, the

consequence of gap would not be considered. Conserved domain of NattoKinase was

detected in NCBI and is the same as the common secondary structures determined by

GRASP package. It is interesting that also predicts the same key structures including the

catalytic triad (D32, H64, S221). the sequence identity of the catalytic domain is as high

as 99%, which suggests the most important part of the sequence for catalytic activity is

most conserved. The binding pocket also has the sequence identity above 90%. Therefore,we conclude that this alignment can be used to construct a reliable 3D model for

NattoKinase. To predict the structure we Blast our target sequence with the template

sequence of protein Calcium Independent Subtilisin Bpn Mutant have the similar quality of

Ramachandran plots, which are acceptable for the relatively low percentage of residues



21/33

having disallowed torsional angels. Secondary structures have been investigated by

GRASP package, and we found that has more extent secondary structures and better

stereochemistry character, which allows further refinement. The quality of the

Ramachandran plot as well as the goodness factors was found to be better . And no

residues have disallowed conformations . Thus, the above analysis suggests the backbone

conformations to be better than those of the templates. Result shows that total, potential

and kinetic energies are always remained constant during the simulation and the protein

size also remained constant. It can be seen that the system remains in equilibrium during

the entire simulation. Then, we concluded that predicted structure is stable at room

temperature. In summary, the quality of the backbone conformation, the residue

interaction, the residue contact and the dynamic stability of the structure are all well within

the limits established for reliable structures. It suggests that structure of NattoKinase is

obtained to characterize proteinsubstrate interactions and to investigate the relation

between the structure and function.

BLAST OUTPUT



22/33



23/33



24/33

The Fasta Format of the Target Sequence

>gi|58866693|gb|AAW83000.1| nattokinase [Bacillus subtilis subsp. natto]

MAFSNMSAQAAGKSSTEKKYIVGFKQTMSAMSSAKKKDVISEKGGKVQKQFKYVNAAAATLDEKAVKELK

KDPSVAYVEEDHIAHEYAQSVPYGISQIKAPALHSQGYTGSNVKVAVIDSGIDSSHPDLNVRGGASFVPS

ETNPYQDGSSHGTHVAGTIAALNNSIGVLGVAPSASLYAVKVLDSTGSGQYSWIINGIEWAISNNMDVIN

MSLGGPTGSTALKTVVDKAVSSGIVVAAAAGNEGSSGSTSTVGYPAKYPSTIAVGAVNSSDQRASFSSVG

SELDVMAPGVSIQSTLPGGTYGAYNGTSMATPHVAGAAALILSKHPTWTNAQVRDRLESTATYLGNSFYY

GKGLINVQAAAH

Template sequence

>gi|21730195|pdb|1GNV|A Chain A, Calcium Independent Subtilisin Bpn'

Mutant

AKCVSYGVSQIKAPALHSQGYTGSNVKVAVIDSGIDSSHPDLNVAGGASFVPSETNPFQDNNSHGTHVAG

TVLAVAPSASLYAVKVLGADGSGQYSWIINGIEWAIANNMDVINMSLGGPSGSAALKAAVDKAVASGVVV

VAAAGNEGTSGSSSTVGYPGKYPSVIAVGAVDSSNQRASFSSVGPELDVMAPGVSICSTLPGNKYGAKSG

TXMASPHVAGAAALILSKHPNWTNTQVRSSLENTTTKLGDSFYYGKGLINVEAAAQ



25/33

STRUCTURE



26/33

ALIGNMENT RESULT

_aln.pos 10 20 30 40 50 60

1gnvA

--------------------------------------------------------------------

nat

MAFSNMSAQAAGKSSTEKKYIVGFKQTMSAMSSAKKKDVISEKGGKVQKQFKYVNAAAATLDEKAVKE

_consrvd

_aln.p 70 80 90 100 110 120 130

1gnvA

-------------------AKCVSYGVSQIKAPALHSQGYTGSNVKVAVIDSGIDSSHPDLNVAGGAS

natLKKDPSVAYVEEDHIAHEYAQSVPYGISQIKAPALHSQGYTGSNVKVAVIDSGIDSSHPDLNVRGGAS

_consrvd * * ** ****************************************

_aln.pos 140 150 160 170 180 190

200

1gnvA FVPSETNPFQDNNSHGTHVAGT---------VLAVAPSASLYAVKVLGADGSGQYSWIINGIEWAIAN

nat

FVPSETNPYQDGSSHGTHVAGTIAALNNSIGVLGVAPSASLYAVKVLDSTGSGQYSWIINGIEWAISN

_consrvd ******** ** ********* ** *************

**************** *

_aln.pos 210 220 230 240 250 260

270

1gnvANMDVINMSLGGPSGSAALKAAVDKAVASGVVVVAAAGNEGTSGSSSTVGYPGKYPSVIAVGAVDSSNQ

natNMDVINMSLGGPTGSTALKTVVDKAVSSGIVVAAAAGNEGSSGSTSTVGYPAKYPSTIAVGAVNSSNQ

_consrvd ************ ** *** ***** ** ** ******* *** ****** **** ******

****



27/33

_aln.pos 280 290 300 310 320 330

340

1gnvA RASFSSVGPELDVMAPGVSICSTLPGNKYGAKSGT-

MASPHVAGAAALILSKHPNWTNTQVRSSLENT

nat

RASFSSVGSELDVMAPGVSIQSTLPGGTYGAYNGTSMATPHVAGAAALILSKHPTWTNAQVRDRLEST

_consrvd ******** *********** ***** *** ** ** *************** *** ***

** *

_aln.pos 350 360

1gnvA TTKLGDSFYYGKGLINVEAAAQ

nat ATYLGNSFYYGKGLINVQAAAH

_consrvd * ** *********** ***

RAMACHANDRAN PLOT

A Ramachandran plot (also known as a Ramachandran map or a Ramachandran

diagram or a [,] plot), developed by Gopalasamudram Narayana Ramachandran

and Viswanathan Sasisekharan is a way to visualize dihedral angles against of

amino acid residues in protein structure. It shows the possible conformations of and

angles for a polypeptide



28/33



29/33



30/33

Evaluation of residues

Residue [ 19 :LYS] (-116.25, 65.43) in Allowed region

Residue [ 89 :GLN] (-178.77, 138.46) in Allowed region

Residue [ 99 :LYS] ( 77.06, 27.85) in Allowed region

Residue [ 119 :ASP] (-157.41,-149.41) in Allowed region

Residue [ 150 :SER] ( 89.55, -10.08) in Allowed region

Residue [ 158 :THR] ( -45.20, -23.52) in Allowed region

Residue [ 160 :ALA] ( 172.15, 161.68) in Allowed region

Residue [ 164 :ASN] (-162.66, 107.31) in Allowed region

Residue [ 166 :ILE] (-151.77,-147.52) in Allowed region

Residue [ 168 :VAL] ( -59.33, -81.43) in Allowed region

Residue [ 242 :ASN] (-112.94, 46.23) in Allowed region

Residue [ 246 :SER] (-140.84, 84.84) in Allowed region

Residue [ 344 :LEU] ( -80.17, -65.17) in Allowed region

Residue [ 140 :SER] ( 141.66, -70.87) in Outlier region

Residue [ 159 :ILE] ( 62.75, 95.13) in Outlier region

Residue [ 165 :SER] ( 160.35, 72.08) in Outlier region

Residue [ 289 :GLY] (-166.83, 9.96) in Outlier region

Number of residues in favored region (~98.0% expected) : 343 ( 95.3%)

Number of residues in allowed region ( ~2.0% expected) : 13 ( 3.6%)

Number of residues in outlier region : 4 ( 1.1%)



31/33

5. CONCLUSION

Homology modeling was designed and developed for nattokinase [Bacillus subtilissubsp. natto]. enzyme 3D structural model using MODELLER because three dimensional

structures are not avilableat PDB. The structure of nattokinase [Bacillus subtilis subsp.

natto] is important for establishing its molecular fuction. The sequence similarity is 99%

with the template and reliability of the predicted model thus generated using MODELLER.

The alignment between two proteins shows high identity when compared with other

protein . The least objective function score was selected for model build and found the

dope scores for template and least objective function score. . Ramachandran plot predicted

the number of residues in the most favoured region A,B,Land the percentage is(~98.0%

expected) : 343 ( 95.3%), Number of residues in allowed region( ~2.0% expected) : 13

(3.6%), Number of residues in outlier region : 4 (1.1%).



32/33

6. REFERENCES

BryanP.N, Protein engineering of subtilisin, Biochemistry. Biophysics. Acta 1543 (2000)

203222.

Laskowski.R.A, M.W. MacArthur, D.S. Moss, J.M. Thornton, PROCHECK, J. Appl.

Cryst. 26 (1993) 283291.

Nakamura.T, Y. Yamagata, E. Ichishima, Nucleotide sequence of the subtilisn NAT gene,

aprN of Bacillus Subtilis (natto), Bioscience. Biotechnology. Biochemistry. 56 (11) (1992)

1869.

Nicholls.A, K. Sharp, B. Honig, Graphical representation and analysis of structural

properties, Proteins Struct. Functional. Genetics. 11 (4) (1991) 281

Rost.B, C. Sander, Prediction of protein secondary structure at better

than 70% accuracy, J. Mol. Biol. 232 (1993) 584599.

Sanchez.R, A. Sali, Advances in comparative protein-structure modeling, Curr. Opin.

Struct. Biol. 7 (1997) 206214.

Sumi.H, A novel fibrinolytic enzyme in the vegetable cheese natto: a typical and popular

soybean food in the Japanese diet, Experientia 43 (20) (1987) 11101111.

Thompson J.D , Higgins D.G , Gibson T.J , CLUSTAL W: improving the sensitivity of

progressive multiple sequence alignment through sequence weighting, position specific

gap penalties and weight matrix choice, Nucleic Acids Res. 22 (1994) 46734680.

Yong.P, H. Qing, Z. Ren-huai, Z. Yi-zheng, Purification and characterization of a

fibrinolytic enzyme produced by Bacillus amyloliquefaciens DC-4 screened from douchi, a

traditional Chinese soybean food, Comp. Biochemistry. Physiol. 134 (2003) 4552.



33/33

Zhong-liang Zheng, Mao-qing Ye, Zhen-yu Zuo, Zhi-gang Liu, Keng-chang Tai, and Guo-

lin Zou Biochemistry . 2006 395(Pt 3): 509515.

Mol.modelling of Nattokinase Final

Documents

Transcript of Mol.modelling of Nattokinase Final