Mol.modelling of Nattokinase Final

download Mol.modelling of Nattokinase Final

of 33

Transcript of Mol.modelling of Nattokinase Final

  • 8/3/2019 Mol.modelling of Nattokinase Final

    1/33

    1. INTRODUCTION

    BIOINFORMATICS

    Bioinformatics is the combination of biology and information technology. The

    discipline encompasses any computational tools and methods used to manage, analyze and

    manipulate large sets of biological data. Essentially, bioinformatics has three components:

    Fig.1. Applications of Bioinformatics

    The creation of databases allowing the storage and management of large

    biological data sets.

    HOMOLOGY MODELLING OF NATTOKINASE1

  • 8/3/2019 Mol.modelling of Nattokinase Final

    2/33

    The development of algorithms and statistics to determine relationships among

    members of large data sets.

    The use of these tools for the analysis and interpretation of various types of biological

    data, including DNA, RNA and protein sequences, protein structures, gene expressionProfiles, and biochemical pathways.

    The term bioinformatics first came into use in the 1990s and was originally

    synonymous with the management and analysis of DNA, RNA and protein sequence data.

    Computational tools for sequence analysis had been available since the 1960s, but this was a

    minority interest until advances in sequencing technology led to a rapid expansion in the

    number of stored sequences in databases such as GenBank. Now, the term has expanded to

    incorporate many other types of biological data, for example protein structures, gene

    expression profiles and protein interactions. Each of these areas requires its own set of

    databases, algorithms and statistical methods.

    Second, computers are required for their problem-solving power. Typical problems that

    might be addressed using bioinformatics could include solving the folding pathways of protein

    given its amino acid sequence, or deducing a biochemical pathway given a collection of RNA

    expression profiles. Computers can help with such problems, but it is important to note that

    expert input and robust original data are also required.

    HOMOLOGY MODELLING OF NATTOKINASE2

  • 8/3/2019 Mol.modelling of Nattokinase Final

    3/33

    The future of bioinformatics is integration. For example, integration of a wide variety

    of data sources such as clinical and genomic data will allow us to use disease symptoms to

    predict genetic mutations and vice versa. The integration of GIS data, such as maps, weather

    systems, with crop health and genotype data, will allow us to predict successful outcomes of

    agriculture experiments. Another future area of research in bioinformatics is large-scale

    comparative genomics. For example, the development of tools that can do 10-way comparisons

    of genomes will push forward the discovery rate in this field of bioinformatics. Along these

    lines, the modeling and visualization of full networks of complex systems could be used in the

    future to predict how the system (or cell) reacts to a drug for example.

    A technical set of challenges faces bioinformatics and is being addressed by faster

    computers, technological advances in disk storage space, and increased bandwidth. Finally, a

    key research question for the future of bioinformatics will be how to computationally compare

    complex biological observations, such as gene expression patterns and protein networks.

    Bioinformatics is about converting biological observations to a model that a computer will

    understand. This is a very challenging task since biology can be very complex. This problem of

    how to digitize phenotypic data such as behavior, electrocardiograms, and crop health into a

    computer readable form offers exciting challenges for future bioinformaticians.

    HOMOLOGY MODELING

    Homology modeling, also known as comparative modeling of protein refers to

    constructing an atomic-resolution model of the "target" protein from its amino acid

    sequence and an experimental three-dimensional structure of a related homologous protein

    (the "template"). Homology modeling relies on the identification of one or more known

    protein structures likely to resemble the structure of the query sequence, and on the

    production of an alignment that maps residues in the query sequence to residues in the

    template sequence. The sequence alignment and template structure are then used to

    produce a structural model of the target. Because protein structures are more conserved

    than DNA sequences, detectable levels of sequence similarity usually imply significant

    structural similarity

    HOMOLOGY MODELLING OF NATTOKINASE3

  • 8/3/2019 Mol.modelling of Nattokinase Final

    4/33

    The quality of the homology model is dependent on the quality of the sequence

    alignment and template structure. The approach can be complicated by the presence of

    alignment gaps (commonly called indels) that indicate a structural region present in the

    target but not in the template, and by structure gaps in the template that arise from poor

    resolution in the experimental procedure (usually X-ray crystallography) used to solve the

    structure. Model quality declines with decreasing sequence identity; a typical model has

    ~1-2 root mean square deviation between the matched C atoms at 70% sequence

    identity but only 2-4 agreement at 25% sequence identity. However, the errors are

    significantly higher in the loop regions, where the amino acid sequences of the target and

    template proteins may be completely different.

    Regions of the model that were constructed without a template, usually by loop

    modeling, are generally much less accurate than the rest of the model. Errors in side chain

    packing and position also increase with decreasing identity, and variations in these packing

    configurations have been suggested as a major reason for poor model quality at low

    identity. Taken together, these various atomic-position errors are significant and impede

    the use of homology models for purposes that require atomic-resolution data, such as drug

    design and protein-protein interaction predictions; even the quaternary structure of a

    protein may be difficult to predict from homology models of its subunit(s). Nevertheless,

    homology models can be useful in reaching qualitative conclusions about the biochemistry

    of the query sequence, especially in formulating hypotheses about why certain residues are

    conserved, which may in turn lead to experiments to test those hypotheses. For example,

    the spatial arrangement of conserved residues may suggest whether a particular residue is

    conserved to stabilize the folding, to participate in binding some small molecule, or to

    foster association with another protein or nucleic acid.

    Homology modeling can produce high-quality structural models when the target

    and template are closely related, which has inspired the formation of a structural genomics

    consortium dedicated to the production of representative experimental structures for all

    classes of protein folds. The chief inaccuracies in homology modeling, which worsen with

    lower sequence identity, derive from errors in the initial sequence alignment and from

    improper template selection. Like other methods of structure prediction, current practice in

    HOMOLOGY MODELLING OF NATTOKINASE4

  • 8/3/2019 Mol.modelling of Nattokinase Final

    5/33

    homology modeling is assessed in a biannual large-scale experiment known as the Critical

    Assessment of Techniques for Protein Structure Prediction, or CASP.

    MODELLER

    MODELLER is a computer program used in producing homology models of

    protein tertiary structures as well as quaternary structures (rarer). It implements a technique

    inspired by nuclear magnetic resonance known as satisfaction of spatial restraints, by

    which a set of geometrical criteria are used to create a probability density function for the

    location of each atom in the protein. The method relies on an input sequence alignment

    between the target amino acid sequence to be modeled and a template protein whose

    structure has been solved.

    MODELLER was originally written and is currently maintained by Andrej Sali

    at the University of California, San Francisco. Although it is freely available for academic

    use, graphical user interfaces and commercial versions are distributed by Accelrys.

    MODELLER is most frequently used for homology or comparative protein structure

    modeling: The user provides an alignment of a sequence to be modeled with known related

    structures and MODELLER will automatically calculate a model with all non-hydrogen

    atoms. MODELLER can also perform multiple comparisons of protein sequences and/or

    structures, clustering of proteins, and searching of sequence databases. The program is

    used with a scripting language and does not include any graphics. MODELLER implements

    an automated approach to comparative protein structure modeling by satisfaction of spatial

    restraints.

    Briefly, the core modeling procedure begins with an alignment of the

    sequence to be modeled (target) with related known 3D structures (templates). This

    alignment is usually the input to the program. The output is a 3D model for the targetsequence containing all main chain and side chain non-hydrogen atoms. Given an

    alignment, the model is obtained without any user intervention.

    HOMOLOGY MODELLING OF NATTOKINASE5

  • 8/3/2019 Mol.modelling of Nattokinase Final

    6/33

    Method for comparative protein structure modeling by

    Modeller

    Modeller implements an automated approach to comparative protein structure

    modeling by satisfaction of spatial Briefly, the core modeling procedure begins with an

    alignment of the sequence to be modeled (target) with related known 3D structures

    (templates). This alignment is usually the input to the program. The output is a 3D model

    for the target sequence containing all main chain and side chain non hydrogen atoms.

    Given an alignment, the model is obtained without any user intervention. First, many

    distance and dihedral angle restraints on the target sequence are calculated from its

    alignment with template 3D structures. The form of these restraints was obtained from a

    statistical analysis of the relationships between many pairs of homologous structures. Thisanalysis relied on a database of 105 family alignments that included 416 proteins with

    known three dimensional structure. By scanning the database, tables quantifying various

    correlations were obtained, such as the correlations between two equivalents C_ C_

    distances, or between equivalent main chain dihedral angles from two related proteins.

    These relationships were expressed as conditional probability density functions (pdf) and

    can be used directly as spatial restraints. For example, probabilities for different values of

    the main chain dihedral angles are calculated from the type of a residue considered, from

    main chain conformation of an equivalent residue, and from sequence similarity between

    the two proteins. Another example is the pdf for a certain C_C_ distance given equivalent

    distances in two related protein structures.

    Using Modeller for comparative modeling

    Simple demonstrations of Modeller in all steps of comparative protein structure

    modeling, including fold assignment, sequence-structure alignment, model building, and

    model assessment, can be found in references listed http://salilab.org /modeler

    /documentation.html. A number of additional tools useful in comparative modeling are

    listed at http://salilab.org/bioinformatics resources.shtml.

    HOMOLOGY MODELLING OF NATTOKINASE6

  • 8/3/2019 Mol.modelling of Nattokinase Final

    7/33

    The rest of this section is a hands on description of the most basic use of Modeller

    in comparative modeling, in which the input are Protein Data Bank (PDB) atom files of

    known protein structures, and their alignment with the target sequence to be modeled, and

    the output is a model for the target that includes all non-hydrogen atoms. Although

    Modeller can find template structures as well as calculate sequence and structure

    alignments, it is better in the difficult cases to identify the templates and prepare the

    alignment carefully by other means.

    The sample input files in this tutorial can be found in the examples/auto model

    directory of the Modeller distribution. There are three kinds of input files: Protein Data

    Bank atom files with coordinates for the template structures, the alignment file with the

    alignment of the template structures with the target sequence, and Modeller commands in

    script files that instruct Modeller what to do.

    Each atom file is named code.atm where code is a short protein code, preferably the

    PDB code; for example,Peptococcus aerogenes ferredoxin would be in a file 1fdx.atm. If

    you wish, you can also use file extensions .pdb and .ent instead of .atm. The code must be

    used as that proteins identifier throughout the modeling.

    Influence of the alignment on the quality of the model cannot be overemphasized.

    To obtain the best possible model, it is important to understand how the alignment is used

    by Modeller [Sali & Blundell, 1993]. In outline, for the aligned regions, Modeller tries to

    derive a 3D model for the target sequence that is as close to one or the other of the

    template structures as possible while also satisfying stereo chemical restraints ( e.g., bond

    lengths, angles, non-bonded atom contacts, the inserted regions, which do not have any

    equivalent segments in any of the templates, are modeled in the context of the whole

    molecule, but using their sequence alone. This way of deriving a model means that

    whenever a user aligns a target residue with a template residue, he tells Modeller to treatthe aligned residues as structurally equivalent. Command alignment. Check () can be used

    to find some trivial alignment mistakes.

    HOMOLOGY MODELLING OF NATTOKINASE7

  • 8/3/2019 Mol.modelling of Nattokinase Final

    8/33

    Modeller is a command-line only tool, and has no graphical user interface; instead,

    you must provide it with a script file containing Modeller commands. This is an ordinary

    Python script.

    Modeller is a command-line only tool, and has no graphical user interface; instead,you must provide it with a script file containing Modeller commands. This is an ordinary

    Python script. If you are not familiar with Python, you can simply adapt one of the many

    examples in the examples directory, or look at the code for the classes used by Modeller

    itself, in the modlib/modeller directory. Finally, there are many resources for learning

    Python itself, such as a comprehensive tutorial at http://www.python.org/doc/2.3.5/tut/

    To run Modeller with the script file model-default.py above, do the following:

    1. On Windows: Click on the Modeller link on your Start Menu. This will give

    you a Windows Command Prompt, set up for you to run Modeller.

    2. Change to the directory containing the script and alignment files you created

    earlier, using the cd command.

    3. Run Modeller itself by typing the following at the command prompt:

    4. Mod9v7 model-default.py

    A number of intermediary files are created as the program proceeds. After about 10

    seconds on a modern PC, the final 1fdx model is written to file 1fdx.B99990001.pdb.

    Examine the model-default.log file for information about the run. In particular, one should

    always check the output of the alignment. Check () command, which you can find by

    searching for check a. Also,check for warning and error messages by searching for W>

    and E>, respectively. There should be no error messages; most often, there are some

    warning messages that can usually be ignored.

    HOMOLOGY MODELLING OF NATTOKINASE8

  • 8/3/2019 Mol.modelling of Nattokinase Final

    9/33

    2. REVIEW OF LITERATURE

    Nattokinase (NK) is a potent fibrinolytic enzyme from Bacillus natto. Closely

    resembling plasmin, NK dissolves fibrin directly. In addition, it also enhances the bodys

    production of both plasmin and other clot-dissolving agents, including urokinase. In someways, NK is actually superior to conventional clot-dissolving drugs, which has many

    benefits including convenience of oral administration, confirmed efficacy, prolonged

    effects, cost effectiveness and can be used preventatively. NK has demonstrated stability of

    pH and temperature so that it can occur stably in the gastrointestinal tract

    NK is a single-chain structure comprised of 275 amino acids and has no

    intramolecular disulfide bond(Nakamura et al.,1992) .Belonging to subtilisin family of

    serine protease, NK has the same conservative catalytic triad (D32, H64, S221) and

    oxyanion hole (N155)(Yong et al.,2003). The binding sites (S125, L126, G127) of

    substrate also position the binding pockets S1 and S4 of subtilisin (Bryan P.N et al., 2003).

    NK keeps highly homologous character with most of subtilisins and the 3D structures of

    many subtilisins have been obtained by using X-ray crystal diffraction and NMR. But the

    3D structure of NK is still unknown.

    The homology model for NK was generated by using the 3D structures of SB, SC,

    SE and SS, which was based on the sequence homology of 84.9%, 67.8%, 98.9% and

    62.92% between NK and them. In order to understand the catalyzing mechanism and

    substrate specificity of NK, several substrates have been docked into the active site of the

    model structure with Lamarckian Genetic Algorithm. The interaction between NK and

    substrates has been determined by calculating the hydrogen bonds of the binding site for

    the enzymesubstrate complexes. Based on our work, we attempt to explain the

    interrelation between the structure and the function of NK.

    HOMOLOGY MODELLING OF NATTOKINASE9

  • 8/3/2019 Mol.modelling of Nattokinase Final

    10/33

    Sequence and structure alignment

    Sequence of NK was from NCBI protein database (GenBank accession no. is

    S51909). Sequences and structures of SB, SC, SE and SS, all fromBacillus subtilis family,

    were obtained from the RSCB protein data bank (PDB ID are 1AU9, 1AF4, 1SCJ and

    1GCI, respectively).

    Sequence alignment was derived using the CLUSTAL W program, and default

    parameters were applied (Higgins et al., 1994). Structure alignment was obtained and

    analyzed by using GRASP package with default parameters (Nicholls et al., 1991) and

    both aligned results were inspected and adjusted manually to minimize the number of gaps

    and insertions.

    SUPPORT FOR HEALTHY BLOOD FLOW AND CIRCULATION

    Nattokinase is a systemic enzyme isolated from the traditional Japanese soy food,

    natto. It has been shown to support healthy blood flow by assisting the circulatory clearing

    system of the body.

    Nattokinase is a soybean food content. It is a 275 amino acid peptide. It is said to

    have similar clot-dissolving abilities as does plasmin, an enzyme that we all have in ourblood as our natural defense mechanism to dissolve unwanted blood clots. The "clot

    busters" used in clinical medicine (tPA=tissue plasminogen activator, streptokinase,

    urokinase, etc) to dissolve blood clots that have led to heart attacks, strokes, pulmonary

    embolism or deep vein thrombosis, all work through enhancing plasmin's action. They

    have to be given intravenously, because they are not active when given orally.

    Nattokinase increases the clot dissolving activities of blood in animals and human

    volunteers and that it suppresses clot formation and enhances clot resolution in animals.

    However, to my knowledge, only one clinical study has been performed to assess whether

    Nattokinase has any real benefit in the prevention of blood clots in humans. In that study

    Nattokinase or placebo were given to individuals prior to long distance (7-8 hours) flights.

    Of the 92 individuals in the placebo group 7 developed a clot, all without symptoms,

    HOMOLOGY MODELLING OF NATTOKINASE10

  • 8/3/2019 Mol.modelling of Nattokinase Final

    11/33

    discovered by ultrasound; of the 94 individuals in the Nattokinase group none developed a

    clot. Main flaw of the study, limiting the usefulness of its conclusions, is, that the

    publication does not indicate whether this was a double-blinded study, or, at least, an

    investigator-blinded study. A non-blinded study has the potential for bias, limiting the

    validity of its findings and conclusions.

    Importance of hydrogen bonds in the active site of the subtilisin nattokinase

    Hydrogen bonds occurring in the catalytic triad (Asp32, His64 and Ser221) and the

    oxyanion hole (Asn155) are very important to the catalysis of peptide bond hydrolysis by

    serine proteases. For nattokinase, a bacterial serine protease, construction and analysis of a

    three-dimensional structural model suggested that several hydrogen bonds formed by four

    residues function to stabilize the transition state of the hydrolysis reaction. These four

    residues are Ser33, Asp60, Ser62 and Thr220. In order to remove the effect of these hydrogen

    bonds, four mutants (Ser33-Ala33, Asp60-Ala60, Ser 62-Ala62, and Thr220-Ala220) were

    constructed by site-directed mutagenesis. The results of enzyme kinetics indicated that

    removal of these hydrogen bonds increases the free-energy of the transition state ( GT).

    We concluded that these hydrogen bonds are more important for catalysis than for binding

    the substrate, because removal of these bonds mainly affects the kcat but not theKm values.

    A substrate, SUB1 (succinyl-Ala-Ala-Pro-Phe-p-nitroanilide), was used during enzymekinetics experiments. In the present study we have also shown the results of FEP (free-

    energy perturbation) calculations with regard to the binding and catalysis reactions for

    these mutant subtilisins. The calculated difference in FEP also suggested that these four

    residues are more important for catalysis than binding of the substrate, and the simulated

    values compared well with the experimental values from enzyme kinetics.

    The results of molecular dynamics simulations further demonstrated that removal

    of these hydrogen bonds partially releases Asp32, His64 and Asn155 so that the stability of the

    transition state decreases. Another substrate, SUB2 (H-D-Val-Leu-Lys-p-nitroanilide), was

    used for FEP calculations and MD simulations.

    HOMOLOGY MODELLING OF NATTOKINASE11

  • 8/3/2019 Mol.modelling of Nattokinase Final

    12/33

    3. MATERIALS AND METHODS

    Homology modeling is an improved method based on the fact that homologousproteins have similar 3D structures. In the case that a homologue of the protein of interest

    is available, with such tools as MODELLER, it's possible to build a model from the

    template 3D coordinates and an alignment of amino-acids sequences. MODELLER applies

    the structure of the template to the protein of interest taking into account the sequence

    constraints (steric clashes, electrostatic interactions, amino acids secondary structure

    propensities, etc).

    3.1 STEPS IN HOMOLOGY MODELING

    1. Selection of Template molecule

    2. Alignment of Template with Target

    3. Model Generation

    4. Model Assessment

    3.1.1 Template Selection

    If the percentage sequence identity between the sequence of interest and a protein

    with known structure is high enough (more than 25 or 30 %) simple database search

    programs like FASTA or BLAST are clearly adequate to detect the homology.

    3.1.2 Template Alignment

    A critical step in the development of a homology model is the alignment of the

    unknown sequence with the homologues. Factors to be considered when performing an

    alignment are

    (1) Which algorithm to use for sequence alignment

    (2) Which scoring method to apply

    (3) Whether and how to assign gap penalties

    HOMOLOGY MODELLING OF NATTOKINASE12

  • 8/3/2019 Mol.modelling of Nattokinase Final

    13/33

    3.1.3 Model Generation

    Given a template and an alignment, the information contained therein must be used

    to generate a three-dimensional structural model of the target, represented as a set of

    Cartesian coordinates for each atom in the protein. Three major classes of model

    generation methods have been proposed.

    3.1.4 Fragment assembly

    The original method of homology modeling relied on the assembly of a complete

    model from conserved structural fragments identified in closely related solved structures.

    For example, a modeling study of serine proteases in mammals identified a sharp

    distinction between "core" structural regions conserved in all experimental structures in the

    class, and variable regions typically located in the loops where the majority of the

    sequence differences were localized. Thus unsolved proteins could be modeled by first

    constructing the conserved core and then substituting variable regions from other proteins

    in the set of solved structures. Current implementations of this method differ mainly in the

    way they deal with regions that are not conserved or that lack a template.

    3.1.5 Segment matching

    The segment-matching method divides the target into a series of short segments,

    each of which is matched to its own template fitted from the Protein Data Bank. Thus,

    sequence alignment is done over segments rather than over the entire protein. Selection of

    the template for each segment is based on sequence similarity, comparisons of alpha

    carbon coordinates, and predicted steric conflicts arising from the van der Waals radii of

    the divergent atoms between target and template.

    3.1.6 Model Assessment

    Assessment of homology models without reference to the true target structure is

    usually performed with two methods: statistical potentials or physics-based energy

    calculations. Both methods produce an estimate of the energy (or an energy-like analog)

    HOMOLOGY MODELLING OF NATTOKINASE13

  • 8/3/2019 Mol.modelling of Nattokinase Final

    14/33

    for the model or models being assessed; independent criteria are needed to determine

    acceptable cutoffs. Neither of the two methods correlates exceptionally well with true

    structural accuracy, especially on protein types underrepresented in the PDB, such as

    membrane proteins.

    3.2 NATIONAL CENTER FOR BIOTECHNOLOGY INFORMATION (NCBI)

    The National Center for Biotechnology Information advances science and health by

    providing access to biomedical and genomic information.

    The National Center for Biotechnology Information (NCBI) is part of the

    United States National Library of Medicine (NLM), a branch of the National Institutes of

    Health. The NCBI is located in Bethesda, Maryland and was founded in 1988 through

    legislation sponsored by Senator Claude Pepper. The NCBI houses genome sequencingdata in GenBank and an index of biomedical research articles in Pub Med Central and Pub

    Med, as well as other information relevant to biotechnology. All these databases are

    available online through the Entrez search engine.

    3.3 BASIC LOCAL ALIGNMENT SEARCH TOOL(BLAST)

    HOMOLOGY MODELLING OF NATTOKINASE14

  • 8/3/2019 Mol.modelling of Nattokinase Final

    15/33

    In Bioinformatics, Basic Local Alignment Search Tool, orBLAST, is an algorithm

    for comparing primary biological sequence information, such as the amino-acid sequences

    of different proteins or the nucleotides of DNA sequences. A BLAST search enables a

    researcher to compare a query sequence with a library or database of sequences, and

    identify library sequences that resemble the query sequence above a certain threshold. Forexample, following the discovery of a previously unknown gene in the mouse, a scientist

    will typically perform a BLAST search of the human genome to see if humans carry a

    similar gene; BLAST will identify sequences in the human genome that resemble the

    mouse gene based on similarity of sequence.

    HOMOLOGY MODELLING OF NATTOKINASE15

  • 8/3/2019 Mol.modelling of Nattokinase Final

    16/33

    The four programs perform the following tasks

    a) Blastp

    Compares an amino acid query sequence against a protein sequence database

    b) Blastn

    Compares a nucleotide query sequence against a nucleotide sequence database

    c) Blastx

    Compares the six-frame conceptual translation products of a nucleotide query

    sequence (both strands) against a protein sequence database

    3.3.1 Working of Blast

    The fundamental unit of BLAST algorithm output is the High-scoring Segment Pair

    (HSP), wherein each segment of the pair is an equal-length but arbitrarily long run of

    contiguous residues for which the aggregate alignment score against the other segment in

    the pair is locally maximal and, further, meets or exceeds some positive-valued threshold

    or cutoff score.

    A (possibly empty) set of HSPs is thus defined by two sequences, a scoring system,

    and a cutoff score.

    In the programmatic implementations of the BLAST algorithm described here, each

    HSP consists of a segment from the query sequence and one from a database

    sequence.

    The cutoff score has been parameterized to permit the programs' sensitivity and

    selectivity to be adjusted.

    A Maximal-scoring Segment Pair (MSP) is defined by two sequences and a scoring

    system and is the highest-scoring of all possible segment pairs that can be produced

    HOMOLOGY MODELLING OF NATTOKINASE16

  • 8/3/2019 Mol.modelling of Nattokinase Final

    17/33

    from the two sequences.The methods of are applicable to determining the statistical

    significance of MSP scores in the limit of infinitely long sequences, under a

    random sequence model that assumes independent and identically distributed

    residues at each sequence position.

    In the programs described here, statistics have been extrapolated to assessing the

    significance of HSP scores obtained from comparisons of biological sequences

    within the context of a database search.

    The approach to similarity searching taken by the BLAST programs is first to look

    for similar segments between the query sequence and a database sequence, then to

    evaluate the statistical significance of any matches that were found, and finally to

    report only those matches that satisfy a user-selectable threshold of significance.

    3.4 PROTEIN DATA BANK (PDB):

    HOMOLOGY MODELLING OF NATTOKINASE17

  • 8/3/2019 Mol.modelling of Nattokinase Final

    18/33

    The PDB archive contains information about experimentally-determined structures

    of proteins, nucleic acids, and complex assemblies. As a member of the PDB, the RCSB

    PDB curates and annotates PDB data according to agreed upon standards.

    The RCSB PDB also provides a variety of tools and resources. Users can perform

    simple and advanced searches based on annotations relating to sequence, structure and

    function. These molecules are visualized, downloaded, and analyzed by users who range

    from students to specialized scientists.

    The PDB is a key resource in areas of structural biology, such as structural

    genomics. Most major scientific journals, and some funding agencies, such as the NIH in

    the USA, now require scientists to submit their structure data to the PDB. If the contents of

    the PDB are thought of as primary data, then there are hundreds of derived (i.e., secondary)

    databases that categorize the data differently. For example, both SCOP and CATH

    categorize structures according to type of structure and assumed evolutionary relations;

    GO categorize structures based on genes.

    3.5 MODELLER:

    Modeller is a computer program that models three-dimensional structures of

    proteins and their assemblies by satisfaction of spatial restraints Modeller is most

    frequently used for homology or comparative protein structure modeling: The user

    provides an alignment of a sequence to be modeled with known related structures and

    Modeller will automatically calculate a model with all non-hydrogen atoms.

    HOMOLOGY MODELLING OF NATTOKINASE18

  • 8/3/2019 Mol.modelling of Nattokinase Final

    19/33

    3.5.1 TYPES OF MODELLER:

    There are 5 types in modeller

    a) Basic Modeling

    Model a sequence with high identity to a template. This exercise introduces the use

    of MODELLER in a simple case where the template selection and target-templatealignments are not a problem

    b) Advanced Modeling

    Model a sequence based on multiple templates and bound to a ligand. This exercise

    introduces the use of multiple templates, ligands and loop refinement in the process of

    model building with MODELLER.

    c) Iterative Modeling

    Increase the accuracy of the modeling exercise by iterating the 4 step process. This

    exercise introduces the concept of MOULDING to improve the accuracy of comparative

    models.

    d) Difficult Modeling

    HOMOLOGY MODELLING OF NATTOKINASE19

  • 8/3/2019 Mol.modelling of Nattokinase Final

    20/33

    Model a sequence based on a low identity to a template. This exercise uses

    resources external to MODELLER in order to select a template for a difficult case of

    protein structure prediction.

    e) Modeling with Cyro-Em

    Model a sequence using both template and cryo-EM data. This exercise assesses

    the quality of generated models and loops by rigid fitting into cryo-EM maps, and

    improves them with flexible EM fitting

    The methods are applicable to determining the statistical significance of MSP

    scores in the limit of infinitely long sequences, under a random sequence model

    that assumes independent and identically distributed residues at each sequence

    position.

    4. RESULTS AND DISCUSSION

    NATTOKINASE [Bacillus subtilis subsp. natto]

    NattoKinase has the identity of sequence length including 275 amino acids. So, the

    consequence of gap would not be considered. Conserved domain of NattoKinase was

    detected in NCBI and is the same as the common secondary structures determined by

    GRASP package. It is interesting that also predicts the same key structures including the

    catalytic triad (D32, H64, S221). the sequence identity of the catalytic domain is as high

    as 99%, which suggests the most important part of the sequence for catalytic activity is

    most conserved. The binding pocket also has the sequence identity above 90%. Therefore,we conclude that this alignment can be used to construct a reliable 3D model for

    NattoKinase. To predict the structure we Blast our target sequence with the template

    sequence of protein Calcium Independent Subtilisin Bpn Mutant have the similar quality of

    Ramachandran plots, which are acceptable for the relatively low percentage of residues

    HOMOLOGY MODELLING OF NATTOKINASE20

  • 8/3/2019 Mol.modelling of Nattokinase Final

    21/33

    having disallowed torsional angels. Secondary structures have been investigated by

    GRASP package, and we found that has more extent secondary structures and better

    stereochemistry character, which allows further refinement. The quality of the

    Ramachandran plot as well as the goodness factors was found to be better . And no

    residues have disallowed conformations . Thus, the above analysis suggests the backbone

    conformations to be better than those of the templates. Result shows that total, potential

    and kinetic energies are always remained constant during the simulation and the protein

    size also remained constant. It can be seen that the system remains in equilibrium during

    the entire simulation. Then, we concluded that predicted structure is stable at room

    temperature. In summary, the quality of the backbone conformation, the residue

    interaction, the residue contact and the dynamic stability of the structure are all well within

    the limits established for reliable structures. It suggests that structure of NattoKinase is

    obtained to characterize proteinsubstrate interactions and to investigate the relation

    between the structure and function.

    BLAST OUTPUT

    HOMOLOGY MODELLING OF NATTOKINASE21

  • 8/3/2019 Mol.modelling of Nattokinase Final

    22/33

    HOMOLOGY MODELLING OF NATTOKINASE22

  • 8/3/2019 Mol.modelling of Nattokinase Final

    23/33

    HOMOLOGY MODELLING OF NATTOKINASE23

  • 8/3/2019 Mol.modelling of Nattokinase Final

    24/33

    The Fasta Format of the Target Sequence

    >gi|58866693|gb|AAW83000.1| nattokinase [Bacillus subtilis subsp. natto]

    MAFSNMSAQAAGKSSTEKKYIVGFKQTMSAMSSAKKKDVISEKGGKVQKQFKYVNAAAATLDEKAVKELK

    KDPSVAYVEEDHIAHEYAQSVPYGISQIKAPALHSQGYTGSNVKVAVIDSGIDSSHPDLNVRGGASFVPS

    ETNPYQDGSSHGTHVAGTIAALNNSIGVLGVAPSASLYAVKVLDSTGSGQYSWIINGIEWAISNNMDVIN

    MSLGGPTGSTALKTVVDKAVSSGIVVAAAAGNEGSSGSTSTVGYPAKYPSTIAVGAVNSSDQRASFSSVG

    SELDVMAPGVSIQSTLPGGTYGAYNGTSMATPHVAGAAALILSKHPTWTNAQVRDRLESTATYLGNSFYY

    GKGLINVQAAAH

    Template sequence

    >gi|21730195|pdb|1GNV|A Chain A, Calcium Independent Subtilisin Bpn'

    Mutant

    AKCVSYGVSQIKAPALHSQGYTGSNVKVAVIDSGIDSSHPDLNVAGGASFVPSETNPFQDNNSHGTHVAG

    TVLAVAPSASLYAVKVLGADGSGQYSWIINGIEWAIANNMDVINMSLGGPSGSAALKAAVDKAVASGVVV

    VAAAGNEGTSGSSSTVGYPGKYPSVIAVGAVDSSNQRASFSSVGPELDVMAPGVSICSTLPGNKYGAKSG

    TXMASPHVAGAAALILSKHPNWTNTQVRSSLENTTTKLGDSFYYGKGLINVEAAAQ

    HOMOLOGY MODELLING OF NATTOKINASE24

  • 8/3/2019 Mol.modelling of Nattokinase Final

    25/33

    STRUCTURE

    HOMOLOGY MODELLING OF NATTOKINASE25

  • 8/3/2019 Mol.modelling of Nattokinase Final

    26/33

    ALIGNMENT RESULT

    _aln.pos 10 20 30 40 50 60

    1gnvA

    --------------------------------------------------------------------

    nat

    MAFSNMSAQAAGKSSTEKKYIVGFKQTMSAMSSAKKKDVISEKGGKVQKQFKYVNAAAATLDEKAVKE

    _consrvd

    _aln.p 70 80 90 100 110 120 130

    1gnvA

    -------------------AKCVSYGVSQIKAPALHSQGYTGSNVKVAVIDSGIDSSHPDLNVAGGAS

    natLKKDPSVAYVEEDHIAHEYAQSVPYGISQIKAPALHSQGYTGSNVKVAVIDSGIDSSHPDLNVRGGAS

    _consrvd * * ** ****************************************

    _aln.pos 140 150 160 170 180 190

    200

    1gnvA FVPSETNPFQDNNSHGTHVAGT---------VLAVAPSASLYAVKVLGADGSGQYSWIINGIEWAIAN

    nat

    FVPSETNPYQDGSSHGTHVAGTIAALNNSIGVLGVAPSASLYAVKVLDSTGSGQYSWIINGIEWAISN

    _consrvd ******** ** ********* ** *************

    **************** *

    _aln.pos 210 220 230 240 250 260

    270

    1gnvANMDVINMSLGGPSGSAALKAAVDKAVASGVVVVAAAGNEGTSGSSSTVGYPGKYPSVIAVGAVDSSNQ

    natNMDVINMSLGGPTGSTALKTVVDKAVSSGIVVAAAAGNEGSSGSTSTVGYPAKYPSTIAVGAVNSSNQ

    _consrvd ************ ** *** ***** ** ** ******* *** ****** **** ******

    ****

    HOMOLOGY MODELLING OF NATTOKINASE26

  • 8/3/2019 Mol.modelling of Nattokinase Final

    27/33

    _aln.pos 280 290 300 310 320 330

    340

    1gnvA RASFSSVGPELDVMAPGVSICSTLPGNKYGAKSGT-

    MASPHVAGAAALILSKHPNWTNTQVRSSLENT

    nat

    RASFSSVGSELDVMAPGVSIQSTLPGGTYGAYNGTSMATPHVAGAAALILSKHPTWTNAQVRDRLEST

    _consrvd ******** *********** ***** *** ** ** *************** *** ***

    ** *

    _aln.pos 350 360

    1gnvA TTKLGDSFYYGKGLINVEAAAQ

    nat ATYLGNSFYYGKGLINVQAAAH

    _consrvd * ** *********** ***

    RAMACHANDRAN PLOT

    A Ramachandran plot (also known as a Ramachandran map or a Ramachandran

    diagram or a [,] plot), developed by Gopalasamudram Narayana Ramachandran

    and Viswanathan Sasisekharan is a way to visualize dihedral angles against of

    amino acid residues in protein structure. It shows the possible conformations of and

    angles for a polypeptide

    HOMOLOGY MODELLING OF NATTOKINASE27

  • 8/3/2019 Mol.modelling of Nattokinase Final

    28/33

    HOMOLOGY MODELLING OF NATTOKINASE28

  • 8/3/2019 Mol.modelling of Nattokinase Final

    29/33

    HOMOLOGY MODELLING OF NATTOKINASE29

  • 8/3/2019 Mol.modelling of Nattokinase Final

    30/33

    Evaluation of residues

    Residue [ 19 :LYS] (-116.25, 65.43) in Allowed region

    Residue [ 89 :GLN] (-178.77, 138.46) in Allowed region

    Residue [ 99 :LYS] ( 77.06, 27.85) in Allowed region

    Residue [ 119 :ASP] (-157.41,-149.41) in Allowed region

    Residue [ 150 :SER] ( 89.55, -10.08) in Allowed region

    Residue [ 158 :THR] ( -45.20, -23.52) in Allowed region

    Residue [ 160 :ALA] ( 172.15, 161.68) in Allowed region

    Residue [ 164 :ASN] (-162.66, 107.31) in Allowed region

    Residue [ 166 :ILE] (-151.77,-147.52) in Allowed region

    Residue [ 168 :VAL] ( -59.33, -81.43) in Allowed region

    Residue [ 242 :ASN] (-112.94, 46.23) in Allowed region

    Residue [ 246 :SER] (-140.84, 84.84) in Allowed region

    Residue [ 344 :LEU] ( -80.17, -65.17) in Allowed region

    Residue [ 140 :SER] ( 141.66, -70.87) in Outlier region

    Residue [ 159 :ILE] ( 62.75, 95.13) in Outlier region

    Residue [ 165 :SER] ( 160.35, 72.08) in Outlier region

    Residue [ 289 :GLY] (-166.83, 9.96) in Outlier region

    Number of residues in favored region (~98.0% expected) : 343 ( 95.3%)

    Number of residues in allowed region ( ~2.0% expected) : 13 ( 3.6%)

    Number of residues in outlier region : 4 ( 1.1%)

    HOMOLOGY MODELLING OF NATTOKINASE30

  • 8/3/2019 Mol.modelling of Nattokinase Final

    31/33

    5. CONCLUSION

    Homology modeling was designed and developed for nattokinase [Bacillus subtilissubsp. natto]. enzyme 3D structural model using MODELLER because three dimensional

    structures are not avilableat PDB. The structure of nattokinase [Bacillus subtilis subsp.

    natto] is important for establishing its molecular fuction. The sequence similarity is 99%

    with the template and reliability of the predicted model thus generated using MODELLER.

    The alignment between two proteins shows high identity when compared with other

    protein . The least objective function score was selected for model build and found the

    dope scores for template and least objective function score. . Ramachandran plot predicted

    the number of residues in the most favoured region A,B,Land the percentage is(~98.0%

    expected) : 343 ( 95.3%), Number of residues in allowed region( ~2.0% expected) : 13

    (3.6%), Number of residues in outlier region : 4 (1.1%).

    HOMOLOGY MODELLING OF NATTOKINASE31

  • 8/3/2019 Mol.modelling of Nattokinase Final

    32/33

    6. REFERENCES

    BryanP.N, Protein engineering of subtilisin, Biochemistry. Biophysics. Acta 1543 (2000)

    203222.

    Laskowski.R.A, M.W. MacArthur, D.S. Moss, J.M. Thornton, PROCHECK, J. Appl.

    Cryst. 26 (1993) 283291.

    Nakamura.T, Y. Yamagata, E. Ichishima, Nucleotide sequence of the subtilisn NAT gene,

    aprN of Bacillus Subtilis (natto), Bioscience. Biotechnology. Biochemistry. 56 (11) (1992)

    1869.

    Nicholls.A, K. Sharp, B. Honig, Graphical representation and analysis of structural

    properties, Proteins Struct. Functional. Genetics. 11 (4) (1991) 281

    Rost.B, C. Sander, Prediction of protein secondary structure at better

    than 70% accuracy, J. Mol. Biol. 232 (1993) 584599.

    Sanchez.R, A. Sali, Advances in comparative protein-structure modeling, Curr. Opin.

    Struct. Biol. 7 (1997) 206214.

    Sumi.H, A novel fibrinolytic enzyme in the vegetable cheese natto: a typical and popular

    soybean food in the Japanese diet, Experientia 43 (20) (1987) 11101111.

    Thompson J.D , Higgins D.G , Gibson T.J , CLUSTAL W: improving the sensitivity of

    progressive multiple sequence alignment through sequence weighting, position specific

    gap penalties and weight matrix choice, Nucleic Acids Res. 22 (1994) 46734680.

    Yong.P, H. Qing, Z. Ren-huai, Z. Yi-zheng, Purification and characterization of a

    fibrinolytic enzyme produced by Bacillus amyloliquefaciens DC-4 screened from douchi, a

    traditional Chinese soybean food, Comp. Biochemistry. Physiol. 134 (2003) 4552.

    HOMOLOGY MODELLING OF NATTOKINASE32

  • 8/3/2019 Mol.modelling of Nattokinase Final

    33/33

    Zhong-liang Zheng, Mao-qing Ye, Zhen-yu Zuo, Zhi-gang Liu, Keng-chang Tai, and Guo-

    lin Zou Biochemistry . 2006 395(Pt 3): 509515.