Needleman-Wunsch and Smith-Waterman Algorithm

BIOTOOLS

DATABASES

C.GAYATHRI

(I M.Sc.BIOINFORMATICS)

Needleman Wunsch Algorithm

Smith Waterman Algorithm

Published in 1970 by SAUL NEEDLEMAN and CHRISTIAN WUNSCH

General algorithm for sequence comparison

Commonly used in bioinformatics to align protein or nucleotide sequences

Example of dynamic programming, and was the first application of dynamic programming to biological sequence comparison.

Scores for aligned characters are specified by a SIMILARITY MATRIX. Here, S(i, j) is the similarity of characters i and j. It uses a LINEAR GAP PENALTY, here called d.

Maximizes a similarity score, to give ‘MAXIMUM MATCH’

Maximum match = largest number of residues of one sequence that can be matched with another allowing for all possible deletions

Finds the best GLOBAL alignment of any two sequences

N-W involves an iterative matrix method of calculationAll possible pairs of residues (bases or

amino acids) - one from each sequence - are represented in a 2-dimensional array

All possible alignments (comparisons) are represented by pathways through this array

Three main steps

1. Assign similarity values

2. For each cell, allowing insertions and deletions give the maximum possible scoring value

3. Construct an alignment (pathway) back from the highest scoring cell

Similarity values Numerical value is assigned to

every cell (depending on the similarity/dissimilarity of the two residues)

simple scores or more complicated, (chemical similarities or frequency of observed substitutions)

The example shown here has match = +1 mismatch = 0

M P R C L C Q R J N C B AP 1B 1R 1 1C 1 1 1KC 1 1 1R 1N 1J 1C 1 1 1J 1A 1

Score pathways through array

to know the maximum possible score for an alignment

Searches sub rows and sub columns, for the highest score

Adds this to the score for the current cell

Proceeds row by row through the array

Gap penalty for the introduction of gaps in the alignment = 0

M P R C L C Q R J N C B AP 0 1 0 0 0 0 0 0 0 0 0 0 0B 0 0 1 1 1 1 1 1 1 1 1 2 1R 0 0 2 1 1 1 1 2 1 1 1 1 2C 0 0 1 3 2 3 2 2 2 2 3 2 2K 0 0 1 2 3 3 3 3 3 3 3 3 3C 0 0 1 3 3 4 3 3 3 3 4 3 3R 0 0 2 2 3 3 4 ?NJ 1C 1 1 1J 1A 1

Hij=max{Hi-1, j-1 +s(ai,bj), max{Hi-k,j-1 -Wk +s(ai,bj)}, max{Hi-1, j-l -Wl +s(ai,bj)}}

Construct alignment The alignment score is

cumulative by adding along a path through the array

The best alignment has the highest score i.e. the maximum match

Maximum match = largest number resulting from summing the cell values of every pathway

The maximum match will ALWAYS be somewhere in the outer row or column shown

The alignment is constructed by working backwards from the maximum match

M P R C L C Q R J N C B AP 0 1 0 0 0 0 0 0 0 0 0 0 0B 0 0 1 1 1 1 1 1 1 1 1 2 1R 0 0 2 1 1 1 1 2 1 1 1 1 2C 0 0 1 3 2 3 2 2 2 2 3 2 2K 0 0 1 2 3 3 3 3 3 3 3 3 3C 0 0 1 3 3 4 3 3 3 3 4 3 3R 0 0 2 2 3 3 4 5 4 4 4 4 4N 0 0 1 2 3 3 4 4 5 6 5 5 5J 0 0 1 2 3 3 4 4 6 5 6 6 6C 0 0 1 3 3 4 4 4 5 6 7 6 6J 0 0 1 2 3 3 4 4 6 6 6 7 7A 0 0 1 2 3 3 4 4 5 6 6 7 8

Statistical Significance

Maximum match is a function of sequence relationship and composition

Useful to know probability of obtaining result (maximum match) from a pair of random sequences

Estimate this experimentally Sequences from random proteins are taken(I.e.

having same composition as the real proteins) if the value for the random proteins is

significantly different from that for the real proteins then the difference is a function of the sequences alone and not of their composition

Proposed by Temple Smith and Michael Waterman in 1981

Smith-Waterman algorithm is useful for performing local sequence alignment

Determining similar regions between two nucleotide or protein sequences

Instead of looking at entire sequence, it compares segments of all possible lengths and optimizes the similarity measure.

For every cell the algorithm calculates ALL possible paths that can be of any length and contain insertions, deletions and gaps

Works effectively, only when gap penalties are used

In example shown match = +1 mismatch = -1/3 gap = -1+1/3k (k=extent

of gap) Start with all cell values =

0 Looks in sub column and

sub row shown and in direct diagonal for a score that is the highest when you take alignment score or gap penalty into account

C A G C C U C G C U U A GA 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0A 0.0 1.0 0.7 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.7U 0.0 0.0 0.8 0.3 0.0 0.0 0.0 0.0 0.0 1.0 1.0 0.0 0.7G 0.0 0.0 1.0 0.3 0.0 0.0 0.7 1.0 0.0 0.0 0.7 0.7 1.0C 1.0 0.0 0.0 2.0 1.3 0.3 1.0 0.3 2.0 0.7 0.3 0.3 0.3C 1.0 0.7 0.0 1.0 3.0 1.7 ?AUUGACGG

Hij=max{Hi-1, j-1 +s(ai,bj), max{Hi-k,j -Wk}, max{Hi, j-l -Wl}, 0}

Four possible ways of forming a path

For every residue in the query sequence

1. To align with next residue, score =previous score +similarity score2. Deletion (i.e. match residue of query with a gap), score =previous score - gap penalty dependent on size of

the gap Insertion (i.e. match residue of db sequence with a gap, score =previous score - gap penalty dependent on size of

the gap4. Stop when the score is zero

Choose whichever of these which has the highest score

Construct Alignment The score in each cell is

the maximum possible score for an alignment of ANY LENGTH ending at those coordinates

Trace pathway back from highest scoring cell

This cell can be anywhere in the array

Align highest scoring segment

C A G C C U C G C U U A GA 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0A 0.0 1.0 0.7 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.7U 0.0 0.0 0.8 0.3 0.0 0.0 0.0 0.0 0.0 1.0 1.0 0.0 0.7G 0.0 0.0 1.0 0.3 0.0 0.0 0.7 1.0 0.0 0.0 0.7 0.7 1.0C 1.0 0.0 0.0 2.0 1.3 0.3 1.0 0.3 2.0 0.7 0.3 0.3 0.3C 1.0 0.7 0.0 1.0 3.0 1.7 1.3 1.0 1.3 1.7 0.3 0.0 0.0A 0.0 2.0 0.7 0.3 1.7 2.7 1.3 1.0 0.7 1.0 1.3 1.3 0.0U 0.0 0.7 1.7 0.3 1.3 2.7 2.3 1.0 0.7 1.7 2.0 1.0 1.0U 0.0 0.3 0.3 1.3 1.0 2.3 2.3 2.0 0.7 1.7 2.7 1.7 1.0G 0.0 0.0 1.3 0.0 1.0 1.0 2.0 3.3 2.0 1.7 1.3 2.3 2.7A 0.0 1.0 0.0 1.0 0.3 0.7 0.7 2.0 3.0 1.7 1.3 2.3 2.0C 1.0 0.0 0.7 1.0 2.0 0.7 1.7 1.7 3.0 2.7 1.3 1.0 2.0G 0.0 0.7 1.0 0.3 0.7 1.7 0.3 2.7 1.7 2.7 2.3 1.0 2.0G 0.0 0.0 1.7 0.7 0.3 0.3 1.3 1.3 2.3 1.3 2.3 2.0 2.0

GCC-UCGGCCAUUG

Needleman-Wunsch

1. Global alignments

2. Requires alignment score for a pair of residues to be >=0

3. No gap penalty required

4. Score cannot decrease between two cells of a pathway

5. Trace back is mostly from the last cell that has the highest score

Smith-Waterman

1. Local alignments

2. Residue alignment score may be positive or negative

3. Requires a gap penalty to work effectively

4. Score can increase, decrease or stay level between two cells of a pathway

5. Trace back is from the cell that has the highest score

CONCLUSION

Hence from calculating and working many times on these algorithms considering different organisms, it is found that NW and SW algorithms are excellent methods for finding the similarity and dissimilarity between the different organisms

Needleman-Wunsch and Smith-Waterman Algorithm

Documents

Transcript of Needleman-Wunsch and Smith-Waterman Algorithm

Introduction to sequence alignment • The Needleman-Wunsch ... · • The Needleman-Wunsch algorithm for global sequence alignment: description and properties •Local alignment

CHAPTER 5: PAIRWISE SEQUENCE ALIGNMENT AND DATABASE … · Alignment algorithms (preview) •Needleman-Wunsch(1970) and variations: •for aligning two sequences •uses dynamic programming

Improving the Needleman-Wunsch algorithm with the DynaMine ... · •Needleman-Wunsch is a sequence alignment algorithm. It aligns proteins using their amino acid sequences alone.

PENERAPAN ALGORITMA NEEDLEMAN- WUNSCH UNTUK ...

Lecture 4 Protein Sequence Alignment - Technionbioinfo.cs.technion.ac.il/courses/biomed/lectures/... · –Needleman-Wunsch algorithm. 4 Alignment Recap (2) •Global vs Local Alignment

Introduction to BLAST - Shodor to BLAST PowerPoint by Ananth Kalyanaraman ... Smith-Waterman Needleman-Wunsch . SC08 Education Sequence Comparison for Metagenomics 9

The Needleman-Wunsch Algorithm for Sequence Alignment

1 Introduction - ULisboaweb.ist.utl.pt/susanavinga/VINGA_bookchapter.preprint.pdf · 2013-10-30 · sequences, either globally, such as Needleman-Wunch (Needleman and Wunsch, 1970),

Lecture 1, 31/10/2001 - Weizmann Institute of Science · • The Needleman-Wunsch algorithm for global sequence alignment: description and properties. 2 Computational sequence-analysis

Predicting morbidity by Local Similarities in Multi-Scale ... · 9/14/2020 · the Needleman-Wunsch algorithm [7] or locally, us-ing the Smith-Waterman [8]. Both are dynamic pro-gramming

Application of Gurson–Tvergaard–Needleman Constitutive ...

Dynamic(programming(€¦ · Alignment Tutorial The Original Needleman-Wunsch Algorithm Which Can be Easily Adapted to Structural Alignment Mark Gerstein, 1998 ... "=IF(M28="",0,1)+MAX(N45,N46:N47,O45:Y45)"

Faster and efficient algorithm for sequence alignment · Needleman-Wunsch algorithm The standard global alignment algorithm, referred to as Needleman-Wunsch after its original authors.

The Needleman Wunsch algorithm for sequence alignment

Comparação e alinhamento de sequênciasw3.ualg.pt/~jvarela/bioinformatica/T02.pdfAlgoritmo de Needleman-Wunsch Needleman, S.B & Wunsch, C.D (1970) J.Mol.Biol. 48:443 • É um algoritmo

Implementación hardware del algoritmo de Needleman-Wunsch … · 2019. 4. 2. · Implementación hardware del algoritmo de ... (Tnp1) do rato marrom e ao mouse comum. O algoritmo

Waterman Stop Logs - Waterman Industries of Egyptwaterman-industries.com/Waterman_Industries_of_Egypt_-_Stop_Logs... · Waterman Industries of Egypt STOP LOGS STOP LOGS Waterman Stop

Sequence Alignment Cont’d. Needleman-Wunsch with affine gaps Initialization:V(i, 0) = d + (i – 1) e V(0, j) = d + (j – 1) e Iteration: V(i, j) = max{

Needleman-wunch algorithm harshita

Needleman, Introduction to the Gurdjieff Work