Multialign New
-
Upload
sndppm7878 -
Category
Documents
-
view
218 -
download
0
Transcript of Multialign New
8/12/2019 Multialign New
http://slidepdf.com/reader/full/multialign-new 1/24
Multiple sequence alignment
8/12/2019 Multialign New
http://slidepdf.com/reader/full/multialign-new 2/24
Why we do multiple alignments?
• Multiple nucleotide or amino sequencealignment techniques are usually performed to
fit one of the following scopes :
– In order to characterize protein families,identify shared regions of homology in a
multiple sequence alignment; (this happens
generally when a sequence search revealed
homologies to several sequences) – Determination of the consensus sequence of
several aligned sequences.
8/12/2019 Multialign New
http://slidepdf.com/reader/full/multialign-new 3/24
Why we do multiple alignments?
– Help prediction of the secondary and tertiary
structures of new sequences;
– Preliminary step in molecular evolutionanalysis using Phylogenetic methods for
constructing phylogenetic trees.
8/12/2019 Multialign New
http://slidepdf.com/reader/full/multialign-new 4/24
An example of Multiple Alignment
VTISCTGSSSNIGAG-NHVK W YQQLPG VTISCTGTSSNIGS--ITVN W YQQLPG
LRLSCSSSGFIFSS--YAMY W VRQAPG
LSLTCTVSGTSFDD--YYST W VRQPPG
PEVTCVVVDVSHEDPQVKFN W YVDG--
ATLVCLISDFYPGA--VTVA W KADS--AALGCLVKDYFPEP--VTVS W NSG---
VSLTCLVKGFYPSD--IAVE W WSNG--
8/12/2019 Multialign New
http://slidepdf.com/reader/full/multialign-new 5/24
Multiple Alignment Method
• The most practical and widely used method
in multiple sequence alignment is the
hierarchical extensions of pairwise
alignment methods.
• The principal is that multiple alignments is
achieved by successive application of
pairwise methods.
8/12/2019 Multialign New
http://slidepdf.com/reader/full/multialign-new 6/24
Multiple Alignment Method• The steps are summarized as follows:
• Compare all sequences pairwise.
• Perform cluster analysis on the pairwise data to generate
a hierarchy for alignment. This may be in the form of a
binary tree or a simple ordering
• Build the multiple alignment by first aligning the most
similar pair of sequences, then the next most similar pair
and so on. Once an alignment of two sequences has
been made, then this is fixed. Thus for a set of sequences A, B, C, D having aligned A with C and B with D the
alignment of A, B, C, D is obtained by comparing the
alignments of A and C with that of B and D using averaged
scores at each aligned position.
8/12/2019 Multialign New
http://slidepdf.com/reader/full/multialign-new 8/24
Multiple alignment in GCG• The program available in GCG for multiple
alignment is Pileup.
• The input file for Pileup is a list of sequence
file_names or sequence codes in the database,
created by a text editor.• Pileup creates a multiple sequence alignment from
a group of related sequences using progressive,
pairwise alignments. It can also plot a tree showing
the clustering relationships used to create the
alignment.
• Please note that there is no one absolute alignment,
even for a limited number of sequences.
8/12/2019 Multialign New
http://slidepdf.com/reader/full/multialign-new 9/24
ClustalW- for multiple alignment
• ClustaW is a general purpose multiplealignment program for DNA or proteins.
• ClustalW is produced by Julie D. Thompson,
Toby Gibson of European Molecular BiologyLaboratory, Germany and Desmond Higgins of
European Bioinformatics Institute, Cambridge,
UK. Algorithmic
• ClustalW is cited: improving the sensitivity of
progressive multiple sequence alignment through
sequence weighting, positions-specific gap penalties
and weight matrix choice. Nucleic Acids Research,-
8/12/2019 Multialign New
http://slidepdf.com/reader/full/multialign-new 10/24
ClustalW- for multiple alignment
ClustalW can create multiple alignments,
manipulate existing alignments, do
profile analysis and create phylogentic
trees.
Alignment can be done by 2 methods:
- slow/accurate
- fast/approximate
8/12/2019 Multialign New
http://slidepdf.com/reader/full/multialign-new 11/24
Running ClustalW
[~]% clustalw
********************************************************************** CLUSTAL W (1.7) Multiple Sequence Alignments **********************************************************************
1. Sequence Input From Disc2. Multiple Alignments3. Profile / Structure Alignments4. Phylogenetic trees
S. Execute a system command
H. HELPX. EXIT (leave program)
Your choice:
8/12/2019 Multialign New
http://slidepdf.com/reader/full/multialign-new 12/24
Running ClustalW
The input file for clustalW is a file containing all
sequences in one of the following formats:
NBRF/PIR, EMBL/SwissProt, Pearson (Fasta),GDE, Clustal, GCG/MSF, RSF.
8/12/2019 Multialign New
http://slidepdf.com/reader/full/multialign-new 13/24
Using ClustalW
****** MULTIPLE ALIGNMENT MENU ******
1. Do complete multiple alignment now (Slow/Accurate)2. Produce guide tree file only3. Do alignment using old guide tree file
4. Toggle Slow/Fast pairwise alignments = SLOW
5. Pairwise alignment parameters6. Multiple alignment parameters
7. Reset gaps between alignments? = OFF8. Toggle screen display = ON
9. Output format options
S. Execute a system commandH. HELPor press [RETURN] to go back to main menu
Your choice:
8/12/2019 Multialign New
http://slidepdf.com/reader/full/multialign-new 15/24
ClustalW options Your choice: 5
********* PAIRWISE ALIGNMENT PARAMETERS *********
Slow/Accurate alignments:
1. Gap Open Penalty :15.002. Gap Extension Penalty :6.663. Protein weight matrix :BLOSUM304. DNA weight matrix :IUB
Fast/Approximate alignments:
5. Gap penalty :56. K-tuple (word) size :27. No. of top diagonals :4
8. Window size :4
9. Toggle Slow/Fast pairwise alignments = SLOW
H. HELPEnter number (or [RETURN] to exit):
8/12/2019 Multialign New
http://slidepdf.com/reader/full/multialign-new 16/24
ClustalW options Your choice: 6
********* MULTIPLE ALIGNMENT PARAMETERS *********
1. Gap Opening Penalty :15.002. Gap Extension Penalty :6.66
3. Delay divergent sequences :40 %
4. DNA Transitions Weight :0.50
5. Protein weight matrix :BLOSUM series6. DNA weight matrix :IUB
7. Use negative matrix :OFF
8. Protein Gap Parameters
H. HELP
Enter number (or [RETURN] to exit):
8/12/2019 Multialign New
http://slidepdf.com/reader/full/multialign-new 17/24
ClustalX - Multiple Sequence
Alignment Program
• ClustalX provides a new window-based
user interface to the ClustalW program.
• It uses the Vibrant multi-platform user
interface development library, developed by
the National Center for Biotechnology
Information (Bldg 38A, NIH 8600 RockvillePike,Bethesda, MD 20894) as part of their
NCBI SOFTWARE DEVELOPEMENT TOOLKIT.