Multialign New

24
 Multiple sequence alignment

Transcript of Multialign New

8/12/2019 Multialign New

http://slidepdf.com/reader/full/multialign-new 1/24

 Multiple sequence alignment

8/12/2019 Multialign New

http://slidepdf.com/reader/full/multialign-new 2/24

Why we do multiple alignments? 

• Multiple nucleotide or amino sequencealignment techniques are usually performed to

fit one of the following scopes :

 – In order to characterize protein families,identify shared regions of homology in a

multiple sequence alignment; (this happens

generally when a sequence search revealed

homologies to several sequences) –  Determination of the consensus sequence of

several aligned sequences.

8/12/2019 Multialign New

http://slidepdf.com/reader/full/multialign-new 3/24

Why we do multiple alignments?

 – Help prediction of the secondary and tertiary

structures of new sequences;

 –  Preliminary step in molecular evolutionanalysis using Phylogenetic methods for

constructing phylogenetic trees.

8/12/2019 Multialign New

http://slidepdf.com/reader/full/multialign-new 4/24

 An example of Multiple Alignment

VTISCTGSSSNIGAG-NHVK W YQQLPG VTISCTGTSSNIGS--ITVN W YQQLPG 

LRLSCSSSGFIFSS--YAMY W VRQAPG 

LSLTCTVSGTSFDD--YYST W VRQPPG 

PEVTCVVVDVSHEDPQVKFN W YVDG--

ATLVCLISDFYPGA--VTVA W KADS--AALGCLVKDYFPEP--VTVS W NSG---

VSLTCLVKGFYPSD--IAVE W WSNG--

8/12/2019 Multialign New

http://slidepdf.com/reader/full/multialign-new 5/24

Multiple Alignment Method  

• The most practical and widely used method

in multiple sequence alignment is the

hierarchical extensions of pairwise

alignment methods.

• The principal is that multiple alignments is

achieved by successive application of

pairwise methods.

8/12/2019 Multialign New

http://slidepdf.com/reader/full/multialign-new 6/24

Multiple Alignment Method• The steps are summarized as follows:

• Compare all sequences pairwise.

• Perform cluster analysis on the pairwise data to generate

a hierarchy for alignment. This may be in the form of a

binary tree or a simple ordering

• Build the multiple alignment by first aligning the most

similar pair of sequences, then the next most similar pair

and so on. Once an alignment of two sequences has

been made, then this is fixed. Thus for a set of sequences A, B, C, D having aligned A with C and B with D the

alignment of A, B, C, D is obtained by comparing the

alignments of A and C with that of B and D using averaged

scores at each aligned position.

8/12/2019 Multialign New

http://slidepdf.com/reader/full/multialign-new 7/24

8/12/2019 Multialign New

http://slidepdf.com/reader/full/multialign-new 8/24

 Multiple alignment in GCG• The program available in GCG for multiple

alignment is Pileup.

• The input file for Pileup is a list of sequence

file_names or sequence codes in the database,

created by a text editor.• Pileup creates a multiple sequence alignment from

a group of related sequences using progressive,

pairwise alignments. It can also plot a tree showing

the clustering relationships used to create the

alignment.

• Please note that there is no one absolute alignment,

even for a limited number of sequences.

8/12/2019 Multialign New

http://slidepdf.com/reader/full/multialign-new 9/24

ClustalW- for multiple alignment  

• ClustaW is a general purpose multiplealignment program for DNA or proteins.

• ClustalW is produced by Julie D. Thompson,

Toby Gibson of European Molecular BiologyLaboratory, Germany and Desmond Higgins of

European Bioinformatics Institute, Cambridge,

UK. Algorithmic

• ClustalW is cited: improving the sensitivity of

progressive multiple sequence alignment through

sequence weighting, positions-specific gap penalties

and weight matrix choice. Nucleic Acids Research,-

8/12/2019 Multialign New

http://slidepdf.com/reader/full/multialign-new 10/24

ClustalW- for multiple alignment

ClustalW can create multiple alignments,

manipulate existing alignments, do

profile analysis and create phylogentic

trees.

 Alignment can be done by 2 methods:

- slow/accurate

- fast/approximate

8/12/2019 Multialign New

http://slidepdf.com/reader/full/multialign-new 11/24

Running ClustalW

[~]% clustalw

********************************************************************** CLUSTAL W (1.7) Multiple Sequence Alignments **********************************************************************

1. Sequence Input From Disc2. Multiple Alignments3. Profile / Structure Alignments4. Phylogenetic trees

S. Execute a system command

H. HELPX. EXIT (leave program)

 Your choice:

8/12/2019 Multialign New

http://slidepdf.com/reader/full/multialign-new 12/24

Running ClustalW  

The input file for clustalW is a file containing all

sequences in one of the following formats:

NBRF/PIR, EMBL/SwissProt, Pearson (Fasta),GDE, Clustal, GCG/MSF, RSF.

8/12/2019 Multialign New

http://slidepdf.com/reader/full/multialign-new 13/24

Using ClustalW

****** MULTIPLE ALIGNMENT MENU ******

1. Do complete multiple alignment now (Slow/Accurate)2. Produce guide tree file only3. Do alignment using old guide tree file

4. Toggle Slow/Fast pairwise alignments = SLOW

5. Pairwise alignment parameters6. Multiple alignment parameters

7. Reset gaps between alignments? = OFF8. Toggle screen display = ON

9. Output format options

S. Execute a system commandH. HELPor press [RETURN] to go back to main menu

 Your choice:

8/12/2019 Multialign New

http://slidepdf.com/reader/full/multialign-new 14/24

8/12/2019 Multialign New

http://slidepdf.com/reader/full/multialign-new 15/24

ClustalW options Your choice: 5

********* PAIRWISE ALIGNMENT PARAMETERS *********

Slow/Accurate alignments:

1. Gap Open Penalty :15.002. Gap Extension Penalty :6.663. Protein weight matrix :BLOSUM304. DNA weight matrix :IUB

Fast/Approximate alignments:

5. Gap penalty :56. K-tuple (word) size :27. No. of top diagonals :4

8. Window size :4

9. Toggle Slow/Fast pairwise alignments = SLOW

H. HELPEnter number (or [RETURN] to exit):

8/12/2019 Multialign New

http://slidepdf.com/reader/full/multialign-new 16/24

ClustalW options Your choice: 6

********* MULTIPLE ALIGNMENT PARAMETERS *********

1. Gap Opening Penalty :15.002. Gap Extension Penalty :6.66

3. Delay divergent sequences :40 %

4. DNA Transitions Weight :0.50

5. Protein weight matrix :BLOSUM series6. DNA weight matrix :IUB

7. Use negative matrix :OFF

8. Protein Gap Parameters

H. HELP

Enter number (or [RETURN] to exit):

8/12/2019 Multialign New

http://slidepdf.com/reader/full/multialign-new 17/24

ClustalX - Multiple Sequence

 Alignment Program 

• ClustalX provides a new window-based

user interface to the ClustalW program.

• It uses the Vibrant multi-platform user

interface development library, developed by

the National Center for Biotechnology

Information (Bldg 38A, NIH 8600 RockvillePike,Bethesda, MD 20894) as part of their

NCBI SOFTWARE DEVELOPEMENT TOOLKIT.

8/12/2019 Multialign New

http://slidepdf.com/reader/full/multialign-new 18/24

ClustalX

8/12/2019 Multialign New

http://slidepdf.com/reader/full/multialign-new 19/24

ClustalX

8/12/2019 Multialign New

http://slidepdf.com/reader/full/multialign-new 20/24

ClustalX

8/12/2019 Multialign New

http://slidepdf.com/reader/full/multialign-new 21/24

ClustalX

8/12/2019 Multialign New

http://slidepdf.com/reader/full/multialign-new 22/24

ClustalX

8/12/2019 Multialign New

http://slidepdf.com/reader/full/multialign-new 23/24

ClustalX

8/12/2019 Multialign New

http://slidepdf.com/reader/full/multialign-new 24/24

THANK YOU