Multiple Sequence Alignment - Lecture 01
Transcript of Multiple Sequence Alignment - Lecture 01
MULTIPLE SEQUENCEALIGNMENT
Bionformatics of Cyanobacteria and PlantsApril 2007
University of Turku
Multiple Sequence Alignment
2/16
How Multiple Sequence AlignmentIs Used
• Prediction of structural similarities inunknown proteins based on knownproteins in the alignment
• Searches for sequences for an unknowngenome, e.g., for the design of aspecific DNA probe
• Automated assembly of large sequencesin genome sequencing or mRNAsequences from ESTs
Bionformatics of Cyanobacteria and PlantsApril 2007
University of Turku
Multiple Sequence Alignment
3/16
Phylogenetic Analyses May Start WithMultiple Sequence Alignments
seqA N • F L SseqB N • F – SseqC N K Y L SseqD N • Y L S
+K -L
Y to F
N Y L S N K Y L S N F S N F L S
Bionformatics of Cyanobacteria and PlantsApril 2007
University of Turku
Multiple Sequence Alignment
4/16
Multiple Sequence AlignmentsCan Be Global or Local
• In global msa, pair-wise aligmentis extended to include more(closely) related sequences
• Conserved domains in DNA orprotein sequences are found usinglocal msa methods
Bionformatics of Cyanobacteria and PlantsApril 2007
University of Turku
Multiple Sequence Alignment
5/16
Global MSA Is Done UsingApproximate Methods
• Progressive global alignment starts withaligning most-alike sequences
• Iterative methods globally align groupsof sequences and revise the result
• Alignments based on locally conservedpatterns found in the same order
• Statistical methods generatingprobalistic models
• Graph-based methods
Bionformatics of Cyanobacteria and PlantsApril 2007
University of Turku
Multiple Sequence Alignment
6/16
The Dynamic ProgrammingAlgorithm Limits the MSA
• Number of comparisons is the productof the lengths of sequences– Three protein sequences of each 300 amino
acids means 3003 comparisons
• Supercomputers may handle tensequences of each up to 1000 aminoacids
• Sum-of-pairs method is used– Sums the scores of all pair-wise alignments
Bionformatics of Cyanobacteria and PlantsApril 2007
University of Turku
Multiple Sequence Alignment
7/16
CLUSTALW Uses a ProgressiveMethod for Global MSA
• Pair-wise alignments of all sequencepairs using, e.g., FASTA
• Alignments scores are used to producea phylogenetic tree (calculated usingthe neighbor-joining method [Saitouand Nei, 1987])
• Sequences are sequentially alignedusing dynamic programming in theorder of their relationship on the tree
Bionformatics of Cyanobacteria and PlantsApril 2007
University of Turku
Multiple Sequence Alignment
8/16
Alignment Scores Are Weighted
• Scores for aligning similar sequencesare given a small weight, so they havea smaller effect on the msa
• Scores for aligning less similarsequences are given a larger weight
• Weighting more accurately reflectssequence changes in the phylogenetictree
Bionformatics of Cyanobacteria and PlantsApril 2007
University of Turku
Multiple Sequence Alignment
9/16
Gaps are placed betweenconserved domains
• When aligning structural relatedproteins gaps were found preferentiallybetween secondary structural elements(Pascarella and Argos, 1992)
• A table was prepared giving theobserved frequency of gaps next toeach amino acid in these regions
• CLUSTALW tries to locate thecorresponding domains by appropriategap placement
Bionformatics of Cyanobacteria and PlantsApril 2007
University of Turku
Multiple Sequence Alignment
10/16
CLUSTALW at EBI
Bionformatics of Cyanobacteria and PlantsApril 2007
University of Turku
Multiple Sequence Alignment
11/16
Bionformatics of Cyanobacteria and PlantsApril 2007
University of Turku
Multiple Sequence Alignment
12/16
Bionformatics of Cyanobacteria and PlantsApril 2007
University of Turku
Multiple Sequence Alignment
13/16
Bionformatics of Cyanobacteria and PlantsApril 2007
University of Turku
Multiple Sequence Alignment
14/16
Bionformatics of Cyanobacteria and PlantsApril 2007
University of Turku
Multiple Sequence Alignment
15/16
MSAs Can Be Viewed and EditedUsing Jalview
Bionformatics of Cyanobacteria and PlantsApril 2007
University of Turku
Multiple Sequence Alignment
16/16
The MUSCLE Algorithm Provides aFaster and Refined Alignment