Multiple Sequence Alignment - Lecture 01

16
MULTIPLE SEQUENCE ALIGNMENT

Transcript of Multiple Sequence Alignment - Lecture 01

Page 1: Multiple Sequence Alignment - Lecture 01

MULTIPLE SEQUENCEALIGNMENT

Page 2: Multiple Sequence Alignment - Lecture 01

Bionformatics of Cyanobacteria and PlantsApril 2007

University of Turku

Multiple Sequence Alignment

2/16

How Multiple Sequence AlignmentIs Used

• Prediction of structural similarities inunknown proteins based on knownproteins in the alignment

• Searches for sequences for an unknowngenome, e.g., for the design of aspecific DNA probe

• Automated assembly of large sequencesin genome sequencing or mRNAsequences from ESTs

Page 3: Multiple Sequence Alignment - Lecture 01

Bionformatics of Cyanobacteria and PlantsApril 2007

University of Turku

Multiple Sequence Alignment

3/16

Phylogenetic Analyses May Start WithMultiple Sequence Alignments

seqA N • F L SseqB N • F – SseqC N K Y L SseqD N • Y L S

+K -L

Y to F

N Y L S N K Y L S N F S N F L S

Page 4: Multiple Sequence Alignment - Lecture 01

Bionformatics of Cyanobacteria and PlantsApril 2007

University of Turku

Multiple Sequence Alignment

4/16

Multiple Sequence AlignmentsCan Be Global or Local

• In global msa, pair-wise aligmentis extended to include more(closely) related sequences

• Conserved domains in DNA orprotein sequences are found usinglocal msa methods

Page 5: Multiple Sequence Alignment - Lecture 01

Bionformatics of Cyanobacteria and PlantsApril 2007

University of Turku

Multiple Sequence Alignment

5/16

Global MSA Is Done UsingApproximate Methods

• Progressive global alignment starts withaligning most-alike sequences

• Iterative methods globally align groupsof sequences and revise the result

• Alignments based on locally conservedpatterns found in the same order

• Statistical methods generatingprobalistic models

• Graph-based methods

Page 6: Multiple Sequence Alignment - Lecture 01

Bionformatics of Cyanobacteria and PlantsApril 2007

University of Turku

Multiple Sequence Alignment

6/16

The Dynamic ProgrammingAlgorithm Limits the MSA

• Number of comparisons is the productof the lengths of sequences– Three protein sequences of each 300 amino

acids means 3003 comparisons

• Supercomputers may handle tensequences of each up to 1000 aminoacids

• Sum-of-pairs method is used– Sums the scores of all pair-wise alignments

Page 7: Multiple Sequence Alignment - Lecture 01

Bionformatics of Cyanobacteria and PlantsApril 2007

University of Turku

Multiple Sequence Alignment

7/16

CLUSTALW Uses a ProgressiveMethod for Global MSA

• Pair-wise alignments of all sequencepairs using, e.g., FASTA

• Alignments scores are used to producea phylogenetic tree (calculated usingthe neighbor-joining method [Saitouand Nei, 1987])

• Sequences are sequentially alignedusing dynamic programming in theorder of their relationship on the tree

Page 8: Multiple Sequence Alignment - Lecture 01

Bionformatics of Cyanobacteria and PlantsApril 2007

University of Turku

Multiple Sequence Alignment

8/16

Alignment Scores Are Weighted

• Scores for aligning similar sequencesare given a small weight, so they havea smaller effect on the msa

• Scores for aligning less similarsequences are given a larger weight

• Weighting more accurately reflectssequence changes in the phylogenetictree

Page 9: Multiple Sequence Alignment - Lecture 01

Bionformatics of Cyanobacteria and PlantsApril 2007

University of Turku

Multiple Sequence Alignment

9/16

Gaps are placed betweenconserved domains

• When aligning structural relatedproteins gaps were found preferentiallybetween secondary structural elements(Pascarella and Argos, 1992)

• A table was prepared giving theobserved frequency of gaps next toeach amino acid in these regions

• CLUSTALW tries to locate thecorresponding domains by appropriategap placement

Page 10: Multiple Sequence Alignment - Lecture 01

Bionformatics of Cyanobacteria and PlantsApril 2007

University of Turku

Multiple Sequence Alignment

10/16

CLUSTALW at EBI

Page 11: Multiple Sequence Alignment - Lecture 01

Bionformatics of Cyanobacteria and PlantsApril 2007

University of Turku

Multiple Sequence Alignment

11/16

Page 12: Multiple Sequence Alignment - Lecture 01

Bionformatics of Cyanobacteria and PlantsApril 2007

University of Turku

Multiple Sequence Alignment

12/16

Page 13: Multiple Sequence Alignment - Lecture 01

Bionformatics of Cyanobacteria and PlantsApril 2007

University of Turku

Multiple Sequence Alignment

13/16

Page 14: Multiple Sequence Alignment - Lecture 01

Bionformatics of Cyanobacteria and PlantsApril 2007

University of Turku

Multiple Sequence Alignment

14/16

Page 15: Multiple Sequence Alignment - Lecture 01

Bionformatics of Cyanobacteria and PlantsApril 2007

University of Turku

Multiple Sequence Alignment

15/16

MSAs Can Be Viewed and EditedUsing Jalview

Page 16: Multiple Sequence Alignment - Lecture 01

Bionformatics of Cyanobacteria and PlantsApril 2007

University of Turku

Multiple Sequence Alignment

16/16

The MUSCLE Algorithm Provides aFaster and Refined Alignment