Multiple sequence alignment (MSA) Usean sekvenssin rinnastus Petri Törönen Help contributed by:...
-
Upload
collin-owens -
Category
Documents
-
view
214 -
download
0
Transcript of Multiple sequence alignment (MSA) Usean sekvenssin rinnastus Petri Törönen Help contributed by:...
![Page 1: Multiple sequence alignment (MSA) Usean sekvenssin rinnastus Petri Törönen Help contributed by: Liisa Holm & Ari Löytynoja.](https://reader036.fdocuments.net/reader036/viewer/2022070407/56649e3b5503460f94b2dbbe/html5/thumbnails/1.jpg)
Multiple sequence alignment(MSA)
Usean sekvenssin rinnastus
Petri TörönenHelp contributed by: Liisa Holm & Ari Löytynoja
![Page 2: Multiple sequence alignment (MSA) Usean sekvenssin rinnastus Petri Törönen Help contributed by: Liisa Holm & Ari Löytynoja.](https://reader036.fdocuments.net/reader036/viewer/2022070407/56649e3b5503460f94b2dbbe/html5/thumbnails/2.jpg)
What is MSA?• MSA is an alignment generated from three
or more sequences.
• MSA is usually a more global alignment, i.e., the aim is to align homologous residues (nucleotides or amino acids) in columns across the length of the whole sequences.
GA--GTACA
CAC-GTATA
CACGGTAT-
G-CGGTCTA
![Page 3: Multiple sequence alignment (MSA) Usean sekvenssin rinnastus Petri Törönen Help contributed by: Liisa Holm & Ari Löytynoja.](https://reader036.fdocuments.net/reader036/viewer/2022070407/56649e3b5503460f94b2dbbe/html5/thumbnails/3.jpg)
What is MSA?
Picture shows protein multiple sequence alignmenthttp://en.wikipedia.org/wiki/Multiple_sequence_alignment
![Page 4: Multiple sequence alignment (MSA) Usean sekvenssin rinnastus Petri Törönen Help contributed by: Liisa Holm & Ari Löytynoja.](https://reader036.fdocuments.net/reader036/viewer/2022070407/56649e3b5503460f94b2dbbe/html5/thumbnails/4.jpg)
Why MSA• ”MSA emphasises signal observed in the
pairwise alignment” (Liisa Holm)
• Improved alignments!!
• Alignment of more distant sequences with the help from intermediate sequences
• Highlight the conserved regions in sequences
http://ekhidna.biocenter.helsinki.fi/users/petri/public/opetus_jutut/Bioinf_Per_Lects/urease_output.txt
![Page 5: Multiple sequence alignment (MSA) Usean sekvenssin rinnastus Petri Törönen Help contributed by: Liisa Holm & Ari Löytynoja.](https://reader036.fdocuments.net/reader036/viewer/2022070407/56649e3b5503460f94b2dbbe/html5/thumbnails/5.jpg)
Why MSAMSA is input to many analysis tasks:
•Detection of active site
•Generation sequence profiles
•Detection of protein domains and motifs
•Phylogenetics
…
![Page 6: Multiple sequence alignment (MSA) Usean sekvenssin rinnastus Petri Törönen Help contributed by: Liisa Holm & Ari Löytynoja.](https://reader036.fdocuments.net/reader036/viewer/2022070407/56649e3b5503460f94b2dbbe/html5/thumbnails/6.jpg)
Remember• First step of MSA:
• Good selection of sequences to the analysis
• Sequences need to be functionally/evolutionarily related
• Sometimes it is good to have some variation in the sequences (depends on the analysis task)
• Alternative: Rubbish in → Rubbish out
![Page 7: Multiple sequence alignment (MSA) Usean sekvenssin rinnastus Petri Törönen Help contributed by: Liisa Holm & Ari Löytynoja.](https://reader036.fdocuments.net/reader036/viewer/2022070407/56649e3b5503460f94b2dbbe/html5/thumbnails/7.jpg)
MSA methods
• Finding optimal multiple sequence alignment is computationally hard task
• “Correct” answer would always come by extending dynamic algorithm to multiple sequences
• In practice dynamic algorithm cannot be applied to MSA problems
• We need approximate solutions (heuristics)
http://en.wikipedia.org/wiki/Multiple_sequence_alignment#Dynamic_programming_and_computational_complexity
![Page 8: Multiple sequence alignment (MSA) Usean sekvenssin rinnastus Petri Törönen Help contributed by: Liisa Holm & Ari Löytynoja.](https://reader036.fdocuments.net/reader036/viewer/2022070407/56649e3b5503460f94b2dbbe/html5/thumbnails/8.jpg)
MSA methods: heuristics
• Progressive Alignment (not much used)
• Iterative Alignment (most popular)
• Hidden Markov Models
• Pattern Based methods
![Page 9: Multiple sequence alignment (MSA) Usean sekvenssin rinnastus Petri Törönen Help contributed by: Liisa Holm & Ari Löytynoja.](https://reader036.fdocuments.net/reader036/viewer/2022070407/56649e3b5503460f94b2dbbe/html5/thumbnails/9.jpg)
Progressive alignment
• Divide unsolvable task into subtasks that can be solved
• Align first most similar pairs of sets of sequences– Sequence sets can have 1 or many sequences– First the sets include only single sequences
• Move progressively to more bigger sets and to more difficult pairs of sets
• Always align only two pairs of sets at the time
![Page 10: Multiple sequence alignment (MSA) Usean sekvenssin rinnastus Petri Törönen Help contributed by: Liisa Holm & Ari Löytynoja.](https://reader036.fdocuments.net/reader036/viewer/2022070407/56649e3b5503460f94b2dbbe/html5/thumbnails/10.jpg)
Progressive alignment
• Produce pairwise alignments between all the sequences you want to align with MSA.– Dynamic programming, ktup-methods..
• Produce a “guide tree” on the basis of the pairwise distances calculated from pairwise alignments– UPGMA, neighbor joining
• Produce an MSA using the “guide tree”.– Sequences are aligned in the same order as the
guide tree instructs.
![Page 11: Multiple sequence alignment (MSA) Usean sekvenssin rinnastus Petri Törönen Help contributed by: Liisa Holm & Ari Löytynoja.](https://reader036.fdocuments.net/reader036/viewer/2022070407/56649e3b5503460f94b2dbbe/html5/thumbnails/11.jpg)
Set of sequences All against all pairwise alignment Here demonstrated for 1. sequence
Get pairwise similarities from alignmentsCreate a cluster tree from similarities Join sequences in the order obtained
From the cluster tree
![Page 12: Multiple sequence alignment (MSA) Usean sekvenssin rinnastus Petri Törönen Help contributed by: Liisa Holm & Ari Löytynoja.](https://reader036.fdocuments.net/reader036/viewer/2022070407/56649e3b5503460f94b2dbbe/html5/thumbnails/12.jpg)
Guide tree construction: UPGMA
• Unweighted Pair Group Method with Arithmetic mean
• One of the fastest tree construction methods
![Page 13: Multiple sequence alignment (MSA) Usean sekvenssin rinnastus Petri Törönen Help contributed by: Liisa Holm & Ari Löytynoja.](https://reader036.fdocuments.net/reader036/viewer/2022070407/56649e3b5503460f94b2dbbe/html5/thumbnails/13.jpg)
An example: Pairwise alignments
![Page 14: Multiple sequence alignment (MSA) Usean sekvenssin rinnastus Petri Törönen Help contributed by: Liisa Holm & Ari Löytynoja.](https://reader036.fdocuments.net/reader036/viewer/2022070407/56649e3b5503460f94b2dbbe/html5/thumbnails/14.jpg)
Pairwise distances, based on pairwise alignments
Number of nucleotide differences
Absolute distances, used in Pileup/
Clustal
JC-distance
![Page 15: Multiple sequence alignment (MSA) Usean sekvenssin rinnastus Petri Törönen Help contributed by: Liisa Holm & Ari Löytynoja.](https://reader036.fdocuments.net/reader036/viewer/2022070407/56649e3b5503460f94b2dbbe/html5/thumbnails/15.jpg)
UPGMA based on JC-distances*
0,107 / 2
JC-distances = Jukes-Cantor distances. The observed distances, D, are corrected for multiple substitutions via correction function –(3/4)*ln(1-(4/3)D)
![Page 16: Multiple sequence alignment (MSA) Usean sekvenssin rinnastus Petri Törönen Help contributed by: Liisa Holm & Ari Löytynoja.](https://reader036.fdocuments.net/reader036/viewer/2022070407/56649e3b5503460f94b2dbbe/html5/thumbnails/16.jpg)
UPGMA, distance updatesd(human,chimp),gorilla = [d(human, gorilla) + d(chimp, gorilla)] / 2 =
[0,383 + 0,232] / 2 = 0,3075
![Page 17: Multiple sequence alignment (MSA) Usean sekvenssin rinnastus Petri Törönen Help contributed by: Liisa Holm & Ari Löytynoja.](https://reader036.fdocuments.net/reader036/viewer/2022070407/56649e3b5503460f94b2dbbe/html5/thumbnails/17.jpg)
UPGMA
![Page 18: Multiple sequence alignment (MSA) Usean sekvenssin rinnastus Petri Törönen Help contributed by: Liisa Holm & Ari Löytynoja.](https://reader036.fdocuments.net/reader036/viewer/2022070407/56649e3b5503460f94b2dbbe/html5/thumbnails/18.jpg)
UPGMA
![Page 19: Multiple sequence alignment (MSA) Usean sekvenssin rinnastus Petri Törönen Help contributed by: Liisa Holm & Ari Löytynoja.](https://reader036.fdocuments.net/reader036/viewer/2022070407/56649e3b5503460f94b2dbbe/html5/thumbnails/19.jpg)
UPGMA
U
d(human & chimp),U =
0,3923/2 = 0,1962
d(gorilla & orangutan),U
= 0,3923/2 = 0,1962
0,1962 - 0,0537 = 0,1426
0,1962 - 0,116 = 0,080
![Page 20: Multiple sequence alignment (MSA) Usean sekvenssin rinnastus Petri Törönen Help contributed by: Liisa Holm & Ari Löytynoja.](https://reader036.fdocuments.net/reader036/viewer/2022070407/56649e3b5503460f94b2dbbe/html5/thumbnails/20.jpg)
UPGMA
0.7083 / 2
0,3541 - 0,1426 - 0,0537
0,3541 - 0,080 - 0,116or
![Page 21: Multiple sequence alignment (MSA) Usean sekvenssin rinnastus Petri Törönen Help contributed by: Liisa Holm & Ari Löytynoja.](https://reader036.fdocuments.net/reader036/viewer/2022070407/56649e3b5503460f94b2dbbe/html5/thumbnails/21.jpg)
Constructing MSA
human ACGTACGTCCchimp ACCTACGTCCgorilla ACCACCGTCCorangutan ACCCCCCTCCmaqaque CCCCCCCCCC
human ACGTACGTCCchimp ACCTACGTCC
gorilla ACCACCGTCCorangutan ACCCCCCTCC
human ACGTACGTCC
chimp ACCTACGTCC
gorilla ACCACCGTCC
orangutan ACCCCCCTCC
![Page 22: Multiple sequence alignment (MSA) Usean sekvenssin rinnastus Petri Törönen Help contributed by: Liisa Holm & Ari Löytynoja.](https://reader036.fdocuments.net/reader036/viewer/2022070407/56649e3b5503460f94b2dbbe/html5/thumbnails/22.jpg)
Alignment score• 1234• ACGT match=1• ACGA mismatch=0• AGGA
• 1: A-A + A-A + A-A = 1+1+1 = 3
• 2: C-C + C-G + C-G =1+0+0 = 1
• 3: G-G + G-G + G-G = 1+1+1 = 3
• 4: T-A + T-A + A-A = 0+0+1 =1
• S(alignment) = S(1) + S(2) + S(3) + S(4) = 3+1+3+1 = 8
• The higher the score, the better the alignment
![Page 23: Multiple sequence alignment (MSA) Usean sekvenssin rinnastus Petri Törönen Help contributed by: Liisa Holm & Ari Löytynoja.](https://reader036.fdocuments.net/reader036/viewer/2022070407/56649e3b5503460f94b2dbbe/html5/thumbnails/23.jpg)
Progressive alignment - pros and cons
• Pros:– Fast
• Cons:– Once gaps are opened they can never be closed– Errors in the alignment of the first few
sequences can have catastrophic effects on the whole alignment
– Not much used (to my knowledge)
![Page 24: Multiple sequence alignment (MSA) Usean sekvenssin rinnastus Petri Törönen Help contributed by: Liisa Holm & Ari Löytynoja.](https://reader036.fdocuments.net/reader036/viewer/2022070407/56649e3b5503460f94b2dbbe/html5/thumbnails/24.jpg)
Iterative alignment
• Create a progressive alignment
• After obtaining the alignment calculate a quality score
• REPEAT THE FOLLOWING STEPS:– Redo the cluster tree– Realign the sequences using the new cluster
tree– Calculate a quality score
• Loop above can be stopped when a maximum number is reached or when quality score is not improved
![Page 25: Multiple sequence alignment (MSA) Usean sekvenssin rinnastus Petri Törönen Help contributed by: Liisa Holm & Ari Löytynoja.](https://reader036.fdocuments.net/reader036/viewer/2022070407/56649e3b5503460f94b2dbbe/html5/thumbnails/25.jpg)
Iterative alignment
• Allows correction of errors that was not possible in progressive alignment
• Very popular among the MSA methods
• Increases the running time of the method
![Page 26: Multiple sequence alignment (MSA) Usean sekvenssin rinnastus Petri Törönen Help contributed by: Liisa Holm & Ari Löytynoja.](https://reader036.fdocuments.net/reader036/viewer/2022070407/56649e3b5503460f94b2dbbe/html5/thumbnails/26.jpg)
Diagram of typical iterative MSA program workflow. Figure from Do & Katoh 2008 http://ai.stanford.edu/~chuongdo/papers/alignment_review.pdf
Iterative alignment
Iteration loop
![Page 27: Multiple sequence alignment (MSA) Usean sekvenssin rinnastus Petri Törönen Help contributed by: Liisa Holm & Ari Löytynoja.](https://reader036.fdocuments.net/reader036/viewer/2022070407/56649e3b5503460f94b2dbbe/html5/thumbnails/27.jpg)
What MSA program(s) to use?• Depends on the application
– Phylogenetic studies– Structure based studies
• Depends on the size of the data– Some programs cannot handle large dataset
• Remember to evaluate the alignment by eye
![Page 28: Multiple sequence alignment (MSA) Usean sekvenssin rinnastus Petri Törönen Help contributed by: Liisa Holm & Ari Löytynoja.](https://reader036.fdocuments.net/reader036/viewer/2022070407/56649e3b5503460f94b2dbbe/html5/thumbnails/28.jpg)
What MSA program(s) to use?
• Collection of MSA programs at EBI
• http://www.ebi.ac.uk/Tools/msa/
![Page 29: Multiple sequence alignment (MSA) Usean sekvenssin rinnastus Petri Törönen Help contributed by: Liisa Holm & Ari Löytynoja.](https://reader036.fdocuments.net/reader036/viewer/2022070407/56649e3b5503460f94b2dbbe/html5/thumbnails/29.jpg)
Summary of MSA
• MSA is relevant for many analysis tasks– Improved signal from the alignment
• Solving MSA requires heuristics
• Selection of MSA methods depends on the application
• Results should be evaluated by eye– And the errors should be corrected with MSA
editors
![Page 30: Multiple sequence alignment (MSA) Usean sekvenssin rinnastus Petri Törönen Help contributed by: Liisa Holm & Ari Löytynoja.](https://reader036.fdocuments.net/reader036/viewer/2022070407/56649e3b5503460f94b2dbbe/html5/thumbnails/30.jpg)
Manual editing of MSAs?
• Let’s say that your performed an MSA witn computer. However, biologically, it has some faults - needs manual editing ->
• Editors: Jalview and Seaview http://www.csc.fi/english/research/sciences/bioscience/programs/index_html
• Input data can be in any of the most common MSA formats (Mase, Phylip, Clustal, MSF, Fasta, NEXUS, PIR and BCL)