Multiple Sequence Alignment
description
Transcript of Multiple Sequence Alignment
Multiple Sequence Multiple Sequence AlignmentAlignment
ClustalWClustalWTCoffeeTCoffee
Ka, Ks, and Ka/KsKa, Ks, and Ka/KsAnchored alignmentAnchored alignment
11
ClustalWClustalW
http://www.ebi.ac.uk/clustalw/ http://www.ebi.ac.uk/clustalw/
22
ClustalWClustalW
Paste your sequences
Multiple sequence Alignment alignment options
Submit
33
ExerciseExercise
HomoloGene is a system for automated HomoloGene is a system for automated detection of homologs among annotated detection of homologs among annotated genes of several completely sequenced genes of several completely sequenced eukaryotic genomes. eukaryotic genomes.
Download the FASTA sequences of Download the FASTA sequences of HomoloGene:5276 and align them with HomoloGene:5276 and align them with ClustalWClustalW
44
Download protein sequences
55
ResultResult
Alignment
Guide Tree
66
TCoffeeTCoffeehttp://tcoffee.crg.cat/http://tcoffee.crg.cat/
Tcoffee computes its alignments by combining a Tcoffee computes its alignments by combining a collection of smaller alignmentscollection of smaller alignments 77
Alignment at the DNA level based on an Alignment at the DNA level based on an alignment at the Protein Level alignment at the Protein Level
The 18-kDa protein plays an important role The 18-kDa protein plays an important role in fertilization of several abalone speciesin fertilization of several abalone species
Build a multiple sequence alignment using Build a multiple sequence alignment using the following sequences the following sequences
88
Sequences Sequences >gi|604533|gb|AAC37231.1| fertilization protein>gi|604533|gb|AAC37231.1| fertilization proteinMRSLVLLCVLLMAICAADKKTSVSKENEAAMKVAMMKFLDMKAGVFKEIIEDMGYPITPPQWTTLLYYNRMRSLVLLCVLLMAICAADKKTSVSKENEAAMKVAMMKFLDMKAGVFKEIIEDMGYPITPPQWTTLLYYNRERLIEFCRSFLALSKKIILLGGNKLNKANFARMGRILGWKSQWAVRQRQWGMVRVSRRHTSTAIAKRIVAERLIEFCRSFLALSKKIILLGGNKLNKANFARMGRILGWKSQWAVRQRQWGMVRVSRRHTSTAIAKRIVAMKVADLPCNMKVADLPCN
>gi|604531|gb|AAC37233.1| fertilization protein>gi|604531|gb|AAC37233.1| fertilization proteinMRFLLLLCVLMGAVSQAVCRKRPNVWGKIVVKEKNKAAMKIGFMEYLDAKLVKFKRHWLVGANWKLQKFEMRFLLLLCVLMGAVSQAVCRKRPNVWGKIVVKEKNKAAMKIGFMEYLDAKLVKFKRHWLVGANWKLQKFETDEMRYLAIKRLIKVCHGYTIWSQRLIMLKYRPLNEKYFKKVGRYLAWRNYLIVFRMWIGVLKKNLKRSETDEMRYLAIKRLIKVCHGYTIWSQRLIMLKYRPLNEKYFKKVGRYLAWRNYLIVFRMWIGVLKKNLKRSEITKPMQKLLDTKDGELPCPVRKIHGITKPMQKLLDTKDGELPCPVRKIHG
>gi|604529|gb|AAC37232.1| fertilization protein>gi|604529|gb|AAC37232.1| fertilization proteinMRSLVLLCVLMAVGCVAFDDVVVSRQEQSYVQRGMVNFLDEEMHKLVKRFRDMRWNLGPGFVFLLKKVNRMRSLVLLCVLMAVGCVAFDDVVVSRQEQSYVQRGMVNFLDEEMHKLVKRFRDMRWNLGPGFVFLLKKVNRERMMRYCMDYARYSKKILQLKHLPVNKKTLTKMGRFVGYRNYGVIRELYADVFRDVQGFRGPKMTAAMRKERMMRYCMDYARYSKKILQLKHLPVNKKTLTKMGRFVGYRNYGVIRELYADVFRDVQGFRGPKMTAAMRKYSSKDPGTFPCKNEKRRGYSSKDPGTFPCKNEKRRG
>gi|604527|gb|AAC37230.1| fertilization protein>gi|604527|gb|AAC37230.1| fertilization proteinMRSLVLLCVLLMAICAADKKTTVSKENAAAMKIAMIKFLDARAGKFKKRVENMGYPITPPQWTTLLYYNRMRSLVLLCVLLMAICAADKKTTVSKENAAAMKIAMIKFLDARAGKFKKRVENMGYPITPPQWTTLLYYNRQRLMEWCHTYVEFSKKIILMGGNKLNKKNFTRMGRIIGWKNQWVLKRRQWEMVRVMRRYKSTAIAKKIVAQRLMEWCHTYVEFSKKIILMGGNKLNKKNFTRMGRIIGWKNQWVLKRRQWEMVRVMRRYKSTAIAKKIVAMKVADLPCNMKVADLPCN
>gi|604525|gb|AAC37229.1| fertilization protein>gi|604525|gb|AAC37229.1| fertilization proteinMRSLVLLCVLLMAICAADKKSTVSKENAAAMKVAMIKFLDSRTDRFKKRIEKIGYPITPPQYTTLLYYNRMRSLVLLCVLLMAICAADKKSTVSKENAAAMKVAMIKFLDSRTDRFKKRIEKIGYPITPPQYTTLLYYNRERLMDWCHNYVEVSKKIILLGGNKLNKKNFARMGRIIGWKNQWILKRRQWHMVRVMRRYKASAIAKKIVAERLMDWCHNYVEVSKKIILLGGNKLNKKNFARMGRIIGWKNQWILKRRQWHMVRVMRRYKASAIAKKIVAMKVADLPCNMKVADLPCN
99
Choose TCoffee Regular, paste the sequences in the data box, and press submit
1010
Download formats
Guide tree
1111
Codon AlignmentCodon Alignment
In order to study selection patterns, you In order to study selection patterns, you will need to have the corresponding DNA will need to have the corresponding DNA alignment alignment
Using the PROTOGENE (Protein-to-Using the PROTOGENE (Protein-to-Gene) in Tcoffee, the amino-acid Gene) in Tcoffee, the amino-acid alignment will be transformed into a codon alignment will be transformed into a codon alignment. The actual procedure invloves alignment. The actual procedure invloves tBLASTn.tBLASTn.
1212
•PROTOGENE (in Tcoffee) is time PROTOGENE (in Tcoffee) is time consuming. Please submit your email consuming. Please submit your email address, and the results will be emailed to address, and the results will be emailed to you.you.•PROTOGENE may return more that one PROTOGENE may return more that one DNA sequence for any given Protein DNA sequence for any given Protein sequence. For your homework assignment, sequence. For your homework assignment, please choose one sequence for each please choose one sequence for each species.species.
1313
(Result) Codon alignment(Result) Codon alignment>gi|604533|gb|AAC37231.1|_G_L36554 _S_ AAC37231 _DESC_ fertilization protein MATCHES_ON Haliotis assimilis fertilization protein mRNA, complete cds>gi|604533|gb|AAC37231.1|_G_L36554 _S_ AAC37231 _DESC_ fertilization protein MATCHES_ON Haliotis assimilis fertilization protein mRNA, complete cdsATGAGGTCTTTGGTGCTTCTCTGTGTTTTGCTGATGGCAATATGTGCGGCGGAC------ATGAGGTCTTTGGTGCTTCTCTGTGTTTTGCTGATGGCAATATGTGCGGCGGAC------------------------AAAAAAACCTCGGTCTCGAAGGAAAATGAAGCCGCAATGAAG------------------AAAAAAACCTCGGTCTCGAAGGAAAATGAAGCCGCAATGAAGGTAGCGATGATGAAGTTTTTGGATATGAAGGCGGGTGTATTCAAAGAAATC---ATTGAGGTAGCGATGATGAAGTTTTTGGATATGAAGGCGGGTGTATTCAAAGAAATC---ATTGAGGATATGGGATATCCAATAACCCCTCCGCAATGGACAACTCTACTGTACTACAACAGAGAGGATATGGGATATCCAATAACCCCTCCGCAATGGACAACTCTACTGTACTACAACAGAGAGAGATTGATTGAATTTTGCCGTTCCTTCCTTGCATTGTCCAAAAAGATTATATTGCTGGGAAGATTGATTGAATTTTGCCGTTCCTTCCTTGCATTGTCCAAAAAGATTATATTGCTGGGAGGTAACAAATTAAATAAGGCGAACTTCGCTAGGATGGGTCGAATCCTTGGCTGGAAAAGCGGTAACAAATTAAATAAGGCGAACTTCGCTAGGATGGGTCGAATCCTTGGCTGGAAAAGCCAGTGGGCTGTGAGACAGAGGCAATGGGGGATGGTCAGA---------GTGTCGAGGCGCCAGTGGGCTGTGAGACAGAGGCAATGGGGGATGGTCAGA---------GTGTCGAGGCGCCATACAAGTACTGCAATAGCTAAAAGGATCGTCGCCATGAAAGTTGCTGACCTACCCTGTCATACAAGTACTGCAATAGCTAAAAGGATCGTCGCCATGAAAGTTGCTGACCTACCCTGTAAC------------------TAGAAC------------------TAG>gi|604531|gb|AAC37233.1|_G_L36590 _S_ AAC37233 _DESC_ fertilization protein MATCHES_ON Haliotis corrugata fertilization protein mRNA, complete cds>gi|604531|gb|AAC37233.1|_G_L36590 _S_ AAC37233 _DESC_ fertilization protein MATCHES_ON Haliotis corrugata fertilization protein mRNA, complete cdsATGAGGTTTTTGCTGCTTCTCTGTGTTTTGATGGGGGCAGTATCTCAGGCAGTATGCAGAATGAGGTTTTTGCTGCTTCTCTGTGTTTTGATGGGGGCAGTATCTCAGGCAGTATGCAGAAAAAGACCTAATGTCTGGGGGAAAATCGTGGTCAAGGAGAAAAATAAAGCCGCAATGAAGAAAAGACCTAATGTCTGGGGGAAAATCGTGGTCAAGGAGAAAAATAAAGCCGCAATGAAGATAGGGTTTATGGAATATTTGGATGCAAAGTTGGTAAAGTTTAAAAGGCACTGGCTTGTTATAGGGTTTATGGAATATTTGGATGCAAAGTTGGTAAAGTTTAAAAGGCACTGGCTTGTTGGAGCCAATTGGAAACTTCAAAAATTTGAAACGGATGAAATGAGATACCTCGCCATAAAGGGAGCCAATTGGAAACTTCAAAAATTTGAAACGGATGAAATGAGATACCTCGCCATAAAGAGACTGATAAAAGTTTGCCATGGATACACTATTTGGTCCCAACGACTAATAATGTTAAAAAGACTGATAAAAGTTTGCCATGGATACACTATTTGGTCCCAACGACTAATAATGTTAAAATATCGACCATTGAATGAGAAATACTTCAAAAAGGTGGGTCGATACCTTGCCTGGCGAAACTATCGACCATTGAATGAGAAATACTTCAAAAAGGTGGGTCGATACCTTGCCTGGCGAAACTACCTCATAGTTTTTCGGATGTGGATCGGCGTTTTG------AAGAAAAATCTTAAAAGATACCTCATAGTTTTTCGGATGTGGATCGGCGTTTTG------AAGAAAAATCTTAAAAGATCGGAAATAACGAAACCCATGCAAAAACTCCTCGACACAAAGGATGGTGAGTTGCCCTGCTCGGAAATAACGAAACCCATGCAAAAACTCCTCGACACAAAGGATGGTGAGTTGCCCTGCCCTGTTAGAAAGATACATGGATAACCTGTTAGAAAGATACATGGATAA>gi|604529|gb|AAC37232.1|_G_L36589 _S_ AAC37232 _DESC_ fertilization protein MATCHES_ON Haliotis fulgens fertilization protein mRNA, complete cds>gi|604529|gb|AAC37232.1|_G_L36589 _S_ AAC37232 _DESC_ fertilization protein MATCHES_ON Haliotis fulgens fertilization protein mRNA, complete cdsATGAGGTCTTTGGTGCTTCTCTGTGTTTTGATGGCGGTAGGATGTGTGGCGTTT------ATGAGGTCTTTGGTGCTTCTCTGTGTTTTGATGGCGGTAGGATGTGTGGCGTTT------------------------GATGATGTGGTGGTCTCAAGGCAAGAGCAATCTTATGTGCAG------------------GATGATGTGGTGGTCTCAAGGCAAGAGCAATCTTATGTGCAGAGAGGGATGGTCAACTTTTTGGATGAAGAAATGCATAAACTGGTTAAACGG---TTTAGAAGAGGGATGGTCAACTTTTTGGATGAAGAAATGCATAAACTGGTTAAACGG---TTTAGAGATATGCGATGGAATTTAGGGCCAGGCTTTGTATTCCTTCTAAAAAAAGTCAACAGAGAGGATATGCGATGGAATTTAGGGCCAGGCTTTGTATTCCTTCTAAAAAAAGTCAACAGAGAGAGAATGATGCGCTACTGCATGGATTACGCCAGATATTCCAAAAAGATTTTACAGCTAAAAAGAATGATGCGCTACTGCATGGATTACGCCAGATATTCCAAAAAGATTTTACAGCTAAAACATCTTCCAGTAAATAAGAAGACCCTCACTAAAATGGGTAGATTCGTTGGATATCGAAACCATCTTCCAGTAAATAAGAAGACCCTCACTAAAATGGGTAGATTCGTTGGATATCGAAACTATGGGGTCATCAGGGAGTTGTACGCCGACGTATTCAGAGACGTTCAAGGATTTAGGGGGTATGGGGTCATCAGGGAGTTGTACGCCGACGTATTCAGAGACGTTCAAGGATTTAGGGGGCCTAAAATGACTGCAGCCATGAGGAAGTACAGCAGCAAGGATCCTGGTACATTTCCTTGCCCTAAAATGACTGCAGCCATGAGGAAGTACAGCAGCAAGGATCCTGGTACATTTCCTTGCAAGAACGAGAAACGCCGCGGATGAAAGAACGAGAAACGCCGCGGATGA>gi|604527|gb|AAC37230.1|_G_L36553 _S_ AAC37230 _DESC_ fertilization protein MATCHES_ON Haliotis sorenseni fertilization protein mRNA, complete cds>gi|604527|gb|AAC37230.1|_G_L36553 _S_ AAC37230 _DESC_ fertilization protein MATCHES_ON Haliotis sorenseni fertilization protein mRNA, complete cdsATGAGGTCTTTGGTGCTTCTCTGTGTTTTGCTGATGGCAATATGTGCGGCGGAC------ATGAGGTCTTTGGTGCTTCTCTGTGTTTTGCTGATGGCAATATGTGCGGCGGAC------------------------AAAAAAACCACGGTCTCGAAGGAAAATGCAGCCGCAATGAAG------------------AAAAAAACCACGGTCTCGAAGGAAAATGCAGCCGCAATGAAGATAGCTATGATAAAGTTTTTGGATGCGAGGGCGGGTAAATTCAAAAAACGC---GTTGAGATAGCTATGATAAAGTTTTTGGATGCGAGGGCGGGTAAATTCAAAAAACGC---GTTGAGAATATGGGATATCCAATAACCCCTCCGCAATGGACAACTCTACTATACTACAACAGACAGAATATGGGATATCCAATAACCCCTCCGCAATGGACAACTCTACTATACTACAACAGACAGAGATTGATGGAATGGTGCCATACCTACGTTGAATTTTCCAAAAAGATTATATTGATGGGAAGATTGATGGAATGGTGCCATACCTACGTTGAATTTTCCAAAAAGATTATATTGATGGGAGGTAACAAATTAAATAAGAAGAACTTCACTAGGATGGGTCGAATCATTGGCTGGAAAAACGGTAACAAATTAAATAAGAAGAACTTCACTAGGATGGGTCGAATCATTGGCTGGAAAAACCAGTGGGTTTTGAAAAGGAGGCAATGGGAGATGGTCAGA---------GTGATGAGGCGCCAGTGGGTTTTGAAAAGGAGGCAATGGGAGATGGTCAGA---------GTGATGAGGCGCTATAAAAGTACTGCAATAGCTAAAAAGATCGTCGCCATGAAAGTTGCTGACCTACCCTGTTATAAAAGTACTGCAATAGCTAAAAAGATCGTCGCCATGAAAGTTGCTGACCTACCCTGTAAC------------------TAGAAC------------------TAG>gi|604525|gb|AAC37229.1|_G_L36552 _S_ AAC37229 _DESC_ fertilization protein MATCHES_ON Haliotis rufescens fertilization protein mRNA, complete cds>gi|604525|gb|AAC37229.1|_G_L36552 _S_ AAC37229 _DESC_ fertilization protein MATCHES_ON Haliotis rufescens fertilization protein mRNA, complete cdsATGAGGTCTTTGGTGCTTCTCTGTGTTTTGCTGATGGCAATATGTGCGGCGGAC------ATGAGGTCTTTGGTGCTTCTCTGTGTTTTGCTGATGGCAATATGTGCGGCGGAC------------------------AAAAAATCCACGGTCTCGAAGGAAAATGCAGCCGCAATGAAG------------------AAAAAATCCACGGTCTCGAAGGAAAATGCAGCCGCAATGAAGGTAGCGATGATAAAGTTTTTGGATTCGAGGACGGATAGATTCAAAAAACGC---ATTGAGGTAGCGATGATAAAGTTTTTGGATTCGAGGACGGATAGATTCAAAAAACGC---ATTGAGAAGATTGGATATCCAATAACCCCTCCGCAATATACAACTCTACTATACTACAACAGAGAGAAGATTGGATATCCAATAACCCCTCCGCAATATACAACTCTACTATACTACAACAGAGAGAGATTGATGGATTGGTGCCATAACTACGTTGAAGTATCCAAAAAGATTATATTGTTGGGAAGATTGATGGATTGGTGCCATAACTACGTTGAAGTATCCAAAAAGATTATATTGTTGGGAGGTAACAAATTAAATAAGAAGAACTTCGCTAGGATGGGTCGAATCATTGGCTGGAAAAACGGTAACAAATTAAATAAGAAGAACTTCGCTAGGATGGGTCGAATCATTGGCTGGAAAAACCAGTGGATTTTGAAAAGGAGGCAATGGCACATGGTCAGA---------GTGATGAGGCGCCAGTGGATTTTGAAAAGGAGGCAATGGCACATGGTCAGA---------GTGATGAGGCGCTATAAAGCTTCTGCAATAGCTAAAAAGATCGTCGCCATGAAAGTTGCTGACCTACCCTGTTATAAAGCTTCTGCAATAGCTAAAAAGATCGTCGCCATGAAAGTTGCTGACCTACCCTGTAAC------------------TAGAAC------------------TAG
1414
SNAP - Ds/Dn Calculation Tool SNAP - Ds/Dn Calculation Tool http://hcv.lanl.gov/content/sequence/SNAP/SNAP.htmlhttp://hcv.lanl.gov/content/sequence/SNAP/SNAP.html
Calculates synonymous and nonsynonymous Calculates synonymous and nonsynonymous substitution rates based on codon alignments substitution rates based on codon alignments according to Nei and Gojobori (1986) method.according to Nei and Gojobori (1986) method.
1515
Input codon alignment
Select output statistics
1616
SNAP - Ds/Dn Calculation Tool SNAP - Ds/Dn Calculation Tool
Conclusion: We detect positive selection in six of the Conclusion: We detect positive selection in six of the comparisons. So did Swanson and Vacquier (1998).comparisons. So did Swanson and Vacquier (1998).
1717
Distmat calculates the evolutionary distances between every pair of sequences in a multiple alignment.
The distances are expressed in terms of the number per 100 nucleotides or number of replacements per 100 amino acids
Distmathttp://emboss.bioinformatics.nl/cgi-bin/emboss/distmat
1818
Feed the DNA alignment of 18-kDa protein Feed the DNA alignment of 18-kDa protein into distmat.into distmat.
Calculate separately the distances Calculate separately the distances between the sequences for codon between the sequences for codon positions 1 and 2, and for codon position positions 1 and 2, and for codon position 3.3.
Are the results in agreement with those Are the results in agreement with those from the dn/ds analysis?from the dn/ds analysis?
Distmat
1919
Distmat
Distmat
http://dialign.gobics.de/anchor/manual
http://dialign.gobics.de/anchor/submission.php
User manual:
2222
Anchored multiple-sequence alignment with DIALIGN
Align the following sequences (use the file Align the following sequences (use the file dalign_sequences.txt): dalign_sequences.txt):
>seq1 WKKNADAPKRAMTSFMKAAY >seq1 WKKNADAPKRAMTSFMKAAY >seq2 WNLDTNSPEEKQAYIQLAKDDRIRYD >seq2 WNLDTNSPEEKQAYIQLAKDDRIRYD >seq3 WRMDSNQKNPDSNNPKAAYNKGDANAPK>seq3 WRMDSNQKNPDSNNPKAAYNKGDANAPK
2323
Results Results DIALIGN makes alignments from DIALIGN makes alignments from fragmentsfragments
2424
Results Results
Numbers below the alignment reflect Numbers below the alignment reflect some rough degree of local similarity some rough degree of local similarity among the sequencesamong the sequences
2525
Anchored alignmentAnchored alignment Now, let us assume that the user has Now, let us assume that the user has
some expert knowledge concerning a some expert knowledge concerning a certain domain that is present in all the certain domain that is present in all the input sequencesinput sequences
The domains marked in red in the three The domains marked in red in the three sequences are thought to be homologous sequences are thought to be homologous to one anotherto one another
>seq1 WKKNADAPKRAMTSFMKAAY >seq2 WNLDTNSPEEKQAYIQLAKDDRIRYD >seq3 WRMDSNQKNPDSNNPKAAYNKGDANAPK
2626
Therefore, the user wants to define this Therefore, the user wants to define this domain as domain as anchoranchor and align the rest of the and align the rest of the sequences automatically.sequences automatically.
To specify a set of anchor points, each To specify a set of anchor points, each anchor point corresponds to a anchor point corresponds to a equal-equal-length segment pairlength segment pair involving two of the involving two of the input sequences should be definedinput sequences should be defined
2727
first sequence involved first sequence involved second sequence involvedsecond sequence involved start of anchor in first sequence start of anchor in first sequence start of anchor in second sequence start of anchor in second sequence length of anchor length of anchor
2828
Results Results
The specified domain is aligned and the The specified domain is aligned and the remainder of the sequences is aligned remainder of the sequences is aligned automatically respecting the constraints automatically respecting the constraints given by the anchor points: given by the anchor points:
2929
Guidance/HoT
>seq1 WKKNADAPKRAMTSFMKAAY >seq2WNLDTNSPEEKQAYIQLAKDDRIRYD >seq3WRMDSNQKNPDSNNPKAAYNKGDANAPK>seq4WRMDSNQKNPNNPKAAYNKGDANAPK