Multiple Alignment and Multiple Alignment and Phylogenetic TreesPhylogenetic Trees
Csc 487/687 Computing for Csc 487/687 Computing for BioinformaticsBioinformatics
Multiple Sequence AlignmentMultiple Sequence Alignment
• One amino acid sequence plays coy; a pair of homologous sequences whisper; many aligned sequences shout out loud.
• Very informative
DefinitionDefinition
• A global alignment of a set of sequences is obtained by– inserting into each sequence gap characters
• so that– the resulting sequences are of the same
length
• and so that– no “column” has only gap characters
Example: Chromo domains alignedExample: Chromo domains aligned
Use of alignmentsUse of alignments• High sequence similarity usually means significant
structural and/or functional similarity. The reverse does not need to be true
• Homolog proteins (common ancestor) can vary significantly in large parts of the sequences, but still retain common 2D-patterns, 3D-patterns or common active site or binding site.
• Comparison of several sequences in a family can reveal what is common for the family. Something common for several sequences can be significant when regarding all of the sequences, but need not if regarding only two.
• Multiple alignment can be used to derive evolutionary history.
Use of alignmentsUse of alignments
• Predict features of aligned objects– conserved positions
• structurally/functionally important
Conserved positions
Use of alignmentsUse of alignments
• Predict features of aligned objects– conserved positions
• structurally/functionally important
– patterns of hydrophobicity/hydrophilicity• secondary structure elements
Helix pattern
Use of alignmentsUse of alignments
• Predict features of aligned objects– conserved positions
• structurally/functionally important
– patterns of hydrophobicity/hydrophilicity• secondary structure elements
– “gappy” regions• loops/variable regions
Loop? Loop?Loop?
Use of AlignmentsUse of Alignments- make patterns/profiles- make patterns/profiles
• Can make a profile or a pattern that can be used to match against a sequence database and identify new family members
• Profiles/patterns can be used to predict family membership of new sequences
• Databases of profiles/patterns– PROSITE– PFAM– PRINTS– ...
Prosite: Motifs for Prosite: Motifs for classificationclassification
Protein sequence
Prositepattern 1
Prositepattern 2
Prositepattern n
Family 1 Family 2 Family n
PatternRegular expression
Profile
Pattern from alignmentPattern from alignment[FYL]-x-[LIVMC]-[KR]-W-x-[GDNR]-[FYWLE]-x(5,6)-[ST]-W-[ES]-[PSTDN]-x(3)-[LIVMC]
Alignment problemAlignment problem
Given a set of sequences, produce a multiple alignment which corresponds as
well as possible to the biological relationships between the corresponding
bio-molecules
For homologous proteinsFor homologous proteins
• Two residues should be aligned (on top of each other)– if they are homologous (evolved from the
same residue in a common ancestor protein)– if they are structurally equivalent
Automatic approachAutomatic approach
• Need a way of scoring alignments – fitness function which for an alignment
quantifies its “goodness”
• Need an algorithm for finding alignments with good scores
• Not all methods provide a scoring function for the final alignment!
Analysis of fitness functionAnalysis of fitness function
• One can test whether the alignments optimal under a given fitness function correspond well to the biological relationships between the sequences
• For example, if the structure of (some of) the proteins are known.
Align by use of dynamic programmingAlign by use of dynamic programming
• Dynamic programming finds best alignment of k sequences with given scoring scheme
• For two sequences there are three different column types
• For three sequences there are seven different column types x means an amino acid, - a blank Sequence1 x - x x - - x Sequence2 x x - x - x - Sequence3 x x x - x - x
• Time complexity of O(nk) (sequence lengths = n)
Use of dynamic programmingUse of dynamic programming
• Dynamic programming finds best alignment of k sequences given scoring scheme
Algorithm for dynamic programmingAlgorithm for dynamic programming
Top Related