Theory and Application of Multiple Sequence Alignments Brett Pickett, PhD a.k.a What is a Multiple...

32
Theory and Application of Multiple Sequence Alignments Brett Pickett, PhD a.k.a What is a Multiple Sequence Alignment, How to Make One, and What to Do With It

Transcript of Theory and Application of Multiple Sequence Alignments Brett Pickett, PhD a.k.a What is a Multiple...

  • Slide 1
  • Theory and Application of Multiple Sequence Alignments Brett Pickett, PhD a.k.a What is a Multiple Sequence Alignment, How to Make One, and What to Do With It
  • Slide 2
  • History Structure of DNA discovered (1953) First (phage) genome determined in 1977 Human genome project begun in 1990 First living organism (H.i.) sequenced in 1995 Human Rough draft completed in 2000 NHGRI (public) vs. J. Craig Venter (private) Used super computer to put human genome together in right order
  • Slide 3
  • What is a Genome? Genetic material required for organism to replicate Eukaryotes (Humans): # chromosomes Prokaryotes (Bacteria): 1 chromosome Viruses: whats a chromosome? 10 trillion cells in human body X 2m = 3.2 Gb 780,000 times around Earth 67.8 roundtrips to the sun Bacteria (580 kb- 10 Mb) Virus (3.5 kb 1.3 Mb) http://www.rsc.org/chemsoc/timeline/pages/2001.html
  • Slide 4
  • Why are Genomes so Important? Encode all organismal functions DNA -> RNA -> protein Unique to each organism Find differences (mutations) only by comparing genomes with each other www.thednastore.com/images/cells/mrdna1.jpg
  • Slide 5
  • How are Sequences Made? 1.Make lots of copies of original sequence (PCR) 2.Put the copies into a machine to make even more copies 3.Fluorescent (glow-in-the-dark) bases get incorporated randomly into new DNA molecule 4.Laser detects glowing bases and tells the computer the order of bases = sequence http://bjpsbiotech.edublogs.org/files/2007/12/electropherogram.jpg
  • Slide 6
  • Whats the Next Step? After sequence is determined, then what? Make sense of it by comparing with other related (homologous) sequences Multiple Sequence Alignment
  • Slide 7
  • What is an Alignment? Lining up related (homologous) positions Allows comparison Unaligned Aligned
  • Slide 8
  • Comparing Sequences (Genomes) All DNA contains a unique genetic fingerprint Similarity reveals Related function Shared evolutionary history education.vetmed.vt.edu/.../FINGERPRINT.jpg
  • Slide 9
  • Aligning with Computational Methods Computers cant see patterns Use math to find best alignment by assigning scores Match Mismatch Gap Internal Insertion / deletion (indel) Terminal Missing information?
  • Slide 10
  • What is a Gap? Allows bases to be lined up even if sequences are different lengths Insertions / deletions (indels) Impossible to tell which sequence has lost (gained) information Terminal gaps Sequence is either naturally shorter or artificially cutoff
  • Slide 11
  • MismatchesGaps Nucleotide Alignment Custom Scores Match Mismatch Gap-opening penalty Penalized for not having letter (begin a gap) Why? Gap-extension penalty Little or no penalty for lengthening a gap Why? Scores balance between mismatch & gap
  • Slide 12
  • Dynamic Programming Used to calculate alignment Breaks a very complicated process into smaller steps Helps computers to solve the problem faster Sequence 1 Sequence 2 Math Read http://www.myspacepimper.com/images/232763/Disney-s-Goofy-Baking-a-Cake.htm
  • Slide 13
  • Manual Alignment SequenceAATC 00000 A 0 -4 5 -4 5 1 5 -4 5 1 -2 -4 1 -3 -2 -4 -2 T 0 -4 -2 1 1 -3 3 1 3 -1 10 -3 10 6 -1 -6 6 C 0 -4 -2 -3 -2 -6 -1 -1 -5 1 6 6 2 15 2 15 Match = 5 Mismatch = -2 Gap Opening = -4 Gap Extension = 0 Traceback: Follow the highest scores back to the beginning Up or sideways = gap, diagonal = homology (line up) AAAA A-A- TTTT CCCC
  • Slide 14
  • Computer-Generated Alignment Much faster than we are 2 GHz = 2B calculations per second Dont get tired, make mistakes, or get handcramps
  • Slide 15
  • Alignment Process
  • Slide 16
  • Types of Alignment Global Aligns entire sequence Permits gaps Forced even if sequences not homologous Local Aligns longest region possible with minimal (no) gaps
  • Slide 17
  • Beware! The computer is not always right Alignments Optimal: highest score True: evolutionarily correct Can be improved Hard for computer to accurately place indels (gaps) Apply prior knowledge--codons - AAA CCC Lys Pro AA- ACC C ??? Thr ? Asn Lys vs. Nucleotide Sequence Amino Acid Sequence
  • Slide 18
  • BLAST Basic Local Alignment Search Tool Most frequently used alignment tool Local alignment of 1 sequence (query) against all known sequences (subjects) in database Uses a heuristic to reduce number of sequences it actually has to align Like using Google to find most homologous sequences
  • Slide 19
  • BLAST Input
  • Slide 20
  • BLAST Output
  • Slide 21
  • How Does This Impact Me? Human Microbiome project Sequence all bacteria in intestines Millions of bacteria in each gram of excrement Which ones make us sick? How different is flora between people? Ocean Virus Metagenomics project Try to get an idea of virus diversity across the globe Boat goes around N.A. collecting samples Billions of viruses in each gallon of seawater
  • Slide 22
  • How Does This Impact Me (contd)? Used to take swabs, grow colonies on agar Antimicrobial resistance in turkeys Sequencing removes middle step How to quickly assign genus and species to new sequences? BLAST Project: New Phage from ponds
  • Slide 23
  • Other Uses for Alignments
  • Slide 24
  • SNP Detection Single Nucleotide Polymorphism Genetic changes occurring in at least one sequence May have biological significance Antibiotic resistance Changes could avoid detection by immune system Cause of genetic disease (CF)
  • Slide 25
  • Phylogenetic Trees Computer generated by: Examining alignment Looking for shared mutations Show relationship(s) between sequences History of sequences Where they came from Genetic changes that have occurred Clade Node Leaf iOS Phylogram App (Free) Branch
  • Slide 26
  • Recombination Can occur in all types of organisms Eukaryotes Prokaryotes Viruses May change characteristic of organism Make you sick (or not) Not recognized by immune system Fast way of getting lots of genetic changes Breakpoint RdRP Genome 1 Genome 2 Daughter Sequence Major Parent Minor Parent
  • Slide 27
  • Reassortment Chromosomes (segments) from one organism replace those from another May change characteristic of organism Make you sick (or not) Not recognized by immune system Fast way of getting lots of genetic changes + =
  • Slide 28
  • Other Analysis Options Align Sequences Look for genetic changes (genotype) that are associated with traits (phenotype) Host How sick it makes you Drug resistance Inherited disease Do any mutations consistently accompany the traits? Genome Wide Association Studies http://lovestats.wordpress.com/dman/
  • Slide 29
  • Slide 30
  • Slide 31
  • How Does an Alignment Get a Score? Amino acids Identical >> Similar >> Dissimilar
  • Slide 32
  • Score Lookup Table (Matrix) Symmetrical Positive Scores on Diagonal (Matches) Some Mismatches get Negative Scores Some Mismatches dont