Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund...

50
Pairwise Alignment How do we tell whether two sequences are similar? BIO520 Bioinformatics Jim Lund Assigned reading: Ch 4.1-4.7, Ch 5.1, get what you can out of 5.2, 5.4

Transcript of Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund...

Page 1: Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch 4.1-4.7, Ch 5.1, get what you can.

Pairwise Alignment

How do we tell whether two sequences are

similar?

BIO520 Bioinformatics Jim Lund

Assigned reading:Ch 4.1-4.7, Ch 5.1, get what you can out of 5.2, 5.4

Page 2: Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch 4.1-4.7, Ch 5.1, get what you can.

Pairwise alignment

• DNA:DNA

• polypeptide:polypeptide

The BASIC Sequence Analysis Operation

Page 3: Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch 4.1-4.7, Ch 5.1, get what you can.

Alignments

• Pairwise sequence alignments

–One-to-One

–One-to-Database• Multiple sequence alignments

–Many-to-Many

Page 4: Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch 4.1-4.7, Ch 5.1, get what you can.

Origins of Sequence Similarity

• Homology– common evolutionary descent

• Chance– Short similar segments are very

common.

• Similarity in function– Convergence (very rare)

Page 5: Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch 4.1-4.7, Ch 5.1, get what you can.
Page 6: Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch 4.1-4.7, Ch 5.1, get what you can.

Visual sequence comparison: Dotplot

Page 7: Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch 4.1-4.7, Ch 5.1, get what you can.

Visual sequence comparison: Filtered dotplot

4 bp window, 75% identity cutoff

Page 8: Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch 4.1-4.7, Ch 5.1, get what you can.

Visual sequence comparison: Dotplot

4 bp windw, 75% identity cutoff

Page 9: Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch 4.1-4.7, Ch 5.1, get what you can.

Dotplots of sequence rearrangements

Page 10: Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch 4.1-4.7, Ch 5.1, get what you can.

Assessing similarity

GAACAAT||||||| 7/7 OR 100%GAACAAT

GAACAAT | 1/7 or 14%GAACAAT

Which is BETTER?How do we SCORE?

Page 11: Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch 4.1-4.7, Ch 5.1, get what you can.

Similarity

GAACAAT||||||| 7/7 OR 100%GAACAAT

GAACAAT||| ||| 6/7 OR 84%GAATAAT

MISMATCH

Page 12: Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch 4.1-4.7, Ch 5.1, get what you can.

Mismatches

GAACAAT||| ||| 6/7 OR 84%GAATAAT

GAACAAT||| ||| 6/7 OR 84%GAAGAAT

Page 13: Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch 4.1-4.7, Ch 5.1, get what you can.

Terminal Mismatch

GAACAATttttt ||| |||aaaccGAATAAT 6/7 OR 84%

Page 14: Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch 4.1-4.7, Ch 5.1, get what you can.

INDELS

GAAgCAAT||| |||| 7/7 OR 100%GAA*CAAT

Page 15: Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch 4.1-4.7, Ch 5.1, get what you can.

Indels, cont’d

GAAgCAAT||| ||||GAA*CAAT

GAAggggCAAT||| ||||GAA****CAAT

Page 16: Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch 4.1-4.7, Ch 5.1, get what you can.

Similarity Scoring

Common Method: • Terminal mismatches (0)• Match score (1)• Mismatch penalty (-3)• Gap penalty (-1)• Gap extension penalty (-1)

DNA Defaults

Page 17: Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch 4.1-4.7, Ch 5.1, get what you can.

DNA Scoring

GGGGGGAGAA

|||||*|*|| 8(1)+2(-3)=22GGGGGAAAAAGGGGG

GGGGGGAGAA--GGG

|||||*|*|| ||| 11(1)+2(-3)+1(-1)+1(-1)=33GGGGGAAAAAGGGGG

Page 18: Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch 4.1-4.7, Ch 5.1, get what you can.

Absurdity of Low Gap Penalty

GATCGCTACGCTCAGC A.C.C..C..T

Perfect similarity,Every time!

Page 19: Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch 4.1-4.7, Ch 5.1, get what you can.

Sequence alignment algorithms

• Local alignment– Smith-Waterman

• Global alignment– Needleman-Wunsch

Page 20: Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch 4.1-4.7, Ch 5.1, get what you can.

Alignment Programs

• Local alignment (Smith-Waterman)– BLAST (simplified Smith-Waterman)

– FASTA (simplified Smith-Waterman)

– BESTFIT (GCG program)

• Global alignment (Needleman-Wunsch)– GAP

Page 21: Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch 4.1-4.7, Ch 5.1, get what you can.

Local vs. global alignment

10 gaggc 15 ||||| 3 gaggc 7

1 gggggaaaaagtggccccc 19 || |||| ||1 gggggttttttttgtggtttcc 22

Global alignment: alignment of the full length of the sequences

Local alignment: alignment of regions of substantial similarity

Page 22: Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch 4.1-4.7, Ch 5.1, get what you can.

Local vs. global alignment

Page 23: Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch 4.1-4.7, Ch 5.1, get what you can.

BLAST Algorithm

Look for local alignment, a High Scoring Pair (HSP)• Finding word (W) in query and subject. Score > T.• Extend local alignment until score reaches

maximum-X.• Keep High Scoring Segment Pairs (HSPs) with

scores > S.• Find multiple HSPs per query if present• Expectation value (E value) using Karlin-Altschul

stats

Page 24: Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch 4.1-4.7, Ch 5.1, get what you can.

BLAST statistical significance: assessing the likelihood a match

occurs by chance

Karlin-Altschul statistic:E = k m N exp(-Lambda S)

m = Size of query seqeunceN = Size of databasek = Search space scaling parameterLambda = scoring scaling parameterS = BLAST HSP score

Low E -> good match

Page 25: Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch 4.1-4.7, Ch 5.1, get what you can.

BLAST statistical significance:

Rule of thumb for a good match:

•Nucleotide match•E < 1e-6•Identity > 70%

•Protein match•E < 1e-3•Identity > 25%

Page 26: Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch 4.1-4.7, Ch 5.1, get what you can.

Protein Similarity Scoring

• Identity - Easy• WEAK Alignments• Chemical Similarity

– L vs I, K vs R…

• Evolutionary Similarity–How do proteins evolve?–How do we infer similarities?

Page 27: Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch 4.1-4.7, Ch 5.1, get what you can.

BLOSUM62

C S T P A G N D C 9 -1 -1 -3 0 -3 -3 -3 S -1 4 1 -1 1 0 1 0 T -1 1 4 1 -1 1 0 1 P -3 -1 1 7 -1 -2 -1 -1 A 0 1 -1 -1 4 0 -1 -2 G -3 0 1 -2 0 6 -2 -1 N -3 1 0 -2 -2 0 6 1 D -3 0 1 -1 -2 -1 1 6

Page 28: Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch 4.1-4.7, Ch 5.1, get what you can.

Single-base evolution changes the encoded

AACAU=HCAU=H

CAC=H CGU=R UAU=Y

CAA=Q CCU=P GAU=D

CAG=Q CUU=L AAU=N

Page 29: Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch 4.1-4.7, Ch 5.1, get what you can.

Substitution Matrices

Two main classes:

• PAM-Dayhoff

• BLOSUM-Henikoff

Page 30: Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch 4.1-4.7, Ch 5.1, get what you can.

PAM-Dayhoff

• Built from closed related proteins, substitutions constrained by evolution and function

• “accepted” by evolution (Point Accepted Mutation=PAM)

• 1 PAM::1% divergence• PAM120=closely related proteins

• PAM250=divergent proteins

Page 31: Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch 4.1-4.7, Ch 5.1, get what you can.

BLOSUM-Henikoff&Henikoff

• Built from ungapped alignments in proteins: “BLOCKS”

• Merge blocks at given % similar to one sequence

• Calculate “target” frequencies

• BLOSUM62=62% similar blocks– good general purpose

• BLOSUM30– Detects weak similarities, used for distantly related proteins

Page 32: Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch 4.1-4.7, Ch 5.1, get what you can.

BLOSUM62

C S T P A G N D C 9 -1 -1 -3 0 -3 -3 -3 S -1 4 1 -1 1 0 1 0 T -1 1 4 1 -1 1 0 1 P -3 -1 1 7 -1 -2 -1 -1 A 0 1 -1 -1 4 0 -1 -2 G -3 0 1 -2 0 6 -2 -1 N -3 1 0 -2 -2 0 6 1 D -3 0 1 -1 -2 -1 1 6

Page 33: Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch 4.1-4.7, Ch 5.1, get what you can.

Gapped alignments

• No general theory for significance of matches!!

• G+L(n) – indel mutations rare

– variation in gap length “easy”, G > L

Page 34: Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch 4.1-4.7, Ch 5.1, get what you can.

Real Alignments

Page 35: Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch 4.1-4.7, Ch 5.1, get what you can.

Phylogeny

Page 36: Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch 4.1-4.7, Ch 5.1, get what you can.

1 MGLSDGEWQLVLNAWGKVEADVAGHGQEVLIRLFTGHPETLEKFDKFKHL 50 ||||||||||||| |||||||||||||||||||| ||||||||||||||| 1 MGLSDGEWQLVLNVWGKVEADVAGHGQEVLIRLFKGHPETLEKFDKFKHL 50 . . . . . 51 KTEAEMKASEDLKKHGNTVLTALGGILKKKGHHEAEVKHLAESHANKHKI 100 |.| ||||||||||||||||||||||||||||||||. ||:||| |||| 51 KSEDEMKASEDLKKHGNTVLTALGGILKKKGHHEAELTPLAQSHATKHKI 100 . . . . . 101 PVKYLEFISDAIIHVLHAKHPSDFGADAQAAMSKALELFRNDMAAQYKVL 150 |||||||||:||| || .||| ||||||| |||||||||||||||.|| | 101 PVKYLEFISEAIIQVLQSKHPGDFGADAQGAMSKALELFRNDMAAKYKEL 150

151 GFHG 154 || | 151 GFQG 154

Cow-to-Pig Protein

Page 37: Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch 4.1-4.7, Ch 5.1, get what you can.

Cow-to-Pig cDNA 1 CAGCTGTCGGAGACAGACACCCAGTCAGTCCCGCCCTTGTTCTTTTTCTC 50 | ||| ||| || | ||||| |||| ||| |||||| 1 .......CAGAGCCAGGACACCCAGTACGCCCGCACTTGCTCTGTTTCTC 43 . . . . . 51 TTCTTCAGACTGCGCCATGGGGCTCAGCGACGGGGAATGGCAGTTGGTGC 100 |||| ||||||| |||||||||||||||||||||||||||||| |||||| 44 TTCTGCAGACTGTGCCATGGGGCTCAGCGACGGGGAATGGCAGCTGGTGC 93 . . . . . 101 TGAATGCCTGGGGGAAGGTGGAGGCTGATGTCGCAGGCCATGGGCAGGAG 150 |||| | ||||||||||||||||||||||||||||||||||||||||||| 94 TGAACGTCTGGGGGAAGGTGGAGGCTGATGTCGCAGGCCATGGGCAGGAG 143 . . . . . 151 GTCCTCATCAGGCTCTTCACAGGTCATCCCGAGACCCTGGAGAAATTTGA 200 ||||||||||||||||| | ||||| ||||||||||||||||||||||| 144 GTCCTCATCAGGCTCTTTAAGGGTCACCCCGAGACCCTGGAGAAATTTGA 193 . . . . . 201 CAAGTTCAAGCACCTGAAGACAGAGGCTGAGATGAAGGCCTCCGAGGACC 250 |||||| |||||||||||| |||||| ||||||||||||||| ||||||| 194 CAAGTTTAAGCACCTGAAGTCAGAGGATGAGATGAAGGCCTCTGAGGACC 243

80% Identity (88% at aa!)

Page 38: Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch 4.1-4.7, Ch 5.1, get what you can.

DNA similarity reflects polypeptide similarity

101 TGAATGCCTGGGGGAAGGTGGAGGCTGATGTCGCAGGCCATGGGCAGGAG 150 |||| | ||||||||||||||||||||||||||||||||||||||||||| 94 TGAACGTCTGGGGGAAGGTGGAGGCTGATGTCGCAGGCCATGGGCAGGAG 143

501 CCAGTACAAGGTGCTGGGCTTCCATGGCTAAGCCCCACCCCTGTGCCCCT 550 | ||||||||| |||||||||||| ||||||||||| | | || | 494 CAAGTACAAGGAGCTGGGCTTCCAGGGCTAAGCCCCCCAGACGCCCCTCA 543 . . . . .

Page 39: Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch 4.1-4.7, Ch 5.1, get what you can.

Coding vs Non-coding Regions

451 CAGGCTGCCATGAGCAAGGCCCTGGAACTGTTCCGGAATGACATGGCTGC 500 |||| ||||||||||||||||||||||| |||||||| |||||||| || 444 CAGGGAGCCATGAGCAAGGCCCTGGAACTCTTCCGGAACGACATGGCGGC 493 . . . . . 501 CCAGTACAAGGTGCTGGGCTTCCATGGCTAAGCCCCACCCCTGTGCCCCT 550 | ||||||||| |||||||||||| ||||||||||| | | || | 494 CAAGTACAAGGAGCTGGGCTTCCAGGGCTAAGCCCCCCAGACGCCCCTCA 543 . . . . . 551 CAC.CCCACCCACCTGGG...........CAGGGTGGGCGGGGACTGAAT 588 | | |||| |||| |||| | || ||| ||| ||||| 544 CCCACCCATCCACTTGGGCCAGGGCCCCCCGCGGAGGGTGGGCGCTGAAG 593 . . . . . 589 CCCAAGTAGTTATAGGGTTTGCTTCTGAGTGTGTGCTTTGTTTAGGAGAG 638 | | |||| | |||||||||||||||||||| ||||||||| | ||||| 594 CTCCTGTAGCTGTAGGGTTTGCTTCTGAGTGT.TGCTTTGTTCATGAGAG 642 . . . . . 639 GTGGGTGGAAGAGGTGGATGGGTTAGGGGTGGAGG............... 673 |||||||| ||||||||| ||| | | ||||| || 643 GTGGGTGGGAGAGGTGGAGGGGCTGGTGGTGGTGGTGGGGGGGTGTTCAG 692

90% in coding (70% in non-coding)

Page 40: Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch 4.1-4.7, Ch 5.1, get what you can.

Third Base of Codon is Hypervariable

201 CAAGTTCAAGCACCTGAAGACAGAGGCTGAGATGAAGGCCTCCGAGGACC 250 ||||||*||||||||||||*||||||*|||||||||||||||*||||||| 194 CAAGTTTAAGCACCTGAAGTCAGAGGATGAGATGAAGGCCTCTGAGGACC 243 . . . . . 251 TGAAGAAGCATGGCAACACGGTGCTCACGGCCCTGGGGGGTATCCTGAAG 300 ||||||||||*||||||||||||||*||*|||||||||||*|||||*||| 244 TGAAGAAGCACGGCAACACGGTGCTGACTGCCCTGGGGGGCATCCTTAAG 293

Page 41: Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch 4.1-4.7, Ch 5.1, get what you can.

Cow-to-Fish Protein

1 MGLSDGEWQLVLNAWGKVEADVAGHGQEVLIRLFTGHPETLEKFDKFKHL 50 :. :|| || .||| | || || |||| |||||. | || : 1 ....MADFDMVLKCWGPMEADHATHGSLVLTRLFTEHPETLKLFPKFAGI 46 . . . . . 51 KTEAEMKASEDLKKHGNTVLTALGGILKKKGHHEAEVKHLAESHANKHKI 100 :: . || ||| || :|| :| | | .| |. ||| |||| 47 .AHGDLAGDAGVSAHGATVLNKLGDLLKARGAHAALLKPLSSSHATKHKI 95 . . . . . 101 PVKYLEFISDAIIHVLHAKHPSDFGADAQAAMSKALELFRNDMAAQYKVL 150 |: . |.: | |: | | | | |: : : || | || | 96 PIINFKLIAEVIGKVMEEKAGLD..AAGQTALRNVMAIIITDMEADYKEL 143

151 GFHG 154 || 144 GFTE 147

42% identity, 51% similarity

Page 42: Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch 4.1-4.7, Ch 5.1, get what you can.

Cow-to-Fish DNA

32 .ACAGGACATTTTACTACTCTGCAGATAATGGCTGACTTTGACATGGTAC 80 | | | | | | || | | || | | |||| | 51 TTCTTCAGACTGCGCCATGGGGCTCAGCGACGGGGAATGGCAGTTGGTGC 100 . . . . . 81 TGAAGTGCTGGGGTCCAATGGAGGCGGACCACGCAACCCACGGGAGTCTG 130 |||| |||||| ||||||| || |||| ||| ||| | 101 TGAATGCCTGGGGGAAGGTGGAGGCTGATGTCGCAGGCCATGGGCAGGAG 150 . . . . . 131 GTGCTGACCCGTTTATTCACAGAGCACCCAGAAACCCTAAAGTTATTCCC 180 || || | | | | ||||||| || || || ||||| || ||| 151 GTCCTCATCAGGCTCTTCACAGGTCATCCCGAGACCCTGGAGAAATTTGA 200 . . . . . 181 CAAGTTTGCTGGC...ATCGCCCATGGGGACCTGGCCGGGGATGCAGGTG 227 |||||| | | | | | || || | | | 201 CAAGTTCAAGCACCTGAAGACAGAGGCTGAGATGAAGGCCTCCGAGGACC 250

48% similarity

Page 43: Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch 4.1-4.7, Ch 5.1, get what you can.

Protein vs. DNAAlignments

• Polypeptide similarity > DNA• Coding DNA > Non-coding

• 3rd base of codon hypervariable• Moderate Distance poor DNA similarity

Page 44: Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch 4.1-4.7, Ch 5.1, get what you can.

Rules of Thumb

• DNA-DNA similarities– 50% significant if “long”

– E < 1e-6, 70% identity

• Protein-protein similarities– 80% end-end: same structure, same function

– 30% over domain, similar function, structure overall similar

– 15-30% “twilight zone”

– Short, strong match…could be a “motif”

Page 45: Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch 4.1-4.7, Ch 5.1, get what you can.

Basic BLAST Family

• BLASTN– DNA to DNA database

• BLASTP– protein to protein database

• TBLASTN– DNA (translated) to protein database

• BLASTX– protein to DNA database (translated)

• TBLASTX– DNA (translated) to DNA database (translated)

Page 46: Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch 4.1-4.7, Ch 5.1, get what you can.

DNA Databases

• nr (non-redundantish merge of Genbank, EMBL, etc…)– EXCLUDES HTGS0,1,2, EST, GSS, STS, PAT, WGS

• est (expressed sequence tags)• htgs (high throughput genome seq.)• gss (genome survey sequence)• vector, yeast, ecoli, mito• chromosome (complete genomes)• And more

http://www.ncbi.nlm.nih.gov/BLAST/blastcgihelp.shtml#nucleotide_databases

Page 47: Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch 4.1-4.7, Ch 5.1, get what you can.

Protein Databases

• nr (non-redundant Swiss-prot, PIR, PDF, PDB, Genbank CDS)

• swissprot

• ecoli, yeast, fly

• month

• And more

Page 48: Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch 4.1-4.7, Ch 5.1, get what you can.

BLAST Input

• Program

• Database

• Options - see more

• Sequence– FASTA

– gi or accession#

Page 49: Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch 4.1-4.7, Ch 5.1, get what you can.

BLAST Options

• Algorithm and output options– # descriptions, # alignments returned– Probability cutoff– Strand

• Alignment parameters– Scoring Matrix

• PAM30, PAM70, BLOSUM45, BLOSUM62BLOSUM62, BLOSUM80, BLOSUM80

– Filter (low complexity) PPPPP->XXXXX

Page 50: Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch 4.1-4.7, Ch 5.1, get what you can.

Extended BLAST Family

• Gapped Blast (default)Gapped Blast (default)• PSI-Blast (Position-specific iterated

blast)– “self” generated scoring matrix

• PHI BLAST (motif plus BLAST)• BLAST2 client (align two seqs)

• megablast (genomic sequence)• rpsblast (search for domains)