Download - Multiple sequence alignments and motif discovery

Transcript
Page 1: Multiple sequence alignments  and motif discovery

Multiple sequence alignments and motif discovery

Tutorial 5

Page 2: Multiple sequence alignments  and motif discovery

• Multiple sequence alignment– ClustalW– Muscle

• Motif discovery– MEME– Jaspar

Multiple sequence alignments and motif discovery

Page 3: Multiple sequence alignments  and motif discovery

• More than two sequences– DNA– Protein

• Evolutionary relation– Homology Phylogenetic tree– Detect motif

Multiple Sequence Alignment

GTCGTAGTCG-GC-TCGACGTC-TAG-CGAGCGT-GATGC-GAAG-AG-GCG-AG-CGCCGTCG-CG-TCGTA-AC

A

D B

CGTCGTAGTCGGCTCGACGTCTAGCGAGCGTGATGCGAAGAGGCGAGCGCCGTCGCGTCGTAAC

Page 4: Multiple sequence alignments  and motif discovery

• Dynamic Programming– Optimal alignment– Exponential in #Sequences

• Progressive– Efficient– Heuristic

Multiple Sequence Alignment

GTCGTAGTCG-GC-TCGACGTC-TAG-CGAGCGT-GATGC-GAAG-AG-GCG-AG-CGCCGTCG-CG-TCGTA-AC

A

D B

CGTCGTAGTCGGCTCGACGTCTAGCGAGCGTGATGCGAAGAGGCGAGCGCCGTCGCGTCGTAAC

Page 5: Multiple sequence alignments  and motif discovery

ClustalW

“CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice”, J D Thompson et al

Pairwise alignment – calculate distance

matrix

Guided tree

Progressive alignment using the

guide tree

Page 6: Multiple sequence alignments  and motif discovery

ClustalW

• Progressive– At each step align two existing alignments or

sequences– Gaps present in older alignments remain fixed

-TGTTAAC-TGT-AAC-TGT--ACATGT---CATGT-GGC

Page 7: Multiple sequence alignments  and motif discovery

ClustalW - Inputhttp://www.ebi.ac.uk/Tools/clustalw2/index.html

Input sequences

Gap scoring

Scoring matrix

Email address

Output format

Page 8: Multiple sequence alignments  and motif discovery

ClustalW - Output

Match strength in decreasing order: * : .

Page 9: Multiple sequence alignments  and motif discovery

ClustalW - Output

Page 10: Multiple sequence alignments  and motif discovery

ClustalW - Output

Page 11: Multiple sequence alignments  and motif discovery

ClustalW - Output

Page 12: Multiple sequence alignments  and motif discovery

ClustalW - Output

Pairwise alignment scores

Building alignment

Final score

Building tree

Page 13: Multiple sequence alignments  and motif discovery

ClustalW - Output

Page 14: Multiple sequence alignments  and motif discovery

ClustalW Output

Sequence names Sequence positions

Match strength in decreasing order: * : .

Page 15: Multiple sequence alignments  and motif discovery

ClustalW - Output

Page 16: Multiple sequence alignments  and motif discovery

ClustalW - Output

Branch length

Page 17: Multiple sequence alignments  and motif discovery

ClustalW - Output

Page 18: Multiple sequence alignments  and motif discovery

ClustalW - Output

Page 19: Multiple sequence alignments  and motif discovery

http://www.ebi.ac.uk/Tools/muscle/index.html

Muscle

Page 20: Multiple sequence alignments  and motif discovery

Muscle - output

Page 21: Multiple sequence alignments  and motif discovery

What’s the difference between Muscle and ClustalW?

ClustalW Muscle

Page 22: Multiple sequence alignments  and motif discovery

http://www.megasoftware.net/index.html

Page 23: Multiple sequence alignments  and motif discovery

Can we find motifs using multiple sequence alignment?

1 2 3 4 5 6 7 8 9 10

A 0 0 0 0 0 0.5 1/6 1/3 0 0

D 0 0.5 1/3 0 0 1/6 5/6 1/6 0 1/6

E 0 0 2/3 1 0 0 0 0 1 5/6

G 0 1/6 0 0 1 1/3 0 0 0 0

H 0 1/6 0 0 0 0 0 0 0 0

N 0 1/6 0 0 0 0 0 0 0 0

Y 1 0 0 0 0 0 0.5 0.5 0 0

1 3 5 7 9..YDEEGGDAEE....YDEEGGDAEE....YGEEGADYED....YDEEGADYEE....YNDEGDDYEE....YHDEGAADEE.. * :** *:

MotifA widespread pattern with a biological significance

Page 24: Multiple sequence alignments  and motif discovery

Can we find motifs using multiple sequence alignment?

YES! NO

Page 25: Multiple sequence alignments  and motif discovery

MEME – Multiple EM* for Motif finding

• http://meme.sdsc.edu/• Motif discovery from unaligned sequences

– Genomic or protein sequences• Flexible model of motif presence (Motif can be absent in

some sequences or appear several times in one sequence)

*Expectation-maximization

Page 26: Multiple sequence alignments  and motif discovery

MEME - InputEmail address

Input file (fasta file)

How many times in each

sequence?

How many motifs?

How many sites?

Range of motif

lengths

Page 27: Multiple sequence alignments  and motif discovery

MEME - Output

Motif score

Page 28: Multiple sequence alignments  and motif discovery

MEME - Output

Motif length

Number of times

Motif score

Page 29: Multiple sequence alignments  and motif discovery

MEME - Output

Low uncertainty

=

High information content

Page 30: Multiple sequence alignments  and motif discovery

MEME - Output

Multilevel Consensus

Page 31: Multiple sequence alignments  and motif discovery

Sequence names

Position in sequence

Strength of match

Motif within sequence

MEME - Output

Page 32: Multiple sequence alignments  and motif discovery

Overall strength of motif matches

Motif location in the input sequence

MEME - OutputSequence names

Page 33: Multiple sequence alignments  and motif discovery

MAST

• Searches for motifs (one or more) in sequence databases:– Like BLAST but motifs for input– Similar to iterations of PSI-BLAST

• Profile defines strength of match– Multiple motif matches per sequence– Combined E value for all motifs

• MEME uses MAST to summarize results: – Each MEME result is accompanied by the MAST result for

searching the discovered motifs on the given sequences.

http://meme.sdsc.edu/meme4_4_0/cgi-bin/mast.cgi

Page 34: Multiple sequence alignments  and motif discovery

MEME - Input

Email address

Input file (motifs)

Database

Page 35: Multiple sequence alignments  and motif discovery

JASPAR

• Profiles – Transcription factor binding sites– Multicellular eukaryotes– Derived from published collections of experiments

• Open data accesss

Page 36: Multiple sequence alignments  and motif discovery

JASPAR• profiles

– Modeled as matrices.– can be converted into PSSM for scanning genomic

sequences.

1 2 3 4 5 6 7 8 9 10

A 0 0 0 0 0 0.5 1/6 1/3 0 0

D 0 0.5 1/3 0 0 1/6 5/6 1/6 0 1/6

E 0 0 2/3 1 0 0 0 0 1 5/6

G 0 1/6 0 0 1 1/3 0 0 0 0

H 0 1/6 0 0 0 0 0 0 0 0

N 0 1/6 0 0 0 0 0 0 0 0

Y 1 0 0 0 0 0 0.5 0.5 0 0

Page 37: Multiple sequence alignments  and motif discovery

Search profile

http://jaspar.genereg.net/

Page 38: Multiple sequence alignments  and motif discovery

scoreorganism logoName of gene/protein