Introduction to Bioinformatics Dot Plots. One of the simplest and oldest methods for sequence...

Post on 26-Dec-2015

217 views 2 download

Transcript of Introduction to Bioinformatics Dot Plots. One of the simplest and oldest methods for sequence...

Introduction to Bioinformatics

Dot Plots

Dot Plots

• One of the simplest and oldest methods for sequence alignment

• Visualization of regions of similarity – Assign one sequence on the horizontal axis– Assign the other on the vertical axis– Place dots on the space of matches– Diagonal lines means adjacent regions of

identity

Simple Example• Construct a simple dot plot for

GCTGAAGCGAA

One sequence goes horizontally, the other verticallyMark boxes w/ matched horizontal and vertical symbolsLook for diagonal(s)

Alignment:GCTGAAGCT-AA

G C T G A A

G * *

C *

T *

A *

A *

Another Example• Construct a simple dot plot for

GCTAGTCAGATCTGACGCTAGATGGTCACATCTGCCGC

A long stretch of nearly identical residues is revealed starting at the fifth nucleotide of each sequence (GTCA-ATCTG-CGC).

Sliding Window and Cutoff

• Problem– Plot becomes noisy when comparing large,

similar sequences

• Solution– Sliding window (size = w)– Cutoff (value = v)– Consider w nucleotides at a time – When at least v matches in a window, place a

dot on the space where the window starts

Example• Same example with w = 4 and v = 3

• Compare to the previous plot. You make the call!

Worksheet • w = 4 and v = 3

What else can it do (and how)?

• Gaps • Inverse subsequence• Repeats• Palindrome• Genome rearrangement• Exon identification• RNA structure prediction• Nice tool for conceptualizing sequence-

related algorithms