BLAT – The B LAST- L ike A lignment T ool

BLAT – The BLAST-Like Alignment Tool

Kent, W.J.

Genome Res. 2002 12: 656-664

Presenter: 巨彥霖田知本

BLAT overview

• Use an index to find regions in genome

homologous to query.

• Do a detailed alignment between query

and homologous regions.

• Use dynamic programming to stitch

together detailed alignments regions

into detailed alignment of whole.

• Database : non-overlapping

• Query : overlapping

K-merK-mer

…K-mer

…K-merK-mer

Example

• Database: cacaattatcacgaccgc

3-mers: cac aat tat cac gac cgc

Index: aat 3 gac 12

cac 0,9 tat 6

cgc 15

• Query: aattctcac

3-mers: aat att ttc tct ctc tca cac

0 1 2 3 4 5 6

Search Criteria

• Single Perfect Matches

• Single Near Perfect Matches

• Multiple Perfect Matches

Notation

• K : K-mer size

• M : The match ratio between homologous

• H : Homologous region size

• G : Query sequence size

• A : The alphabet size

Single Perfect Matches (1)

Perfect Match

Homologous

region

KHkMP /)1(1

Homologous

region

The prob of at least one k-mer perfect match :

K K K K K K K

(Sensitivity)

• The number of k-mer in the database = G / K• The number of k-mer in the query = Q – K + 1

The number of k-mer that are expected to

matched by chance : KAKGKQF )/1()/()1( (Specificity)

Single Perfect Nucleotide K-mer Matches as Search Criterion

Case (perfect match)

• Comparing mouse and human coding sequences at the nucleotide level :

H = 100

M = 86%

Sensitivity = 0.99

max K = 7

chance matches = 13078962

(query = 500 , database = 3 billion)

Single Near Perfect Matches (1)

Near Perfect Match

)1(11 MMKMp Kk

Homologous

region

Almost Perfect : One letter may mismatch

Single Near Perfect Matches (2)

• Sensitivity

• Specificity

KHkpP /1 )1(1

))/1())/1(1()/1(()/()1( 1 KK AAAKKGKQF

Case (near perfect match)

• Comparing mouse and human coding sequences at the nucleotide level :

H = 100

M = 86%

Sensitivity = 0.99

max K = 12

chance matches = 275671

(query = 500 , database = 3 billion)

Single Near Perfect Nucleotide K-mer Matches as Search Criterion

Multiple Perfect Matches

• Hit is triggered :– there must be N perfect matches– each no further than W letters from each other

in the database coordinate– have the same diagonal coordinate

Example

The hits a, b, c, and d are all k letters long. Hits b and d have the same diagonal coordinate within W letters of each other. Therefore, they would match the 2 perfect K-mer search criteria.

Target Coordinate

Query C

oordinate

Multiple Perfect Nucleotide K-mer Matches as Search Criterion

Default

• Nucleotide– two perfect 11-mer

• Protein– single perfect 5-mer for standalone version– three perfect 4-mer for client/server version

1) Build the hash table for Sequence A.

2) Scan Sequence B for hits.

3) Extend hits.

BLASTStep 1: Build the hash table for Sequence A. (3-tuple example)

For DNA sequences:

Seq. A = AGATCGAT 12345678AAAAAC..AGA 1..ATC 3..CGA 5..GAT 2 6..TCG 4..

For protein sequences:

Seq. A = ELVIS

Add xyz to the hash table if Score(xyz, ELV) T;≧Add xyz to the hash table if Score(xyz, LVI) T;≧Add xyz to the hash table if Score(xyz, VIS) T;≧

BLASTStep2: Scan sequence B for hits.

Step 3: Extend hits.

Terminate if the score of the sxtension fades away. (That is, when we reach a segment pair whose score falls a certain distance below the best score found for shorter extensions.)

BLAST 2.0 saves the time spent in extension, and

considers gapped alignments.

Algorithm

1. Search Stage– Use an index to find regions in genome

homologous to query

2. Alignment Stage– Do a detailed alignment between query and

homologous regions

3. Stitching and Filling In– Use dynamic programming to stitch together

detailed alignments regions into detailed alignment of whole

Search Stage

• Build an index which contains positions of each K-mer in database.

• Step through each overlapping K-mer in query and look it up in index

• Get list of ‘hits’ - positions in query and in database that match for K bases

• Cluster hits to find homologous regions

Search Stage

• Clump hits

• Clump ‘clumps’

• Eliminate small clumps

homologous region

Search Stage

Alignment Stage (nucleotide)

• Start from scratch with regions defined with K-mers

• Index on smaller K-mers, but extend each K-mer until it becomes specific

• Extend in both direction without mismatches or gaps and merge overlapping or continues alignments

• Recurse on gaps with smaller K until gap or hits are eliminated

Alignment Stage (nucleotide)

recursive

Alignment Stage (protein)

• Extend hits into maximal scoring ungapped alignment (HSPs) with +2/-1 scoring scheme

• Create a graph of all possible HSP merges

• Use dynamic programming to traverse the graph

Alignment Stage (protein)

homologous region

Stitching and Filling In

• The alignment of gene is often scattered across multiple homologous regions found in the search stage

database

Stitching and Filling In

database

homologous region

Evaluation

• Comparison with Other Tools:– mRNA/Genome Alignments– Remapped 713 mRNAs corresponding to annotated

chromosome 22– BLAT took 26 sec while Sim4 took 17,468 sec

(almost 5h)

Est_genome Sim4 BLAT

Relative speed 1 333 223,000

Base accuracy N/A 99.66% 99.99%

Gene accuracy 77.7% 93.4% 99.5%

Evaluation• Comparison with Other Tools:

– Translated Mouse/Human Alignments– 13 million mouse genomic reads vs. human

chromosome 22

WU-TBLASTX BLAT

Relative Speed 1x 73x

% RefSeq Covered 84.5% 86.7%

% Genome Covered 2.67% 2.89%

BLAT vs. BLAST

• Index– Query vs. Database

• Hits– Perfect vs. Near Perfect

• Alignment– Separate vs. Together

Magic Time !

Prediction !No

mind !Great !

Reference

• http://amber.cs.umd.edu/class/838-s04/nada.ppt

• http://bioportal.weizmann.ac.il/course/ATIB/ATIB03_lecture3.print.pdf

BLAT – The B LAST- L ike A lignment T ool

Documents

Transcript of BLAT – The B LAST- L ike A lignment T ool

?ool 1o - Quebec.ca

2013 2013 GrizzlyGrizzlyGrizzly Bear Blat Bear Blat Bear Blat · 2013 2013 GrizzlyGrizzlyGrizzly Bear Blat Bear Blat Bear Blat 8 th August – 17 August The idea of organizing a Blat

ool. - goethe.de

954435-Ool...Title 954435-Ool Created Date 9/20/2018 8:28:41 PM

Teleskola · Web viewIl-blat tal-gżejjer Maltin huwa blat poruż . It-Tafli huwa l-uniku blat impermeabbli fil-gżejjer Maltin. L-erba’ saffi ta’ blat l-oħra huma blat permeabbli.

L ETTER TO P HONEME A LIGNMENT Reihaneh Rabbany Shahin Jabbari.

ANorth AmericanNorth American LIGNMENT - …Optical+Toolin… · North AmericanNorth American LIGNMENT ... We offer calibration estimates on instruments in addition to ... • Yoke

B asic L ocal A lignment S earch T ool

1019368-Ool...Title 1019368-Ool Created Date 5/12/2020 4:21:19 PM

.ool Obligaciones Corriente

THAMMASAT BUSINESS SCH OOL

BLAT – The B LAST- L ike A lignment T ool Kent, W.J. Genome Res. 2002 12: 656-664 Presenter: 巨彥霖 田知本.

Ool - Casa do Povo

Del Blat Al Pa

Stoisko - 1 moduł (blat, ścianka, 4 siedziska)

Se gment A lignment (SEA) Yuzhen Ye Adam Godzik The Burnham Institute

Blat lapidot - Salesforce summer '16 release webinar

Lecture 3.11 BLAST. Lecture 3.12 BLAST B asic L ocal A lignment S earch T ool Developed in 1990 and 1997 (S. Altschul) A heuristic method for performing.

A lignment Class II

El comerç del blat al Port de Maó

BLAT – The B LAST- L ike A lignment T ool Kent, W.J. Genome Res. 2002 12: 656-664 Presenter: 巨彥霖田知本.