We do not have to understand the languaje to identify patterns: “ klaatu barada nikto”

DNA, RNA and protein are an alien language ... We try to cryptographically attack this language ... we

want to decipher both its meaning and its history …

We do not have to understand the languaje to identify patterns:

“klaatu barada nikto”

Fortunate the genetic code is alphabetic … susceptible to perform string comparisons and

pattern recognition

Pairwise Sequence Alignment

• Principles of pairwise sequence comparison• global / local alignments• scoring systems• gap penalties

• Methods of pairwise sequence alignment • window-based methods• dynamic programming approaches

A TTCACATA

T A C A T T A C G T A C

Sequence 1

Sequence 2

Pairwise Sequence Alignment: How to?

Dotplot:

A T T C

T A C A T T A C G T A CSequence 1

Sequence 2

A dotplot gives an overview of all possible alignments

Dotplot:

A T T C

T A C A T T A C G T A C

A T A C A C T T A

Sequence 1

Sequence 2

One possible alignment:

In a dotplot each diagonal corresponds to a possible (ungapped) alignment

• Methods of pairwise sequence alignment • window-based methods• dynamic programming approaches

Window-based Approaches

• Word Size

• Window / Stringency

Word Size Algorithm

T A C G G T A T G

A C A G T A T C

T A C G G T A T G

A C A G T A T C

T A C G G T A T G

A C A G T A T C

T A C G G T A T G

A C A G T A T C

C T A T G A C A

T A C G G T A T G

Word Size = 3

Window / Stringency

T A C G G T A T G

T C A G T A T C

T A C G G T A T G

T C A G T A T C

T A C G G T A T G

T C A G T A T C

T A C G G T A T G

T C A G T A T C

C T A T G A CA

T A C G G T A T G

Window = 5 / Stringency = 4

Considerations

• The window/stringency method is more sensitive than the wordsize method (ambiguities are permitted).

• The smaller the window, the larger the weight of statistical (unspecific) matches.

• With large windows the sensitivity for short sequences is reduced.

• Insertions/deletions are not treated explicitly.

Insertions / Deletions in a Dotplot

T A C T G T T C A TSequence 1

Sequence 2

Hemoglobin -chain

Hemoglobin

-chain

Dotplot (Window = 130 / Stringency = 9)

Output of the programs Compare and DotPlot

Dotplot (Window = 18 / Stringency = 10)

Output of the programs Compare and DotPlot

Hemoglobin

-chain

Hemoglobin -chain

• Methods of pairwise sequence alignment • window-based approaches• dynamic programming approaches

• Needleman and Wunsch• Smith and Waterman

Automatic procedure that finds the best alignment

with an optimal score depending on the chosen parameters.

Dynamic Programming

Recursive solutions. We solve smaller problems first, and

use those solutions to solve larger problems. Intermediate

solutions are stored in a tabular matrix.

Basic principles of dynamic programming

- Initialization of alignment matrix: the scoring model

- Stepwise calculation of score values

(creation of an alignment path matrix)

- Backtracking (evaluation of the optimal path)

Initialization of Matrix (BLOSUM 50): A distance metric

H E A G A W G H E E

P -2 -1 -1 -2 -1 -4 -2 -2 -1 -1

A -2 -1 5 0 5 -3 0 -2 -1 -1

W -3 -3 -3 -3 -3 15 -3 -3 -3 -3

H 10 0 -2 -2 -2 -3 -2 10 0 0

E 0 6 -1 -3 -1 -3 -3 0 6 6

A -2 -1 5 0 5 -3 0 -2 -1 -1

E 0 6 -1 -3 -1 -3 -3 0 6 6

Needleman and Wunsch(global alignment)

Sequence 1: H E A G A W G H E ESequence 2: P A W H E A E

Scoring parameters: BLOSUM50 matrix

Gap penalty: Linear gap penalty of 8

Creation of an alignment path matrix

Idea:Build up an optimal alignment using previous solutions for

optimal alignments of smaller subsequences

• Construct matrix F indexed by i and j (one index for each sequence)

• F(i,j) is the score of the best alignment between the initial segment x1...i of x up to xi and the initial segment y1...j of y up to yj

• Build F(i,j) recursively beginning with F(0,0) = 0

HHG-WWAA

Optimal global alignment: EE

H E A G A W G H E E 0 -8 -16 -24 -32 -40 -48 -56 -64 -72 -80

P -8 -2 -9 -17 -25 -33 -42 -49 -57 -65 -73

A -16 -10 -3 -4 -12 -20 -28 -36 -44 -52 -60

W -24 -18 -11 -6 -7 -15 -5 -13 -21 -29 -37

H -32 -14 -18 -13 -8 -9 -13 -7 -3 -11 -19

E -40 -22 -8 -16 -16 -9 -12 -15 -7 3 -5

A -48 -30 -16 -3 -11 -11 -12 -12 -15 -5 2

E -56 -38 -24 -11 -6 -12 -14 -15 -12 -9 1

HEAGAWGHE-E--P-AW-HEAE

Optimal global alignment:

F(i, j) = F(i-1, j-1) + s(xi ,yj)

F(i, j) = max F(i, j) = F(i-1, j) - d

F(i, j) = F(i, j-1) - d

F(i-1, j-1) F(i, j-1)

F(i-1,j) F(i, j)

s(xi ,yj)

HEAGAWGHE-E--P-AW-HEAE

• If F(i-1,j-1), F(i-1,j) and F(i,j-1) are known we can calculate F(i,j)

• Three possibilities:

• xi and yj are aligned, F(i,j) = F(i-1,j-1) + s(xi ,yj)

• xi is aligned to a gap, F(i,j) = F(i-1,j) - d

• yj is aligned to a gap, F(i,j) = F(i,j-1) - d

• The best score up to (i,j) will be the largest of the three options

H E A G A W G H E E 0

-8 -16 -24 -32 -40 -48 -56 -64 -72 -80

F(j, 0) = -j d

Boundary conditions

F(i, 0) = -i d

H E A G A W G H E E 0 -8 -16 -24 -32 -40 -48 -56 -64 -72 -80

Stepwise calculation of score values

F(i, j) = F(i-1, j-1) + s(xi ,yj)

F(i, j) = max F(i, j) = F(i-1, j) - d

F(i, j) = F(i, j-1) - d

F(0,0) + s(xi ,yj) = 0 -2 = -2

F(1,1) = max F(0,1) - d = -8 -8= -16 = -2

F(1,0) - d = -8 -8= -16

F(1,0) + s(xi ,yj) = -8 -1 = -9

F(2,1) = max F(1,1) - d = -2 -8 = -10 = -9

F(2,0) - d = -16 -8= -24

-8 -2 = -10

F(1,2) = max -16 -8 = -24 = -10

-2 -8 = -10

-2 -1 = -3

F(2,2) = max -10 -8 = -18 = -3

-9 -8 = -17

P-H=-2

E-P=-1

H-A=-2

E-A=-1

H E A G A W G H E E 0 -8 -16 -24 -32 -40 -48 -56 -64 -72 -80

P -8 -2 -9 -17 -25 -33 -42 -49 -57 -65 -73

A -16 -10 -3 -4 -12 -20 -28 -36 -44 -52 -60

W -24 -18 -11 -6 -7 -15 -5 -13 -21 -29 -37

H -32 -14 -18 -13 -8 -9 -13 -7 -3 -11 -19

E -40 -22 -8 -16 -16 -9 -12 -15 -7 3 -5

A -48 -30 -16 -3 -11 -11 -12 -12 -15 -5 2

E -56 -38 -24 -11 -6 -12 -14 -15 -12 -9 1

Backtracking

HHG-WWAA

-8 -16

Optimal global alignment: EE

Two differences:

2. An alignment can now end anywhere in the matrix

Smith and Waterman(local alignment)

Example:Sequence 1 H E A G A W G H E ESequence 2 P A W H E A E

Scoring parameters: Log-odds ratiosGap penalty: Linear gap penalty of 8

F(i, j) = F(i-1, j-1) + s(xi ,yj)

F(i, j) = F(i-1, j) - d

F(i, j) = F(i, j-1) - d

F(i, j) = max

H E A G A W G H E E 0 0 0 0 0 0 0 0 0 0 0

P 0 0 0 0 0 0 0 0 0 0 0

A 0 0 0 5 0 5 0 0 0 0 0

W 0 0 0 0 2 0 20 12 4 0 0

H 0 10 2 0 0 0 12 18 22 14 6

E 0 2 16 8 0 0 4 10 18 28 20

A 0 0 8 21 13 5 0 4 10 20 27

E 0 0 6 13 18 12 4 0 4 16 26

Smith Waterman alignment

Optimal local alignment: AA

Extended Smith & Waterman

To get multiple local alignments:• delete regions around best path

• repeat backtracking

H E A G A W G H E E 0 0 0 0 0 0 0 0 0 0 0

P 0 0 0 0 0 0 0 0 0

A 0 0 0 5 0 0 0 0 0 0

W 0 0 0 0 2 0 0 0

H 0 10 2 0 0 0

E 0 2 16 8 0 0

A 0 0 8 21 13 5 0

E 0 0 6 13 18 12 4 0

20 12 4

12 18 22 14 6

4 10 18 28 20

4 10 20 27

4 16 26

H E A G A W G H E E 0 0 0 0 0 0 0 0 0 0 0

P 0 0 0 0 0 0 0 0 0 0

A 0 0 0 5 0 0 0 0 0 0

W 0 0 0 0 2 0 0 0

H 0 10 2 0 0 0

E 0 2 16 8 0 0

A 0 0 8 21 13 5 0

E 0 0 6 13 18 12 4 0

Second best local alignment:

HHEEAA

Further Extensions of Dynamic Programming

• Overlap matches

• Alignment with affine gap scores

• Pairwise sequence comparison• global / local alignments• parameters• scoring systems• insertions / deletions

• Methods of pairwise sequence alignment • dotplot• windows-based methods• dynamic programming• algorithm complexity

Methods of Pairwise Comparison

Multiple AlignmentProgressive Alignment:

Progressive Alignment:

Programs perform global alignments:

• Needleman & Wunsch: (Pileup, Tree, Clustal)

• Word Size Method: (Clustal)

• X. Huang (MAlign) (modified N-W)

Construction of a Guide Tree

1 2 3 4 5

Sequence

Similarity Matrix:

displays scores ofall sequence pairs.

The similarity matrix is transformed into a distance matrix . . . . .

Construction of a Guide Tree

DistanceMatrix

Guide Tree

Neighbour-Joining Method or

UPGMA (unweighted pair group method of arithmetic averages)

Multiple Alignment

Guide Tree

T T A C T T C C A G G

Columns - once aligned - are never changed

G T C C G - - C A G G

T T - C G C - C - G G

G T C C G - C A G G

T T - C G C C - G G

T T - C G C - C - G G

G T C C G - C A G G

T T - C G C C - G G

. . . . and new gaps are inserted.

step3.

T T - C G C - C - G G

A T C - T - - C A A T

C T G - T C C C T A G

A T C T - - C A A T

C T G T C C C T A G

T T - C G C - C - G G

Sub-sequence alignments

A K-means like clustering problem

Clustering resulting model

Clustering predictions

Assignments

•Describe a pairwise alignment with a different gap penalization.

•Provide an example and perform a multiple global alignment. Describe the recipe.

•Provide an example and and perform a multiple alignment of subsequences. Describe the recipe.

•Algorithms Order (polynomial, exponential, NP)

Algorithmic Complexity

How does an algorithm‘s performance in CPU time and required memory storage scale with the size of the problem?

Needleman & Wunsch

• Storing (n+1)x(m+1) numbers

• Each number costs a constant number of calculations to compute (three sums and a max)

• Algorithm takes O(nm) memory and O(nm) time

• Since n and m are usually comparable: O(n2)

We do not have to understand the languaje to identify patterns: “ klaatu barada nikto”

Documents

Transcript of We do not have to understand the languaje to identify patterns: “ klaatu barada nikto”

KLAATU BARADA NIKTO PROMISE POWER PERSUASIONyoungrhetoriciansconference.com/wp-content/uploads/2016/05/YRC_… · Young Rhetoricians’ Conference (YRC) 2016 ! 1! YOUNG RHETORICIANS’

47 o ouro de klaatu

My luxury treatment. Moja bavlnka. Nikto mi nič nedaroval, ani som nič nezdedila.

312 o ouro de klaatu

Annette Camacho Rivera, et. al. - Rama Judicial · 2019-04-11 · Navarro Pizarro),1 Jean C. Barada Santiago (joven Barada Santiago), Giovanny Rosa Báez (joven Rosa Báez)2 y a Kevin

Nikto nie je na predaj - brožúra Ministerstva vnútra SR

Jeden za všetkých, nikto za jedného | Čo chýba a čo pomáha odvážnym ľuďom, ktorí nahlasujú nekalosti

Barada Fall/Winter 2013/14 Collection

Mechanisms for Social Stability (MSS)reliefweb.int/sites/reliefweb.int/files/resources/PeaceBuilding... · Assoun Nabatiyé El-Faouka Lebaa Addoussiat Haouche Barada Al-Ali-Nahri

10. Byomkesh o Barada

ZARIADENIE SOCIÁLNYCH SLUŽIEB TOPOĽČANY - PRAŠICE MÔJ … · Bohom požehnané vianočné sviatky a aby počas nich nikto nebol sám. Atmosféra bola veľmi príjemná a zúčastnené

· Slid mohol ntistrcýom pcichatela, precltÝm nikto nepomvslel. Dnes však už nuime do ðinenia s takými podstatami. A prípad Puttjarken nebude jecliný.

Testowanie bezpieczeństwa aplikacji internetowych. Recepturypdf.helion.pl/tebeap/tebeap-4.pdf · 2009-04-23 · 6.11. Używanie programu Nikto z uwierzytelnianiem 143 6.12. Uruchamianie

PREČO NIKTO NECHCEL BYŤ DRUHÝ? KDE SA DVAJA BIJÚ, (I)filatelie-klim.com/PDF/merkur_revue/preco nikto nechcel byt druhy.pdf · dy objavili ľudské stopy. Ako prvý na jej pevninský

AST101 Lecture 18 Kang, Klingons, and Klaatu. What is Life? And will you know it when you see it?

Corporate Sustainability IMPLEMENTING DOW JONES SUSTAINABILITY INDEX FOR EMERGING MARKETS GROUP7: BARADA PS, JASBIR L, MARK E, PANKAJ J, RAVI M, SHOVIT.

Oleh - core.ac.ukKedudukan tumbuhan meranti kuning dalam sistematika tumbuhan yaitu tumbuhan meranti kuning barada dalam divisi Magnoliophyta dengan sub divisi Angiospermae. Termasuk

Miho Barada, Topografija Porfirogenetove Paganije, SHP I, Zagreb-Knin 1928, 37-54.

2010 17 ويلوي · 2020. 1. 17. · ةلضفملا روملأا ضعب اضيأ كانھ. DirBuster و ،Nikto .يلاتلا رادصلإا يف WTE ىلإ فاضتس يتلا

BAB I PENDAHULUAN Latar Belakang Masalahrepository.upi.edu/22482/4/S_KTP_0906070_Chapter1.pdfbulan februari – Agustus 2014 yaitu barada diangka 9,55% untuk SMA dan11,24% meski pada