Guy Grebla1 Allegro, A new computer program for linkage analysis Guy Grebla.

34
Guy Grebla 1 Allegro, A new computer program for linkage analysis Guy Grebla

description

Guy Grebla 3 What is Allegro Allegro is based on Genehunter. Allegro runs faster than Genehunter due to algorithmic improvements.

Transcript of Guy Grebla1 Allegro, A new computer program for linkage analysis Guy Grebla.

Page 1: Guy Grebla1 Allegro, A new computer program for linkage analysis Guy Grebla.

Guy Grebla 1

Allegro,A new computer program for linkage analysisGuy Grebla

Page 2: Guy Grebla1 Allegro, A new computer program for linkage analysis Guy Grebla.

Guy Grebla 2

Overview

What is Allegro Allegro vs. Genehunter Reduced inheritance vectors Founder couple reduction Fast tree traversal

Formalization Calculation of Spairs

Single locus probability calculation (if time permits)

Page 3: Guy Grebla1 Allegro, A new computer program for linkage analysis Guy Grebla.

Guy Grebla 3

What is Allegro

Allegro is based on Genehunter.

Allegro runs faster than Genehunter due to algorithmic improvements.

Page 4: Guy Grebla1 Allegro, A new computer program for linkage analysis Guy Grebla.

Guy Grebla 4

Allegro vs. Genehunter(1)

Allegro runs much faster than Genehunter, typically the speedup is 20-40 fold, and in many cases as high as 100 fold.

If necessary, Allegro is capable, at a cost of 10-30% in run time, to cut down the memory requirements by a factor of 20-60 compared with Genehunter.

Page 5: Guy Grebla1 Allegro, A new computer program for linkage analysis Guy Grebla.

Guy Grebla 5

Allegro vs. Genehunter(2)

Recall that the time complexity of Genehunter is exponential in the pedigree’s size, therefore it is infeasible to run Genehunter with large pedigree’s size.

Due to the algorithmic improvements, Allegro is capable of handling significantly larger pedigrees (even though its time complexity is still exponential in the pedigree’s size).

Page 6: Guy Grebla1 Allegro, A new computer program for linkage analysis Guy Grebla.

Guy Grebla 6

Reduced inheritance vectors – the idea The idea is based on symmetry that exists

between the two alleles of a founder.

1 0

V=(0,1,1,0)

0 1

0 0

V=(1,1,0,0)

1 1

n1 n2

Page 7: Guy Grebla1 Allegro, A new computer program for linkage analysis Guy Grebla.

Guy Grebla 7

Reduced inheritance vectors

For male (female) founder, the corresponding paternal (maternal) bit of his (her) first child is set to 0 and not expressed in the reduced vector (it is called hidden).

Result: let m be the number of non-founders, f the number of founders, the vector size is reduced to 2m-f

Page 8: Guy Grebla1 Allegro, A new computer program for linkage analysis Guy Grebla.

Guy Grebla 8

Reduced inheritance vectors (Cont.)n1 n2 n3 n4

n5 n6

a / b

[0 0]

a / b

1 1

a / b

1] 0[

a / c

b / c0 1

І

ІІ

ІІІ

Page 9: Guy Grebla1 Allegro, A new computer program for linkage analysis Guy Grebla.

Guy Grebla 9

Founder couple reduction

Consider a couple of founders which: Have at least one grandchild Both not genotyped Aren’t married twice

Page 10: Guy Grebla1 Allegro, A new computer program for linkage analysis Guy Grebla.

Guy Grebla 10

Founder couple reduction (Cont.) v* is like v but :

Invert the corresponding bit of each of the grandchildren.

The paternal and maternal bit of each child are switched

n1 n2 n3 n4

1] 0[

a / c

0 1

Corresponding bit

v and v* has the same probability

Page 11: Guy Grebla1 Allegro, A new computer program for linkage analysis Guy Grebla.

Guy Grebla 11

Founder couple reduction - results With the founder couple reduction, the

effective number of bits is 2m-f-c where c is the number of founder couples satisfying the stated conditions.

Therefore, we’ve improved by a factor of 2c over the previous reduction.

Page 12: Guy Grebla1 Allegro, A new computer program for linkage analysis Guy Grebla.

Guy Grebla 12

Fast tree traversal

The basic structure of the algorithms implemented in the Genehunter program loops over inheritance vectors in the outermost loop and over people in the pedigree in an inner loop

Drawback: for vectors that only differ for branches of the pedigree, part of the calculation will be duplicated.

Page 13: Guy Grebla1 Allegro, A new computer program for linkage analysis Guy Grebla.

Guy Grebla 13

Fast tree traversal (Cont.)

Idea: changing the order of looping to avoid the repeated calculations.

Page 14: Guy Grebla1 Allegro, A new computer program for linkage analysis Guy Grebla.

Guy Grebla 14

Fast tree traversal – naïve example Say we want to calculate for each vector v of

length n, the number of 1’s in v.

“Genehunter” method: for each vector calculate the number of 1’s.(add each bit of the vector to the sum)

“Allegro” method: pass the vectors and save calculations along the way.

Page 15: Guy Grebla1 Allegro, A new computer program for linkage analysis Guy Grebla.

Guy Grebla 15

naïve example – Allegro method

0

0

0 1

1

1 2

0

0 1 0 1

1

Less additions!

Page 16: Guy Grebla1 Allegro, A new computer program for linkage analysis Guy Grebla.

Guy Grebla 16

Fast tree traversal - formalization For each inheritance vector v, S(v) is known. We traverse the pedigree from the top down. When a child is born:

If it has i hidden bits – 22-i possibilities for its bits For each possibility the inheritance vector is appropriately

updated and the branch is descended We add a bit b to update vector v to v+ D(v) is a collection of data N=22m-f - number of possible inheritance vectors

Page 17: Guy Grebla1 Allegro, A new computer program for linkage analysis Guy Grebla.

Guy Grebla 17

Fast tree traversal - formalization(2)

Recursive algorithm:

addbit(v, D, b):for b = 0, 1 do

set v+ = (v,b) and calculate D+ = D(v+)if there are more bits, addbit(v+,D+, next bit) ,

else D+ contains data for s(v+)If the calculation of D+ and s are both O(1) then the total time complexity of the calculation is O(N)

Page 18: Guy Grebla1 Allegro, A new computer program for linkage analysis Guy Grebla.

Guy Grebla 18

Example – calculation of Spairs

Øij(p,q)= 1 if allele i of p and allele j of q are IBD and 0 otherwise

Spq(v) = ∑1i=0∑1

j=0Øij(p,q)

Spairs(v) = ∑(p,q) is a pair of affecteds Spq(v)

ki- the number of times founder allele i turns up among the affected.

s – the value of Spairs for the traversed portion

D = (s,k1,k2,…,k2f)

Page 19: Guy Grebla1 Allegro, A new computer program for linkage analysis Guy Grebla.

Guy Grebla 19

Example (Cont.)

When an unaffected person is added, do nothing (s+=s, ki

+=ki , kj

+=kj) When an affected person is added, perform:

s+ s + ki + kj

ki+ ki + 1

kj+ kj + 1

Page 20: Guy Grebla1 Allegro, A new computer program for linkage analysis Guy Grebla.

Guy Grebla 20

Example (Cont.)n1 n2 n3 n4

n5 n6

a / b

[0 0]

a / b

1 1

a / b

1] 0[

a / c

b / c

V=(0,1,1,1,1)

Init (no vector bits)

s=1, k1=1, k3=2, k4=1

ІІІ1 is added

s=2, k1=1, k3=2, k4=2, k5=1

ІІІ2 is added

s=4, k1=1, k3=2, k4=3, k5=1,k6=1

0 1

І

ІІ

ІІІ

Page 21: Guy Grebla1 Allegro, A new computer program for linkage analysis Guy Grebla.

Guy Grebla 21

Spairs calculation – Genehunter vs. Allegro Genehunter calculates Spairs by calculating Spq

for each affected pair, and add it to Spairs

This process requires O(Nα2) where α is the number of affected.

We saved a factor of α2 (!)

Page 22: Guy Grebla1 Allegro, A new computer program for linkage analysis Guy Grebla.

Guy Grebla 22

Additional improvements

Allegro use FFT for matrices multiplication, some classical computational techniques have been used to speed the FFT by a factor of three or four.

Page 23: Guy Grebla1 Allegro, A new computer program for linkage analysis Guy Grebla.

Guy Grebla 23

References

“Fast multipoint linkage analysis and the program Allegro”, Daniel F.Gudbjartsson, Kristjan Jonasson, Michael L.Frigge, Augustine Kong

"Allegro, a new computer program for linkage analysis,"Gudbjartsson DF, Jonasson K, Frigge ML, Kong A. Nat Genet. 2000 May;25(1):12-3.

Page 24: Guy Grebla1 Allegro, A new computer program for linkage analysis Guy Grebla.

Guy Grebla 24

BACKUP

Page 25: Guy Grebla1 Allegro, A new computer program for linkage analysis Guy Grebla.

Guy Grebla 25

Single locus probability calculation Goal: compute Pr[ml | vl], at locus l for every

vector vl

marker data at this locus (evidence).

A certain inheritance vector.

Page 26: Guy Grebla1 Allegro, A new computer program for linkage analysis Guy Grebla.

Guy Grebla 26

Single locus probability calculation(Cont.) In general: p(ml | vl) = ∑aєP∏2f

i=1p(ai)where P is the set of possible allele assignments a=(a1,…a2f) to (n1,…,n2f)

This probability may be calculated for each vl using Fast tree traversal.

Denote p(ml | vl) as q(v)

Page 27: Guy Grebla1 Allegro, A new computer program for linkage analysis Guy Grebla.

Guy Grebla 27

Single locus probability - notations

n1 n2 n3 n4

n5 n6

a / b

[0 0]

a / b

1 1

a / b

1] 0[

a / c

b / c0 1

І

ІІ

ІІІ

Founder nodes

Assume our founder nodes are numbered, node ni is numbered i

Page 28: Guy Grebla1 Allegro, A new computer program for linkage analysis Guy Grebla.

Guy Grebla 28

Single locus probability – notations(2) Founder nodes are classified to 3 disjoint sets:

A – assigned nodes. E – contains edges – each edge is labeled with 2

distinct alleles. U – unassigned nodes.

ai – allele assigned to i (i єA)

Page 29: Guy Grebla1 Allegro, A new computer program for linkage analysis Guy Grebla.

Guy Grebla 29

Single locus probability - initialization Init:

E nodes of genotyped founders (edges). U rest of the founder nodes. A nil (empty) q(v) 0

Goal: build a founder graph. From the graph we can calculate q(v)

Page 30: Guy Grebla1 Allegro, A new computer program for linkage analysis Guy Grebla.

Guy Grebla 30

Single locus probability – algorithm When a person genotyped a / b is added:

The value of v (so far) determines the sources of the alleles of the person among the founders.

Denote the corresponding founders by i and j, and consider the edge (i,j).

Page 31: Guy Grebla1 Allegro, A new computer program for linkage analysis Guy Grebla.

Guy Grebla 31

Single locus probability – algorithm (2) 6 options for edge (i,j):

123

4

5

6

A U

E

ii

i

i

i

j

j

j

j

j

i

j

Page 32: Guy Grebla1 Allegro, A new computer program for linkage analysis Guy Grebla.

Guy Grebla 32

Single locus probability – case by case Case 1:

Put (i,j) in E, remove i,j from U

Case 2: check whether {a,b} = {ai,aj}

Case 3: Check if ai is one of a and b, and if it

is, assign the other to aj , and move j from U to A

Page 33: Guy Grebla1 Allegro, A new computer program for linkage analysis Guy Grebla.

Guy Grebla 33

Single locus probability – case by case(2) Case 4:

Check if ai is one of a and b Check if the other one is consistent

with the labeling of an edge (j,k) in E and if it’s consistent force the assignment

Cases 5,6: May need another loop. Set ai=a, aj=b, check and handle

consistency Set ai=b, aj=a, check and handle

consistency

Page 34: Guy Grebla1 Allegro, A new computer program for linkage analysis Guy Grebla.

Guy Grebla 34

Single locus probability – algorithm(3) After the last bit of the vector was added, for

the probability calculation a product over the edges in E is needed:

Let (ae,be)єE

q(v) is updated by adding to it:∏i єA ∏ e єE2p(ae)p(be)