Towards optimal distance functions for stochastic substitution models

Towards optimal distance functionsfor stochastic substitution models

Ilan Gronau, Shlomo Moran, Irad YavnehTechnion, Israel

PreviewThe

Phylogenetic Reconstrutction

Problem

AATCCTG

ATAGCTGAATGGGC

GAACGTA

AAACCGA

ACGGTCA

ACGGATA

ACGGGTA

ACCCGTG

ACCGTTG

TCTGGTA

TCTGGGA

TCCGGAA AGCCGTG

GGGGATT

AAAGTCA

AAAGGCG AAACACAAAAGCTG

Evolution is modeled by a Tree

(All our sequences are DNA sequences, consisting of {A,G,C,T})

AATCCTG

ATAGCTGAATGGGC

GAACGTA

AAACCGAACCGTTGTCTGGGA

TCCGGAA AGCCGTG

GGGGATT

Phylogenetic Reconstruction

B : AATCCTG

C : ATAGCTG

A : AATGGGC

D : GAACGTAE : AAACCGA

J : ACCGTTG

G : TCTGGGAH : TCCGGAA

I : AGCCGTG

F : GGGGATT

Goal: reconstruct the ‘true’ tree as accurately as possible

reconstruct

(root)

Road Map • Distance based reconstruction algorithms• The Kimura 2 Parameter (K2P) Model• Performance of distance methods in the K2P model• Substitution models and substitution rate functions• Properties of SR functions• Unified Substitutions Models• Optimizing Distances in the K2P model• Simulation results

edge-weighted ‘true’ tree reconstructed tree

reconstruction

ˆˆ ( , )u v S

D d u v

3 0.32 2

Challange: minimize the effect of noiseIntroduced by the sampling

Distance Based Phylogenetic Reconstruction:Exact vs. Noisy distances

Estimated distances

( , )u v S

D d u v

Exact (additive) distances

Between species

Distance estimationusing

finite Sampling

Road Map • Distance based reconstruction algorithms

• The Kimura 2 Parameter (K2P) Model• Performance of known distance methods in the K2P model• Substitution models and substitution rate functions• Properties of SR functions• Unified Substitutions Models• Optimizing Distances in the K2P model• Simulation results

The Kimura 2 Parameter )K2P( model [Kimura80]:each edge corresponds to a “Rate Matrix”

{ }A G

{ }C T

Transitions

Transversions

Transitions

Transitions/transversions ratio = / 2 1R

-αββT

α-ββC

ββ-αG

ββα-A

-αββT

α-ββC

ββ-αG

ββα-A

K2P generic rate matrixu

K2P standard distance: Δtotal = Total substitution rate

The total substitution rate of a K2P rate matrix R is

This is the expected number of mutations per site. It is an additive distance.

1( ) 2 sum of off-diagonal entries of 4total uv uvR R

α + 2β α’ + 2β’

(α+α’) + 2(β+ β’)

Estimation of Δtotal(Ruv) = dK2P(u,v) is a noisy stochastic process

u A A C A … G T C T T C G A G G C C C

v A G C A … G C C T A T G C G A C C T

2ˆ ˆˆ( , ) 2K Pd u v

K2P total rate“distance correction”

procedure

Road Map • Distance based reconstruction algorithms• The Kimura 2 Parameter (K2P) Model

• Performance of distance methods in the K2P model• Substitution models and substitution rate functions• Properties of SR functions• Unified Substitutions Models• Optimizing Distances in the K2P model• Simulation results

Check performance of K2P “standard” distances in resolving quartet-splits

• Distance methods reconstruct the true split by 4-point

condition:

There are 3 possible quartet topologies:

The 4-point condition for noisy distances is:

2 2 2 2 2 2( , ) ( , ) min ( , ) ( , ) , ( , ) ( , )K P K P K P K P K P K Pd d d d d d A B C D A C B D A D B C

2 2 2 2 2 2( , ) ( , ) ( , ) ( , ) ( , ) ( , )2K P K P K P K P K Pse K Ppd d dwd d d A B C D A C B D A D B C

We evaluate the accuracy of the K2P distance estimation

by Split Resolution Test:

-αββC

α-ββT

ββ-αG

ββα-A

-αββC

α-ββT

ββ-αG

ββα-A

10t 10t10t

t-αββC

α-ββT

ββ-αG

ββα-A

-αββC

α-ββT

ββ-αG

ββα-A

-αββC

α-ββT

ββ-αG

ββα-A

-αββC

α-ββT

ββ-αG

ββα-A

-αββC

α-ββT

ββ-αG

ββα-A

-αββC

α-ββT

ββ-αG

ββα-A

-αββC

α-ββT

ββ-αG

ββα-A

-αββC

α-ββT

ββ-αG

ββα-A

-αββC

α-ββT

ββ-αG

ββα-A

-αββC

α-ββT

ββ-αG

ββα-A

-αββC

α-ββT

ββ-αG

ββα-A

-αββC

α-ββT

ββ-αG

ββα-A

t is “evolutionary time”

The diameter of the quartet is 22t

Phase A: simulate evolution

CCCGGAGCTTCTG…ACAA CCCGGAGCTTCTG…ACAA

CCCGGAGCTTCTG…ACAA CCCGGAGCTTCTG…ACAACCCGGAGCTTCTG…ACAA CCCGGAGCTTCTG…ACAA

Phase B: reconstruct the split by the 4p condition

DCCCGGAGCTTCTG…ACAA CCCGGAGCTTCTG…ACAA

CCCCGGAGCTTCTG…ACAA CCCGGAGCTTCTG…ACAA

BCCCGGAGCTTCTG…ACAA CCCGGAGCTTCTG…ACAA

ACCCGGAGCTTCTG…ACAA CCCGGAGCTTCTG…ACAA

÷÷÷÷÷÷÷÷

øçççççççç

2ˆˆ ( , ) ( , )K P i jD i j d s s

Apply the 4p condition.

Was the correct split found?

estimate distances between sequences,

Repeat this process 10,000 times,

count number of failures

10t 10t 10t

the split resolution test was applied on the model quartet with various diameters

For each diameter, mark the fraction (percentage) of the

simulations in which the 4p condition failed (next slide)

10t 10t 10t

t … …

0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.20

quartet diameter )total rate between furthest leaves(

tsperformance of K2P standard distance method in resolving quartets, R=10

Performance of K2P distances in resolving quartets, small diameters: 0.01-0.2

10t 10t 10t

Templatequartet

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

quartet diameter (=mutations rate between furthest leaves)

000 si

nsperformance of K2P standard distance method in resolving quartets,

For quartet ratio 0.1, R=10

Performance for larger diameters

“site saturation”

{ }A G

{ }C T

Transitions

Transversions

Transitions

When β < α, we can postpone the “site saturation” effect. For this, use another distance function for the same model, Δtv , which counts only transversions:

This is actually the CFN model

[Cavendar78, Farris73, Neymann71]

Apply the same split resolution test on the transversions only distance:

ˆ ˆ( , )trd u v

Transversions onlyDistance correction

procedure

transversions only performs better on large, worse on small rates

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 20

quartet diameter

s out o

performance of distance methods in resolving quartets, R=10

Transversions only

total K2P rate

210 61

Conclusion: Distance based reconstruction methods should be

adaptive:

Find a distance function d which is good for the input ÷

÷÷÷÷÷÷÷

çççççççç

= ˆˆ ( , ) ( , )D u v d u vD

We do a small step in this direction:

Input: An alignment of the sequences at u, v.

Output: a )near(-optimal distance function, which minimizes the

expected noise in the estimation procedure.

Example: An adaptive distance method (max-optimal)

based on this talk:

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 20

quartet diameter

s out of 10

max-optimal

stanard K2Ptrasversions only

Road Map • Distance based reconstruction algorithms• The Kimura 2 Parameter (K2P) Model• Performance of distance methods in the K2P model

• Substitution models and Substitution Rate functions• Properties of SR functions• Unified Substitutions Models• Optimizing Distances in the K2P model• Simulation results

Steps in finding optimal distance functions:1. Define substitution model.

2. Characterize the available distance functions.

3. Select a function which is optimal for the input

sequences.

least sensitive to stochastic noise

From Rate matrices to Substitution matrices

A A C A … G T C T T C G A G G C C Cu

-αββT

α-ββC

ββ-αG

ββα-A

-αββT

α-ββC

ββ-αG

ββα-A

Rate matrices imply stochastic substitution matrices:

p p1 2 p p

1 2 p p

uvP uvR

Evolution of a finite sequence by unknown model parameters α, β

A stochastic substitution matrix Puv

A substitution model M : A set of stochastic substitution matrices, closed under matrix product:

P,Q∈ M ⇒ PQ ∈ M

uw uv vwP P P

Motivation tothe definition:

Also requiredP>0, 0<det(P)<1

for all P∈M

Uniform distribution

Model tree over M =<Tree Topology> +

P.. P..

P.. P.. P..

Distances for a given model are defined by

Substitution Rate functions:

Δ:M is an SR function for ℝ M iff for all P,Q in M:

1. Δ(PQ) = Δ(P)+ Δ(Q) (additivity)

2. Δ(P)>0 (positivity)

Road Map • Distance based reconstruction algorithms• The Kimura 2 Parameter (K2P) Model• Performance of distance methods in the K2P model• Substitution models and substitution rate functions

• Properties of SR functions• Unified Substitutions Models• Optimizing Distances in the K2P model• Simulation results

1st question:Given a model M, what are its SR functions? X

additive

SR functions are additive functions which are strictly

positive

Example 1: The logdet function [Lake94, Steel93] is an SR function for the most general model, Muniv :

Muniv= {P: P is a stochastic 4╳4 matrix, 0<det(P)<1}.

logdetThe function ( ) ln(det( ))

additive functionis an for .univ

logdetThe function ( , ) ln(det(

SR fun

is an for .ction

d u v P

Example 2: The log eigenvalue function

Assume a model with the following property:

There is a vector which is an eigenvector

The function

is an additive function for . [e.g. Gu&L

( ) ln(| ( ) |)

i.e., PPv v

Both “logdet” and the “log eigenvalue” functions are special cases of a general technique:

Generalized logdet which is given below:

Definition: Let be a 4 by 4 matrix.

A subspace of R is -invariant if

If is invariant, then defines a linear transformation on .

det( | ) is the determinant of this linear transformationH

H P PH H

H P P H

(Generalized LogDet)Lemma GLD :

If is -invariant for all , then

ln(| det( | ) |)

is an additive function for .

( ) HH

Linearity of additive functions:

1. If Δ1 and Δ2 are additive functions for M, so is c1 Δ1 + c2 Δ2

The set of additive functions for M forms a vector space, to be denoted ADM.

Dimension(ADM) is the dimension of this vector space.Large dimension implies more “independent” distance functions

If dimension(ADM ) = 1, then M admits a single distance function (up to product by scalar). Selecting best SR function in such a model is trivial. Thus, the adaptive approach is useful only when dimension(ADM ) > 1.

Road Map • Distance based reconstruction algorithms• The Kimura 2 Parameter (K2P) Model• Performance of distance methods in the K2P model• Substitution models and substitution rate functions• Properties of SR functions

• Unified Substitutions Models: Models which the

adaptive approach is potentially useful.• Optimizing Distances in the K2P model• Simulation results

Unified Substitution Models:

U-1 PU = λ3(P)000

?λ2(P)00

??λ1(P)0

λ3(P)000

?λ2(P)00

??λ1(P)0

Def: A model M is unified if there is a matrix U s.t. for each P∈M it holds that:

Thm: if is unified,

then for each 3 constants , , , the function

( ) ln(| ( ) |)

is an additive function for

Using Lemma GLD, we have:

Strongly Unified Substitution Models

U-1 PU =

Def: A model M is strongly unified if there is a matrix U s.t. for each P∈M it holds that:

Thm: if is strongly unified,

then the additive functions of

are of the form

( ) ln( ( ))i ii

00λ1 (P)0

λ2 (P)

λ3 (P)

A simple strongly unified model: The Jukes Cantor model [1969]

For all P∈ MJC , U-1 PU =

:0< p <0.25

MJC is strongly unified by U=

1 1 12 22

1-3ppppC

p1-3pppT

pp1-3ppG

ppp1-3pA

1-3ppppC

p1-3pppT

pp1-3ppG

ppp1-3pA

1 4P p

00λp0

Claim dimension(ADMJC)=1

Hence the adaptive approach is irrelevant to this model.

Another model M for which dimension(ADM)=1

Recall: Muniv consists of all DNA transition matrices.

Claim 2: dimension(ADMuniv) = 1

This means that all the additive functions of Muniv are

proportional to logdet.

Hence the adaptive approach is irrelevant also to this model.

Luckily, the additive functions of “intermediate” unified models have dimensions > 1, hence the adaptive approach is useful for them.Next we return to the Kimura 2 parameter model.

Back to K2P: For every K2P Substitution Matrix P:

1 0 0 0

0 λP 0 0

0 0 μP 0

0 0 0 μP

Where:λP = 1 - 4Pβ = e-4β

μP = 1 - 2Pβ - 2Pα= e-2α-2β

U-1 PU =

p p1 2 p p

1 2 p p

0 < λP <10 < μP < 1

Conclusion: dimension(ADMK2P )=2.

U of the JC model

The functions:Δλ(P)= -ln(λP) , Δμ (P)=-ln(μP)

Form a basis of ADK2P

Each positive function of the form:

( ) ln( ) ln( )

is an SR function for the K2P model

P PP c c

The standard “total rate” distance is:

ΔK2P(P)=-(ln(λP)+2ln(μP))/4=-Δlogdet(P)/4.

The “transversion only” distance is:

Δtr(P)=-ln(λP )/4.

Road Map • Distance based reconstruction algorithms• The Kimura 2 Parameter (K2P) Model• Performance of distance methods in the K2P model• Substitution models and substitution rate functions• Properties of SR functions• Unified Substitutions Models

• Optimizing Distances in the K2P model• Simulation results

ˆˆ ˆCompute ( ) ln( ) ln( ),

an estimation of ( ) ln( ) ln( ).uv

K2P distance estimation: where the noise comes from

ˆ ˆ ˆˆCompute ( ), ( ),

estimations of ( ), ( ).uv uv

inherent noise

implied noise propagation

“user controlled” noise propagation

ˆCompute , an estimation of uv uvP P

Given , we look for , such that:

( , ) ( ) ln( ) ln( )

has a small expected relative error.uv uv

uv P P

d u v P c c

Selection of c1, c2

True distance

Expected error

ated distance+ =

Expected Relative Error True distance

Expected error

Minimizing the expected relative error

Let ( , ) ( ) be the exact distance

ˆ ˆ ˆ( , ) ( ) is the estimated (stochastic) distance.

We would like to minimize the "Normalized Mean Square Error":

ˆ ( )

d d u v P

NMSE d

ˆIn the plots we use NRMSE=

The NMSE of a distance function:

ˆˆ ˆ ( ) ln( )+c ln( )

Depends only on the ratio

This means that equivalent SR functions have

the same NMSE

A basic property of Normalized Mean Square Error:

A Proper Disclosure on our optimal functions:

Since ln( ) is non-linear, we only find which minimizes the NMSE

ˆ of a of (usinlinear ap g the "deproxim lta mea thod")on .ti

and the optimal for a K2P matrix is:

st1 term in the Taylor

expansion of

Hence, our approximation is imprecise when some

of the (true) Eigenvalue are very smalls

Relation between c and SR functions:

Function name Function c c/(1+c)

Total rate (logdet) -ln)λP(-2ln)μP( 1/2 1/3

Transversions only -ln)λP( ∞ 1

13As grows from to 1, the optimal rate function

is gradually changed from to total rate transversions only

0 0.5 1 1.5 2 2.5 30

total substitution rate

C2) α=20β

Optimal values of copt /(1+copt) for ti/tv ratio = 10

As the rate grows, the relative weight of the “transversion” coefficient increases

0 0.5 1 1.5 2 2.5 30

C2) α=2β

α=4βα=20β

Optimal values of c1/(c1 +c2) for various transitions/transversion rates

α>>β,rate>2

α=200β

0 0.5 1 1.5 2 2.50

0.6R = 2

Expected Relative error of various distance functions: theoretical prediction

Total rate

transversions

optimal

Road Map • Distance based reconstruction algorithms• The Kimura 2 Parameter (K2P) Model• Performance of distance methods in the K2P model• Substitution models and substitution rate functions• Properties of SR functions• Unified Substitutions Models• Optimizing Distances in the K2P model

• Simulation results

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 20

standard formula )C = 0.5(

'transversions only' )C = (actually used SR functions

predicted error for standard formula

predicted error for 'transversions only'predicted error for optimal SR function

Expected Relative error of various distance functions: simulations

Total rate

Transversions only

optimal

“small eigenvaluedistortion”

Back to the K2P quartet resolution

A heuristic distance method )max-optimal( based on this talk:

Select a distance function which is optimal w.r.t. the largest of the six observed distances of the quartet )ie, largest copt(.

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 20

quartet diameter

Recall the performance of the two known distance function on the “template quartet”

When α≠β, the suggested heuristic performs better than both known methods.

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 20

quartet diameter

s out of 10

max-optimal

stanard K2Ptrasversions only

Summary• Adaptive approach to distance based reconstructions: adjust

distance function to input sequences.• Distance functions for stochastic evolutionary models are defined by

SR functions.• SR functions can be constructed by Generalized Logdet.• When the dimension of the space of SR functions is greater than 1,

the adaptive approach is applicable.• The adaptive approach is applicible to non-trivial unified models.• Most common models are unified.• An analysis of the simplest non-trivial unified model - K2P - shows

a significant improvements in the accuracy of the adaptive

approach.

Further Research Prove/Disprove: For any substitution model M, all the additive functions of

M are GLD functions. In the K2P model:

Define&find optimal SR functions for: two distances, quartets, general trees.

Find optimal SR functions for non-homogenous model trees Find optimal SR functions to variable rates cross sites.

Find optimal SR functions for more general evolutionary models (Tamura Nei) (analytic/heuristic methods)

Empirical/analytical study of “plugging” adaptive distances in common reconstruction algorithms (eg NJ).

Study improvement in performance on real biological data. Devise algorithms which use distance-vectors

Further research questions• We have infinitely many additive distance functions for

the K2P model.• Which one should we use for reconstructing the tree?• If we have the exact substitution matrices for all pairs of

taxa, then all functions are equally good.• But we have only finite sequences,

whose alignments provide only estimations of the true substitution matrices

Distances are defined by Substitution Rate functions

For each tree path u — v—w It holds that D(u,v)+D(v,w)=D(u,w).D(u,v)

D(v,w)

D(u,w)= D(u,v)+D(v,w)

Part 3.1:

Substitution modelsto

Additive distances

The aligned sequences provide for each pair of DNA letters,say A and G, how many times A was mutated to GThis defines a joint distribution matrix F

Aligned Sequences joint distribution matrices

A G T C

A 0.2 0.05 0.01 0.02

G 0.02 0.25 0.01 0.01

T 0.02 0.01 0.16 0.02

C 0.01 0.01 0.01 0.2

A is aligned with GIn 5% of the pairs

Joint Distribution matrices are converted to distances by Substitution models.These models describe how DNA sequences are transformed during the evolution. The tool used for this is called “Markovian Processes”. In the following we will sketch it. Additional reading is recommended…

species C1 C2 C3 C4 … Cm

K2P Distinguish between two mutations types:

Transitions {AG, CT}

Transversions [{A,G}{C,T}]

Different biological models impose restrictions on the substitution matrices.

Our model is the Kimura 2 Parameter )K2P( model:

K2P rate matrices have the following shape

A G T C

All transitions have rate α

All transversions has rate β

Part 3.2:Distance functions for K2P

( Linear Algebra in the service of Biology)

μP000

0μP00

00λP0

U-1 P U =

μQ000

0μQ00

00λQ0

U-1 Q U =

U-1 PQ U =

Let P,Q be two matrices in K2P. Then:

μP μQ

0μP μQ00

00λP λQ0

U-1 PQ U =

00λ1 (P)0

λ2 (P)

λ3 (P)

00λp0

U-1 P U =

ACGGTCA

ACGGATA

GGGGATT

The joint distribution of each pair of verticesprovides an approximation of the substitution matrices

The common theme of all projects: Start with input sequences for two or more taxa.Find a distance function which minimizes the inaccuracy (noise) introduced by the sampling process.

A G C T

A - α β β

G α - β β

C β β - αT β β α -

A G C T

A - α` β` β`

G α` - β` β`

C β` β` - α`T β` β` α` -

ACGGATA

K2P Model tree:======<Tree Topology> +

A G T C

p p1 2 p p

1 2 p p

A G T C

A 1-3p p p pG p 1-3p p pT p p 1-3p pC p p p 1-3p

1 1 12 22

0.25 0.25 0.25 0.25

A G C T

K2P rate matrices have the following shape

A G T C

All transitions have rate α

All transversions has rate β

''''''

Given sequences at two adjacent verticeswe define the edge length in two steps :

vertices C1 C2 C3 C4 … Cm

v…TCTGGGA…

…GGGGATT…

First, align the sequences,

Natural evolutionary distance: Total substitution rate

-αββT

α-ββC

ββ-αG

ββα-A

-αββT

α-ββC

ββ-αG

ββα-A

Each edge is associated with a time t and a K2P rate matrix S.The total substitution rate along an edge of length t is t(α +2β).Total substitution rate between species = sum of the rates over the path connecting them.

Total substitution rates are exact distances, which we try to reconstruct from observing the joint distribution of sequences at u and v.

-α`β`β`T

α`-β`β`C

β`β`-α`G

β`β`α`-A

-α`β`β`T

α`-β`β`C

β`β`-α`G

β`β`α`-A

How do we estimate DK2P(u,v)?

Our input are aligned sequences at u and v.They can be used to estimate the probablity that a nucleotide X in u will be replaced by a nucleotide Y in v

Estimate Puv from the joint distributions:

First step in distance estimation:

(Maximum Likelihood)

p p1 2 p p

1 2 p p

p p1 2 p p

1 2 p p

Substitution matrix is estimated by the observed difference between the sequences.

ACCGTTGTCTGGGA5

ACGGGTA

ACCCGTGTCTGGTA1

ACCGTTGTCTGGGA

• Errors in distance estimations are amplified when:• The rate is small: signal is too weak (in extreme

cases, there are no substitution whatsoever)• The rate is large: recent substitutions overwrite older

ACGGATA

How reliable

Consider “balanced” quartets. Define the “quartet ratio” to be the ratio between the middle edge and two external edges.

The rate matrix S implies a stochastic substitution matrix Puv :

p p1 2 p p

1 2 p p

exp( )uv uvP S

Puv defines the joint distribution of the sequences at u,v.

( , ) ( , ) ( , ) ( , ) ( , ) (2 , )seT T T Tp T Td d d d dw d A B C D A C B D A D B C

performance of the standard distance method in reconstructing the split from estimated distances

12 sepw

• Distance based 4-point method (FPM):

Reconstruction will fail if .

ˆ ˆ ˆ ˆ ˆ ˆ( , ) ( , ) min ( , ) ( , ), ( , ) ( , )d A B d C D d A C d B D d A D d B C

12 sepw 1

2 sepw 12 sepw 1

2 sepw 12 sepw

10t 10t 10t

Minimizing the expected relative error

Since ln( ) is non-linear, we only find which minimizes the NMSE

ˆ of a linear approximation of (using the "delta method").

ˆ ˆˆ ˆ(ln( ) ln( )) (ln( ) ln( ))

ln( ) ln( )

E cE c

2ln( ) ln( )c

and the optimal is:

- Compute distances between all taxon-pairs

- Find a tree (edge-weighted) best-describing the distances

Distance based methods: The general scheme

1514180

171620220

1615192190

210 61

This Talk

AATCCTG

ATAGCTGAATGGGC

GAACGTA

AAACCGAACCGTTGTCTGGGA

TCCGGAA AGCCGTG

GGGGATT

Find constants { ,c }

s.t. the SR function:

( ) ln( ) ln( )

is best for the input P P

P c c D

÷÷÷÷÷÷÷÷

çççççççç

1615192190

( , ) ( , )i jD i j s s

Adaptive distance based algorithm

for the K2P model

Distance based methods: The general scheme

1514180

171620220

1615192190

210 61

This Talk

÷÷÷÷÷÷÷÷

çççççççç

1615192190

D ( , ) ( , )i jD i j d s s4 5

210 61

Find a good distance function

Distance based methods: An adaptive scheme

Find a distance function d which is good for the input

This work

÷÷÷÷÷÷÷÷

øçççççççç

( , ) ( , )i jD i j d s s

Promotion: Make Distance based methods adaptive

functions for K2P are of the form:

gives the weight the function

puts on the transversions.

Next we show how this weight is affected by

( ) ln(

total substitution r

transition/transversion nd ratio

Summary of previous slides:

Towards optimal distance functions for stochastic substitution models

Documents

Transcript of Towards optimal distance functions for stochastic substitution models

Optimal and Hierarchical Controls in Dynamic Stochastic

Stochastic optimal generation bid to electricity markets ...

A combined stochastic programming and optimal control ...Finally, Geyer et al. (2009a) argue that stochastic optimal control and stochastic programming can be combined and integrated

Optimal resource capacity management for stochastic networksad3217/publications/capacitymanagement.pdf · Dieker, Ghosh, and Squillante: Optimal resource capacity management for stochastic

Stochastic optimal portfolios and life insurance problems ...

Introduction - uni-tuebingen.de › ~ananta › Optimal-control-SLLG.pdf · ON STOCHASTIC OPTIMAL CONTROL OF FERROMAGNETISM 3 2.1. Stochastic optimal control problem (1.3): reformulation

Optimal Regulation in Systems With Stochastic Time Sampling · 2013-08-31 · Optimal Regulation in Systems With Stochastic Time Sampling ... Optimal Regulation in Systems With Stochastic

Asset Substitution, Debt Overhang, and Optimal Capital ... · Asset Substitution, Debt Overhang, and Optimal Capital Structure Abstract This article uses a contingent-claims valuation

Stochastic optimal control theory - Uni Stuttgart · Stochastic optimal control, discrete case (Toussaint, 40 min.) - Stochastic Bellman equation (discrete state and time) and Dynamic

Stochastic optimal therapy for enhanced immune response

Robust stochastic optimal short-term generation scheduling ...

Application of Stochastic Optimal Control to Financial ... · APPLICATION OF STOCHASTIC OPTIMAL CONTROL TO FINANCIAL MARKET DEBT CRISES ... Analyses of applications of stochastic

Optimal Stochastic Search and Adaptive Momentum · rithms with annealed learning rates of the form Jl = Jlo/t, ... Optimal Stochastic Search and Adaptive ... Optimal Stochastic Search

Stochastic time-optimal path-planning

CLASSIFICATION TREE ANALYSIS OF STOCHASTIC OPTIMAL …civil.colorado.edu/~balajir/CVEN6833/projects/ryan.pdf · classification tree analysis of stochastic optimal control sequences

Multi-stage Stochastic Alternating Current Optimal Power ...

Learning Stochastic Optimal Policies via Gradient Descent

Optimal Pooling of Inventories with Substitution: a ...

Stochastic Optimal Control in Finance - ETH Zhmsoner/pdfs/BOOK-Soner-Stochastic... · Stochastic Optimal Control in Finance H. Mete Soner Ko¸c University Istanbul, Turkey msoner@ku.edu.tr.

Gradient Dynamic Programming for Stochastic Optimal Controlefi.eng.uci.edu/papers/efg_100_.pdf · stochastic optimal control problems decomposable in stages. The algorithm, designated