Intelligence Artificial Intelligence Ian Gent [email protected] Constraint Programming 3: The Party.
Species Trees & Constraint Programming. Ongoing work with Ian Gent, Barbara Smith, Wu Wei...
-
Upload
clifton-newman -
Category
Documents
-
view
217 -
download
2
Transcript of Species Trees & Constraint Programming. Ongoing work with Ian Gent, Barbara Smith, Wu Wei...
Species Trees & Constraint Programming
Ongoing work with
Ian Gent, Barbara Smith, Wu Wei (Christine)
The Tree of Life
A central goal of systematics
• construct the tree of life
• a tree that represents the relationship between all living things• including constraint programmers
• The leaf nodes of the tree are species
• The interior nodes are hypothesized species• extinct, where species diverged
Properties of a Species Tree
• We have a set of leaf nodes, each labelled with a species• the interior nodes have no labels• each interior node has 2 children and one parent
• except the root (it has no parent)• if we have n leaf nodes we then have n 1 interior nodes• it is a bifurcating tree
Super Trees
• We are given two trees, T1 and T2
• T1 has leaf set S1 and S2 has leaf set • remember, leaves are species!
• But S1 and S2 have a non-empty intersection• why? How can that happen?
• We want to combine T1 and T2• so, why is that a problem?
Most Recent Common Ancestors (mrca)
a b
cWe have 3 species, a, b, and c
Species a and b are more closely relatedto each other than they are to c
The most recent common ancestor of a and bis further from the root than the most recent common ancestor of a and c (and b and c)
• mrca(a,b) mrca(a,c)• mrca(a,b) mrca(b,c)• mrca(a,c) mrca(b,c)
cab |
Triples (and Fans)
a b
c
b c
d
Species trees are frequently presented as a set of triples (and fans)
}|,|{ dbccab
Triples (and Fans)
a b
c
b c
d
a b
c
d
BreakUp & OneTree (circa 1996)
Algorithm breakUp takes a species tree and produces a set of rooted triples R that define that tree.
Algorithm OneTree takes a set of species and a set of rootedtriples, and builds a tree that respects those triples, or reportsthat no tree exists (in polytime)
OneTree is a specialisation of Build, an algorithm proposedby Aho, Sagiv, Szymanski, and Ulman in 1981
The Flavour of OneTree
Given a set of species S and rooted triples R
• produce a node N• construct a graph G
• with vertices in S• and edge (x,y) if triple xy|z is in R
• if G is a single component fail• else recursively build
• on the left with one component • with S’ and R’ (the set of species and triples in that component)
• on the right, with the other components
The Flavour of OneTree
},,,{
}|,|{
dcbaS
dbccabR
d
a
c
b
},,{
}|{
cbaS
cabR
a
c
b
}{
{}
dS
R
d
Min-cut Super Trees
• What happens if OneTree fails?
• Gives us the best you can• by breaking some triples (resulting in fans)• by excluding some species
• There are polytime algorithms for this• but they are greedy and biased
Constraint Programming solutions to building a species tree from a set of rooted triples
A naïve constraint encoding (footnotes 756, 789, 794, 796)
• n-1 variables as interior nodes• v[i] = j parent(v[i]) = v[j]• no loops/cycles
• Barbara used set variables (ILOG)• Patrick used specialised constraint (Chco)
• Francois then encoded set variables!• n variables as leaf nodes• each takes a value respecting triples
• I am sparing you (and me) the details
Why was this a naïve constraint encoding?
• It produced the right number of trees when no triples• the Catalan number• symmetry breaking
• It would produce a tree if one existed
A 2 stage process
• (1) build a tree from the interior nodes• there are Catalan many of these
• (2) given an “interior tree” place the leaf nodes• there are n! ways to do this
• if step (2) fails generate the next interior tree in (1)
Yikes! That’s expensive.Imagine {ab|c,bc|d,cd|a}
Ultrametric Trees & Species Trees (footnotes 803,804,805,810,819)
What is an ultrametric tree?
• We are given a 2d symmetric matrix D• D[i][j] is the time of divergence of species i and j.
• D[i,j] is the the mrca(i,j) labeled with time of divergence• D[i,j] is the value of mrca(i,j)
• Build a bifurcating tree• n leaves and n - 1 interior nodes• interior nodes labeled with entries from D• any path from the root is a strictly decreasing sequence
8
35
B3 CD
EA
0
50
880
8830
35880
E
D
C
B
A
EDCBA
Ultrametric Trees: here’s one I (well, Dan Gusfield actually ) prepared earlier
Note: if the sequence increases, we have min-ultrametric tree
Ultrametric Matrix: necessary & sufficient conditions
• cannot have more than n - 1 distinct values• because there are n - 1 interior nodes
• For every 3 indices i,j,k• there is a tie for the maximum between D[i,j], D[i,k], D[j,k]
Given an ultrametric matrix, an ultrametric tree can beconstructed in O(n2)
… see Dan Gusfield’s book “Algorithms on Strings, Trees, and Sequences”
A CP encoding of D
• We have a 2 dimensional matrix of constrained integer cvariables D• We must ensure that for any i,j,k the following holds
],[],[
],[],[
],[],[
kjDkiD
kjDjiD
kiDjiD
],[ jiD
],[ kjD
],[ kiD
i
j k
Think isosceles triangles,allowing equilateral
An ultrametric space,composed of isosceles triangles
A CP encoding of D
],[],[
],[],[
],[],[
kjDkiD
kjDjiD
kiDjiD
],[ jiD
],[ kjD
],[ kiD
i
j k
Any instantiation of the variables in D isnow guaranteed to be min-ultrametric
We get Catalan number of min-ultrametric solutions
How can we exploit this?
• We are given triples and fans, but not distances!• But we can consider a triple ij|k as a constraint
k
ji
],[],[],[],[],[],[ kjDjiDkiDjiDkjDkiD
Note: our tree is min-ultrametric!
This over-rides the disjunctions postedacross the matrix
The CP encoding (contd)
• we have the “blanket” disjunctive constraint to ensure min-ultrametric
• triples are constraints that break the disjunctions
• a solution (if one exists) is min-ultrametric respecting triples
• we can then produce tree from the matrix, as a post process
• NOTE: we need a pre-process to break up trees into triples
So where are we?
Good question:
• we have not yet tried real data• we have a number of different micro-encodings• Are we in P for decision?
• Not sure yet• How about optimisation?
• We can see a way, by introducing penalties• Wu Wei is coding up BreakUp and OneTree
• so we have something real to compare with• We need real data to check this out• I need to get funding for this
• write a grant proposal with DRG I think!
Questions?