Complexity and Approximation of the Minimum Recombinant Haplotype Configuration Problem
description
Transcript of Complexity and Approximation of the Minimum Recombinant Haplotype Configuration Problem
Complexity and Approximation of the Minimum Recombinant Haplotype Configuration Problem
Authors: Lan Liu, Xi Chen, Jing Xiao & Tao Jiang
Outline Introduction and problem definition Deciding the complexity of binary-tree-MRHC Approximation of MRHC with missing data Approximation of MRHC without missing data Approximation of bounded MRHC Conclusion
Introduction
2 2
2 11 2
1 11 2
Genotype
Haplotype
Locus
2 1 PS value=11 2 PS value=0
Basic concepts Mendelian Law: one haplotype comes from the mother and the other comes from the father.
Example: Mendelian experiment
Notations and Recombinant
1111
22222222
2222
11110 recombinant
2222
FatherMother
: recombinant
1111
22222222
2222
1122
22221 recombinant
FatherMother
1122 2222 Genotype
1222
2122 Haplotype Configuration
Pedigree
Camilla, Duchess of Cornwall
Peter Phillips Zara Phillips
Diana,Princess of Wales
Prince Williamof Wales
Prince Henry ofWales
PrincessBeatrice of York
PrincessEugenie of York
Lady LouiseWindsor
Prince Charles,Prince of Wales
Princess Anne, Princess Royal
CommanderTimothy Laurence
Prince Andrew,Duke of York
SarahMargaret Ferguson
Prince Edward, Earl of Wessex
Sophie Rhys-Jones
Elizabeth II ofthe United Kingdom
Prince Philip,Duke of Edinburgh
CaptainMark Phillips
An example: British Royal Family
Haplotype Reconstruction - Haplotype: useful, expensive - Genotype: cheaper
1 21 2
1 21 2
M C
1 21 2
1 21 2
1 21 2
M C
1 21 2
(a)
1 21 2
1 22 1
M C
1 21 2
(b)
Reconstruct haplotypes from genotypes
Problem Definition MRHC problem Given a pedigree and the genotype
information for each member, find a haplotype configuration for each member which obeys Mendelian law, s.t. the number of recombinants are minimized.
Problem Definition Variants of MRHC
Tree-MRHC: no mating loop Binary-tree-MRHC: 1 mate, 1 child 2-locus-MRHC: 2 loci 2-locus-MRHC*: 2 loci with missing data
Previous Work The known hardness results for Mendelian law checking
Loop? Multi-allelic? Hardness
Yes Yes NP-hard [AHI+03]No P [AHI+03]
No P [AHI+03]
The known hardness results for MRHC
NP-hard [LJ03] P [LJ03]
P [DLJ03]NP-hard [DLJ03]
2-locus-MRHCTree-MRHC with
bounded #membersTree-MRHC withbounded #loci
Tree-MRHC
Hardness
Our hardness and approximation results
Lower boundof approx.
ratio
Any f(n)
Any f(n)
Any constant
Assumption
P≠ NP
P≠ NP P≠ NP
the Unique GamesConjecture[Khot02]
Binary-tree-MRHC
2-locus-MRHC*Binary-tree-
MRHC*
2-locus-MRHC
Hardness
NP
Tree-MRHC Any constant P≠ NP
the Unique GamesConjecture
Upper boundof approx.
ratio
O ( )
The lower boundholds for
2-locus-MRHC*(4,1)
Binary-tree-MRHC*(1,1)
2-locus-MRHC(16,15)
Tree-MRHC(1,u)Tree-MRHC(u,1)
)log(n
Our hardness and approximation results
Lower boundof approx.
ratio
Any f(n)
Any f(n)
Any constant
Assumption
P≠ NP
P≠ NP P≠ NP
the Unique GamesConjecture[Khot02]
Binary-tree-MRHC
2-locus-MRHC*Binary-tree-
MRHC*
2-locus-MRHC
Hardness
NP
Tree-MRHC Any constant P≠ NP
the Unique GamesConjecture
Upper boundof approx.
ratio
O ( )
The lower boundholds for
2-locus-MRHC*(4,1)
Binary-tree-MRHC*(1,1)
2-locus-MRHC(16,15)
Tree-MRHC(1,u)Tree-MRHC(u,1)
)log(n
Outline Introduction and problem definition Deciding the complexity of binary-tree-MRHC Approximation of MRHC with missing data Approximation of MRHC without missing data Approximation of bounded MRHC Conclusion
A verifier for ≠3SAT (1) Given a truth assignment for literals
in a 3CNF formula Consistency checking for each variable Satisfiability checking for each clause
Binary-tree-MRHC is NP-hard
(A) C’s genotype
1 21 2
(B) Two haplotype
1 21 2
1 22 1
configurations
1 22 1
1 21 2
1 22 1
1 22 1
1 22 1
1 22 1
1 22 1
1 21 2
1 21 2
M C M MC C
(a) (b) (c)
C can check if M have certain haplotype configuration!!
Binary-tree-MRHC is NP-hardO1 O2 B1A1
BtAt
Bt+1At+1
Bt+2At+2
Bt+3At+3
Bt+3mAt+3m...
M2
M1
...
Mt-1
Mt
B2A2C1
C2
Ct
Part 1 (#recombinants >=0)
Part 2(#recombinants >=#clauses)
Ct+1
Mt+1Ct+2
Mt+2Ct+3
Mt+3m-1 Ct+3m
Mt+3m
consistencychecking
satisfiabilitychecking
The pedigree
≠3SAT is satisfiable OPT(MRHC)=#clauses
Outline Introduction and problem definition Deciding the complexity of binary-tree-MRHC Approximation of MRHC with missing data Approximation of MRHC without missing data Approximation of bounded MRHC Conclusion
Inapproximability of 2-locus -MRHC*
Definition: A minimization problem R cannot be approximated -There is not an approximation algorithm with ratio f(n)
unless P=NP.-f(n) is any polynomial-time computable function
Fact: If it is NP-hard to decide whether OPT(R)=0, R cannot be approximated unless P=NP.
Inapproximability of 2-locus -MRHC*
1 21 2
x
(A) gadget for variable x
x1 21 2
1 12 1
1 21 2
x
*1 2
2
1 21 2
2 22 2
1 22
1 22
1 21 2
2 22 2
1 11 *
y
z
*
*1 2
2*
2 22 2
zyx (B) gadget for clause
Reduce 3SAT to 2-locus-MRHC*
3SAT is satisfiableOPT(2-locus-MRHC*)=0
2-locus-MRHC* cannot be approximatedunless P=NP!!
False
True1 21 21 22 1
Outline Introduction and problem definition Deciding the complexity of binary-tree-MRHC Approximation of MRHC with missing data Approximation of MRHC without missing data Approximation of bounded MRHC Conclusion
Upper Bound of 2-locus-MRHC Main idea: use a Boolean variable to capture the configuration; use clauses to capture the recombinants.
An example
1 21 2
1 21 2
1 11 1
A B
1 22 1
FalseTrue 1 21 2
1 21 2
1 22 1
1 11 1
A B
)( BA
Upper Bound of 2-locus-MRHC The reduction from 2-locus-MRHC to Min 2CNF Deletion
Genotype of theMother (A)
Genotype of theFather (B) Genotype of the Child (C) 2CNF Constraint
1 11 1
2 22 2 )( )( )(2 BABABA
2 21 1
1 12 2 )( )( )(2 BABABA
1 11 2
1 21 1
2 21 2
1 22 2
1 21 2 )( )( )( )( CBCBCACA
1 21 2
1 21 2
1 11 1
2 22 2 A
2 21 1
1 12 2
A
1 21 2 )( )( CACA
1 21 2
X XY X
Y XX X
Y XX XY XY Y
X XY X
Y XX X
X XX YY YX Y
A
A
A
A
)( )( BABA
Upper Bound of 2-locus-MRHC
)log(n
Recently, Agarwal et al. [STOC05] presented an O ( ) randomized approximation algorithm for Min 2CNF Deletion.
)log(n 2-locus-MRHC has O ( ) approximation algorithm.
Outline Introduction and problem definition Deciding the complexity of binary-tree-MRHC Approximation of MRHC with missing data Approximation of MRHC without missing data Approximation of bounded MRHC Conclusion
Approximation Hardness of bounded MRHC
Bound #mates and #children 2-locus-MRHC: (16,15) 2-locus-MRHC*: (4,1) tree-MRHC: (u,1) or (1,u)
Conclusion Our hardness and approximation results
Lower boundof approx.
ratio
Any f(n)
Any f(n)
Any constant
Assumption
P≠ NP
P≠ NP P≠ NP
the Unique GamesConjecture
Binary-tree-MRHC
2-locus-MRHC*Binary-tree-
MRHC*
2-locus-MRHC
Hardness
NP-hard
Tree-MRHC Any constant P≠ NP
the Unique GamesConjecture
Upper boundof approx.
ratio
O ( )
The lower boundholds for
2-locus-MRHC*(4,1)
Binary-tree-MRHC*(1,1)
2-locus-MRHC(16,15)
Tree-MRHC(1,u)Tree-MRHC(u,1)
)log(n
Thanks for your time and
attention!