Bioinformatica Catene di Markov, HMM – GenScan – EasyBack Dr. Giuseppe Pigola – [email protected].
Prof. Mario Pavone Computazione Naturale CdL Magistrale in Informatica [email protected]
-
Upload
jenna-mckay -
Category
Documents
-
view
38 -
download
2
description
Transcript of Prof. Mario Pavone Computazione Naturale CdL Magistrale in Informatica [email protected]
Prof. Mario PavoneProf. Mario PavoneComputazione NaturaleComputazione Naturale
CdL Magistrale in InformaticaCdL Magistrale in Informatica
[email protected]://www.dmi.unict.it/mpavone/
Teoria della Complessità:Teoria della Complessità:la classe NPla classe NP
Mario Pavone, AA. 2010/2011 – Computazione Naturale – CdL Magistrale in Informatica – SDAI – DMI - UniCt
Bibliography
Una descrizione delle classi NP-complete possono essere trovate in:M. R. Garey and D. S. Johnson, Computers and Computers and
Intractability: A Guide to the Theory of NP-Intractability: A Guide to the Theory of NP-CompletnessCompletness, W. H. Freeman, 1st ed. (1979)
Cormen, Leiserson, Rivest and Stein, Introduzione agli Algoritmi e Strutture DatiIntroduzione agli Algoritmi e Strutture Dati, McGraw-Hill (capitolo 34)
A compendium of NP optimization problemsA compendium of NP optimization problems at at the link:the link:
http://www.csc.kth.se/~viggo/problemlist/http://www.csc.kth.se/~viggo/problemlist/
Mario Pavone, AA. 2010/2011 – Computazione Naturale – CdL Magistrale in Informatica – SDAI – DMI - UniCt
Search Space
Given a combinatorial problem PP, a search search spacespace associated to a mathematical formulation of PP is defined by a couple (S,(S,f f )) where SS is a finite set of configurations (or nodes
or points) andff a cost functioncost function which associates a real number
to each configurations of SS.For this structure two most common measures
are the minimum and the maximum coststhe minimum and the maximum costs. In this case we have the combinatorial combinatorial optimization problemsoptimization problems.
Mario Pavone, AA. 2010/2011 – Computazione Naturale – CdL Magistrale in Informatica – SDAI – DMI - UniCt
Example: K-SATAn instance of the K-SAT problem K-SAT problem consists of a
set V of variables, a collection C of clauses over V such that each clause c C has |c|= K
The problem is to find a satisfying truth assignment for C
The search space search space for the 2-SATfor the 2-SAT with |V|=2 is (S,(S,f f ) ) where S={ (T,T), (T,F), (F,T), (F,F) }S={ (T,T), (T,F), (F,T), (F,F) } andthe cost functionthe cost function for 2-SAT2-SAT computes only the
number of satisfied clauses
ffsatsat (s)= #SatisfiedClauses(F,s), s (s)= #SatisfiedClauses(F,s), s S S
Mario Pavone, AA. 2010/2011 – Computazione Naturale – CdL Magistrale in Informatica – SDAI – DMI - UniCt
Example: SEARCH SPACE of K-SAT
Let we consider F = (A B) ( A B)
A BA B ffsatsat(F,s)(F,s)
T T 1
T F 2
F T 1
F F 2
Mario Pavone, AA. 2010/2011 – Computazione Naturale – CdL Magistrale in Informatica – SDAI – DMI - UniCt
Problem: the start point
PROBLEMPROBLEM: to refer to a question such as: "Is a given graph G=(V,E) K-colorable?"
INSTANCEINSTANCE of a problem is a list of arguments, one argument for each parameter of the problem
a problem is a binary relationa problem is a binary relation on a set II of problem instancesproblem instances, and a set SS of problem problem solutionssolutions: Q: I → SQ: I → S
decision problemsdecision problems: Q: I → {0, 1}Q: I → {0, 1}in general, if we can solve an optimization in general, if we can solve an optimization
problem quickly, we can also solve its related problem quickly, we can also solve its related decision problem quicklydecision problem quickly
Mario Pavone, AA. 2010/2011 – Computazione Naturale – CdL Magistrale in Informatica – SDAI – DMI - UniCt
Problem: the start point
a problem is UNDECIDABLEUNDECIDABLE if there is no algorithm that takes as input an instance of the problem and determines whether the answer to that instance is "yes""yes" or "no""no"
problems that are solvable by polinomial-time solvable by polinomial-time algorithmsalgorithms as being TRACTABLETRACTABLE; ones that require superpolynomial timesuperpolynomial time as being INTRACTABLEINTRACTABLE
Mario Pavone, AA. 2010/2011 – Computazione Naturale – CdL Magistrale in Informatica – SDAI – DMI - UniCt
Problem: the start pointNP-complete problems are intractableNP-complete problems are intractable
If any single NP-complete problem can be If any single NP-complete problem can be solved in polynomial time then every NP-solved in polynomial time then every NP-complete problem has a polynomial-time complete problem has a polynomial-time algorithmalgorithm
If you can establish a problem as NP-complete you provide good evidence for its intractabilityits intractability
You would then do better spending your time developing an approximation algorithmdeveloping an approximation algorithm rather than searching for a fast algorithm that solves the problem exactly
Polynomial-time solvable problems are generally Polynomial-time solvable problems are generally regarded as tractableregarded as tractable
Mario Pavone, AA. 2010/2011 – Computazione Naturale – CdL Magistrale in Informatica – SDAI – DMI - UniCt
Complexity Class PAn algorithm solves a problem in time An algorithm solves a problem in time O(T(n))O(T(n)) if,
when it is provided a problem instance problem instance ii of of length length n=|i|n=|i|, the algorithm can produce the the algorithm can produce the solution in at most solution in at most O(T(n))O(T(n)) time time
A problem is polynomial-time solvableproblem is polynomial-time solvable if there exists an algorithm to solve it in time O(nan algorithm to solve it in time O(nkk)) for some constant k
COMPLEXITY CLASS PCOMPLEXITY CLASS P: the set of decision problems that are solvable in polynomial time
A function function ff is polynomial-time computable is polynomial-time computable if there exists a polynomial-time algorithm AA that, given any input x x I I, produces as output f(x)f(x)
Mario Pavone, AA. 2010/2011 – Computazione Naturale – CdL Magistrale in Informatica – SDAI – DMI - UniCt
Example: problem in P
Dato un grafo G=(V,E) e una funzione peso sugli archi w(u,v) che specifica il costo; determinare un sottoinsieme T E di peso minimo è detto problema problema dell'albero di connessione minimodell'albero di connessione minimo (Minimum (Minimum Spanning Tree)Spanning Tree)
Mario Pavone, AA. 2010/2011 – Computazione Naturale – CdL Magistrale in Informatica – SDAI – DMI - UniCt
Example: problem in P
Minimum Weight Spanning Tree.
Input: Graph G, integer k.
Decision problem: Does G have a spanning tree of Does G have a spanning tree of weight at most k? weight at most k?
If you are provided with a tree with weight at most k as part of the solution, the answer can be verified in O(n2) time.
Complexity Class NP
Some problems are intractable: as they grow large, we are unable to solve them in reasonable time
What constitutes reasonable time? Standard working definition: polynomial timeOn an input of size n the worst-case running time
is O(nk) for some constant kPolynomial time: O(n2), O(n3), O(1), O(n lg n) Not in polynomial time: O(2n), O(nn), O(n!)
Mario Pavone, AA. 2010/2011 – Computazione Naturale – CdL Magistrale in Informatica – SDAI – DMI - UniCt
Complexity Class NPA decision problemdecision problem (yes/no question) is in the class NPclass NP if it has a nondeterministic polynomial time algorithm
Informally, such an algorithm:
Guesses a solution (nondeterministically).
Checks deterministically in polynomial time that the answer is correct.
Or equivalently, when the answer is "yes", there is a certificate (a solution meeting the criteria) that can be verified in polynomial time (deterministically)
The Complexity Class NPComplexity Class NP is the class of problems that can can be verified by a polynomial-time algorithmbe verified by a polynomial-time algorithm
If Q Q PP then Q Q NPNP => P P NPNP
Open question is:Open question is: P = NP P = NP
Mario Pavone, AA. 2010/2011 – Computazione Naturale – CdL Magistrale in Informatica – SDAI – DMI - UniCt
Nondeterminism
Think of a non-deterministic computer as a computer that magically “guesses” a solution, then has to verify that it is correctIf a solution exists, computer always guesses itOne way to imagine it: a parallel computer that
can freely spawn an infinite number of processesHave one processor work on each possible solutionAll processors attempt to verify that their solution worksIf a processor finds it has a working solution
So: NPNP = problems = problems verifiableverifiable in polynomial time in polynomial time
Mario Pavone, AA. 2010/2011 – Computazione Naturale – CdL Magistrale in Informatica – SDAI – DMI - UniCt
Review: P and NP
Summary so far:P = problems that can be solved in polynomial
timeNP = problems for which a solution can be
verified in polynomial timeUnknown whether P = NP (most suspect not)
Hamiltonian-cycle problem is in NP:Cannot solve in polynomial timeEasy to verify solution in polynomial time
Mario Pavone, AA. 2010/2011 – Computazione Naturale – CdL Magistrale in Informatica – SDAI – DMI - UniCt
Hamiltonian Cycle Problem
A hamiltonian cycle of an undirected graph G=(V,E) is a simple cycle that contains each vertex in V.
A graph that contains a hamiltonian cycle is said to be hamiltonianhamiltonianNOT ALL GRAPHS ARE HAMILTONIANNOT ALL GRAPHS ARE HAMILTONIAN
Hamiltonian-cycle problem: “Does a graph G “Does a graph G have a hamiltonian cycle?”have a hamiltonian cycle?”
What is the running time of this algorithm?– There are There are m!m! possible permutations of the possible permutations of the
verticesvertices
Mario Pavone, AA. 2010/2011 – Computazione Naturale – CdL Magistrale in Informatica – SDAI – DMI - UniCt
Hamiltonian Cycle ProblemHamiltonian Cycle Hamiltonian Cycle (input: a graph G)Does G have a Hamiltonian cycle?Does G have a Hamiltonian cycle?
Solution:Solution:
0, 1, 2, 11, 10, 9, 8, 7, 6, 5, 14, 0, 1, 2, 11, 10, 9, 8, 7, 6, 5, 14, 15, 6, 17, 18, 19, 12, 13, 3, 415, 6, 17, 18, 19, 12, 13, 3, 4
Mario Pavone, AA. 2010/2011 – Computazione Naturale – CdL Magistrale in Informatica – SDAI – DMI - UniCt
Hamiltonian Cycle Problem
algorithmalgorithm: lists all permutations of the vertices and check each permutation to see if it is a hamiltonian paththere are there are m!m! Possible permutations of the vertices Possible permutations of the vertices
CHECK if the provided cycle is hamiltonian: if it is a permutation of the vertices of V and each consecutive edges along the cycle exists in the graph
This verification can be implemented to run in O(nverification can be implemented to run in O(n22) ) timetime => a hamiltonian cycle exists in a graph can be verified in polynomial timen is the lenght of the econding of G
NP-Complete Problems
The NP-Complete problems are an interesting class of problems whose status is unknown No polynomial-time algorithm has been
discovered for an NP-Complete problemNo suprapolynomial lower bound has been
proved for any NP-Complete problem, eitherWe call this the P = NP open question
The biggest open problem in CS
Mario Pavone, AA. 2010/2011 – Computazione Naturale – CdL Magistrale in Informatica – SDAI – DMI - UniCt
NP Complete Problems
The class of problems in NP which are the problems in NP which are the “hardest”“hardest” are called the NP-completeNP-complete problemsproblems
A problem QQ in NP classNP class is NP-complete NP-complete problemproblem if the existence of a polynomial time algorithm for QQ implies the existence of existence of a polynomial time algorithm for all problems a polynomial time algorithm for all problems in in NPNP
Steve Cook in 1971 proved that SAT is NP-Steve Cook in 1971 proved that SAT is NP-complete. complete.
– Other problems: use reductions.Other problems: use reductions.
NP-Complete Problems
NP-Complete problems are the “hardest” problems in NP:If any one NP-Complete problem can be solved in
polynomial time……then every NP-Complete problem can be solved
in polynomial time……and in fact every problem in NP can be solved
in polynomial time (which would show P = NP)Thus: solve hamiltonian-cycle in O(n100) time,
you’ve proved that P = NP. Retire rich & famous.
Reduction
The crux of NP-Completeness is reducibilityInformally, a problem P can be reduced to a problem P can be reduced to
another problem Qanother problem Q if any instance of P can be “easily rephrased” as an instance of Q, the solution to which provides a solution to the instance of PWhat do you suppose “easily” means?This rephrasing is called transformation
Intuitively: If P reduces to Q, P is “no harder to solve” than Q
Reducibility
An example:P: Given a set of Booleans, is at least one TRUE?Q: Given a set of integers, is their sum positive?Transformation: (x1, x2, …, xn) = (y1, y2, …, yn)
where yi = 1 if xi = TRUE, yi = 0 if xi = FALSE
Another example: Solving linear equations is reducible to solving
quadratic equationsGiven an instance ax+b=0, we transform it to 0x2+ax+b=0
Using ReductionsIf P is polynomial-time reducible to Q, we denote
this P P pp Q Q
Definition of NP-Complete: If P is NP-Complete, P NP and all problems R are
reducible to PFormally: R R pp P P R NP
If P If P pp Q and P is NP-Complete, Q is also Q and P is NP-Complete, Q is also NP-CompleteNP-Complete
Polynomial-time reductions provide a formal means for showing that one problem is at least as hard as another, to within a polynomial factor.
NP-Completeness
If P P pp Q Q then P is not more than a polynomial P is not more than a polynomial factor harder than Qfactor harder than Q
A problem P is NP-Complete if P P NP, and NP, and P' P' pp P, for every P' P, for every P' NP NP
If all problems Q all problems Q NP are reducible to P NP are reducible to P, then we say that P is NP-HardP is NP-Hard
A problem P is NP-Complete if it is in the A problem P is NP-Complete if it is in the NP class and is NP-Hard (P is NP-Hard and NP class and is NP-Hard (P is NP-Hard and P P NP) NP)
Why Prove NP-Completeness?
Though nobody has proven that P != NP, if you prove a problem NP-Complete, most people accept that it is probably intractable
Therefore it can be important to prove that a problem is NP-CompleteDon’t need to come up with an efficient
algorithmCan instead work on approximation algorithms
Proving NP-Completeness
What steps do we have to take to prove a problem P is NP-Complete?Pick a known NP-Complete problem QReduce Q to P
Describe a transformation that maps instances of Q to instances of P, s.t. “yes” for P = “yes” for Q
Prove the transformation worksProve it runs in polynomial time
Oh yeah, prove P NP (What if you can’t?)
Mario Pavone, AA. 2010/2011 – Computazione Naturale – CdL Magistrale in Informatica – SDAI – DMI - UniCt
Teorema
DefinizioneUn problema B é detto NP-completo se:• B NP
• ANP vale che A≤PB
TeoremaTeorema
Se B é un problema NP-completo tale che Se B é un problema NP-completo tale che B≤B≤PPC, per qualche problema C, allora C é NP-C, per qualche problema C, allora C é NP-hard. Se poi C ∈ NP, allora vale anche che C é hard. Se poi C ∈ NP, allora vale anche che C é NP-completo.NP-completo.
Mario Pavone, AA. 2010/2011 – Computazione Naturale – CdL Magistrale in Informatica – SDAI – DMI - UniCt
SAT ProblemDefiniamo:
• Una variabile é un elemento di {x1, x2, . . . , xn}
• Un letterale é una variabile xi o il suo negato
• Una clausola é una disgiunzione di k letterali
C = • Una formula booleana é una congiunzione di clausole
cfn = forma normale congiunta
PROBLEMA: Soddisfacibilità (SAT)
ISTANZA: Una formula booleana DOMANDA: Esiste una assegnazione di valori di verità
alle variabili di che rendono vera?
ix
ikii xxx ...21
mCCC ...21
Mario Pavone, AA. 2010/2011 – Computazione Naturale – CdL Magistrale in Informatica – SDAI – DMI - UniCt
ESEMPIO:
• La formula é soddisfacibile, semplicemente ponendo x1=1, x2=0, e x3=0.
• Infatti
• La seguente formula, invece, non é soddisfacibile
(provare tutte le 8 possibile assegnazioni di verità)
Mario Pavone, AA. 2010/2011 – Computazione Naturale – CdL Magistrale in Informatica – SDAI – DMI - UniCt
TEOREMA DI COOK-LEVINSAT è NP-CompletoSAT è NP-Completo
Dimostrazione
In breve, il teorema di Cook-Levin fa vedere che:– il problema SAT in NP può essere deciso da un opportuno modello di calcolo: la Macchina di Turing non Deterministica (NTM) – data la descrizione di una NTM nn e un input w per nn, si può costruire una espressione in forma normale congiuntiva w che è soddisfacibile se e solo se l'output di nn su input w è accettante– la lunghezza della formula w è polinomiale in |nn| e in |w|
Mario Pavone, AA. 2010/2011 – Computazione Naturale – CdL Magistrale in Informatica – SDAI – DMI - UniCt
PRIMO ESEMPIO: 3SAT3SAT é analogo al problema della soddisfacibilitá di formule booleane (SAT), con però la restrizione che ogni clausola della formula ha al piú 3 letterali.
Teorema
3SAT é NP-completo
Dimostrazione:
Proveremo che 3SAT ∈ NP, successivamente proveremo che SAT ≤P 3SAT. Dal Teorema 1 discenderà che 3SAT é NP-completo. Per mostrare che 3SAT ∈ NP, osserviamo che un certificato (prova) consiste di una assegnazione di valori di verità alle variabili di una formula può essere facilmente verificata in tempo polinomiale. La verifica infatti consiste nel sostituire i valori all’interno della formula e controllare il risultato che vien fuori.
Mario Pavone, AA. 2010/2011 – Computazione Naturale – CdL Magistrale in Informatica – SDAI – DMI - UniCt
SECONDO ESEMPIO: CLIQUE
ISTANZA: Un grafo G = (V, E) ed un intero k
DOMANDA: G ha una clique di taglia k?
(ovvero ∃ V’⊆ V, |V’| = k, tale che ∀u, v ∈ V’, u v vale che (u, v) ∈ E)
Teorema
CLIQUE é NP-completoCLIQUE é NP-completoDimostrazione
Proveremo che CLIQUE ∈ NP, successivamente proveremo che
3SAT ≤P CLIQUE. Dal Teorema 1 discenderà che CLIQUE é NP-completo.
Un algoritmo di verifica per il problema CLIQUE é il seguente: avendo in
input la coppia (G, k) e come certificato un sottoinsieme X ⊆ V , con |X| = k (potenziale soluzione al problema CLIQUE), l’algoritmo controlla tutte le coppie di vertici (u, v), con u, v ∈ X, u v, e produce in output SI se e solo se ciascuna di queste coppie é connessa da un arco di E. Il tempo di esecuzione dell’algoritmo é O(n2), se n = |V |.
Mario Pavone, AA. 2010/2011 – Computazione Naturale – CdL Magistrale in Informatica – SDAI – DMI - UniCt
3SAT ≤P CLIQUE• Sia = una istanza di 3SAT in
una istanza (G , k) di Clique come segue:
1. Per ogni letterale scriviamo un nodo2. Colleghiamo tutti i nodi ad eccezione:
Nodi nella medesima clausola Nodi che rappresentano letterali opposti
kCCC ...21
Mario Pavone, AA. 2010/2011 – Computazione Naturale – CdL Magistrale in Informatica – SDAI – DMI - UniCt
ESEMPIO
Mario Pavone, AA. 2010/2011 – Computazione Naturale – CdL Magistrale in Informatica – SDAI – DMI - UniCt
TERZO ESEMPIO: VERTEX-COVER
ISTANZA: Un grafo G = (V,E) ed un intero kDOMANDA: esiste V ′ ⊆ V , |V ′| = k, tale che ∀ (u, v) ∈ E o vale che u ∈ V ′ oppure v ∈ V ′?
Teorema Vertex-CoverVertex-Cover é NP-completo é NP-completo
DimostrazioneProveremo che Vertex-Cover ∈ NP, successivamente proveremo che 3SAT ≤P Vertex-Cover. Dal Teorema 1 discenderà che Vertex-Cover é NP-completo.Che il problema VERTEX COVER sia in NP é ovvio. Un algoritmo di verifica prende in input l’istanza di input (G, k) ed un certificato V ′ ⊆ V ,|V ′| = k, e controlla se ∀(u, v) ∈ E vale che u ∈ V ′ oppure che v ∈ V ′. Il controllo può essere fatto in tempo O(n2).
Mario Pavone, AA. 2010/2011 – Computazione Naturale – CdL Magistrale in Informatica – SDAI – DMI - UniCt
3SAT ≤P VERTEX-COVER
Per ogni variabile xi associamo una coppia di nodi collegati da un arco
Per ogni clausola associamo 3 nodi mutuamente legati da un arco
Per ogni letterale identico tra clausola e coppia si metta un arco.Se consta di m variabili e t clausole G consterà di 2m+3t nodi. Sia k=m+2t
xi ix
xi-1 ix
xi+1