Lecture 2 on NP-Completeness. The classical reference is Computers and Intractability A guide to the theory of NP-Completeness Michael Garey and David Johnson W H Freeman, 1979. 1. Recap: NP is the class of decision problems whose YES instances have polynomial-time checkable proofs. A problem A is NP-Hard if EVERY problem in NP has a reduction to A. A problem A is NP-Complete if it is NP-Hard AND it belongs in NP. Remember not all problems have polynomially-checkable solutions. Take, for instance, the Halting Problem, or generalized Chess. 2. CSAT and NP-Completeness. To get the theory going, we need to establish at least one NPC problem. We will reason that Circuit-SAT is such a problem. First, CSAT is clearly in NP; given an instance C, and a "satisfying" setting of booleans x1, x2, ..., xn, we just simulate the circuit, and check if xout = 1. The hard part is show that EVERY other problem in NP reduces to CSAT. Recall the implications of A <= CSAT; if CSAT is easy, then all A's are easy too; if ANY A is hard, then CSAT is hard too. This is Cook's Theorem (1971), and difficult result. We will give an informal proof sketch, based on the following idea. 3. Suppose A is a decision problem that is solvable in p(n) time by some program P, where n is the input size (in bits). Then, for every fixed n, there is a circuit Cn of size about O( (p(n))^2 (log p(n))^k) such that for every input x = (x1, .., xn), A(x) = Cn(x1, ..., xn). That is, the circuit Cn solves problem A on all inputs of length n. Proof Idea. Assume the program P is written in some low-level machine language (or compile it). Because P takes at most p(n) steps, it can access at most p(n) memory cells. So, at any step, the "global state" of the program is given by the contents of these p(n) cells plus O(1) program counters. No register/cell needs to contain numbers bigger than log p(n) = O(log n) bits. Set q(n) = (p(n) + O(1)) O(log n) be the size of the global state. We maintain a "q(n) x p(n)" tableau that describes the computation. The row i of the table is the state at time i. Each row of the table can be computed starting from the previous row by means of an O(q(n)) size circuit. (In fact, the microprocessor that executes the program P is such a circuit.) End of Proof. There are a bunch of technical details and fine points, but this is the general idea. NOTE that we don't need to know anything about P; the proof only uses the fact that it runs in time p(n). Now, to show that every NP problem reduces to CSAT, take the example of HAM cycle. Given a graph G, with n vertices and m edges, we build a circuit that outputs 1 iff G is Hamiltonian. How? Well, there is a computer program that checks in polynomial time if a given sequence of edges is HAM. So, there is a circuit that can do the same. Then, we "hard wire" G into the circuit. Circuit is a Yes-instance CSAT if and only G is Hamiltonian. 4. Proving More NP-Completeness Results If A <= B, and B <= C, then A <= C. Former implies a poly time function f such that A(x) = B(f(x)). Latter implies a poly time function g such that B(y) = C(g(y)). Thus, A(x) = C(g(f(x)), and g(f()) is clearly poly-time. Lemma. Suppose C is a NP-C problem, and A is some problem in NP. If we can show that C reduces to A, then A must be NP-C. (By NP-C, all problems in NP can reduce to C; since C reduces to A; by previous observation, all problems also reduce to A.) 5. NP-Completeness of SAT. CSAT is not the most convenient problem to work with. CNF formula Satisfiability (abbreviated SAT) is better. Cook showed that SAT is NP-Complete. Even easier than SAT is 3-SAT, where each clause has exactly 3 literals. It is easy to show hardness of 3-SAT using SAT. We take an instance of SAT, and convert into a poly-size instance of 3-SAT, as follows. Leave alone any clause with exactly 3 literals. For the rest: If a clause has exactly one literal, as (x), then replace it: (x V y1 V y2) ^ (x V y1 V !y2) ^ (x V !y1 V y2) ^ (x V !y1 V !y2) where y1, y2 are 2 new variables introduced just for this clause. If clause has length 2, such as x1 V x2, we do (x1 V x1 V y) ^ (x1 V x2 !y) If clause has 4 or more literals, as (x1 V x2 V ... V xk): (x1 V x2 V y1) ^ (!y1 V x3 V y2) ^ (!y2 V x4 V y3) ... 6. Some Famous NP-Complete Graph Problems. a. MAXIMUM INDEPENDENT SET: Given an undirected, unweighted graph G = (V, E), as Independent Set is a subset I of V in which no two vertices are adjacent. (A classical problem, modeling conflicts among tasks, with many applications.) Decision Version: Given a graph G, and an integer k, does G have an Ind Set of size at least k? Membership in NP is easy; just check that no two vertices of I have an edge between them. To show NP-Completeness, we reduce 3SAT to MIS. Starting with a formula of n vars x1, ..., xn, and m clauses, construct a graph with 3m vertices. This graph has an IS of size m if and only if the 3SAT formula is satisfiable. Construction: The graph G has a triangle for every clause. The vertices in the triangle correspond to the 3 literals in that clause. Vertices in different clauses are joined by an edge IFF those vertices correspond to literals that are negations of each other. EXAMPLE: (x1 V !x5 V !x3) ^ (!x1 V x3 V x4) ^ (x3 V x2 V x4) Proof of Correctness. Formula satisfiable ==> IS of size m. Suppose formula is satisfiable. Thus, in each clause, at least one literal is satisfied. We construct the IS by picking exactly one vertex corresponding to a satisfied literal in each clause; break ties arbitrarily. Note that no two such variables can be adjacent---cross-triangle edges join literals to their negations. So, we must have a IS with m vertices. IS ==> Satisfiable. Suppose we have an IS with m vertices. We can have only one vertex in each triangle; two are always adjacent. We make the variable assignment so that these chosen literals are true. This will consistently satisfy all the clauses. End of proof. 7. MAX CLIQUE. Given a graph G (undirected, unweighted), a CLIQUE is a set of vertices K such that EVERY pair is adjacent. Decision Version: Given G and an integer k, does G contain a CLIQUE of size at least k? Membership in NP is easy. For the NP-Completeness, we actually reduce the MIS problem to CLIQUE. Take an instance (G, k) of MIS. Construct the complement graph G' which has the same vertex set as G, but there is an edge (u,v) in G' EXACTLY when (u,v) is NOT an edge in G. Now, an IS in G' must be a clique in G and every clique in G is an IS in G'. Thus, G' has a k-clique if and only if G has a k-vertex IS. 8. VERTEX COVER: Given a graph G, a vertex cover C is a subset of vertices such that every edge (u,v) in G has either u or v (or both) in C. In the MIN VC problem, we want to find the cover of smallest size. In the Decision Version, we want to decide if G has a vertex cover of size at most k. Again, membership in NP is straightforward; just check if all edges have at least one endpoint in the cover. For teh NP-Completeness, we show reduction from MIS. Lemma. Suppose I is an Independent Set in G = (V, E). Then, the set of vertices (V - I) forms a vertex cover in G. Furthermore, if C is a vertex cover in G, then (V - C) is an independent set in G. Proof. Suppose C is not a vertex cover. Then there is at least one edge (u,v), for which neither u nor v is in C. This means that both u and v are in I = V - C, but that's a contradiction because u and v are not independent. For the furthermore part, suppose I is not an independent set; then there is some edge (u,v) for which both u and v are in I. But then C = V - I is not a vertex cover; it doesn't cover (u,v). So, the reduction is easy: an MIS instance (G, k) is mapped to an (G, n-k) instance of VC.