Detecting a Network Failure Jon Kleinberg Internet Mathematics, 1, 37-56. 1. How should we choose good measurement or monitoring locations in a large network? Especially when the network is unstructured, and the full knowledge of network topology unavailable. A promising approach is to take measurements at multiple points and aggregate them to infer network-wide properties. Examples include structural analysis of networks, Internet Tomiography, estimating many network performance metrics etc. In this lecture, we consider one concrete problem of determining "break in connectivity" to illustrate the ideas. 2. Consider a large network modeled as an undirected graph G = (V, E), with n nodes, which is initially connected. For a given parameter, eps > 0, we want to detect eps-partitions: failures of network elements (nodes and/or links) after which there are two subsets of nodes A and B, each of size at least eps*n, such that no node in A has a path to any node in B. For a parameter k > 0, we wish to be able to detect eps-partitions that are caused by failure of upto k (adversarily chosen) network elements. 3. The approach we take is to designate a subset of nodes D as "monitoring agents". The monitoring nodes periodically engage among themslves (pair-wise) "I am alive" communication. If at some point a pair of nodes u,v fails to communicate, it would imply a break in the network connectivity. Of course, we we choose D = V, then this is indeed a detection set. Our goal, however, is to see if a much smaller set D can work. 4. More precisely, we will only consider edge failures. (Kleinberg's paper also considers node failures, but the analysis is more complicated.) We say that a set of edges Z \subset E is an (eps, k)-partitioning set if (1) |Z| <= k, and (2) deletion of Z disconnects the graph into disjoint vertex sets A and B, each of size at least eps*n. We also say that two sets A and B are separated if there is no path joining any node in A to any node in B. A set of nodes D \subset V is an (eps, k)-detection set if for every (eps, k)-paritioning set Z, there are two nodes u,v in D that lie in different components of G\Z. (These nodes will act as a witness to the partition: they won't be able to communicate, and we can infer that a partition has occured.) Question: Given k, and eps, what is the smallest size of the (eps, k)-detection set? 5. We will prove the following theorem: Theorem 1. If we choose O( k/eps log 1/eps + 1/eps log 1/del ) random nodes in G, then this set is a (eps, k)-detection set for k edge failures, with prob at least 1-del. The non-obvious, but most unusual, aspect of this result is that the detection set's size does NOT depend on the size of the graph G!! 6. Alternation Notions of Detection. Before we prove the theorem, let's look at why the detection set must involve both eps and k. Removing either one makes it impossible to say something non-trivial. First, suppose we drop eps, and just ask for ANY disconnection of G. Consider a 2-connected d-regular graph, with d=k. Then, by removing all incident edges of a node u, the adversary can disconnect u from the rest of the graph. The only way to detect this failure would be to add u to the detection set. By this argument, we would have to choose D=V. Second, if remove the parameter k, and simply ask for a set that can detect any eps partitioning. But now consider the star network. If D has fewer than (1-eps)n nodes, then an adversary can delete all the edges to nodes NOT in D. Now G clearly contains two subsets each of size >= eps*n, but all of D lies in one component, so it can't detect the partitioning. Indeed, increasing k gives more power to the adversary, making it difficult to design small detetion sets. 7. A Simple Bound. A simple bound, which is NOT independent of the graph size, can be obtained easily from basic principles. Suppose we choose D by randomly selecting a c vertices from V, so c = |D|. Let's analyze the prob that D is unable to detect a particular (eps,k) partitioning set Z. The prob that ALL vertices of D were selected from a single component (whose size is at most (1-eps)n) of G\Z is at most: ( 1 - eps)^c Altogether there are at most (m choose k) partitioning sets; there are only so many ways to choose k edges out of m. So, the prob that at least one (eps,k)-partitioning set is not detected by D is <= (m choose k) * (1 - eps)^c Suppose we wish this prob to be less than some specified del. Then, we have (m choose k) (1 - eps)^c <= (em/k)^k (1 - eps)^{eps * c/eps} <= (em/k)^k (eps)^{-c * eps} < del Take logs of both sides: k log (em/k) - c * eps < log del Simplifying, we get c > (k/eps) log (em/k) + (1/eps) log (1/del) Thus, a random subset of O( k/eps * log m) vertices is an (eps, k) detetion set with prob at least (1 - del). 8. We now prove the result that the detection set size can be made *independent* of the graph size. Let us say that a subset of nodes S is k-edge-separable if there exists a set Z of k edges whose removal yields S as a (union of) component. The following fact is obvious: Lemma 1. If D \subset V intersects every k-(edge)-separable set of size at least eps*n, then D is an (eps,k)-detection set. Proof. Let Z be a set of k edges, whose deletion creates two disjoint sets A and B, each of size eps*n, that are separated G\Z. Consider one partition where A is one side, and the rest of the components are put as B'. Then A is a k-separable set, and so D must include at least one vertex of A. Similarly, D must include a vertex of B. These two vertices are a witness that D can detect Z. 9. VC Dimension. We now come to the main tool used in the proof. This is an important device introduced in On the uniform convergence of relative frequencies of events to their probabilities. V. N. Vapnik and A. Y . Chervonenkis. Theory of Probability Applications, 16, 1971. VC dimension (after the initials) has found many many applications in machine learning, computational geometry, databases, networking. Let O be a finite set, and let X be a collection of subsets. We say that a set A \subset O is SHATTERED by X if, for all B \subset A, there exist a set S in X such that B = A \cap S. That is, if every subset of A can be produced by intersecting A with some set in X. The VC dimension of the set system (O, X) is the MAX cardinality of a subset of O that is shattered. 10. Examples. O = a set of n points in the planbe X = set of all circles Then, no set of 4 points of O can be shattered. But all sets of triples can be shattered. Set system with infinite VC dimension. O = set of n points in the plane X = set of all convex polygons. 11. eps-Nets. A set of elements N \subset O is called an eps-net for the set system (O, X) if, for every set S \in X, of cardinality at least eps|O|, N has a non-empty intersection with S. That is, every large enough subset of O includes at least one member of N (the eps-net). VC-DIMENSION THEOREM: Let (O, X) be a set system with VC-dimension d. Then a random subset of size (d/eps log 1/eps + 1/eps log 1/del) is a eps_net with prob at least 1-del. Again, the eps-net size is independent of |O|. 12. We will end up showing that the set system for the (eps,k)-detection problem has bounded VC dimension. Basically, the underlying set O will be the vertex set V. The sets (X) will be the k-edge-separable sets. If we use the eps-net as our detection set, it follows that the detection set will include a vertex from each k-separable set of size eps*n or more!!! To pull this off, we need to bound the VC dimension of this set system. The first step is the following simple, but important observation. Lemma 2. Let H (V', E') be a connected graph, and let T \subset V' be a set of 2l terminal nodes, for some integer l. Then, we can always pair these nodes up using edge-disjoint paths in H. That is, H always contains l mutually edge-disjoint paths P1, P2, ..., Pl such that each node in T appears as one end of exactly one of these paths. Proof. Wlog, we can assume that H is a tree; otherwise, just work with a spanning tree of H. Proof is by induction on the number of nodes in H. The basis of induction is n=2 nodes, in which case we have two terminals, and one edge, so the claim holds. For n > 2, we may assume that *all* leaves of H are terminals; if any leaf is not a terminal, we can simply delete it. Now, consider an arbitrary leaf node v, and let w be the node adjance to v. We have two cases: (i) If w \in T, then we can "pair up" v and w. The smaller tree T' = T - {v,w}, by induction, has edge-disjoint paths between remaining 2l-2 nodes. Those paths together with {v,w} for the l disjoint paths. (ii) If w is not in T, then we delete the edge (v,w), and consider the problem of joining the terminals T' = T - v + w in this smaller tree; note that |H'| < |H|. By induction, there are l disjoint paths joining the terminals of T' in H'; we convert that solution into solution for H by concatenating the edge (w,v) to the edd of the path that terminates at w. QED. 13. Now we come to the crux of the proof. We consider the set system (V, KS), where KS are the k-edge-separable sets of V.` THEOREM. The VC dimension of the set system (V, KS) is at most 2k+1. PROOF. We will show that no set of size 2k+2 can be shattered by KS. Consider one such set A. We use Lemma 2 to join A's vertices by k+1 edge-disjoint paths: each node in A appears as one endpoint of exactly one path. For convenience, assume that these paths join vertices (a1, a_{k+1}), (a2, a_{k+2}), etc. We claim that the set B = {a1, a2, ..., ak+1} cannot be written as A \cap S, for any S \in KS. Suppose it could, and let Z be the wtiness set of at most k edges whose deletion yields B as a union of components in G\Z. But since |Z| < k+1, there is at least one path P_i none of whose edges are deleted by Z. Thus, both endpoints of P_i are in the same component, contradicting the hypothesis that a_i is separated from a_{i+k+1}. Thus, A cannot be shattered. QED. 14. We are done! Because the VC dimension of (V, KS) is at most 2k+1, a random subset of size O( k/eps log 1/eps + 1/eps log 1/del) is an eps-net, and hence a (eps,k) detection set for G, with prob at least 1-del. 15. Theorem: There exist graphs for which the VC dimension of (V, KS) is 2k+1. Proof. Consider the star graph K_{1, 2k+1}, and let A be the leaves of G. We claim that A is shattered by (V, KS). Indeed, let B be any subset of A. If |B| <= k, then B is k-separable as we may delete the set Z of edges incident to nodes in B. On the other hand, if |B| > k, then |A\B| <= k. In this case, we can delete all edge incident to nodes in A\B. One of the components in the resulting graph is V - (A\B), and notice that B = A \cap [V - (A\B)]. QED. 16. When adversary can delete both nodes and edges, the argument is more tricky; but Kleinberg does show that a detection set exists whose size is independent of |G| and depends only on k and eps. One thing that goes wrong is that the VC dimension of the set systems where separable sets are created by deleting vertices can have unbounded VC dimension: Consider the star network K_{1,n}. Any subset of nodes can be created as a union of components by deleting just the center node. So, any size subset is shattered. So, Kleinberg uses more complicated set systems, and bounds their VC dimension. 17. A different model and result. Deterministic Detection Sets of size O(k/eps). [Anupam Gupta.] A detection set D is called a {weak detection set} if, for every (eps, k) partitioning set Z, (i) either Z intersects D (adversary kills a detection node), or (ii) there are two nodes that lie in different components of G\Z (who cannot communicate, and hence act as witnesses). This is different from Kleinberg's model: we need a complete knowledge of the graph. Nevertheless, it turns out that this weaker definition allows for efficient construciton of small (OPTIMAL SIZE) detection sets. CONSTRUCTION: Given G, compute a rooted spanning tree T. Set alpha = (eps n)/2k. (This is the number of vertices each detector is responsible for.) For a node v, let C(v) be the children of v in T. The algorithm processes nodes from the bottom on, and computes a weight W(v) for each nodem as follows. W'(v) = 1 + [\sum_{u \in C(v)} W(u)]. If W'(v) >= alpha, then add v to detector set; and put W(v) = 0. Otherwise, let W(v) = W'(v). Lemma: The detector set D output by this algorithm is a weak (eps, k) detection set. Proof. Let COMP be the set of connected components in T\D. First, we note that every connected component in COMP has size at most alpha. If the weight were > alpha, look at the node in this component that is the ancestor of all other nodes in the tree-scan order. Now this node has weight > alpha, which would imply it would have been added to D. A contradiction! Consider any subset Z \subset V of size at most k, where Z \cap D = 0 and G \ Z contains two disjoint subsets C1, C2 each of size at least eps*n that are disconnected. We show that the situation in which all vertices of Z are deleted can be detected. If D intersects both C1 and C2, we are done. So, consider the other case, and let C of one of C1 and C2 that has empty intersection with D. Because |C| >= eps*n, there are at least 2k components A1, A2, ..., of COMP that intersect C. In order to disconnect (C \cap Ai) from G\C without including any detectors, the adversary must delete at least one vertex in Ai. Hence, at least 2k > k vertices must be deleted to disconnect C, which is a contradiction. QED.