Network Flows 1. When one thinks about a network (communication, social, transportation, computer networks etc), many fundamental questions naturally arise: how well-connected is it? how much "data" (commodity) can it transport? where are its bottlenecks? and so on... In the next few lectures, I will describe in some detail the Theory of Network Flows. It is both elegant in its mathematics, as well as quite general and powerful for modeling a variety of practical problems. The theory itself dates back to 1950s (well before the internet or the web), when Ford and Fulkerson described an augementation based method for finding maximum flows in a capacited network, with transportation being the underlying motivation. 2. Flow will be our abstract term for the traffic. It is an abstract entity which originates at the source nodes, and is absorbed at sink nodes. We will consider a simplified model (first), which includes a. a directed graph G = (V,E). b. non-negative capacity for each edge; c. a single source node s, and d. a single sink node t. (Single source, single sink is not a serious limitation.) We will simplify our discussion by assuming that (i) no edge enters the source, (ii) no edge leaves the sink, (iii) at least one edge is incident to edge node, and (iv) all capacities are integers. These assumptions preserve all essential issues, but remove some annoying pathologies. FIG: (network) 3. Flow Definition. We now mathematize the notion of traffic flow in a network. We say that an s-t flow is a function f that assigns a non-neg number (real number) to each edge. The value f(e) intuitively represents the amount of flow carried by edge e. The flow f must satisfy the following two properties: (i) Capacity Constraint: 0 <= f(e) <= c(e), for all e \in E. (ii) Flow Conservation: For each node v, other than s and t, we have \sum_{e into v} f(e) = \sum_{e out of v} f(e). The former (latter) sums over all edges that are directed into (out of) v. Obviously, the flow on an edge should not exceed the capacity of the edge, and except for s and t, the net flow at a node should be zero. No such constraints for s and t. Some times, a lower bound l(e) for an edge is also specified. EXAMPLE. (flow) The value of the flow, denoted v(f), is the total amount of flow generated at the source v(f) = \sum_{e out of s} f(e). 4. The Max Flow Problem. What is the maximum amount of flow that can be sustained in G? Clearly, the only obstacle to the flow are the capacities of the edges in G. The bottleneck doesn't necessarily occur from the edges originating at s, or those terminating at t. It can arise from a complicated interaction among the edges, as flow snakes through G. 5. Modeling Power of Network Flows. 3 Examples to show how network flow type problems can be used to model a rich variety of combinatorial optimization problems. Transportation. Matrix Rounding. Scheduling 6. Ford Fulkerson Method. 7. Polynomial Schemes for Maxflow 8. Preflow Push Method. 9. Applications of Maxflow-Mincut theorem. 9A. Max cardinality bipartite matching Given a bipartite graph G = (X, Y, E), a matching is a subset of edges M \subset E in which no two edges have a common vertex (i.e. vertex disjoint set of edges). This can be thought of as the unweighted assignment problem: jobs and workers, with edges representing compatibility. Assign as many workers to job as possible. It is easy to see that simple-minded greedy methods do not work. (However, at least for unweighted case, it is easy to show that greedy is 1/2 optimal.) We can transform the matching problem to a maxflow problem, in a fairly transparent way: Introduce a node s, and join it to all nodes in X. Direct all edges from X to Y. Introduce a sink t, and join all nodes of Y to t. Given every edge capacity 1. Compute a maxflow in this network. We claim that the value of maxflow equals the cardinality of the max bipartite matching. The equivalence of the two is fairly straightforward. Suppose there is a matching of k edges (xi, yi). Then, we can produce a flow of value k, in which 1 unit of flow goes through each of k paths of the form (s, xi, yi, t). Easy to check this is a valid flow (respects capacity and conservation). Conversely, suppose there is a flow of value k. By the integrality theorem, there exists an integer-valued flow. Because the capacity of each edge is 1, the only possible values of flows for each edge are 0 and 1. Consider all edges of the form (x,y) with flow 1. Call this set M'. Fact 1. M' contains exactly k edges. Proof. Consider the cut (A, B), where A = s + X. The value of flow is the total flow leaving A minus the flow entering A. There are no edges entering A, and each edge leaving A has capacity 1, so there must be k edges leaving A, each carrying 1 unit of flow. This is |M'|. Fact 2. Each node in X is the tail of at most one edge in M'. Proof. Suppose, instead, a node x \in X is the tail of two edges in M'. Because flow is integer valued, this would mean that at least 2 units of flow leave x. By flow conservation, then at least 2 units must enter x, but that is impossible because the incoming edge into x has capacity 1. Similarly, each edge of M' is the head of at most edge in M'. Therefore, the edges of M' form a matching of size k. 9B. Disjoint Paths. In many applications, we want to compute many edge-disjoint paths between s and t (for fault tolerance, parallelism etc). In addition, the flow as discussed so far has a static feel: we just describe it as a number associated with each edge. It does not describe how the flow "travels" through the network as in traffic. It would be useful to obtain this kind of traffic picture. We say that a set of paths (each path starting at s and ending at t) is edge-disjoint if no two paths have a common edge. Given a directed graph G, and two nodes s and t, the Edge-Disjoint Path problem is to find the maximum number of disjoint paths from s to t. (We can also consider the undirected version; which we can convert to the directed by replacing each undirected edge by two oppositely directed copies.) We will prove: the maximum number of edge-disjoint paths between s and t equals the maxflow between s and t in G. => If there are k edge-disjoint paths, then maxflow is at least k. Simply ship 1 unit of flow along each path. It easily satisfies the capacity and conservation properties. <= If f is a 0-1 valued flow of value k, then the set of edges with flow f(e) = 1 contains k edge-disjoint paths. We prove this by induction on the number of edges that carry non-zero flow. If k=0, then trivial. Otherwise, there is at least one edge (s,u) that carries a unit of flow. We now "trace out" a path of edges that must carry the flow: by flow conservation, some edge (u,v) must have flow, and so on, until we either reach t or return to a node v for the second time. In the former case, we now have a path p from s to t, which we add to our list. We reduce the flow on the edge of p to zero. The new flow has value less than k, and it has fewer edges of non-zero flow, which by induction yields the result. In the latter case, we have a cycle C. If we decrease the flow value on all edges of C to zero, it doesn't affect the total flow from s to t, but it has fewer edges with non-zero flow, so induction completes the proof. QED. 9C. MENGER's Theorem: What we just showed is equivalent to Menger's Theorem. In a directed graph, the max number of s-t edge-disjoint paths is equal to the minimum number of edges whose removal disconnects s from t. If the removal of F separates s from t, then each s-t path must use at least one edge from F. Thus, the maximum number of disjoint paths is at most |F|. Conversely, the max number of disjoint paths equals the maxflow. If this values is k, then the maxflow-mincut theorem says that there is a s-t cut (A,B) of capacity k. Let F be the set of edges that go from A to B. Since each edge has capacity 1, F has exactly k edges. This proves Manger's theorem: max number of edge-disjoint s-t paths equals the min number of edges whose removal disconnects s from t. A similar statement holds about *node disjoint* s-t paths. An easy transformation converts the problem to edge-disjoint version: vertex transformation.... 10. Extensions of the Maxflow Problem. 10A. Circulations with Demands. Suppose we have multiple sources and sinks, instead of a single s-t pair. Rather than maximize the total flow (which can be tricky to agree on due to fairness among different flows), we work with a *fixed* set of demands and supplies. Each node v has an associated demand dv. If dv > 0, we say v is a demand node; if dv = 0, v is a transshipment node; if dv < 0, v is a supply node (it supplies -dv units). A circulation is a function f such that (i) Capacity Constraint: 0 <= f(e) <= c(e), for all edges e (ii) Demand Condition: fin(v) - fout(v) = dv, for all nodes v So, instead of *optimization*, we now have a feasibility problem (satisfying demands at all nodes subject to capacity constraints). Clearly, if a feasible circulation exists then \sum_v dv = 0. This is because \sum_v (dv) = \sum_v ( fin(v) - fout(v) ). In the summation on the right, each edge appears twice, once in fin and once in fout, canceling each other. Algorithm. We convert the circulation problem into a flow problem. Introduce a source s, and join it to all supply nodes, with edge capacity equal to -dv. Similarly, add a sink node t, and join each demand node to t, with edge capacity dv. Now, the circulation is feasible if and only if the maxflow has value exactly \sum_{v demand node} dv. Example. 10B. Circulation with Demands and Lower Bounds. In some applications, certain amount of flow is forced on some edges. That is, f(e) >= l(e), and so there is a lower bound on the value of flow at some edges. The conditions for the flow, with demands, now change to: (i) l(e) <= f(e) <= c(e), for all edges e (ii) fin(v) - fout(v) = dv, for all nodes v. Now, again, decide if such a circulation is feasible. We will do this in two steps. I. First, solve for the circulation *without* any lower bounds. That is, we set an initial circulation of f_0(e) = l(e), for all edges. This circulation clearly satisfies the capacity constraint (both upper and lower bounds), but perhaps violates the demands. In particular, let L(v) = f_0^in (v) - f_0^out (v) = \sum_{e into v} l(e) - \sum_{v out of v} l(e). If L(v) = dv, we have satisfied the demand at v. Otherwise, we need to *superimpose* another circulation that Will clear the imbalance introduced by f_0. So, we need to find a circulation f_1, where for node v, f_1^in (v) - f_1^out (v) = dv - L(v). And how much capacity do we have to work with? Edge e has available capacity of c(e) - l(e). So, we basically compute a circulation in the network G', where e has capacity c(e) - l(e), and a node v has demand dv - L(v). Example. PROBLEMS: Page 435 (ad hoc) Kleinberg-Tardos Page 444 (free trade) ------------------------------------------------------------------------ 11. Minimum Cost Flow Problem. G = (V, E) is a flow network. An edge ij has capacity u_ij, and cost c_ij. We write this as tuple (c_ij, u_ij). If the edge ij has x units of flow through it, then it costs x.c_ij. Each node v has a demand or supply. If b(v) > 0, it has net supply; if b(v) < 0, it has net demand; and otherwise, it must conserve the flow through it. Assume that the total supply equals total demand: \sum_j (b(v) = 0. In general, it may be enough to assume that supply >= demand; and use a artificial sink to absorb the excess supply at zero cost. We wish to find a flow in which each node v has net outflow b(v) and the total cost of the flow is minimized. The shortest path problem is a special case of MCF: capacity = 1 for all edges; d(s) = 1, and d(t) = -1. The min cost matching is also a special case: each vertex in X has b(x) = 1; each vertex in Y has b(y) = -1; edges have capacity 1. The max flow problem is a special case when costs are zero. An Illustrative Application: Caterer's Problem. ---------------------------------------------- A caterer needs to provide d_i napkins on each of the next n days. He can either buy new napkins at the price of $x per napkin, or reuse dirty napkins by having them laundered. He has two choices for laundary: $y per napkin for a 1-day service, and $z per napkin for 2-day service. Assume that x >= y >= z. Compute an optimal (least cost) purchasing/laundary policy for the caterer. Create a source node s, which acts as a supply node for new napkins. For each day i, create two nodes: p_i and q_i. Set b(p_i) = -d_i; so this is the demand vertex for day i. Set b(q_i) = +d_i; so this is the supply vertex that can provide dirty napkins for future days. Set b(s) = \sum_i d_i; as total demand over n days. From node s, draw edges to each p_i, with cost $x and capacity d_i; this edge encodes purchasing upto d_i new napkins for day i. From node q_i, we draw one edge to node p_{i+2}, with cost $y and capacity d_i, and one edge to node p_{i+3}, with cost $z and capacity d_i. These edges encode the fact that napkins of day i can be reused on day i+2 at cost $y, and on day i+3 at cost $z. The total supply of dirty napkins from day i is clearly upper bounded by d_i. One can see that a Min Cost Flow, satisfying the demand and supplies solves the caterer's problem. MIN COST FLOW ALGORITHM ----------------------- First, the concept of the residual graph is the same as before. However, if the forward edge (i,j) has cost c_ij, then the revsere edge (j,i) has cost -c_ij. This corresponds to the fact that canceling a flow reduces the cost. A simple MCF algorithm starts by computing a MAX FLOW, ignoring the costs entirely. We create a single source vertex s, add edges from s to each supply node v with capacity |b(v)|. We also create a sink node t, add edges from each demand node v to t of capacity |b(v)|. If the maxflow does not saturate all the edges coming out of s, then there is no feasible solution. Otherwise, the solution is feasible, but the maxflow computed may have very high cost. In order to reduce the cost of the maxflow, we construct the residual graph, and check to see if has a *negative cycle*. (We can use Bellman-Ford for this step.) Remember that reverse edge in G_f have negative costs. If G_f does have a negative cycle, then we can reduce the cost of the flow, by pushing flow around the cycle, until we cannot push any more flow. Pushing a flow of \delta around the cycle reduces the total cost of the flow by \delta *|C|, where C is he cost of the cycle. Pushing the flow around a cycle does not change the net flow at any vertex; so flow remains conservered, and all demands and supplies are satisfied. We only reduce the cost of the flow. What if there is no negative cycle in G_f? Then, the flow must have min cost! Theorem. A flow f is a min cost flow if and only if G_f has no negative cycles in it. Proof. If G_f has a negative cycle, then we can reduce the cost of the flow, so f can't be min cost flow. If G_f does not have a negative cycle, then we argue that it has min cost. Suppose f is not optimal, and there is another flow f* with the same value as f, but lower cost. Consider the flow (f* - f). This is a set of cycles in G_f. By pushing flows around these cycles, we can convert f to f*. However, since all the cycles have non-negative costs, this can only increase the cost. Thus, c(f) <= c(f*), and because f* is optimal, we must have c(f) = c(f*). end. --------------------------------------------------------------- Multi-Commodity Flows: 1. Throughout the preceding discussion, we assumed that the commodity flowing in the network (from all the sources) is identical. This holds true as long as we do not need to look within the traffic class itself. For instance, if we are interested in computing the total flow of road traffic, or natural gas as a homogenoeus commodity, this view works. 2. When the traffic is of multiple types, but there is no interaction among them: for instance, each type has its own capacity constraints, then we can still model the problem as multiple instances of single commodity flows. 3. Only when there are multiple different commodoties AND they need to share the network resources (without explicit capacity allocation), the problem become trickier. In the general case of the multi-commodity problem, we assume there are k different types of goods. Each node can be the source or sink of one or more such commodities. The edge capacity is the upper bound on the TOTAL sum of the commodities flowing across that edge. The general problem can be formulated as a Linear Program, and solved in polynomial time. We will return to this topic later. 4. However, unlike the integrality of single flow, the integrality in multi-commodity flows is harder to achieve. In fact, with the requirement that individual flows be integer-valued, the problem becomes NP-Complete. Here are some simple instance of this intractability. 5. DISJOINT PATHS. Given are a graph G = (V, E), and a collection of disjoint vertex pairs (s1, t1), (s2, t2), ..., (sk, tk). Decide if G contains k mutually vertex-disjoint paths, one connecting each si to its matching ti. Problem is NP-complete, whether G is directed or undirected. Even for planar graphs. DIRECTED 2-Commodity Integral Flow. Given a directed graph G = (V, E), and two s-t pairs, s1, t1, and s2, t2. There is positive integer capacity c(e), for each edge, and integer demands R1, R2 at nodes t1, t2. Are there integer flows f1, f2, such that (i) for each edge e, f1(e) + f2(e) <= c(e); (ii) flow is conserved at each node v other than si and ti. (iii) for i=1,2, the net flow into ti under flow fi is at least Ri. This is also NP-complete, even if c(e) = 1 for all edges, and R1 = 1.