The Small World Phenomenon: An Algorithmic Perspective Jon Kleinberg 1. A matter of folklore: we are all linked by short chains of acquaintances. In social (natural) networks, short paths between any two nodes. Began with a famous experiment by Stanley Milgram (1960s): A source person in Nebraska was given a letter, to be delivered to a target person in Massachusetts. Source told just the basic information: name, address, occupation of the target. Source can forward the letter to a "acquaintance" (someone he known on first-name basis), who then repeats the process until the letter is delivered to the target. Over many trials, the average number of hops (intermediate steps) in successful transmission were between 5 and 6! This has come to be known as "6 Degrees of Separation" principle (also a movie and play). Mathematics and Actors: Small communities, such as mathematics and actors, have also been found to be densely connected by short chains of associations. In mathematics, Erdos number describes distance to Paul Erdos based on shared pubs. Paul Erdos (March 26, 1913 -- September 20, 1996), was an immensely prolific and famously eccentric Hungarian-born mathematician. With hundreds of collaborators, he worked on problems in combinatorics, graph theory, number theory, classical analysis, approximation theory, set theory, and probability theory. Erdos was one of the most prolific publishers of papers in mathematical history, second only to Leonhard Euler; Erdos published more papers, while Euler published more pages. He wrote around 1,500 mathematical articles in his lifetime, mostly with co-authors. He had 511 different collaborators (The Erdos Number Project Data Files), and strongly believed in (and obviously practiced) mathematics as a social activity. Finally, WWW is the obvious domain for such investigations. The link structure in the Web can be viewed as a social graph: the links that users create reflect their interests, and act as proxies for their acquaintances. 2. A natural analytical question: Why should there exist short paths between arbitrary strangers? Much of the early work took the view that social networks were random graphs, which tend to have small diameter. But this view has logical problems: if A and B are two nodes with a common friend, it is quite likely that A and B are themselves friends. This type of transitivity doesn't exist in random graphs, and it basically it says that social networks would be non-random and too clustered, and not have a small diameter. 3. A recent model by Watts-Strogatz. Edges are divided into two types: Local and long-range. A typical example: consider a ring network: a set of n points spaced uniformly around a circle. Join each point by an edge to each of its k nearest neighbors, for small constant k. These are the local contacts. Then introduce a small number of edges whose endpoints are chosen randomly---the long range contacts. Justification for the model: most people know their neighbors, except there are a few long-distance contacts. They show that this network also has a small diameter. This model has been used for analysis of the hyperlink structure in WWW. 4. Even if these network models have small diameters, it doesn't explain why people should be able to discover these short paths using *only local information.* This is the ALGORITHMIC question. Milgram's experiment shows there must be *latent navigational cues* embedded in the network that let users discover short paths. 5. We will study a model of social networks (proposed by Jon Kleinberg), which sheds light on both the structural and algorithmic aspect of small world phenomenon. 6. MODEL: Designed around a simple framework, which encapsulates the paradigm of Watts-Strogatz: rich in local connections, with a few long-range contacts. He even allows edges to be directed; this is a generalization, and clearly present in real-world social networks (you may know or know of, say, your local major, or CEO of a company, without the link being symmetric). Assume a nxn grid, where the lattice distance between two nodes is the number of lattice steps. That is, d( (i,j), (k,l)) = |k-i| + |l-j|. (Keep in mind the number of nodes in this graph is n^2.) Local Contacts: For a constant p >= 1, each node u has directed edges to all those nodes that lie within lattice distance p from u. Long Range: For universal constants q, r >= 0, we create directed edges from u to q other nodes using random iid trials. The ith edge from u has endpoint v with probability proportional to [d(u,v)]^{-r}. (We normalize these quantities to turn them into probabilities. That is, divide by D = \sum_{v} [d(u,v)]^{-r}.) 6. Geographical Interpretation of the Model. Individuals live on a grid; each node knows its neighbors upto some distance in each direction. They also have a few acquaintances distributed across the grid. If we keep p and q fixed, then we get a 1-parameter family of social networks by tuning the parameter r >= 0. Think of this as a Decay parameter, that controls how "widely networked" the society of nodes is. When r = 0, we get a uniform distribution of the long-range contacts. (This is basically the Watts-Strogatz model.) As r increases, the long-range contacts become more heavily clustered around u's location. We are now ready to tackle the algorithmic question: distributed routing. 7. Decentralized Routing. Consider two arbitrary nodes s and t. The goal is to transmit a message from s to t in as FEW STEPS AS POSSIBLE. Figure of Merit: expected delivery time of the message. In the decentralized algorithm, the message is passed sequentially from the current message holder to one of its (local or long-range) contacts, *using only local information*. Specifically, the message holder u at any time has the following information: a. the underlying grid structure; so it knows the local contacts of all the nodes in the network. b. the location on the lattice of the target t. c. the location and long-range contacts of all the nodes that the message has come in contact with. NOTE that u does not know the long-range contacts of the nodes that have not handled the message yet. In other words, u only knows the local history of the message's path. If the entire graph were known to u, then the shortest path can be computed simply by running BFS. 8. Main Results: Theorem 1. When r = 0, the expected delivery time of ANY decentralized algorithm is at least Omega (n^{2/3}), where the constant in Omega depends on p and q, but not n. Theorem 2. When r = 2, p=q=1, there is a decentralized algorithm with expected delivery time of O(log^2 n). Theorem 3. When r > 2, the expected delivery time of ANY decentralized algorithm is Omega (n^{(r-2)/(r-1)}). Discussion: With r = 0, the graph is basically the grid graph with a few random links thrown in. Random graph results show that this graph has expected diameter O(poly(log n)); So, there DO EXIST short paths, but no decentralized algorithm can find them. As r increases, the geographical structure implicit in the network allows the algorithm to find short paths. At the same time, with growing r, the long-range contacts become less useful in moving the message a large distance. The critical value of r happens to be 2: the inverse square distribution. We will focus mainly on Theorem 2. 9. Proof of Theorem 2. The decentralized algorithm works as follows: in each step, the current node u chooses a contact that is as close to t as possible (by lattice distance). We have p=q=1. That is, each node only has links to its four local neighbors, plus one long-range neighbor, generated using the inverse-square distribution. For simplicity of proofs, we will invoke the Principle of Deferred Decisions: we will generate the long-range contact on demand (when the message reaches u) instead of assuming that it's precomputed. (On the other hand, this is not essential for understanding the analysis: all arguments are based on prob of membership in a subset.) Suppose the message has arrived at node u. We estimate the probability that a particular node v is u's long-range contact. Let D = \sum_{w \neq u} 1/d(u,w)^2, where w spans over all nodes w. Then, the prob that v is the long-range contact of u is d(u,v)^{-2} /D. Now, D <= \sum_{j=1}^{2n-2} (4j)/j^2 = 4(1 + ln (2n-2)) <= 4 ln (6n). (Harmonic series approximation: \sum_{j=1}^n j <= 1 + ln(n) For the last sum, note that 1 = ln(e) = 2.71...) Thus, the prob that v is chosen is >= 1/ [4 ln (6n) d(u,v)^2]. PHASES of the Algorithm: we say that the execution of the algorithm is in phase j when the lattice distance from the current node to t is > 2^j and <= 2^{j+1}. Thus initially, j = log n. When the lattice distance to t becomes at most 2, we say we are in phase 0. Observation: In each step, the lattice distance to t strictly decreases ==> no node becomes the message holder more than once. 10. The most important case is to consider the range of phases j, where log(log n) <= j < log n. We are in phase j, and u is the current holder. Let's consider the probability that phase j will end in this step. This will require the new distance to t to become at most 2^j. Let B_j be the set of nodes within lattice distance 2^j of t. A conservative bound is: 1 + \sum_{i=1}^{2^j} i = 1 + 1/2 (2^{2j} + 2^j) > 2^{2j -1}. Furthermore, all of these nodes in B_j are within distance 2^{j+1} + 2^{j} < 2^{j+2} of u. Thus, each of these has prob of at least 1/[4 ln (6n) 2^{2j+4}] of being the long-range contact of u. If any of these nodes actually is the long-range contact of u, it will be the closest neighbor of u to t. Therefore, the message enters the set B_j with prob at least 2^{2j - 1}/[4 ln (6n) 2^{2j + 4}] = 1/ [128 ln (6n)]. 11. Let X_j denote the total number of steps spent in phase j, where loglogn <= j < log n, then we have E(X_j) = \sum_{i=1}^infty Pr [X_j >= i] = \sum_{i=1}^infty ( 1 - 1/[128 ln (6n)])^{i-1} = 128 ln (6n). (This can also be simplified: if the prob of success is p, then the expected number of independent tosses to get first success is 1/p.) Similarly, when j = logn, we also get EX_j <= 128 ln (6n). Finally, when j < loglogn, we have the easy bound that EX_j <= logn; this follows because distance from u to t is at most 2^{loglogn} = logn, and the algorithm can reach t even by following the local contacts in logn steps. 12. Finally, let X denote the total number of steps spent by the algorithm. X = \sum_{j=0}^logn X_j So, by the linearity of Expectation, E(X) <= (1 + log n) E(X_j) <= c (log n)^2, for a constant c. QED. --------------------------------------------------------------------- Lower Bound Theorems. 1. To complement this upper bound, we now prove the lower bound claimed in Theorem 1. Kleinberg shows that r=2 is a critical parameter value: for values of r less than 2, short paths may exist but decentralized algorithms are not good at finding them; for r > 2, the network becomes too clustered and short paths man not even exist. At r=2 these two opposing forces are at balance. 2. Making precise what the algorithm knows---all lower bounds are restricted to a model of algorithms; in other words, the lower bound shows the limitation of a particular class of algorithms. The weaker the assumptions, the more powerful (larger class of algorithms) the lower bound is. In the present case, we make minimal assumptions on what the state of the decentralized routing algorithm includes, and how it makes its routing decisions. A. The algorithm knows the overall grid structure of the network, and thus the local contacts of all the nodes, plus it knows the locations of s and t. B. Let Si denote the set of nodes that have touched the message upto this moment. We allow the algorithm the knowledge of all the long term contacts of nodes in Si. C. Based on this information, the algorithm can choose to forward the message to any contact v of any node in Si that has not yet received the message. Note: Observe that v doesn't have to be a contact of the current message holder. This ability plus the capability of sending messages to new nodes in each step is essentially equivalent *not counting the backtracking steps.* D. Now S_{i+1} has one more element than S_i. Algorithm iterates until t is reached. 3. We will prove the following Theorem : Suppose 0 <= r < 2. Then, the expected delivery time for any decentralized scheme is Omega ( n^{(2-r)/3}). I will actually focus on the case r=0, and show the lower bound of Omega(n^2/3}; the general case is basically the same proof with slightly more technical details. 4. Assume that s and t are chosen uniformly at random from the grid. Given a node u, what is the prob that it chooses a specific node v as one of its q long-term contacts. By assumption, this is d(u,v)^{-r} / NormalizingTerm. Let us calculate the NormalizingTerm. sum_{v \neq u} d(u,v)^{-r} >= \sum_{j=1}^{n/2} (number of nodes at dist j from u)/j^{-r} >= \sum_{j=1}^{n/2} ( j * j^{-r} ) >= \integral_{1}{n/2} (x^{1-r}) dx >= [ (x^{2-r}) / (2-r) ]_{1}^{n/2} >= ( n/2)^{2-r} - 1)/(2-r) We may assume that n is large enough to satisfy: 2^{2-r} < 1/2 * n^{2-r} With this, we get NormalizingTerm >= 1/2 * (n^{2-r}) / (2-r)*(2^{2-r}) >= n^{2-r}/ (2-r)*2^{3-r}. 5. In fact, for r=0, we can calculate a better bound for NormalizingTerm more directly and simply: Node u chooses each of the remaining (n^2 - 1) nodes with equal probability, so that theNormalizingTerm = 1/(n^2 - 1) = theta(n^2). 6. Let U denote the set of nodes within lattice distance pn^{2/3} of t. Remember p is the parameter controlling the local contacts. How many nodes are in this "neighborhood" of t? |U| <= 1 + \sum_{j=1}^{pn^{2/3}} (4j) <= 4 p^2 n^4/3 7. We will fix the parameter L in a minute. Let us define E' to be the event that the message reached within (L * n^2/3) steps a node (not t) whose long range contact is in U. Let E'_i be the even that this happens in step i. Then, clearly, E' = Union E'_i, where union is over i <= L * n^2/3 (Think of E' as a bad case for us: the algorithm gets lucky!) 8. Use the Method of Deferred Decisions to generate the long term contacts online. Thus, Pr[E'_i] <= q * |U| / NormalizingTerm <= q * 4 * p^2 * n^4/3 / (n^2 /16) <= 64 * p^2 * q / n^{2/3} 9. Pr[E'] <= \sum_{j <= L * n^{4/3}} Pr[E'_i] <= L * n^{2/3} * 64 * p^2 * q / n^{2/3} <= 64 * p^2 * q * L 10. If we choose L = 1 / (p^2 * q * 256), then we get Pr[E'] <= 1/4. Thus, the prob that the message reaches a node with a long-range contact in U within O(n^{2/3}) steps is less than 1/4, for a given s and t pair. 11. We now argue about the prob of s and t being reasonably far apart. That is easy. Let F be the event that chosen s and t are more than n/4 (lattice steps) apart. Then, Pr(F) >= 1/2. (One way to see this: for any node u, its avg distance to other nodes is n/4; the worst-case happens with u at the center). Thus, the prob that s and t are at least n/2 distance apart, and the event E' does not occur (the good case for the lower bound) is P(F and not-E') = 1 - P(not-F or E') => 1 - [P (not-F) + P(E')] => 1 - (1/2 + 1/4) => 1 - 3/4 => 1/4. 12. Let X be the expected number of steps taken by the algorithm to deliver the message from s to t. Let Y1 be the event that s and t are more than n/4 steps away and the delivery takes n^{2/3} steps; and let Y2 be the event covering the remaining cases. Then, EX = Prob[Y1 occurs] * [n^{2/3}] + Prob[Y2 occurs] * [time of delivery] > 1/4 * n^{2/3} + 0 = Omega(n^{2/3} -------------------------------------------------------------------