CS290I Lecture notes -- Ramsey Numbers: Top Hits Played at Parties Thrown by Paul Erdos


You may not think about it much, but like most demanding disciplines, Math and Science has its share of heros. During the 20th century, Paul Erdos (pronounced "err-dosh") was clearly one of them. He was an extremely prolific researcher. So prolific, in fact, that they invented a metric which describes the publishing relationship that the math community as a whole has with him: The Erdos Number. If you have published a paper with Paul, your Erdos number is 1. If you have published a paper with someone who as published a paper with Paul, your Erdos number is 2, and so on. Paul also had no home. Everything he owned, he carried in a single suitcase. He would arrive at the home of some mathematician he admired, and simply move in for two or three months. Then he'd leave and move on. It was considered a tremendous badge of recognition to be visited by Erdos and, by all accounts, he was a terrible house guest. Toward the end of his life, he spent a great deal of time staying with Ron Graham while he was in North America.

Paul lived for mathematics.

While he was a genius and had a tremendous capacity for collaborating (he'd literally flit from problem to problem "helping" other mathematicians with their work), he had a knack for posing extremely simple problems that defy mathematical solution. The purpose of this lecture is to introduce you to one of them called The Party Problem.


We'll begin by illustrating The Party Problem with an example. Imagine that you are throwing a party.

What is the smallest number of people you can invite such that either there must be a group of three people at the party, all of whom know each other, or there must be a group of three people at the party, all of whom are complete strangers?

You have to get your head around what the question is actually asking, before we can go further. Say you invite 15 people and you can invite anyone form the world's population. Surely you can invite 15 strangers or 15 mutual friends since you are free to pick anyone. The question is,

Is it possible to invite a set of people so that there is NOT a group of three complete strangers and there is NOT a group of three mutual friends?

If you can make the invitations such that both of these NOTs are true, then you know that the smallest number referred to in the question is at least 15.

More formally, the question can be stated thus:

Find the minimum number of guests that must be invited so that at least m will know each other or at least n will not know each other. The solutions are known as Ramsey numbers. . This definition is due to Wolfram Research. On this class, we will study only symmetric Ramsey numbers: where m == n and in the example, both are equal to 3.

You can and should bend your brain around the English language statements of the problem, which sound to me like those brain teaser puzzles in Reader's Digest, or you can resort of graph theory for a picture. Let's define a node to be a person, and an edge to represent the property of acquaintance between two people represented by nodes at the ends. Notice that, for any group of people (nodes) there are edges connecting them pairwise since two people either know each other or do not know each other. That is, the graph is fully connected. We'll further stipulate that the relationship is symmetric: if I know you, you know me. The graph is, therefore, undirected. Another term for such a graph is clique.

Further, we can color the edges so that they signify the binary acquaintance relationship. We'll let red indicate that the people (nodes) at the ends of an edge are strangers, and green indicate that they are acquainted. The party question for these five nodes can be restated as

Is it possible color the edges of this 5-clique such that there is no subclique of size 3 in which all of the edges in the subclique are the same color?

If the answer to this question is "yes" then you know that the 3rd symmetric Ramsey number R(3,3) is bigger than 5. Why? Because if such a coloring exists, then it is conceivable that there are 5 people who could be invited, there would be no group of 3 complete strangers (which would be signified by a red subclique of size 3) and no group of 3 mutual friends (which would be signified by a green subclique of size 3). Therefore, the smallest number for which one or the other must be true must be bigger than 5.

On the LHS of this figure, a coloring like the one described above is depicted. Notice that it does not contain a triangle (also called a 3-clique) that is all one color. As such, R(3,3) is not 5.

Now let's ask the same question about 6. Since there are six nodes and two colors, at least three edges originating at any node must be the same color (red or green). Pick a node (1 in the example) and a color for those edges (say green). Consider the colors of the dotted edges in the figure. If any one of them is also green, then they form a green triangle with the three green edges (solid lines we said were green). If none of them can be green, then they all must be red, forming a red triangle. It is, therefore, impossible to color the edges of a complete graph on six nodes without introducing a monochromatic 3-clique. Since it is possible on 5, and not possible on 5, R(3,3) is 6.

What this means is that if you throw a party and you can invite anyone, from any time, who has ever lived, on any planet, there must be a group of three complete strangers or there must be a group of three mutual acquaintances or both. Neat, huh?

It Gets Hard Fast

While the problem statement is relatively simple and the proof of R(3,3)=6 straightforward, it is a problem in graduate-level combinatorics to prove that R(4,4) is 18. Anecdotally, two very capable Ph.D.-level mathematician friends of mine decided to work out the proof for R(4,4) as a twisted form of entertainment. It took about 6 hours without the help of a combinatorics book on the subject.

R(5,5) is currently unknown as is R(k,k) for any integer value of k greater than 4. In Math terms, this situation is called an "open problem." You should think of it as a really open problem. When Math knows the answer for 3 (it is easy), 4 (it is much harder) and does not know 5 or greater (but knows that the number must exist -- unlike Fermat's Last Theorem) you know you have just stepped into the deep end of the pool. Indeed in the book "The Man Who Loved Only Numbers," (a book about Paul Erdos) many of the mathematicians interviewed said that they believed a new kind of mathematics (as yet unseen) was necessary to attack this problem successfully. One even claimed that it would be 50 years at least, before it would come to fruition.


The Current State of Things

The current Ramsey score for various colors and subclique sizes is given on this lovely page by Eric Weisstein and Wolfram Research. There are a couple of generalizations to the symmetric problem that people have worked on as well. The first is to consider asymmetric colorings using only two colors. The number R(m,n) refers to having a subclique of size m of one color and/or one of size n of another. The other generalization is to add colors. By far the coolest Ramsey numbers, though, are the symmetric ones.

At present, R(5,5) is known to be between 43 and 49. I do not know how those bounds were determined, but knowing it to be one of 7 numbers and not being able to say which one is a remarkable situation in mathematics. In 1997, two friends of mine and I generated counter examples for all sizes up through 42 for R(5,5) as a diversion at SC97. Some day, we'll get serious about it. Some day soon.

Ladies and Gentlemen, Start your Engines...

So what does this have to do with you and this class? Your job is to try and improve the known bounds on R(10,10) using every possible resource available to you. The current bounds are 798 to 23556. The team that produces the largest counter-example for R(10,10) wins.

In the event of a tie (more than one team comes up with the same largest counter-example) the team that uses the widest variesty of resources (from the largest number of separate sites) wins.

Your Race Car

At the end of the class you will need to make a short presentation that describes four things: You will turn in the counter-example, your code, and your logs for grading.

Many of the clouds you will encounter will limit or quota your access, possible without warning. You will need to ensure that your distributed program can "survive" many different kinds of outages or your hard work may be lost.

Thus a good solution requires careful logging mechanisms, fault tolerant design and implementation, and good cloud programming skills.

The Details

Logically, each graph can be represented by the upper or lower triangle in a square adjacency matrix where each element of the matrix is a binary value indicating the edges color. For example, in my implementation

0 0 1 0 1 1 0 1 1 0 0 0 1 0 1 0 1 0 1 0
0 0 0 0 1 1 0 0 1 0 1 1 1 0 1 0 1 0 1 0
0 0 0 0 0 1 0 1 0 0 1 1 1 0 1 0 1 0 1 0
0 0 0 0 1 1 1 1 0 0 1 0 1 0 1 0 1 0 1 0
0 0 0 0 0 0 0 0 1 0 1 0 1 0 1 0 1 0 1 0
0 0 0 0 0 0 0 0 1 0 0 0 1 0 1 0 1 0 1 0
0 0 0 0 0 0 0 1 1 0 1 1 1 0 1 0 1 0 1 0
0 0 0 0 0 0 0 0 1 1 0 0 0 0 1 0 1 0 1 0
0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 1 0
0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 1 0 1 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 1 0
0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 1 0 1 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

represents a two-color graph on 20 nodes. A "0" is a red edge and a "1" is a green edge. Notice that only elements on one side of the diagonal matter. That is because [2,6] and [6,2] mean the same thing. That is again, an edge from [2,6] is the same edge that goes from [6,2] because the graph is undirected. Also -- the entries along the diagonal do not matter since nodes are not connected to themselves by an edge.

To figure out whether this graph is a counter-example for R(10,10) or not you simply need to run a set of nested loops to count the number of monochromatic subcliques of size 10 that it has. The following routine written in C uses integers to represent node adjacency color.

Consider the code found here.

This routine takes 2 arguments:

It returns the number of monochromatic subcliques of size 10. If this number is 0 then the graph is a counter example for R(10,10).

Your job will be to find graphs where the dimension is as large as possible and the counts is 0. If the dimension is 799 or larger, you will have achieved a new results in combinatorics.

For example, the graph on 20 nodes shown above is a counter example on R(10,10). That is, the number of single-color subcliques of size 10 in this graph is 0. However, because it is size 20 this is not a new result since the known bounds for R(10,10) are known to be between 798 and 23556 a counter example on 20 nodes is not surprising.

Some helpful hints from Juan Manuel Fangio

Ramsey counter examples have some interesting properties. For example, a counter example on n nodes has embedded in it a counter example on n-1 nodes. To see why, imagine that you remove one node and all of the edges incident on it from a counter example on n nodes. Removing edges cannot create a monochromatic subclique so the remaining graph on n-1 nodes must also be a counter example.

However, not all counter examples on n nodes are embedded in a counter example on n+1 nodes. Some are, but some are not.

None the less, one startegy to consider is to try and find a counter example on a small number of nodes to start out with and then to add a node and a set of edges to make a graph one dimension bigger. If the smaller counter example is embedded in a larger one, then only the new edges need be recolored.

Another thing to realize is that the count of the number of monochromatic subcliques is a kind of "fitness function." That is, a graph with a smaller number of monochromatic subsliques is, in some sense, "better" than one with a larger number, the best being 0 subcliques. A search startegy is then

a greedy search won't get very far before no move decreases the count. In that case you need to make an "uphill" move so that you can explore a different part of the space.

However the strategy or strategies you use are up to you. Any algorithm you wish to use is fine as long as what you produce as your "best" answer is, in fact, a counter-example for R(10,10). The team with the largest such counter-example wins.

Graph Format

When you turn in your counter-example it must be in a textfile that has the following format: So, for example, if the largest counter-example you were able to find was on 20 nodes, your submitted graph would look like

20 0
0 0 1 0 1 1 0 1 1 0 0 0 1 0 1 0 1 0 1 0
0 0 0 0 1 1 0 0 1 0 1 1 1 0 1 0 1 0 1 0
0 0 0 0 0 1 0 1 0 0 1 1 1 0 1 0 1 0 1 0
0 0 0 0 1 1 1 1 0 0 1 0 1 0 1 0 1 0 1 0
0 0 0 0 0 0 0 0 1 0 1 0 1 0 1 0 1 0 1 0
0 0 0 0 0 0 0 0 1 0 0 0 1 0 1 0 1 0 1 0
0 0 0 0 0 0 0 1 1 0 1 1 1 0 1 0 1 0 1 0
0 0 0 0 0 0 0 0 1 1 0 0 0 0 1 0 1 0 1 0
0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 1 0
0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 1 0 1 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 1 0
0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 1 0 1 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

I will run a clique-checker on your submitted graphs to ensure that they are, in fact, counter examples. They must be in this format for the checker to work properly.