Introduction to Game Theory Game Theory and Strategy Philip Straffin 1. Some of you may have heard of game theory; others may not have. But perhaps all of you are wondering what does game theory have to do with Computer Science. A fair question, which we hope is answered satisfactorily by the end of the course. But just to be explicit, this is not a course about *computer games*. It is, however, about strategies and behavior of intelligent agents, which might help you write computer games if you so desire. 2. Let's consider a simple toy game for example. DOLLAR AUCTION. I want to auction this $1 bill. It's honest auction: no min bid; person with the highest bid wins. Here are the rule of this auction (game): It's an open cry auction, so you can bid multiple times. Any (positive) bid is fine; The $1 goes to the highest bidder; The 2nd highest bidder also pays his/her bid (and gets nothing). :-) It is this last condition that complicates the game, and often generates interesting responses. What would you do? How would you bid? What do you think it's the highest bid one would make? If you think it's .99, reason from the perspective of the second bidder. This bidding/counter-bidding can continue, and in a perfectly rational world, there is NO upper bound on the highest bid the $1 can produce!!!! In some of the academic experiments, the bids have been typically as high as $4-5!!! 3. Game theory was invented as a framework for *analysis of conflict* (competition, self-interest) the kind of situation Dollar Auction presents. What kinds of conflicts are there in computer science, you may ask? Well, before I answer that, let's take a brief look at the history of GT. Initially conceived as a mathematical foundation for parlor games, it now forms the *very basis* for all of mathematical economics. Historically, economics has been the subject of supply and demand, and the classical analytic tool for studying economics is Price Theory: Adam's invisible hand of the market. But even in 1800s, economists recognized that price-based theory was too limited, and competition and incentives were more important factors. Following the invention of game theory, and influential work of von Neumann and Nash, the *non cooperative game theory* now forms the general basis for studying economics. The driving force in game theory is *incentives* and *self-interest*. 4. Over the past few decades, computer science has also undergone a revolution, and not only CS is now a major factor in any economy, but the nature of the CS infrastructure has reached a state where many of the central questions are best studied using a game-theoretic, incentive-based analysis. Through the world-wide-web, we have now all becomes players in an Information and Computational Economy. Instead of physical goods, we trade information: data, music, videos; we participate in electronic commerce, auctions, social networks, etc. I would even say that the world of computing is an even more exciting economy than the classical one. 5. As users, we also have preferences. As rational human beings, we also tend to prefer "what is good of us" to "what is bad for us", one can imagine this would lead to "conflicts" when we have to share "scarce" resources. We will see many concrete examples later, but in general, we prefer more bandwidth to less (higher quality streaming video to lower quality); faster response to slower; we all prefer our "computational jobs" or "print jobs" to finish first than last when bidding on ebay, we would like to win at low price not high; and so on.. Of course, it's only logical to believe that "others" feel the same way, yet not all can be first or the fastest. So, when I say GT is about conflict, what I really mean is "conflict of our preferences". What may be optimal for me is not necessarily optimal for you, and vice versa. 6. Immediately, we see that even one of the most fundamental concept of traditional CS is in question: what is an optimal solution when the outcome affects many, not one? In traditional CS analysis, we tend to go with "global" optimality---think of as "single user" perspective. When we use Dijkstra's algorithm to compute the Shortest Path, we are looking after the interests of that user When we compute a Minimum Spanning Tree, the goal is to minimize the cost of that user When we compute an optimal job schedule, the goal can be to max system throughout. But, in a distributed, decentralized computer systems with many usres (e.g. Internet, P2P, computational grid, etc), what solution would be deemed *optimal* by all? Generally, no single solution that satisfies all. Worse yet, users can try to influence the outcome by strategic, lying or deceitful behavior!! In the Internet, users (software agents) cannot be trusted to follow the rules, so a user wlill only accept a solution that serves his interest. This sounds like "game theory"! With that, preamble, let's get started on real game theory, and see what it's all about, and what it can teach us. 7. Very abstractly, a "game" is defined to any situation that has: a. 2 or more "players". b. each player has 1 or more "strategies" (actions). c. the actions of all the players determine the "outcome" of the game d. each outcome leads to "payoffs" to the players (different amounts to different players). These ideas will become more concrete and their subtleties clearer as we progress, but we want to first focus on the intuition behind the main ideas of game theory. 8. Rationality. A key assumption in GT is that players act "rationally": they always choose their actions with the single-minded goal of maximizing their payoff. I.e. payoffs are the *only* value that players get out of playing the game. Each player can *influence* the outcome with his strategy, but the outcome of the game is a function of *everyone's* strategies. This is where conflict and cooperation arise. Indeed, each player also tries to base his strategy assuming what he thinks other will do; other, of course, are doing the same thing. First a word about "rational" behavior. Yes, it's a common criticism: humans are not perfectly rational, and there are other social factors that sometimes come in the play. There are several responses to this criticism: o. the degree of rationality is a difficult thing to model; it certainly makes any mathematical analysis impossible; by studying ideal behavior, GT hopes to shed light on the impact of rational behavior. o. in CS, we expect players to be "software agents/programs" that act on the behalf of their designers. In that setting, strategies etc have to be pre-programmed, and a rational model seems quite acceptable. Nevertheless, to illustrate the sometimes counter-intuitive behavior that emerges from rationality, let me consider a toy example: 8. GUESS THE NUMBER. Everyone in the room, guess a number (integer) between 0 and 99, and write it down. I will collect all the numbers, and compute the average A. Then, the winner is the one whose guess is closest to 2/3*A. What will you guess? Suppose you were the only *perfectly rational* person in the room; the rest didn't think too hard. You will guess that A will about 50, so you should guess 100/3. But now if everyone thought this way, they will also reason this way, and so the average will drift down from 50 to 33. Now you *foresee* this in your 2nd iteration of reasoning, so you will need to guess (2/3)*(2/3)*A. But this speculation-counter-speculation continues, and the end result is that *all rational* agents should guess 0!!!!! The point of these two examples is to illustrate some pitfalls and subtleties of *rational* thinking... as irrational as they may seem :-) * * * * 9. The basic 2-person-zero sum game. The analysis of this game by John von Neumann marks the birth of Game Theory, so it's only appropriate that we start with this. A 2P-ZS game has two players, whose interests are diametrically opposed: win for one means for the other. Many recreational or parlor games fit in this category: Tic-Tac-Toe, Chess, Tennis, Soccer, etc... This game models the ideal conflict between two nations, corporations, businesses, enemies, etc. 10. The Matrix Representation: The Normal Formal. We can represent the strategies and payoffs for such a game as a matrix. A convenient, mnemonic name for players is Rose R (for row) and Colin C (for column). Let's consider a purely abstract toy example of such a game: Colin A B ---------------------- A | 2 -3 | | | Rose B | 0 2 | | | C | -5 10 | ------------------------ Fig 1. What do the entries mean: R has three possible strategies (A,B,C), and C has two possible (A,B); the strategies A and B for R are not necessarily the same as for C; they are just names for each player's actions. (Suppose R player is the striker; C is the goalie. R can strike left, middle, or right. This particular goalie make only two moves: dive to left or dive right.) Each entry represents the "payoff to R"---due to zero-sum game, the payoff to C is the negative of this. So, for strategy pair (A,B), R will lose $3 and C will win 3. The assumption here is that both players have *complete* knowledge of the payoff matrix; they know each other's all *possible* moves. But each has to decide on their move independently and simultaneously: they announce their moves, and the matrix tells them what the payoff is. Consider the thinking process of these "rational players". Rose likes payoff of 10, so she should play C, hoping Colin will play B. But Colin can anticipate that and play A, giving Rose payoff of -5. Anticipating that, Rose should play A, and since Colin anticipates that, he can play B. And, we get into this endless chain of reasoning... What would you advise Rose and Colin to do, and what *assurance* can you offer justifying your recommendation? This is the crux of GT: are there choices for players that they will not *regret*? 11. Dominance and Saddle Points. In order to get there, we will need to introduce a few basic concepts. For that, consider the following slightly larger game: Rose and Colin, both with 4 strategies or actions. Colin A B C D ------------------------------------- | A | 12 -1 1 0 | B | 5 1 7 -20 Rose | C | 3 2 4 3 | D | -16 0 0 16 -------------------------------------- Fig. 2 Rose prefers the large payoff (e.g. 16 would be ideal), while Colin prefers small payoffs (-20). 12. Dominated Strategies. Some strategies are just not worth playing *ever*. For instance, should Colin ever play strategy C? No! Because strategy B serves Colin's interest better than C. That is, B is *strictly* better than C for Colin. We call C a *dominated* strategy for Colin. 13. Dominance Principle. A rational player will (should) never play a dominated strategy. So we can eliminate it from our matrix. While clearly very useful, this by itself does not help reduce the complexity of the problem. In fact, there are no dominated strategies in example of Fig 1. If you play this game for a while, you will notice that Rose uses strategy C quite often, and Colin uses B frequently. Why is that? At a superficial level, we can call them *most cautious* strategies: Rose is assured of winning at least 2, while in other strategies she can even *lose* some times. For Colin, he is assured of losing the least (2), while in others his losses can be much larger. But the important point is that Rose-C and Colin-B in fact are very good strategies *even if* Rose and Colin are not cautious players. There are two "justifications" for this: (i) Equilibrium: Rose C--Colin B is an equilibrium point. This means that as long as Colin is playing B, C is Rose's optimal strategy, and similarly as long as Rose plays C, Colin's optimal strategy is to play B. This is an EQUILIBRIUM: if both are playing this pair of strategies, neither one can benefit by switching. (ii) Safety Nets: By playing C, Rose can ensure she will win *at least* 2 NO MATTER what Colin plays. By playing B, Colin can ensure he loses *at most* 2 NO MATTER what Rose does. If Rose wins < 2, she would regret not playing C. If Rose wins > 2, Colin would regret not playing B. Thus, by playing anything else, they each risk a *worse* payoff than the equilibrium. 14. Best Response. According to GT, the Rose C--Colin B is the "rational" or "optimal" play for this game. The specific structure of the solution in this example is this: its payoff is simultaneously the MIN in its row, and MAX in its column. what does that mean: if Rose knew Colin is playing B, row C maximizes her payoff; if Colin knew Rose is playing C, col B maximizes his payoff. Strategies leading to this outcome are players' *best response* to each other. 15. Saddle Point: An outcome in a matrix game is called a *saddle point* if the entry is minimum in its row and maximum in its column. (Visually, it resembles the surface of a saddle.) Saddle Point Principle: If a matrix game has a saddle point, then the optimal (rational) strategy for both players is to play the saddle point (i.e. choose their strategies that lead to this outcome). 16. Value of a Game: If in a matrix game, there is a number v s.t. Rose has a strategy that guarantees a payoff of at least v, and Colin has a strategy that guarantees a payoff (to Rose) of at most v, then v is called the *value of the game*. Clearly, if a game has a saddle point, then the payoff at the saddle point is the value of the game. Not every game matrix has a saddle point; and even if it does it need not be unique. Example of non-unique saddle points: Colin A B C D ------------------------------------- row-wise minimum | A | 4 2* 5 2* 2 <- max-min | B | 2 1 -1 -20 -10 Rose | C | 3 2* 4 2* 2 <- max-min | D | -16 0 16 1 -16 -------------------------------------- col max 4 2 16 2 ^ ^ | | min-max In this case, there are 4 saddle points. They all have equal value; however, not all outcomes with value 2 are saddle: B-A is not a saddle. %----------------- SKIP if Running Low on Time ---------------------% 17. Saddle Point Theorem: Any two saddle points in a game have the same value. Further, they all appear as corners of a rectangle. Proof. We show that if R and C play strategies containing a saddle point then the result will always be a saddle point. Suppose a and b are saddle points, and c,d are the other corners of the rectangle containing a,b. a ........ c . . . . . . d ........ b a is the min in its row, and b is the max in col --> a <= c <= b. b is the min in its row, and a is the max in col --> b <= d <= a Taken together, it must be that a=b=c=d, and so c,d also are min in their rows and max in their columns. QED. 18. Finding Saddle Points. For each row, find the min; mark the rows with the *largest* MIN. (This is MAX-MIN) for each col, find the max; mark the cols with the *smallest* MAX. (This is MIN-MAX) If the max-min = min-max, then they appear at saddle point strategies. This can be done efficiently, in linear time. %-----------------------------------------------------------------------% 18b. However, in other games, max-min does not equal min-max. E.g.: Colin A B ---------------------- row minimum A | 2 -3 | -3 | | Rose B | 0 2 | 0 <- max-min | | C | -5 10 | -5 ------------------------ col maximum 2 10 ^ min-max When this occurs, the game does not have a saddle point. Basically, Rose can guarantee that she wins at least 0, while Colin can ensure she does not win more than 2, but in that range, the game is open, and unresolved. Also, the intersection of max-min and min-max (B,A) is *not* an equilibrium: Rose would want to switch to strategy A. This was the landscape in which John von Neumann's famous theorem appeared, that laid the foundation of GT. * * * * * * * * * 19. MIXED STRATEGIES The question facing von Neumann was this: in a game without saddle points, what strategy a rational player should choose? It's instructive to consider one of the simplest games without a saddle point: MATCHING PENNIES. Colin H T ---------------- H | 1 -1 | | | Rose T | -1 1 | ---------------- On match, Rose wins 1; on mis-match, Colin wins 1. No saddle in the game. (Check!) What would you do in such a game? Even without an iota of game theory, you would do the right thing: *randomize*! If you don't randomized, the opponent can always take advantage of it, and beat you. By randomizing, each has 50-50 chance of either being a match or a no-match, so each has *expected payoff* 0, which is the right answer in this unbiased game. When there is no saddle point, neither player would want to use a single strategy with certainty since the other player could then take advantage of it. The only sensible thing is to use a random device to choose from a set of strategies. Such a plan, which involves playing a *mixture* of strategies according to some probability distribution is called a *MIXED STRATEGY*. Formally, a mixed strategy is a prob dist over strategies. By contrast, playing one strategy with certainty (p=1) is called *PURE STRATEGY*. To understand the effects of both players using mixed strategies, we use the concept of *expected value*: 20. EXPECTED PAYOFF: Suppose the prob of getting payoff ai is pi, for i=1,2,..,k, then the expected payoff with respect to these probabilities is: \sum_i ai*pi (It is just the weighted sum of the payoffs, weighted by probabilities.) In order to understand how to rationally play mixed strategies, suppose for a minute we know what (mixed) strategy Colin is using. Say, Colin is playing (0.5, 0.5): equal prob for his two strategies. Colin A B -------------------- A | 2 -3 | | | Rose B | 0 3 | -------------------- How should Rose play? Well, Rose can calculate the *expected* payoffs for 3 strategies: payoff(A) = 2*1/2 - 3*1/2 = -1/2. payoff(B) = 0*1/2 + 3*1/2 = 3/2 So, clearly Rose should play B. This reasoning is called the 21. EXPECTED VALUE PRINCIPLE: If you know your opponent is playing a given mixed strategy, then you should play your strategy with the largest expected payoff. Question: Keeping Colin's strategy fixed, can Rose do better by randomizing? Now, consider the situation from Colin's view. He *knows* that mixed strategy (0.5,0.5) can be exploited by Rose to get a payoff of 3/2. So, perhaps he should try a different combination. But, for any of his mixed strategies, Rose would like to choose her strategy that gives her the *best* payoff. So, the best Colin can hope for is to choose a strategy that cannot be exploited by Rose----i.e. one for which Rose's expected payoffs are the same for all her moves. So, suppose Colin plays A with prob x and B with prob (1-x), for x \in [0,1]. Rose's expected payoffs are: Rose(A) = 2x - 3(1-x) = 5x - 3. Rose(B) = 0x + 3(1-x) = -3x + 3. By setting them equal, we get 8x = 6 --> x = 3/4. Thus, if Colin plays (3/4, 1/4), Rose's expected payoffs for both A and B are 3/4. Now, consider the same game from Rose's perspective. She also knows that for any of her mixed strategy, Colin will try to choose a column that *minimizes* her payoff. So, the best she can hope for is to neutralize this threat, by choosing a mixed strategy where all actions of Colin are *equalized*: Suppose she plays (y, 1-y). Then, Colin(A) = 2y + 0(1-y) = 2y Colin(B) = -3y + 3(1-y) = 3 - 6y By setting them equal, we get 8y = 3 --> y = 3/8. So, if Rose plays (3/8, 5/8), no matter which strategy Colin plays, the payoff to Rose is at least: 2y = 3 - 6y = 3/4. 22. Solution of the Game: Do you notice some similarity between this solution and the saddle point! Game theory would then prescribe that: o. 3/4 is the value of this game o. (3/4 A, 1/4 B) as Colin's optimal strategy o. (3/8 A, 5/8 B) as Rose's optimal strategy Together these two strategies are called the *solution* of the game. 23. General 2-person-0-sum Games: Arbitrary Number of Strategies With 2 players, we can always solve the game: we get two equations in 2 variables, always giving a unique solution. But what happens when each player has several pure strategies? In general, the game matrix is a m*n matrix, where Rose has m strategies and Colin has n strategies. In this case, we don't get the a uniquely solvable set of simultaneous linear equations. In fact, considering the picture from when Colin's position, equalizing Rose's payoff, we get m equations in n variables! This was the starting point in the founding of game theory: John von Neumann proved in 1928 that *every* m x n matrix game has a solution. 24. The MiniMax Theorem. Von Neumann showed that every such game has a unique number v, the value of the game, and optimal strategies (pure or mixed) such that (i) if Rose player her optimal strategy, her expected payoff is >= v, no matter what Colin does. (ii) if Collin player his optimal strategy, Rose's expected payoff is <= v, no matter what Rose does. 25. Proof Idea. We will not prove the minimax theorem. Von Neumann's original proof used Brouwer's fixed point theorem in topology. A particularly simple proof is based on linear programming. In fact, the minimax theorem is an easy corollary of the Duality Theorem of LP. I will not prove that theorem, but as a teaser, let me at least show you the formulation: Rose wants to find a mixed strategy maximizing her expected payoff. So, suppose this strategy is (p_1, p_2, ..., p_m). Then, we can write an *optimization* problem: max v subject to \sum_{i=1}^m p_i * a_i1 >= v (exp if C plays strategy 1) \sum_{i=1}^m p_i * a_i2 >= v (exp if C plays 2) ... ... \sum_{i=1}^m p_i * a_in >= v (exp if C plays n) p_1 + p_2 + ... + p_m = 1 and p_i >= 0 We can normalize things by introducing new variables: x_i = p_i/v Then, the constraints become: \sum_{i=1}^m x_i * a_i1 >= 1 \sum_{i=1}^m x_i * a_i2 >= 1 ... ... \sum_{i=1}^m x_i * a_in >= 1 x_1 + x_2 + ... + x_m = 1/v and x_i >= 0 Now, since maximizing v is the same as *minimizing* 1/v, we can write simply: Problem A: minimize x_1 + x_2 + ... + x_m subject to \sum_{i=1}^m x_i * a_i1 >= 1 \sum_{i=1}^m x_i * a_i2 >= 1 ... ... \sum_{i=1}^m x_i * a_in >= 1 This optimization problem, with a linear objective function and linear constraints is called Linear Programming. Turns out we also write a symmetric problem: Problem B: minimize y_1 + y_2 + ... + y_n subject to \sum_{i=1}^n y_i * a_1i >= 1 \sum_{i=1}^n y_i * a_2i >= 1 ... ... \sum_{i=1}^n y_i * a_mi >= 1 y_i >= 0 These two problems are called DUALs of each, and a remarkable theorem of LP is that these two solutions have the identical value.