Difference between revisions of "Combinatorial game theory"

From HexWiki
Jump to: navigation, search
(Added a remark that values are not numbers.)
(Comparison games: Added proof for soundness of comparison game.)
 
(2 intermediate revisions by the same user not shown)
Line 254: Line 254:
  
 
The English explanation of this value is something every Hex player will agree with: If Red moves first in the region, Red moves at x and connects all three terminals. After this, the region is [[settled region|settled]], i.e., neither Blue nor Red wants to make another move there. If Blue moves first in the region, Blue moves at x and ''threatens'' to connect all of Blue's terminals, but Red can reply at y to achieve outcome a. In this latter case, Red connects terminals 2 and 3, but not terminal 1.
 
The English explanation of this value is something every Hex player will agree with: If Red moves first in the region, Red moves at x and connects all three terminals. After this, the region is [[settled region|settled]], i.e., neither Blue nor Red wants to make another move there. If Blue moves first in the region, Blue moves at x and ''threatens'' to connect all of Blue's terminals, but Red can reply at y to achieve outcome a. In this latter case, Red connects terminals 2 and 3, but not terminal 1.
 +
 +
== Comparison games ==
 +
 +
=== Comparing positions ===
 +
 +
When we consider two different Hex positions in the same region, we often ask questions like: are the positions equivalent? Is one of them better for Red than the other? We can formalize this notion by defining an ''order'' on positions.
 +
 +
Given two positions P and Q in the same Hex region, we say that P ≤ Q if Q is at least as good for Red as P. The symbol "≤" is pronounced "less than or equal", and is called an ''order relation''. Note that the order is always defined from Red's point of view, i.e., by "better", we mean "better for Red". But of course Blue has the opposite point of view: from Blue's point of view, P ≤ Q means that P is at least as good for Blue as Q.
 +
 +
'''Remark.''' The ordering of positions is not total: one can have positions P and Q such that neither P ≤ Q nor Q ≤ P holds. For example, P could be better for Red in some situations, and better for Blue in others. Such positions are called ''incomparable''.
 +
 +
When two positions are such that both P ≤ Q and Q ≤ P hold, then P is at least as good for Red as Q and vice versa. In that case, we say that the positions are ''equivalent''.
 +
 +
So far, we have not been very precise about what we mean by "at least as good". One way to define this is by considering the region to be part of a larger game, and then check to see who is winning that game. Specifically, P ≤ Q means that for every way of embedding the region inside a larger game, and no matter whose turn it is, if P is winning for Red, then Q is also winning for Red.
 +
 +
=== The comparison game ===
 +
 +
In practice, we do not have time to think about all of the infinitely many ways that the rest of the game could be affecting our region. It turns out that there is a much simpler way to prove that P ≤ Q, which only requires us to think about P and Q, and not about the rest of the board. This is done by playing the so-called comparison game, which we now describe.
 +
 +
Let P and Q be two positions in the same Hex region. The ''comparison game'' is a game played by two players, who are called ''Truthifier'' and ''Falsifier''. Truthifier's goal is to prove that P ≤ Q and Falsifier's goal is to prove that P ≰ Q. Therefore, to prove P ≤ Q, we must describe a winning strategy for Truthifier.
 +
 +
The game is played as follows. The positions P and Q are set up next to each other. The players alternate, with Falsifier going first.
 +
A move by Falsifier consists of either placing a red stone in P or placing a blue stone in Q. A move by Truthifier consists of either placing a blue stone in P or placing a red stone in Q. Thus, whenever Falsifier plays in one component (P or Q), Truthifier has the option to either respond in the same component (using the opposite color) or in the opposite component (using the same color). The game continues until both P and Q are completely filled with stones, i.e., there are no more empty cells. At this point, it is easy to check whether P ≤ Q, because it amounts to checking that for every red path through P, there is a corresponding red path through Q. If this is the case at the end of the game, Truthifier wins; otherwise, Falsifier wins.
 +
 +
'''Remark.''' The above description of the comparison game works for Hex. For some other classes of combinatorial games, the description might be slightly more complicated, depending on such details as if and when passing is allowed. We will get back to this point in the mathematical section below.
 +
 +
'''Example.''' The page [[theorems about templates]] claims that B is at least as good for Red as A.
 +
A: <hexboard size="2x2"
 +
  float="inline"
 +
  edges="bottom"
 +
  coords="none"
 +
  visible="-a1"
 +
  contents="B b1 b2 E x:a2"
 +
  /> B: <hexboard size="2x2"
 +
  float="inline"
 +
  edges="bottom"
 +
  coords="none"
 +
  visible="-a1"
 +
  contents="B a2 R b2 E y:b1"
 +
  />
 +
We now show how to prove A ≤ B using the comparison game. We must describe a winning strategy for Truthifier. Falsifier makes the first move. If Falsifier plays a red stone at x, Truthifier responds with a red stone at y. Since this kills the blue stone, the outcome of B is better for Red than that of A, so Truthifier wins. On the other hand, if Falsifier plays a blue stone at y, Truthifier responds with a blue stone at x. Since B has a red stone and A doesn't, B again has a better outcome than A, so Truthifier wins. Since Truthifier wins the comparison game no matter how Falsifier plays, we have proved A ≤ B.
 +
 +
Note that in this example, Truthifier always responds to Falsifier's move by playing in the opposite component. There are no other choices, since each component only has a single empty cell. Let us now consider a more complicated example.
 +
 +
'''Example.''' Here is another claim from [[theorems about templates]]. We claim that A ≤ C, where
 +
A: <hexboard size="3x4"
 +
  float="inline"
 +
  edges="none"
 +
  coords="none"
 +
  visible="area(a2,b3,d1,c1)"
 +
  contents="R c1 b2 b3 B a2 d1 E w:c2"
 +
  /> C: <hexboard size="4x4"
 +
  float="inline"
 +
  edges="none"
 +
  coords="none"
 +
  visible="area(a2,b3,d1,c1)"
 +
  contents="R c1 b3 B a2 E x:b2 y:c2 z:d1"
 +
  />
 +
To prove it, we consider every possible move by Falsifier.
 +
* If Falsifier plays Red w, Truthifier responds with Red y, killing x, and leaving A ≤ C (since C has an empty cell where A has a blue stone).
 +
* If Falsifier plays Blue x, Truthifier responds with Red y, killing x and leaving A ≤ C.
 +
* If Falsifier plays Blue z, Truthifier responds with Red x, leaving the two regions equal.
 +
* If Falsifier plays Blue y, Truthifier responds with Red x. Now if Falsifier plays Blue z, Truthifier responds with Blue w, leaving the two regions equal. If Falsifier instead plays Red w, Truthifier responds with Red z, capturing y and leaving C better than A.
 +
 +
'''Remark.''' In the last example, we did not always play the game to the end. Indeed, as soon as Truthifier reaches a position in which it is clear that A ≤ C, we can stop the comparison game, as we already know that Truthifier has a winning strategy from that point onwards.
 +
 +
=== Proof that the comparison game works ===
 +
 +
Let P and Q be two positions in the same Hex region, and suppose that Truthifier has a winning strategy for the comparison game. We claim that P ≤ Q in the sense that for any position of the rest of the board, if Red has a winning strategy for P in that position, then Red has a winning strategy for Q in that position. Indeed, Red can win in Q by following the following procedure.
 +
Red sets up two Hex boards. The left board contains the game with position P in the region, and the right board contains the same game, but with position Q in the region. It is the same color's turn on both boards. Red's goal is to win against Blue on the right board. The left board is only for Red's private use.
 +
 +
Case 1: It is Blue's turn. Then Blue makes a move on the right board. Case 1.1: Blue's move is outside the region. Then Red copies the same move, using a blue stone, on the left board. Now it is Red's turn on both boards. Case 1.2: Blue's move is within the region Q. Then Blue just played one of Falsifier's moves in the comparison game. By assumption, Truthifier has a winning strategy for the comparison game. If Truthifier's strategy is to make a red move in Q, then Red makes the corresponding move in Q. It is now Blue's turn again on both boards. If Truthifier's strategy is to make a blue move in P, then Red makes the corresponding move, using a blue stone, in P. If is now Red's turn on both boards. Case 2: It is Red's turn. By assumption, Red has a winning strategy on the left board. So Red plays a winning move on the left board. Case 2.1: Red's move is outside the region. Then Red makes the same move on the right board. It is now Blue's turn on both boards. Case 2.2: Red's move is in the region P. Then Red just played one of Falsifier's move in the comparison game. Truthifier has a winning strategy in the comparison game. If Truthifier's winning strategy is to make a blue move in P, then Red makes that move (using a blue stone) on the left board. It is now Red's turn again on both boards. If Truthifier's winning strategy is to make a red move in Q, then Red makes that move on the right board. It is now Blue's turn on both boards.
 +
 +
Red can continue playing on both boards in the above manner until both boards are filled. Since Red followed a winning strategy on the left board, Red has a winning path. Since both boards are identical outside the region, and Q (now being completely filled) is at least as good as P, Red also has a winning path on the right board. Therefore Red wins the game against Blue.
 +
 +
'''Remark.''' We showed that if Truthifier has a winning strategy for the comparison game, then P ≤ Q. The converse is not always true. In other words, it is possible for Truthifier to lose the comparison game, even though Q is at least as good as P in all contexts of larger Hex games. However, such situations are relatively rare.
  
 
== Mathematical details ==
 
== Mathematical details ==

Latest revision as of 00:18, 12 May 2024

This page is about the combinatorial game theory of Hex. For combinatorial game theory in general, see Wikipedia.

Combinatorial game theory is a formalism that can be used to analyze sequential two-player perfect information games. It was developed by Berlekamp, Conway, and Guy starting in the 1960s, and has since been applied to a variety of games, including Hex. One of the central ideas of combinatorial game theory is that games can be split into multiple components (for example, separate regions of the Hex board), and each component analyzed separately. Combinatorial game theory can be used to reason about such concepts as domination, reversibility, inferior moves, and equivalence of positions.

Concepts

In this section, we introduce some of the main concepts of combinatorial game theory, as it relates to Hex. We do this mostly by example. A more mathematical treatment follows later.

How the game ends

Normally, a game of Hex ends either when a player has connected their edges, or when a player resigns. However, for the game theoretic analysis, it is easier if we imagine that the game continues until the board is completely filled with stones. Notice that after one player has a winning path, filling the rest of board with stones can no longer change who the winner is. So we are free to assume that the game continues until there are no more empty cells on the board. We will make this assumption throughout this article.

Regions

By a region of the Hex board, we just mean a set of one or more cells on the board. The region may include special features such as board edges, and it can optionally be pre-populated with some stones.

Examples.

  • The whole board is a region.
  • A single empty cell is a region.
  • The 3-cell corner is a region.
  • The following region is completely surrounded by red and blue stones, with three groups or red stones (and therefore three groups of blue stones) in the boundary. This kind of region is called a 3-terminal region.

Outcomes

By an outcome for a region, we mean one of the possible ways of completely filling it with stones. Since we assume that the game continues until the board is completely filled, the outcome describes what the region may look like at the end of the game. We say that two outcomes for the same region are equivalent if for every way of filling the rest of the board (i.e., the part outside of the region) with stones, the first outcome produces the same winner as the second one. Since equivalent outcomes produce the same winner in all situations, we often consider them to be equal, i.e., when we say "the same outcome", we mean "equivalent outcomes".

Examples.

  • When the region is the whole Hex board, there are only two possible outcomes, namely, "Red wins" and "Blue wins".
  • When the region is a single empty cell, there are two possible outcomes, namely, "Red gets the cell" and "Blue gets the cell".
  • For the 3-cell corner, there are 8 possible ways of filling it with red and blue stones. However, most of these are equivalent, and we end up with only 3 non-equivalent outcomes:
  • Outcome 1: Red gets everything. This outcome can be achieved in 3 possible ways; notice that they are equivalent because the blue stone is dead in all cases.
    Outcome 2: Blue gets everything. This outcome can also be achieved in 3 possible ways.
    Outcome 3: Red and Blue each get half of the corner. This outcome can be achieved in 2 possible ways, because the corner cell is dead.
  • For a 3-terminal region, there are 5 possible outcomes. Although there are many ways of filling the region with red and blue stones, at the end of the game, all that matters is which terminals are connected to each other. Suppose Red's terminals have been numbered 1, 2, and 3. Then from Red's point of view, the 5 outcomes are:
    • Red connects all 3 terminals.
      213
    • Red connects only terminals 1 and 2.
      213
    • Red connects only terminals 1 and 3.
      213
    • Red connects only terminals 2 and 3.
      213
    • Red connects none of her terminals.
      213

Positions and options

Consider a region of the Hex board. By a position in the region, we mean the state of the region after zero or more moves have been played, or in other words, a version of the region in which zero or more of its emtpy hexes have been filled with stones (of any color).

Players change positions by making moves. In a given position, we say that Red's options are all positions that Red can reach by making a single move in the region, and Blue's options are all the positions that Blue can reach by making a single move in the region.

Example. Consider the position

G =

In this position, Red has two options:

G₁ =
and G₂ =

Also, Blue has two options:

H₁ =
and H₂ =

Combinatorial game notation

In combinatorial game theory, the players are called Left and Right. In Hex, Red is Left and Blue is Right. We therefore call Red's options left options and Blue's options right options.

In combinatorial game theory, the word game is used to mean position. Thus, an unfinished game has left and right options that are themselves games.

The notation for a combinatorial game is G = {G₁, …, Gₙ | H₁, ..., Hₘ}, where G₁, …, Gₙ are the left options and H₁, ..., Hₘ are the right options. Note that the game is enclosed in curly braces, and a vertical bar is used to separate the left options from the right options.

Example. Consider the position from the previous example:

G =

As we saw, G has two left options G₁ and G₂ and two right options H₁ and H₂. We can write

G = {G₁, G₂ | H₁, H₂} = CGT-example1.png

Recursively, each of the positions G₁, G₂, H₁, H₂ is also a combinatorial game with a left and right option. For example,

G₁ = CGT-example2.png = CGT-example3.png

By recursively decomposing all options, we arrive at this representation of the game G:

G = CGT-example4.png

When a position has no empty cells, it is called an atomic position. Atomic positions have no left or right options and can cannot be further decomposed. Instead, we can replace each atomic position by the associated outcome. For example, the position G in this example is a 3-terminal position. Let us write "⊤" (pronounced "top") for the outcome "Red connects all three terminals", "⊥" (pronounced "bottom") for the outcome "Red connects no terminals", and "a" for the outcome "Red connects the right two terminals, but not the left one". With these notations, the combinatorial game G can be more succinctly described as:

G = {​{⊤|⊤}, {⊤|a} | {a|⊥}, {⊤|⊥}}.

Remark. In a combinatorial game, the players do not always alternate. For example, in the game G, Red can move to the left option G₂ = {⊤|a}. Then Red can move again to the left option ⊤. The reason that players are not necessarily alternating is that the combinatorial game is meant to represent play in a region. For example, Red might move in the region, Blue might move somewhere outside the region, and then Red might move in the region again. Thus, although the players were alternating in the game at large, within the region, Red has made two consecutive moves.

Simplification of combinatorial games

The combinatorial game notation we introduced in the previous section is not very compact. In fact, it is basically a representation of the entire game tree, i.e., a description of all moves that both players could make in every possible position of the game. For a position with more than a few empty cells, the game tree is much too large to write down in its entirety.

Fortunately, combinatorial games can be simplified. Much like the rules by which school children learn to simplify (5 + 8) * 2 to 13 * 2 and then to 26, there is a specific set of rules for simplifying combinatorial games and calculating their value. These rules are a bit complicated and we will describe them in more detail in the mathematical section below. However, the main point is that the rules are mechanical and universal: every finite combinatorial game can be simplified to a unique value, and this can be done either by hand or by using a "CGT calculator". Moreover, when a Hex game has been decomposed into several disjoint regions, we can in principle calculate and simplify the value of each region independently, and then combine them to determine the value, and therefore the winner, of the entire game. In practice, this calculation is too hard when there is lots of empty space on the board, but can be feasible for certain end game positions.

Remark. Unlike in commerce, where the value of everything can be expressed in terms of money, the value of a Hex position is not simply a number. Rather, it is a specially simplified combinatorial game that is a concise description of what each player can get from the position. When two positions have the same value, then they are equivalent, even if they look very different on the Hex board. The following example shows the value of a Hex position, and also explains what that value means in English.

Example. Once again, we consider the position from the previous example:

G =
21xy3

As we saw, the combinatorial notation for this position is

G = {​{⊤|⊤}, {⊤|a} | {a|⊥}, {⊤|⊥}}.

Using the simplification rules we will discuss in more detail later, this can be simplified (or "evaluated") to

G = {⊤ | {a | ⊥}}.

The English explanation of this value is something every Hex player will agree with: If Red moves first in the region, Red moves at x and connects all three terminals. After this, the region is settled, i.e., neither Blue nor Red wants to make another move there. If Blue moves first in the region, Blue moves at x and threatens to connect all of Blue's terminals, but Red can reply at y to achieve outcome a. In this latter case, Red connects terminals 2 and 3, but not terminal 1.

Comparison games

Comparing positions

When we consider two different Hex positions in the same region, we often ask questions like: are the positions equivalent? Is one of them better for Red than the other? We can formalize this notion by defining an order on positions.

Given two positions P and Q in the same Hex region, we say that P ≤ Q if Q is at least as good for Red as P. The symbol "≤" is pronounced "less than or equal", and is called an order relation. Note that the order is always defined from Red's point of view, i.e., by "better", we mean "better for Red". But of course Blue has the opposite point of view: from Blue's point of view, P ≤ Q means that P is at least as good for Blue as Q.

Remark. The ordering of positions is not total: one can have positions P and Q such that neither P ≤ Q nor Q ≤ P holds. For example, P could be better for Red in some situations, and better for Blue in others. Such positions are called incomparable.

When two positions are such that both P ≤ Q and Q ≤ P hold, then P is at least as good for Red as Q and vice versa. In that case, we say that the positions are equivalent.

So far, we have not been very precise about what we mean by "at least as good". One way to define this is by considering the region to be part of a larger game, and then check to see who is winning that game. Specifically, P ≤ Q means that for every way of embedding the region inside a larger game, and no matter whose turn it is, if P is winning for Red, then Q is also winning for Red.

The comparison game

In practice, we do not have time to think about all of the infinitely many ways that the rest of the game could be affecting our region. It turns out that there is a much simpler way to prove that P ≤ Q, which only requires us to think about P and Q, and not about the rest of the board. This is done by playing the so-called comparison game, which we now describe.

Let P and Q be two positions in the same Hex region. The comparison game is a game played by two players, who are called Truthifier and Falsifier. Truthifier's goal is to prove that P ≤ Q and Falsifier's goal is to prove that P ≰ Q. Therefore, to prove P ≤ Q, we must describe a winning strategy for Truthifier.

The game is played as follows. The positions P and Q are set up next to each other. The players alternate, with Falsifier going first. A move by Falsifier consists of either placing a red stone in P or placing a blue stone in Q. A move by Truthifier consists of either placing a blue stone in P or placing a red stone in Q. Thus, whenever Falsifier plays in one component (P or Q), Truthifier has the option to either respond in the same component (using the opposite color) or in the opposite component (using the same color). The game continues until both P and Q are completely filled with stones, i.e., there are no more empty cells. At this point, it is easy to check whether P ≤ Q, because it amounts to checking that for every red path through P, there is a corresponding red path through Q. If this is the case at the end of the game, Truthifier wins; otherwise, Falsifier wins.

Remark. The above description of the comparison game works for Hex. For some other classes of combinatorial games, the description might be slightly more complicated, depending on such details as if and when passing is allowed. We will get back to this point in the mathematical section below.

Example. The page theorems about templates claims that B is at least as good for Red as A.

A:
x
B:
y

We now show how to prove A ≤ B using the comparison game. We must describe a winning strategy for Truthifier. Falsifier makes the first move. If Falsifier plays a red stone at x, Truthifier responds with a red stone at y. Since this kills the blue stone, the outcome of B is better for Red than that of A, so Truthifier wins. On the other hand, if Falsifier plays a blue stone at y, Truthifier responds with a blue stone at x. Since B has a red stone and A doesn't, B again has a better outcome than A, so Truthifier wins. Since Truthifier wins the comparison game no matter how Falsifier plays, we have proved A ≤ B.

Note that in this example, Truthifier always responds to Falsifier's move by playing in the opposite component. There are no other choices, since each component only has a single empty cell. Let us now consider a more complicated example.

Example. Here is another claim from theorems about templates. We claim that A ≤ C, where

A:
w
C:
zxy

To prove it, we consider every possible move by Falsifier.

  • If Falsifier plays Red w, Truthifier responds with Red y, killing x, and leaving A ≤ C (since C has an empty cell where A has a blue stone).
  • If Falsifier plays Blue x, Truthifier responds with Red y, killing x and leaving A ≤ C.
  • If Falsifier plays Blue z, Truthifier responds with Red x, leaving the two regions equal.
  • If Falsifier plays Blue y, Truthifier responds with Red x. Now if Falsifier plays Blue z, Truthifier responds with Blue w, leaving the two regions equal. If Falsifier instead plays Red w, Truthifier responds with Red z, capturing y and leaving C better than A.

Remark. In the last example, we did not always play the game to the end. Indeed, as soon as Truthifier reaches a position in which it is clear that A ≤ C, we can stop the comparison game, as we already know that Truthifier has a winning strategy from that point onwards.

Proof that the comparison game works

Let P and Q be two positions in the same Hex region, and suppose that Truthifier has a winning strategy for the comparison game. We claim that P ≤ Q in the sense that for any position of the rest of the board, if Red has a winning strategy for P in that position, then Red has a winning strategy for Q in that position. Indeed, Red can win in Q by following the following procedure. Red sets up two Hex boards. The left board contains the game with position P in the region, and the right board contains the same game, but with position Q in the region. It is the same color's turn on both boards. Red's goal is to win against Blue on the right board. The left board is only for Red's private use.

Case 1: It is Blue's turn. Then Blue makes a move on the right board. Case 1.1: Blue's move is outside the region. Then Red copies the same move, using a blue stone, on the left board. Now it is Red's turn on both boards. Case 1.2: Blue's move is within the region Q. Then Blue just played one of Falsifier's moves in the comparison game. By assumption, Truthifier has a winning strategy for the comparison game. If Truthifier's strategy is to make a red move in Q, then Red makes the corresponding move in Q. It is now Blue's turn again on both boards. If Truthifier's strategy is to make a blue move in P, then Red makes the corresponding move, using a blue stone, in P. If is now Red's turn on both boards. Case 2: It is Red's turn. By assumption, Red has a winning strategy on the left board. So Red plays a winning move on the left board. Case 2.1: Red's move is outside the region. Then Red makes the same move on the right board. It is now Blue's turn on both boards. Case 2.2: Red's move is in the region P. Then Red just played one of Falsifier's move in the comparison game. Truthifier has a winning strategy in the comparison game. If Truthifier's winning strategy is to make a blue move in P, then Red makes that move (using a blue stone) on the left board. It is now Red's turn again on both boards. If Truthifier's winning strategy is to make a red move in Q, then Red makes that move on the right board. It is now Blue's turn on both boards.

Red can continue playing on both boards in the above manner until both boards are filled. Since Red followed a winning strategy on the left board, Red has a winning path. Since both boards are identical outside the region, and Q (now being completely filled) is at least as good as P, Red also has a winning path on the right board. Therefore Red wins the game against Blue.

Remark. We showed that if Truthifier has a winning strategy for the comparison game, then P ≤ Q. The converse is not always true. In other words, it is possible for Truthifier to lose the comparison game, even though Q is at least as good as P in all contexts of larger Hex games. However, such situations are relatively rare.

Mathematical details

To be written.

References

  • J. H. Conway, "On Numbers and Games". 1st edition 1976. 2nd edition, A. K. Peters, 2001.
  • P. Selinger, "On the combinatorial value of Hex positions", Integers 22:G3, 2022. Also available from arxiv:2101.06694