Typical AI run

# Introduction

In this article, we develop a simple AI for the game 2048 using the Expectimax algorithm and “weight matrices”, which will be described below, to determine the best possible move at each turn.

The implementation of the AI described in this article can be found here. The source files for the implementation can be found here.

# Recursivity of score function

We define some symbols and functions in order to construct a function to compute the score of any one game state (i.e. grid).

• Game state: A $4 \times 4$ square matrix $\mathbf{A}$ where $\mathbf{A}_{ij}$ is equal to the value of the cell in the game grid at row $i$ and column $j$.
• $\text{score}(\mathbf{A})$ : The numerical score of the game state $\mathbf{A}$.
• $D_\mathbf{A}$ : The set of possible move directions (up, down, left, right) for $\mathbf{A}$.
• $\text{move}(\mathbf{A}, d)$ : Produces a game state by applying a move operation to $\mathbf{A}$ in the direction $d \in D_\mathbf{A}$.
• $S_\mathbf{A}$ : The set of possible game states from randomly spawning a new tile on $\mathbf{A}$.
• $P(\mathbf{A}', \mathbf{A})$ : The probability of randomly spawning a new tile on $\mathbf{A}$ to produce $\mathbf{A}'$.

The score function of game state $\mathbf{A}$ can then be written as

$\displaystyle \text{score}(\mathbf{A}) = \sum_{A' \in S_\mathbf{A}}P(\mathbf{A}', \mathbf{A})\cdot\max_{d \in D_\mathbf{A'}}\text{score}(\text{move}(\mathbf{A}', d))$

# Score function at terminal states

The following score function will be used instead when the recursive calculation described above reaches a termination state:

$\displaystyle \text{score}_\text{terminal}(\mathbf{A}) = \underset{\mathbf{W}' \equiv \mathbf{W}}{\max} \mathbf{W}' \circ \mathbf{A}$

where

• $\mathbf{W}$ : A $4 \times 4$ square weight matrix to be defined by the programmer.
• $\mathbf{W} \circ \mathbf{A}$ : The entrywise product of $\mathbf{W}$ and $\mathbf{A}$; that is, $(\mathbf{W} \circ \mathbf{A})_{ij} = \mathbf{W}_{ij}\mathbf{A}_{ij}$.
• $\mathbf{W}' \equiv \mathbf{W}$ is said to hold true if and only if $\mathbf{W}'$ can be produced from $\mathbf{W}$ by some sequence of clockwise-rotating (such that $\mathbf{Y}_{ij} = \mathbf{X}_{(j - 5)i)}$) and transposing (such that $\mathbf{Y}_{ij} = \mathbf{X}_{ji}$).

The recursive calculation can reach a terminal state under two circumstances. The first corresponds to a “game over” situation in the game – a random tile is spawned such that the user can no longer make any moves. That is, $D_\mathbf{A} = \emptyset$.

The second circumstance occurs when the calculation reaches a recursion depth limit predefined by the programmer. Such a search limit is necessary because it is practically very hard if not impossible for the AI to go through all the possible “game over” situations – the number of such cases increases exponentially with recursion depth.  As such it is crucial for a recursion depth limit to be in place in order for the AI to compute the best move fast enough.

# Score function formula and decision function

Putting the recursive score function and terminal score function together, we have our combined score function

$\displaystyle \text{score}(\mathbf{A}) = \sum_{A' \in S_\mathbf{A}}P(\mathbf{A}', \mathbf{A})\cdot\begin{cases}\underset{\mathbf{W}' \equiv \mathbf{W}}{\max} \mathbf{W}' \circ \mathbf{A}', & \text{if terminate}\\ \underset{d \in D_\mathbf{A'}}{\max}~\text{score}(\text{move}(\mathbf{A}', d)), & \text{otherwise} \end{cases}$

The above score function can then be used to decide which move to make at each turn after a new tile has been spawned. We simply pick the move that results in the highest scoring game state. Explicitly: denote the game state, after the new tile has been spawned, as $\mathbf{A}$. Then the best move is given by

$\displaystyle \text{decision}(\mathbf{A}) = \underset{d \in D_\mathbf{A}}{\text{argmax}}~\text{score}(\text{move}(\mathbf{A}, d))$

It seems that our AI is more or less complete… except that we have not decided on the values that the weight matrix $\mathbf{W}$ should contain!

# The weight matrix

A common strategy, is to push the bigger tiles near any one corner and the smaller tiles away from that corner. With a weight matrix, we can mimic this strategy by setting the values of $\mathbf{W}$ such that the weight decreases from the top left to the bottom right. (It can also be from the top right to the bottom left, but they are the same because all possible rotations and transpositions of $\mathbf{W}$ are checked during the computation of the terminal scores.

A simple example is

$\mathbf{W} = \left(\begin{matrix}7&6&5&4\\6&5&4&3\\5&4&3&2\\4&3&2&1\end{matrix}\right)$

With the above example, the AI tends to make moves such that the bigger tiles are closer to the corner than the smaller tiles area, which agrees with the desired strategy. Of course, there is no reason for the corner tile to be $\frac{7}{6}\approx$ times more “attractive” than its adjacent tiles, and the same applies for the other tiles. As such the above weight matrix may not be the most ideal.

An optimization search carried out using randomly generated, diagonally monotone decreasing weight matrices produces the following matrix as the most optimal (among the candidate matrices):

$\displaystyle \mathbf{W} = \left( \begin{matrix} 0.135759&0.121925&0.102812&0.099937\\ 0.0997992&0.0888405&0.076711&0.0724143\\ 0.060654&0.0562579&0.037116&0.0161889\\ 0.0125498&0.00992495&0.00575871&0.00335193 \end{matrix}\right)$

It is possible to use other optimization methods, such as Genetic Algorithms and particle swarm optimization (PSO), but the details off these other methods will not be described here.

# Results and conclusion

Using the described score function and weight matrix, an AI was successfully implemented (here) which can achieve the 4096 tile more than 40% of the time with recursion depth of 6, and at least the 8192 tile more than 30% of the time with recursion depth 8. One may also wish to view a full run of the AI with recursion depth of 8 over here on Youtube.

The AI described here is considerably simple but its performance may indeed be limited. In order to improve on the AI, one can consider other heuristics and strategies, such as those described in the discussion over here at StackExchange.

Nonetheless, we hope that you have enjoyed this article, and that it has provided you with some valuable insight of the basic, general workings of a 2048 AI, specifically the Expectimax algorithm.