Minesweeper Identical Boards Problem, Part I

Some time ago it it was reported that people are getting identical Minesweeper boards. Damien Moore's Minesweeper website shows one board which has been reported at least five times. Considering the astronomical number of possible Minesweeper game boards, any chance occurrence of such an event can be ruled out. The obvious explanation suggests a problem with the pseudo random number generator (PRNG) used by Minesweeper. I decided to reverse engineer Minesweeper to learn about its internals and uncover the exact problem.

A PRNG generates pseudo random numbers as the name indicates. These numbers are not actually random but for most purposes they are good enough. The PRNG depends on its initial state to generate a sequence of numbers. Having the same initial state causes the PRNG to generate the same sequence of numbers. To produce different results on different invocations of a program, a PRNG is initialized with a random input before use.

I expected a problem in the initialization of the Minesweeper PRNG. A programmer mistake could have caused a small set of numbers to be used over and over again in the PRNG initialization. This in turn would lead to only a small set of boards being produced. With a small set of possible boards, eventually some people are bound to get identical boards (collisions).

The first task in learning about the PRNG was to extract text strings embedded inside the Minesweeper binary (program file.) To accomplish this I used the the Unix strings program. The strings program isn't strictly necessary. Viewing the binary with a good text or hex editor is sufficient, although a lot more time consuming.

The strings program extracted more than 1800 lines of text from my Windows 2000 Minesweeper binary. Thankfully most of it was garbage. The interesting keywords were easy to identify. They correspond to library functions and are summarized in the table below:

Keyword         Description
srand           initializes the pseudo random number generator (PRNG)
rand            generates next pseudo random number
GetTickCount    number of milliseconds elapsed since system startup

Typically the srand function is invoked at the start of a program with a random integer to initialize the PRNG. After this step the rand function can be called as needed to generate pseudo random numbers. Most programmers call srand with the current time. The current time is a non-random value but is a good choice as it varies for every invocation of a program. I expected a call to a time related function but did not find one in the strings output. In the absence of such a call the only other plausible choice left as input to srand is the value returned by the GetTickCount function.

The above information is sufficient to hypothesize that the Minesweeper PRNG is being initialized once at the start of the Minesweeper session using the GetTickCount function. The collisions (identical boards) are occurring as some people are starting Minesweeper right after booting their machines.

Starting Minesweeper quickly after bootup is central to the above hypothesis. It ensures that the value of GetTickCount (tickcount) is in a small range. The time taken to bootup is variable but can be reasonably expected to be between 30-60 seconds. Let's assume further that the people who reported the collisions have similar boot times. Now we can expect them to boot and start Minesweeper in a twenty second interval most of the time, with bulk of the variation due to random factors. The random factors can be events such as, typing a bad password, delay in traversing menus, etc. This gives us a set of 20,000 (20s = 20,000 ms) potential inputs for srand.

The set of inputs to srand is actually much smaller. The tickcount is changed in increments. The increment is 55 ms for Win95 (Windows 95/98/ME operating systems,) and 10 ms for WinXP (Windows 2000/XP operating systems.) This implies that for Win95 only 363 (20,000 / 55) of the 20,000 values are relevant. One of these 363 values is most likely leading to the board that is being reported.

The board being reported has to be very special. It needs to be solvable almost 100 % of the time and should always lead to an exceptional score. A cursory look at the board with multiple reported collisions shows that it indeed satisfies this criteria.

With only 363 possible inputs for srand in the 20 second interval, it is not hard to imagine collisions occurring. For the case of WinXP almost 2,000 values need to be considered. This is a 5.5 times increase over Win95. Collisions under this scenario are still likely but much less so than before.

To confirm the above hypothesis I used WinDbg (a Windows debugger) to trace through Minesweeper. I started a Minesweeper session and played a couple of games with the debugger running. As expected only one call to srand was made and that too when Minesweeper was initially loading. I then Disassembled the code around the call to srand. The relevant disassembled code follows:

call    dword ptr [winmine!_imp__GetTickCount (01001090)]
movzx   eax,ax
push    eax
call    dword ptr [winmine!_imp__srand (010010d4)]

The first line of the assembly code calls the GetTickCount function to get the time in milliseconds since the system started. The second line discards the 16 most significant bits of that count. The third line of the code prepares the modified value to be passed as an argument to the srand function. The last line calls the srand function and initializes the PRNG.

The Minesweeper programmers' decision to discard 16 bits of information from the tickcount is rather unfortunate. It does not achieve anything good and instead restricts all possible initializations of the PRNG to 65536 values. As a consequence the tickcount used by Minesweeper now reaches 65536 in around 65.5 seconds and then cycles. Previously it would have taken 49.7 days to cycle.

Initially I thought the the total number of possible tickcount values for Win95 to be:

65536 / 55 = 1191

I was thinking that only the values in the sequence 0,55,110... can occur. This turns out not to be the case. The tickcount does not become 0 after exceeding 65535. It increments according to the sequence:

0 55 110 ... 65505 24 79 124 ...

Adding 55 to 65505 is equivalent to 24 after discarding the most significant bits. Therefore, instead of repeating the same old numbers, a whole set of new numbers is covered, i.e. 24,79,124... Even this is not enough to ensure that all numbers from 0-65535 eventually appear as input to srand. One can imagine a case where after a few cycles 0 appears again and causes 0,55,110... to repeat.

Fortunately the above scenario can be modeled using a group (a particular kind of mathematical structure.) Using elementary group theory it can be proven that in the above scenario all the numbers from 0-65535 will appear before the sequence repeats. Moreover, the increment of the GetTickCount is not precise all the time and can occasionally be 56.

All of this means that if the user starts Minesweeper after the computer has been running for a while, she will get an srand input which is more or less randomly chosen from the 65536 possible values.

The new information is consistent with the original hypothesis. However, the case for an alternate hypothesis can be made. It is possible the collisions are occurring simply because of the small number (65536) of inputs to srand.

In Part II of this article I will examine both hypotheses in detail and provide insight as to why the second hypothesis is unlikely to be true.

by Usman Latif  [Sep 30, 2003]