Random Permutation Generation

Many problems require the use of randomly generated permutations, kpermutations, and ksubsets. Simple and efficient algorithms exist for producing these combinatorial objects; however, these algorithms are somewhat subtle and most people do not intuitively understand them. The algorithm for generating random permutations is the most significant of these algorithms, and the other algorithms are simple variants of this algorithm.

Permutations are rearrangements of items where the order the items are listed in is important. For example, the rearrangements {1,3,2} and {1,2,3} are two different permutations of the numbers [1,2,3]. A permutation has the property that it contains all the items in a given set without any repetitions. For instance, the lists {1,2}, and {1,2,2,3} are not permutations of [1,2,3].

It is helpful to visualize permutations as paths in a permutation tree. A permutation tree models the choices available when forming a permutation. The colored edges of the permutation tree denote the items that are being permuted. A particular item corresponds to edges of a specific color. Following an edge in the permutation tree is equivalent to selecting an item, and permutations are formed by following paths from the root to the leaves. Figure 1, shows a permutation tree for four items.

permutation tree for four items
Figure 1, a permutation tree for 4 items

A permutation tree of n items is generated recursively; at the root all n items are available for selection, so the root has n edges of different colors leading to n subtrees. At the subtrees of the root only n-1 items are available (one item has been selected), so the subtrees of the root each have n-1 edges leading to n-1 subtrees. The color of the missing edge corresponds to the the item that has been selected, and each subtree itself is a permutation tree for n-1 items (see Figure 1). This process is repeated with the subtrees of the subtrees, and so on, until the leaves are reached. At that point all items have been selected; therefore, no edges are available to continue the process.

This construction of the permutation tree assures that all permutations of n items correspond to paths in the tree. Also, no two distinct paths correspond to the same permutation of items. This implies that there are as many permutations of n items as there are paths in a permutation tree for n items.

Counting the number of paths in a permutation tree is fairly straightforward; at the root n paths emerge, and each of these n paths splits into n-1 paths at the subtrees of the root. This gives n*(n-1) paths, and each one of these splits into a further n-2 paths at the next level in the permutation tree. The paths terminate at the leaves by which time the original n paths have split into n! (n factorial) paths, and that equals:

n! = n*(n-1)*(n-2)*..*1

An algorithm for selecting random permutations must assure that it has equal chance of picking any one of the n! possible permutations. Figure 2, suggests a simple algorithm for generating random permutations; to get a permutation of n items, simply pick a random path from the root to the leaves in a permutation tree for n items.

A random path can be picked by starting at the root, randomly picking one of the available edges to get to a subtree, and repeating that process to get to successive subtrees until a leaf is reached. The animation below illustrates this process.

animation for picking a random permutation
Figure 2, Picking a random path in a permutation tree

Intuitively, this process works because the random path picking algorithm is not biased towards any particular path; therefore, all n! paths have equal chance of being picked. More formally, the probability of picking a path consisting of edges {E1,E2,..,En} is the probability of picking the path leading to edge En, and multiplying by the probability of picking edge En given that the path leading to En has been picked.

P({E1,E2,..,En}) = P({E1,E2,..,En-1}) * P(En|{E1,E2,..,En-1})

The above equation can be recursively expanded by substituting in the value for P({E1,E2,..,Ek}) in the right hand side of the equation.

P({E1,E2,..,En}) = P({E1,E2,..,En-1}) * P(En|{E1,E2,..,En-1}) 
                 = P({E1,E2,..,En-2}) * P(En-1|{E1,E2,..,En-2}) * P(En|{E1,E2,..,En-1}) 
                 = P(E1|{}) * P(E2|{E1}) * .. * P(En|{E1,E2,..,En-1})
                 = P(E1)    * P(E2|{E1}) * .. * P(En|{E1,E2,..,En-1})

The probability P(Ek+1|{E1,E2,..,Ek}) can be computed readily. At the point where edge Ek+1 is in consideration k edges have been selected, and edges Ek+1 to En are available. Edge Ek+1 is randomly chosen from all the available choices, so there are n-k choices for it. Therefore, the probability of picking edge Ek+1 is 1 / (n-k). The following expression gives the probability of picking a particular path in a permutation tree for n items:

P({E1,E2..En}) = P(E1) * P(E2|{E1}) * .. * P(En|{E1,E2,..En-1})
               = 1/n   * 1/(n-1)    * .. * 1
               = 1/n!

The above expression implies that the path picking algorithm described above picks any path with probability 1/n!; therefore, the algorithm picks all permutations with equal probability.

The random path picking algorithm needs to be efficient; generating the whole permutation tree in advance is not practical. Fortunately, a complete permutation tree is not needed. Each iteration of the algorithm only needs to keep track of the path chosen so far, and the possible selection of choices for the next edge of the path; Figure 2, illustrates this idea.

There are a maximum of n outgoing edges at any node of the permutation tree, and these can be modeled as numbers 1 to n in an array. Selecting an item frees up space for one edge, and a clever algorithm can use that space to store the path. An algorithm that implements these ideas is given below:

for (i=1 to n) ar[i] = i;
for (i=1 to n) swap(ar[i], ar[Random(i,n)]);

The above algorithm partitions the elements of an array of n elements into two parts, positions 1 to i-1 and positions i to n. Positions 1 to i-1 contain the segment of the path chosen so far, and positions i to n contain the unselected edges. The first loop of the algorithm is just initialization of the array elements, and the second loop does all the work in this algorithm.

Each iteration of the second loop extends the path selected so far by one edge. This is done by picking a random edge from the unselected edges and moving it to position i. Of course, the edge at position i has to go somewhere; therefore, the edge at position i is moved to the former position of the selected edge. The table below illustrates this algorithm for picking a 5 item permutation.

Iteration Selected Edge Selected Path Unselected Edges
? 1 2 3 4 5
1 4 4 2 3 1 5
2 1 4 1 3 2 5
3 3 4 1 3 2 5
4 5 4 1 3 5 2
5 2 4 1 3 5 2

In the above table, array ar is the concatenation of the last two columns. At each iteration one edge is chosen and moved to position i. Unlike the permutation tree, unselected edges in array ar are in no specific order. This is all right as the algorithm chooses amongst the unselected edges randomly, and any specific ordering of the edges is irrelevant.

Another thing to notice is that after the second last iteration of the algorithm the contents of the array ar do not change. At the last iteration only one edge is left unselected; going through the iteration simply swaps it with itself and results in no change. This suggests that the last iteration of the algorithm is redundant and the algorithm can be changed to the following:

for (i=1 to n) ar[i] = i;
for (i=1 to (n-1)) swap(ar[i], ar[Random(i,n)]);

Multiple random permutations can be generated by repeating the second loop. As long as array ar contains a permutation of numbers 1 to n, it can be reused without reinitialization.

Choosing a random k-permutation turns out to be fairly simple as well. A kpermutation is a rearrangement of k items selected from a total of n items. Kpermutations are a generalization of permutations. A permutation is simply a rearrangement of all n items; a permutation is equivalent to a kpermutation when k equals n.

kpermutation tree for n =5, and k = 3
Figure 3, a kpermutation tree for n = 5 and k = 3

Kpermutations too can be visualized as paths in a permutation tree. Kpermutations correspond to paths of length k from the root of the tree. A kpermutation tree can be generated by pruning the permutation tree so that all possible paths from the root to the leaves are of length k. Figure 3, shows such a tree for n = 5 and k = 3. This correspondence with paths suggests the following algorithm for generating kpermutations:

for (i=1 to n) ar[i] = i;
for (i=1 to k) swap(ar[i], ar[Random(i,n)]);

The algorithm above is a modified version of the algorithm for generating permutations; that algorithm yields a path of length n while this one yields a path of length k. Also, it is not possible to optimize this algorithm to cut down one iteration as was the case with the previous algorithm. The optimization in the algorithm for random permutations was possible only because at the last iteration only one edge was left unselected. However, with n less than k more than 1 edges will be available for selection at the kth iteration, and randomly choosing one of the edges will make a difference.

The final algorithm is for choosing a random ksubset. A ksubset is simply a kpermutation where the order of items is unimportant. The lists {1,3}, {3,1} are different kpermutations of the items [1,2,3], but they are the same ksubset. While the lists {1,2} and {1,3} are different ksubsets as well as different kpermutations of [1,2,3].

Paths from the root to the leaves in the kpermutation tree enumerate all possible kpermutations; therefore, they also enumerate all possible ksubsets of n items. The only problem is that each ksubset is enumerated multiple times. This is not an issue as each ksubset appears exactly k! times; once for each permutation of the items in the ksubset. As each ksubset occurs as often as any other all ksubsets have the same probability of being picked at random.

Some problems require ksubsets that are arranged in a sorted order. For example, {1,2,3} and {1,3,2} are the same ksubset but {1,2,3} is in ascending order while {1,3,2} is unsorted. For small k, sorting is inexpensive; therefore the best approach to generating ordered ksubsets in such cases is to simply sort the elements of the ksubset after generating it.

The algorithms for random kpermutations and ksubsets use O(n) space and O(n) time to generate objects of size O(k). O(n) space is needed for the array to store the chosen path and the unselected edges, and O(n) time is needed for initializing the elements of this array. The time and space complexity is not a problem when n and k do not differ by too much, but if n is sufficiently large and k is small, the algorithms can be very inefficient. Part II of this article will discuss techniques to bring the space and time complexity of these algorithms to O(k).

by Usman Latif  [Feb 13, 2004]

Related Links:

Random KPermutations, and KSubsets in O(k) space