Sampling
The most common way to sample from a categorical distribution uses a type of inverse transform sampling:
Assume we are given a distribution expressed as "proportional to" some expression, with unknown normalizing constant. Then, before taking any samples, we prepare some values as follows:
- Compute the unnormalized value of the distribution for each category.
- Sum them up and divide each value by this sum, in order to normalize them.
- Impose some sort of order on the categories (e.g. by an index that runs from 1 to k, where k is the number of categories).
- Convert the values to a cumulative distribution function (CDF) by replacing each value with the sum of all of the previous values. This can be done in time O(k). The resulting value for the first category will be 0.
Then, each time it is necessary to sample a value:
- Pick a uniformly distributed number between 0 and 1.
- Locate the greatest number in the CDF whose value is less than or equal to the number just chosen. This can be done in time O(log(k)), by binary search.
- Return the category corresponding to this CDF value.
If it is necessary to draw many values from the same categorical distribution, the following approach is more efficient. It draws n samples in O(n) time (assuming an O(1) approximation is used to draw values from the binomial distribution).
function draw_categorical(n) // where n is the number of samples to draw from the categorical distribution r = 1 s = 0 for i from 1 to k // where k is the number of categories v = draw from a binomial(n, p / r) distribution // where p is the probability of category i for j from 1 to v z = i // where z is an array in which the results are stored n = n - v r = r - p shuffle (randomly re-order) the elements in z return zRead more about this topic: Categorical Distribution