Random Deviates of Non-uniform Distributions

Most (if not all) programming languages allow you to draw a [pseudo]-random deviate from a uniform distribution. In many scientific situations, though, there is a desire to produce random deviates drawn from a different probability distribution. In this article, I derive relations telling us how to generate these non-uniformly distributed random deviates.


Introduction

The most common situation is that a programming language of analysis library provides a function which returns a floating point number chosen with uniform probability in the range $[0,1)$ (the notable exception being C/C++’s rand() function which returns a uniformly distributed integer between 0 and RAND_MAX).

Scientific applications often require random deviates drawn from a variety of other distributions. One extremely common case is that of simulating a measurement error. By the Central Limit Theorem, a it’s typical for errors to be distributed with a Gaussian probability density. If we want to model this error with a pseduo-random number generator (PRNG), we need to know how to transform the uniform random deviates provided by the programming language’s PRNG functions into ones that are drawn from another probability distribution.

Code Interface

A good, generic solution should make use of a common interface with an option to specify the desired non-uniform probability distribution to use. This motivates immediately defining a function which will be that interface. We do this in the form of a couple of wrappers around Julia’s rand() function for a couple of the simplest and most useful cases. (It is left as an exercise to the reader to extend this simple wrapper for all function prototypes and situations.)

What we’d like to have is a simple interface for generating a single random deviate (in analogy with rand()), generating an array of deviates (i.e. rand(100) or rand(3,3)), or in some cases transform an existing uniform random distribution into a non-uniform distribution. The code below provides exactly this interface, assuming we can develop the input invcdf which depends desired non-uniform distribution. We’ll do this for for several common examples in later sections.

# Create an interface for non-uniform distributions of PRNG deviates

# This one returns a single random deviate
randdist(invcdf::Function) = invcdf(rand());

# These two return [multi-dimensional] arrays of random deviates
randdist(invcdf::Function, dims::Dims) = randdist!(invcdf, rand(Float64,dims));
randdist(invcdf::Function, dims::Int...) = randdist(invcdf, dims);

# This one modifies an existing array in-place
function randdist!{T}(invcdf::Function, A::Array{T})
    for ii=1:length(A)
        A[ii] = invcdf(A[ii]);
    end
    A
end

Note that the code and plots shown here are all available for download by clicking here in the form of an IJulia notebook.

Probability Distributions

The only case of probability distribution we’re going to consider here are distributions of continuous variables; in general, though, a probability function can be much more generically defined at the expense that some of the conclusions drawn here are no longer valid.

Theory

We assume that we can define a smooth function $\mathcal{P}(x)$, the probability density function (PDF). From it, we define the the cumulative distribution function (CDF) $\mathcal{C}(x)$ by

\begin{align} \mathcal{C}(x) \equiv \int_{-∞}^x \mathcal{P}(x') \, dx' \end{align}

Because this is the sum of a probability, we expect and require the total probability sum to unity:

\begin{align*} \mathcal{C}(∞) = 1 = \int_{-∞}^∞ \mathcal{P}(x') \, dx' \end{align*}

This may seem like an obvious requirement since probabilities greater than 1 are non-sensical, but it’s worth explicitly enumerating because it is actually the key to understanding how to draw random deviates from a non-uniform PDF. Take for example an arbitrary PDF and its corresponding CDF.

If we simply swap the axes,

we see that the CDF (in green) is an invertible function (i.e. it passes the “vertical line test”). Even more importantly, it has the desirable feature that the highest density regions in the PDF cover a larger fraction of the domain in $(0,1)$. Therefore if we were to pick random deviates uniformly on the codomain of the CDF (i.e. the vertical axis in the first figure or the horizontal axis of the second) and then map them back to the domain, we will naturally select points in the domain more or less often according to the PDF.

Mathematically, all we want to do is generate a uniformly distributed random deviate $y$ and generate the non-uniformly distributed random deviate $x$ by inverting the CDF:

\begin{align} x = \mathcal{C}^{-1}(y) \end{align}

The work, therefore, is in defining the relevant domains of a chosen PDF, calculating the CDF, and then finding the CDF’s inverse function.

Uniform Distribution

The simplest starting example is that of a uniform probability density over an arbitrary domain $[a,b]$.

\begin{align*} \mathcal{P}_\mathrm{uniform}(x) &= \begin{cases} \alpha & a \le x \le b \\ 0 & \text{otherwise} \end{cases} \\ \mathcal{C}_\mathrm{uniform}(x) &= \begin{cases} 0 & x < a \\ \alpha(x-a) & a \le x \le b \\ 1 & x > b \end{cases} \end{align*}

which integrated over the entire domain must equal 1 to conserve total probability. Therefore, we solve to find $ α = 1/(b-a) $ and the complete definitions become

\begin{align} \mathcal{P}_\mathrm{uniform}(x) &= \begin{cases} \frac{1}{b-a} & a \le x \le b \\ 0 & \text{otherwise} \end{cases} \\ \mathcal{C}_\mathrm{uniform}(x) &= \begin{cases} 0 & x < 0 \\ \frac{x-a}{b-a} & a \le x \le b \\ 1 & x > b \end{cases} \end{align}

Inverting $\mathcal{C}_\mathrm{uniform}(x)$ is easy since it is a very simple rational function. We find that we can extend the rand() function from $[0,1)$ to $[a,b)$ by the formula,

\begin{align} x = \mathcal{C}^{-1}_\mathrm{uniform}(y) = a + (b - a)y \end{align}

Most programmers will immediately recognize this formula since it merely shifts and scales the distribution, but it serves as an important and easily understood example of how the CDF inversion performs as required.

White Noise function

To demonstrate how easy it is to produce the invcdf function needed by our generic interface, we’ll start with the trivial case of returning a uniform-distributed random deviate over a general domain $[f_\mathrm{low}, f_\mathrm{high})$. We’ll call this distribution WhiteNoise for the physical reason that white noise has a uniform power spectrum at all frequencies.

# White Noise Distribution
function WhiteNoise(lowf::Float64, highf::Float64)
    (x::Float64) -> lowf + (highf-lowf)*x;
end

What we’ve done here is produce a generator function which will return an anonymous closure that describes how the input random deviate x should be manipulated to produce the output random deviate. The use of a generator lets us simply define the parameters of a given distribution as inputs, and then a function with the desired properties is built and returned for use.

We can then prove that this does what we want it to. We simply ask for 100,000 random deviates and plot the histrogram over the domain. If everything works we should see a nearly uniform (i.e. flat) distribution.

lowf  = 5.0;
highf = 20.0;
range = (lowf-5, highf+5);

theoryx = linspace(range[1], range[2], 1000);
theoryy = [(i<lowf || i>=highf) ? 0 : 1 / (highf-lowf) for i in theoryx];

whitenoise = randdist(WhiteNoise(lowf, highf), 100000);
White Noise histogram
Normalized histogram of 100,000 white-noise random deviates. Note how well the empirical distribution here agrees with the theoretical PDF.

The high-degree of correlation between this empirical test and the theoretical PDF for a uniform-distributed random deviate suggests that the generic interface we’ve developed is working. Now we can move on to more complex examples.

Pink Noise

A more complicated—but equally more interesting—example is to generate Pink Noise or $1/f$ noise. Pink noise is a ubiquitous type of noise which has been found and described in a vast array of situations and systems. In physics and engineering, it is often the power spectrum of the noise seen in circuits or other electrical circuits. For this reason, it is an important noise spectrum to be able to model.

In terms of a power spectrum $P(f)$ and associated cumulative power $C(f)$, pink noise is defined as

\begin{align} P_\mathrm{pink}(f) &= \frac{1}{f} \\ C_\mathrm{pink}(f) &= \ln f \end{align}

In principle, pink noise has a domain of $f \in [0,∞)$. In practice, though, we must restrict our domain since $P(0) \to ∞$ and $C(∞) \to ∞$. The natural choice is to consider the domain of possible frequencies which can be measured.

In the general case, we’ll choose some pair of minimum and maximum frequencies $f_\mathrm{min}$ and $f_\mathrm{max}$. Normalization to preserve total probability requires that

\begin{align*} 1 &= \lambda \int_{f\_\mathrm{min}}^{f_\mathrm{max}} \frac{1}{f} df \\ \lambda &= \frac{1}{\ln(f_\mathrm{max}/f_\mathrm{min})} \\ \end{align*}

which implies after substitution and integration,

\begin{align} P_\mathrm{pink}(f) &= \frac{1}{f \ln(f_\mathrm{max}/f_\mathrm{min})} \\ C_\mathrm{pink}(f) &= \frac{\ln(f/f_\mathrm{min})}{\ln(f_\mathrm{max}/f_\mathrm{min})} \end{align}

Then, solving for the inverse CDF,

\begin{align*} x &= \frac{\ln(f/f_\mathrm{min})}{\ln(f_\mathrm{max}/f_\mathrm{min})} \\ \left( \frac{f_\mathrm{max}}{f_\mathrm{min}} \right)^x &= \frac{f}{f_\mathrm{min}} \\ f &= f_\mathrm{min} \left( \frac{f_\mathrm{max}}{f_\mathrm{min}} \right)^x \end{align*}

Therefore, our transformation function is

\begin{align} C_\mathrm{pink}^{-1}(x) = f_\mathrm{min} \left( \frac{f_\mathrm{max}}{f_\mathrm{min}} \right)^x \end{align}

Implementing another generator function for randdist(), we can again test the implementation empirically.

# Pink Noise (aka 1/f Noise) Distribution
function PinkNoise(lowf::Float64, highf::Float64)
    (x::Float64) -> lowf * (highf / lowf)^x
end
Pink Noise PDF, CDF, and empirical data
The pink noise PDF (blue) and CDF (green) overplotted with a test of 5,000 random deviates drawn from a $1/f$ distribution. The data agrees well with the expected counts.

A more useful case is to consider a set of $N$ data points that are recorded in the time domain. Then by simple properties of a discrete-time Fourier Transform, we know that the minimum frequency is a mode which has half a period stretching across the data series:

\begin{align*} \frac{1}{2} \lambda = N \implies \frac{1}{2f_\mathrm{min}} = N \implies f_\mathrm{min} = \frac{1}{2N} \end{align*}

Similarly, the maximum frequency that can be supported is one where the mode oscillates between extrema on every sample. The wavelength then must be only two samples long:

\begin{align*} \lambda = 2 \implies \frac{1}{f_\mathrm{max}} = 2 \implies f_\mathrm{max} = \frac{1}{2} \end{align*}

If we then plug in these two particular cases, we can simplify the inverse CDF further to

\begin{align*} C_\mathrm{pink}^{-1}(x) = \frac{1}{2} N^{x-1} \end{align*}

One final simplification is possible because we know $x$ is uniformly distributed in $[0,1)$. For this particular case, $x \longleftrightarrow 1 - x$ without loss of generality. Pushing the $N$ term to the denominator and making this substitution,

\begin{align*} C_\mathrm{pink}^{-1}(x) = \frac{1}{2N^x} \end{align*}

Because this case is of particular use, we can create a Fourier Transform specialized generator function FTPinkNoise.

# Pink Noise distribution, specialized for the Fourier Transform
function FTPinkNoise(N::Int)
    (x::Float64) -> 0.5 / N^x
end
Pink Noise for Fourier Transforms PDF, CDF, and empirical data
In the specific case of Fourier Transform compatible noise, the histogram bins which are formed are unequal in size on the frequency axis. The discrete nature of time series data implies a discrete frequency space, and the data here has been binned to coincide with the permitted values.

Note that although the form the inverse CDF has been specialized for the Fourier Transform by depending only on the number of data points as an input, there are still other mathematical restrictions which have not been enforced in its implementation—namely, the the discrete time domain means there are also only a discrete set of frequencies which can occur. A real application of this noise generating scheme would require either adding additional processing to the inverse CDF or post-processing of the random deviates, but since these details do not add anything to the discussion of generating non-uniformly distributed random deviates, it is left as an exercise to the reader.