Algorithms — Justin Willmert

Notes on calculating online statistics

In this article, I collect simple derivations that demonstrate how to calculate several statistical quantities using point-by-point online statistics. A particular emphasis is made to support multiple weighting schemes (not limited to just uniform weights).

[Read more →]

Numerically computing the exponential function with polynomial approximations

Numerically computing any non-trivial function (spanning the trigonometric functions through to special functions like the complex gamma function) is a large field of numerical computing, and one I am interested in expanding my knowledge within. In this article, I describe my recent exploration of how the exponential function can be implemented numerically.

[Read more →]

Linear model regression matrices

An extremely common operation on data series is to regress the data with a particular model. Many times, the desired model is a linear combination of known basis functions, and when this is true, the regression of a data series can be encapsulated as a matrix operator. Describing the process as a matrix operation—rather than just using the regression coefficients—isn’t always useful, but it’s description is rarer. Because I needed a regression in this form for my research, I have chosen to write up the solution here.

[Read more →]

Bellman k-segmentation algorithm

The Bellman $k$ -segmentation algorithm generates a segmented constant-line fit to a data series, but in trying to learn and implement this algorithm, I found it difficult to find the segmentation algorithm rather than the [apparently more common] $k$ -means algorithm, so in this article I describe and provide code for the $k$ -segmentation algorithm.

[Read more →]

Choosing a computationally efficient distance function

One of the simplest statistical properties of a data set is its mean. The next step is often to quantify how well the mean represents the data. A variety of techniques exists, but in this article, I show why the mean-squared error is an excellent choice for dynamic programming algorithms. The mean-squared error has the advantage that it coincides well with our intuitive idea of distance (being closely related to Euclidean distance) as well admitting a computationally efficient implementation.

[Read more →]

Random Deviates of Non-uniform Distributions

Most (if not all) programming languages allow you to draw a [pseudo]-random deviate from a uniform distribution. In many scientific situations, though, there is a desire to produce random deviates drawn from a different probability distribution. In this article, I derive relations telling us how to generate these non-uniformly distributed random deviates.

[Read more →]