# Numerically computing the exponential function with polynomial approximations

Numerically computing any non-trivial function (spanning the trigonometric functions through to special functions like the complex gamma function) is a large field of numerical computing, and one I am interested in expanding my knowledge within. In this article, I describe my recent exploration of how the exponential function can be implemented numerically.

## Introduction¶

A recent pull request in the Julia repository (PR 36761) was proposed that aims to speed up the calculation of the exponential function ($$e^x$$) for double floating point values. While reading through the code and comments, I found myself wondering how some of the “magic constants” were being derived, especially because they are similar — but not identical — to the Taylor expansion coefficients any mathematician or physicist is likely to recognize on sight. Curiosity led me to devote the past few days to investigating some of the standard techniques used in implementing numerical approximations of special functions, so this article serves as a useful reference for myself (and hopefully others) for some of the techniques I’ve learned.

## Taylor Series¶

To any physicist, the first obvious choice for approximating any (well behaved) function is its Taylor series expansion. For an infinitely differential function $$f(x)$$ at $$x = a$$, the Taylor series is defined as the infinite sum \begin{align*} f(x) = \sum_{n = 0}^{\infty} \frac{1}{n!} \frac{d^n f(a)}{dx^n} (x - a)^n \end{align*} where $$!$$ denotes the factorial function and $$d^nf(a)/dx^n$$ is the $$n$$-th derivative of $$f(x)$$ evaluated at $$x = a$$.1 The approximation is made by truncating the infinite series to a finite series of some arbitrary number of terms, giving a finite-degree polynomial approximation to the function $$f(x)$$.

There is an immediately obvious problem with just using the Taylor series, though — what value of $$x = a$$ do you choose to expand about? The exponential function is defined on the entire real line, but you only have one point about which the function is well approximated, with the approximation growing poorer the further from $$a$$ you go. The span of the well-approximated region can be increased by including more terms in the series, but this is a strategy of diminishing returns given that increasing the accuracy to arbitrary precision everywhere requires an infinite series.

In general, there are also issues of ensuring convergence of the series (i.e. the function does not have to converge everywhere for a given choice of $$a$$), and the rapidly growing size of $$n!$$ and high powers of $$(x-a)^n$$ will quickly lead to issues of numerically computing their values without under/overflow.

Despite these problems, though, let us ignore them for now and proceed to investigate the family of Taylor series approximations $$P_n(x)$$ of degree $$n$$. The Taylor series for $$e^x$$ about $$x = 0$$ is \begin{align} e^x = \sum_{n=0}^{\infty} \frac{x^n}{n!} = 1 + x + \frac{x^2}{2} + \frac{x^3}{6} + \ldots \end{align} which can be easily coded up as a generated function (so that the factorial coefficients need only be calculated once):

@generated function exp_taylor(x::Float64, ::Val{N}) where {N}
coeff = ntuple(i -> inv(factorial(i-1)), N+1) # (1/0!, 1/1!, ..., 1/N!)
end
# Minimizes relative error in |x| ≤ log(2)/2
@generated function exp_relminimax_kernel(x::Float64, ::Val{N}) where {N}
ln2by2 = log(2.0) / 2
_, coeff = remez(exp, -ln2by2, ln2by2, N, x -> exp(-x))
return :(evalpoly(x, (tuple(coeff...)))) end function exp_reduce_kernel(x::Float64, n::Val{N}, kernel) where {N} k, r = reduce_exp(x) return exp2(k) * kernel(r, n) # 2^k * exp(r) end exp_absminimax(x::Float64, n::Val{N}) where {N} = exp_reduce_kernel(x, n, exp_absminimax_kernel) exp_relminimax(x::Float64, n::Val{N}) where {N} = exp_reduce_kernel(x, n, exp_relminimax_kernel)  First, both variations of the minimax polynomial eliminate the systematic bias in the odd-order polynomials that was present while using the Taylor series approximation. Second, we can see that both variations do minimize the absolute and relative error, respectively. (This is probably easiest to see by comparing the last one and a half of the range-reducing periods from about $$x = 1$$ to the right side.) In the left column which shows the absolute minimax approximation, the left and right end points of each range-reducing period has the same deviation. (Compare this against Fig. 3.1 where lower and upper endpoints of the period have different absolute errors.) Then in the right column which shows the relative minimax approximation, we see a similar per-period relationship but now in the relative error measurement. For the relative error case, we also gain continuity of the approximation across range-reducing periods in the odd-degree approximations; this is a direct consequence of the error polynomial being a leading-order-even function together with constraining the relative errors to be equal on the lower and upper edges of the range-reducing period. Finally, let us directly compare the known Taylor series coefficients against the minimax polynomial coefficients (since, as mentioned in the introduction, the observation that the Julia pull request contained not-quite Taylor series coefficients is what initially motivated my exploration into this topic). The following table shows the Taylor series and relative minimax polynomial coefficients up to degree 3, along with the difference (minimax minus Taylor) the relative percent change with respect to the Taylor series coefficients.  Degree Taylor series Minimax poly Abs. (Rel.) change 0 1.0000000000000000 0.9999280752600668 -7.192e-05 (-0.00719 %) 1 1.0000000000000000 1.0001641903948264 0.0001642 (0.0164 %) 2 0.5000000000000000 0.5049632650961922 0.004963 (0.993 %) 3 0.1666666666666667 0.1656683995499798 -0.0009983 (-0.599 %) The bottom-line conclusion is that only very small (less than 1%) tweaks to the Taylor series coefficients are necessary to transform the Taylor polynomial into the minimax polynomial, and the coefficients observed in the Julia pull request could reasonably be due to a minimax-like optimization. ## A more complex minimax example — expm1¶ Before we conclude, let’s take a quick look at a slightly more complex example of defining minimax polynomials. A well known issue with numerically calculating the exponential function is that for $$|x| \ll 1$$, the answer is very nearly $$1$$, and accuracy in the result is lost due to the limits of finite floating point representation. Namely, calculating exp(x) for a value very nearly $$0$$ and then subtracting 1.0 is subject to so-called “catastrophic” cancellation. Therefore, getting a high-accuracy result requires direct calculation of a special form of the function — exmp1(x) which is defined to be $$e^x - 1$$ — without intermediately using exp(x) and subtracting. The expm1 function is commonly provided in math libraries (including the C99 standard and, of course, by Julia’s base library). Given the simple mathematical definition, we can consider how to form a numerical approximation by expressing the function in terms of the exponential’s Taylor series and simplifying: \begin{align*} e^x - 1 &= \left(\sum_{k=0}^{\infty} \frac{x^k}{k!}\right) - 1 \\ {} &= \left(1 + x + \frac{x^2}{2} + \frac{x^3}{6} + \ldots\right) - 1 \\ {} &= x \left(1 + \frac{x}{2} + \frac{x^2}{6} + \ldots \right) \\ e^x - 1 &= x \sum_{k=0}^{\infty} \frac{x^k}{(k+1)!} \approx x P_n(x) \end{align*} Thus we have identified $$e^x - 1$$ as the product of $$x$$ and an infinite power series which can be truncated and approximated by some $$n$$-th order polynomial. To have the remez() function above solve for the minimax polynomial, we must manipulate the error between $$e^x - 1$$ and $$x P_n(x)$$ for either absolute or relative error minimization back into the standard form $$w(x)\left[ f(x) - P_n(x) \right]$$. Doing so for the relative case, we simply divide through by the extra factor of $$x$$ and make the appropriate identifications: \begin{align*} \frac{\left[e^x - 1\right] - \left[x P_n(x)\right]}{\left[e^x - 1\right]} &= \frac{x}{e^x - 1}\left[ \frac{e^x - 1}{x} - P_n(x) \right] \\ {} &= w(x) \left[ f(x) - P_n(x) \right] \\ {} &\qquad\text{where}\quad f(x) \equiv \frac{1}{w(x)} \equiv \frac{e^x - 1}{x} \\ \end{align*} Note that both $$f(x)$$ and $$w(x)$$ limit to unity as $$x \rightarrow 0$$, so the functions are well-defined. For a generic solution, we also need to replace the range-reduction step with one which accounts for the subtraction by one. With a bit of algebra, it’s easy to show that for $$x = k\ln 2 + r$$ as previously, we have \begin{align} e^x - 1 = 2^k(e^r - 1) + 2^k - 1 \end{align} Translating these definitions into code: # Minimizes relative error in |x| ≤ log(2)/2 @generated function expm1_minimax_kernel(x::Float64, ::Val{N}) where {N} ln2by2 = log(2.0) / 2 expm1_kernel(x) = iszero(x) ? one(x) : expm1(x) / x expm1_weight(x) = iszero(x) ? one(x) : x / expm1(x) # Solve for one fewer degrees than requested to make comparisons with # exp(x)-1 fair — i.e. the highest degree in (x * P_n(x)) == N _, coeff = remez(expm1_kernel, -ln2by2, ln2by2, N - 1, expm1_weight) return :(x * evalpoly(x,(tuple(coeff...)))) # Note the leading mult. by x
end

function expm1_minimax(x::Float64, n::Val{N}) where {N}
k, r = reduce_exp(x)
s = exp2(k)
return muladd(s, expm1_minimax_kernel(r, n), s - 1.0)
end


and then directly comparing a third-order relative minimax approximation of exp(x)-1 versus expm1(x):

The catastrophic calculation for small $$x$$ is visible in the bottom panel of the left column where the fractional error diverges as $$x \rightarrow 0$$. This is because the absolute error (middle row of left column) is non-zero near and at $$x = 0$$, so since $$\lim_{x\rightarrow 0} e^x - 1 = 0$$, the fractional error is unbounded. In contast, the right column shows that the direct minimax optimization of expm1(x) actually constructs a minimax polynomial that approaches $$0$$ as $$x \rightarrow 0$$, and therefore the relative error is tightly bounded.

## Conclusions¶

For myself (and likely many physicists) the Taylor series is a well known and familiar method of constructing polynomial approximations of arbitrary functions. Numerically, though, the minimax polynomial is an alternative construction that provides more desirable properties, and it is a concept that any numerically-oriented person should probably have in their toolbox.

In this article, I ignored the problem of bootstrapping the implementation — i.e. we optimized minimax polynomials against Julia’s built-in exp(x::Float64) and expm1(x::Float64) functions — but this isn’t a deal-breaking problem. In some cases you may want to build a custom, fast approximation of an intricate function which can be accurately (but too slowly) calculated in terms of known functions. Then this is just a performance optimization, and there’s no issue of how to bootstrap the process.

If the initial bootstrapping of a special function is the goal, though, then we already have a perfectly servicable definition — just calculate the exact answers to machine precision using a large number of terms in the Taylor series, and optimize for a minimax polynomial that requires fewer terms to achieve the same accuracy. In practice the use of extended-precision arithmetic will almost certainly be required to avoid loss of precision in standard floating point implementations, but that is not an insurmountable problem.

Finally, this is far from being a comprehensive explanation of how to write accurate numerical calculations of special functions. For instance, our range reduction for exp only contracted the domain down to the interval $$|x| \le \frac 1 2 \ln 2$$, but a commonly-referenced extension of this idea is Tang’s method which makes use of a pre-computed table of non-integer powers of two to further reduce the working range. This reduces the polynomial degree required to calculate all answers to machine precision. (The Julia pull request linked to in the introduction uses a similar scheme, and the range is reduced to $$|x| \le \ln(2)/512$$.) There’s also an entire domain of implementation details to be considered to ensure the calculation is actually performant on real hardware — e.g. optimizing for branch (mis-)prediction, instruction pipelining, auto-vectorization, etc — that are outside the scope of this article to discuss.

1. Note that for $$n = 0$$, the $$0$$-th derivative of $$f$$ is defined to be itself (e.g. $$d^0 f(x)/dx^0 = f(x)$$), and $$0! = 1$$ so that the first term always simplifies to just $$f(a)$$↩︎

2. This is technically not true for all possible ways of staring floating point values, but the IEEE 754 standard is used almost everywhere most users are likely to encounter floating point values. ↩︎

3. Using the exp2() function is not a subtle cheat and/or circular definition that itself depends on a properly defined generic exp() function when the argument is an integer type. You can verify with the Julia source code that it does exactly as I claimed and uses bit-level manipulations to directly construct the floating point value equal to $$2^k$$↩︎

4. The Chebyshev polynomial zeros are all contained within the interval $$-1$$ to $$1$$ (excluding the endpoints) and are distributed such that they correspond to evenly distributed points on a unit sphere mapped onto the $$x$$-axis. The left-hand panel below shows the $$n$$ Chebyshev zeros for a 12th degree polynomial as the blue points, with the two endpoints added in as the red points. Notice that the spacing between the first and second points and the penultimate and last points are not equal with the spacing between the rest of the successive pairs.

The initial distribution of points used in remez() is very similar but modified to directly produce $$n + 2$$ points distributed uniformly across the unit sphere with the endpoints included. This case is shown in the right-hand panel.
↩︎