The main `Legendre.jl`

series contains a detailed article considering the numerical
accuracy of the recurrence implementation as a whole.
This short addendum presents a mathematical transformation which is used to improve the
accuracy of one of the recurrence relation coefficients for the spherical harmonic
normalization.

## Introduction¶

In part III of the Legendre.jl series, I introduced how to compute
pre-normalized associated Legendre polynomials by inserting the normalization condition
into the recurrence relations that are iterated to calculate the polynomial values.
Specifically, the standard associated Legendre polynomial recurrence relations define
an unnormalized “unity” case (so called because `$P_0^0(x) = 1$`

for all `$x$`

), and then
we derived and added a “spherical harmonic” normalization which bakes in the appropriate
factor required to combine the Legendre polynomials with a complex exponential in the
definition of the spherical harmonics.

Then in part IV of the series, I concentrated on introducing a couple of algorithmic transformations that are independent of any specific normalization scheme to improve the numerical accuracy of the results in general. In contrast to that generic case, though, there are also numerical concerns specific to the normalization conditions themselves, and I specifically alluded to that fact in that article when I made the comment,

… (Of course, the normalization-specific constants should also be calculated in a numerically accurate way, but since that’s specific to each case, we’ll just assume that has been done expertly here.) …

At the time of writing that article, I already knew about an improvement that could be made to the spherical harmonic normalization’s recurrence coefficients, but I thought the digression to explain the modification was out of scope for both of those articles. This note is to simply introduce the transformation I have identified.

## The Numerical Problem¶

The coefficient of concern is the `$\mu_\ell$`

term which is used to boost along the
“diagonal”, taking `$\lambda_\ell^\ell(x) \rightarrow \lambda_{\ell+1}^{\ell+1}(x)$`

.
For the spherical harmonic normalization, the straight-forward derivation gives the simple
expression
`\begin{align} \mu_\ell &= \sqrt{1 + \frac{1}{2\ell}} \end{align}`

which was quoted in Part III.
This particular expression, though, is a victim of some numerical accuracy problems that
are similar to those explored in Part IV.

First, the dynamic part of the expression (the `$(2\ell)^{-1}$`

factor) decreases in
significance compared to the constant `$1$`

factor.
This is a concern because the relative change from `$\mu_{\ell-1}$`

to `$\mu_\ell$`

is all
that distinguishes each step of the recurrence from the previous, and the change
`$\mu_\ell - \mu_{\ell-1}$`

is evocative of the catastrophic cancellation that was
extensively discussed in Part IV.
If the “identity” of each step of the recurrence fades as `$\ell \rightarrow \infty$`

,
we can surmise that this could decrease the numerical accuracy at high `$\ell$`

.

The prior concern is exacerbated by another familiar numerical concern from Part IV —
square roots of floating point quantities can introduce numerical error to the calculation.
In this case, the problem arises because the `$(2\ell)^{-1}$`

factor is increasingly
contained in the least significant digits of the quantity, and the square root operation
progressively has fewer and fewer digits to work with (as they are truncated away in the
finite representation of `$1 + \epsilon$`

for small `$\epsilon$`

).

Before sharing the solution to these problems, let us first visualize the problem.
The following figure shows the difference between the values of `$\mu_\ell$`

computed
using both 32-bit and 64-bit floating point numbers versus a reference value computed
using extended-precision `BigFloat`

values.

There are two obvious conclusions to be drawn from this figure.
First, this particular expression is not able to produce correctly-rounded coefficients
for either 32-bit or 64-bit floating point values, and instead the values fluctuation
between `$-1$`

, `$0$`

, and `$1$`

ulps of error.
Ideally the true error would be less than half an ulp of error so that during rounding,
the finite floating point value would be correctly rounded to the nearest representable
value.

The second obvious (and potentially more serious) problem is that for `$\ell \gtrsim 1000$`

,
the error in the 32-bit floating point values is biased low (i.e. the error is always
either $0$ or $-1$ ulps).
We might expect this to introduce a systematic bias in the Legendre polynomials at high
`$\ell$`

as well due to this fact.^{1}

## Stabilizing the recurrence coefficient¶

With an understanding of the two underlying causes, we can employ a bit of algebra to solve the problem.

The first issue wouldn’t be a problem if the constant bias by `$1$`

didn’t exist.
Floating point numbers actually have variable density across the real number line, and
the density increases logarithmically approaching `$0$`

.
Therefore, let us consider the modified coefficient
`\begin{align*} \mu'_\ell &= \mu_\ell - 1 = \sqrt{1 + \frac{1}{2\ell}} - 1 \end{align*}`

If we ignore the problem of actually numerically computing the intermediate steps without
rounding, the modified coefficient `$\mu'_\ell$`

has better numerical behavior because
`$\displaystyle\lim_{\ell\rightarrow0} \mu'_\ell = 0$`

and therefore is able to take
advantage of the logarithmic density of floating point values.
Said differently, `$\mu_\ell$`

mostly changes in the least-significant bits of the
significand because there’s always the constant `$1$`

which must be represented by the
exponent `$2^0$`

.
In contrast, `$\mu'_\ell$`

can “reset” its scale with changes to the exponent, so the
significand will on average have all of its bits available to actually store the value’s
binary representation.

In reality, we **do** have to worry about the intermediate calculations, and in this
form we have actually added a catastrophic cancellation problem between the factors of
`$\sqrt{1 + \frac{1}{2\ell}}$`

and `$1$`

.
We can eliminate the cancellation, though, by completing the square so that the `$1$`

inside the square root and outside cancel out.
Including some simplifications, we have:
`\begin{align*} \left( \sqrt{1 + \frac{1}{2\ell}} - 1 \right) \times \frac{\sqrt{1 + \frac{1}{2\ell}} + 1}{\sqrt{1 + \frac{1}{2\ell}} + 1} &= \frac{\left(1 + \frac{1}{2\ell}\right) - 1}{\sqrt{1 + \frac{1}{2\ell}} + 1} \\ {} &= \frac{\frac{1}{2\ell}}{\sqrt{1 + \frac{1}{2\ell}} + 1} \\ \mu'_\ell &= \frac{1}{\sqrt{2\ell(2\ell + 1)} + 2\ell} \end{align*}`

What we’ve gained by this transformation is (a) eliminating the catastrophic cancellation
we just created, and (b) the square root is now taken on a quantity which grows
quadratically rather than the asymptoting quantity (which translates to always having more
bits of significance to work with).

To finish, we just rewrite `$\mu_\ell$`

in terms of `$\mu'_\ell$`

to arrive at the modified
expression
`\begin{align} \mu_\ell &= 1 + \left[ 2\ell + \sqrt{2\ell(2\ell + 1)} \right]^{-1} \end{align}`

There is still an unavoidable numerical problem with the second term always decreasing
in significance with respect to the constant factor of unity, but the growth of the
error is slower due to moving the constant outside of the square root.

Substituting this new form into the exercise shown in Fig. 2.1, we again plot the
relative error (in ulps) of 32-bit and 64-bit floating point calculations against the
extended `BigFloat`

calculation.

The same biasing of error terms should also appear in the 64-bit floating point case as well, but at much higher

`$\ell$`

. 32-bit floats have only 23 bits in the significand, and the representation of`$1 + \frac{1}{2\ell}$`

changes only in the least-signficant 12 bits or so after`$\ell \approx 1000$`

(i.e.`$23 - \left[\log_2 1.0 - \log_2 \frac{1}{2\ell}\right] \lesssim 12$`

for`$\ell \gtrsim 1000$`

). To push the factor of`$(2\ell)^{-1}$`

into the bottom 12 or so bits of a 64-bit floating point number with 52 bits in the significand requires pushing $\ell$ out to values on the order of`$2\ell \approx 2^{52 - 12}$`

, e.g.`$\ell \gtrsim 5.5\times 10^{11}$`

. ↩︎