The Legendre.jl package accumulates the features shown throughout this series (and more) into a package expressly designed for calculating numerically accurate Legendre polynomials in a performant manner. In this article I formally introduce the package and demonstrate how it can be used to quickly prototype a couple of useful computations that I have encountered in CMB analysis.

## Introduction¶

This series of articles has introduced the algorithmic underpinings of practically calculating the Associated Legendre polynomials. After introducing the mathematical definition in part I, the basic algorithm of running recurrence relations was explain in part II. Then in part III, we looked at how to modify the recurrence relations to support multiple styles of normalizing the Legendre polynomials, with the express purpose of avoiding numerical over/underflow that rapidly occurs for the unnormalized functions. Finally, the previous article in part IV delved into numerical implementation details that are critical to maintaining numerical accuracy to high degrees and orders.

All of these prior articles have been written using knowledge I have acquired while developing
the `Legendre.jl`

package, which (at the time of writing this article) has
just been tagged with its version 0.2.0 release.
In this article, I want to show some of the extra features the package provides that have
not yet been discussed, and then I’d like to conclude by showing a couple of example use
cases.

## Using `Legendre.jl`

¶

The `Legendre.jl`

package is not (at this time) a registered package in Julia’s General
repository, so the package must be installed manually.
(There are appropriate instructions in the package README on Github.)
After installing, calculating the Legendre polynomial values looks very similar to
the toy examples shown in previous articles — to calculate the value
`\(\lambda_{25}^{2}(\cos 0.25^\circ)\)`

, the `legendre`

function is invoked with the
requested `LegendreSphereNorm()`

trait for the spherical-harmonic normalization, the
degree and order, and the argument:

```
using Legendre
legendre(LegendreSphereNorm(), 25, 2, cosd(0.25))
```

```
0.0031082973212164973
```

One of the extra features of the package implementation over the sample algorithms shown
here in previous articles is that calculating each degree and order separately (i.e.
independent of any extra requested degree/order) is a wasteful process.
Evaluating the recurrence necessarily computes Legendre polynomial terms of lower
degrees and orders on its way to the target, so an easy way to improve the computational
performance is to calculate a set of terms simultaneously and store the intermediate
results at each step.
`Legendre.jl`

supports two specialized cases:

- Calculating all degrees
`\(0 \le \ell \le \ell_\mathrm{max}\)`

for a fixed order`\(m\)`

. - Calculating all degrees
`\(0 \le \ell \le \ell_\mathrm{max}\)`

and all orders`\(0 \le m \le m_\mathrm{max}\)`

(where`\(m_\mathrm{max} \le \ell_\mathrm{max}\)`

).

For instance, calculating the order `\(m = 25\)`

polynomials for all `\(\ell < 700\)`

(where the
terms with `\(\ell < m\)`

are implicitly defined to be zero^{1}) independently
is roughly 300 times slower (on my machine) than running the recurrence only once and
storing all values along the way:

```
using BenchmarkTools
x = cosd(0.25)
m = 25
lmax = 700
@btime [l < m ? 0.0 : legendre(LegendreSphereNorm(), l, 25, x) for l in 0:lmax];
@btime legendre(LegendreSphereNorm(), 0:lmax, 25, x);
```

```
1.457 ms (5115 allocations: 85.59 KiB)
4.672 μs (7 allocations: 5.73 KiB)
```

The package also provides an in-place interface so that allocation^{2} of the output
array can be avoided and reused, and it can do so with argument arrays of `\(x\)`

of any shape
(given an appropriately shaped output array):

```
lmax, mmax = 2, 2
x = rand(5, 2, 4)
polys = zeros(axes(x)..., lmax+1, mmax+1) # N.B. 0-start for degrees/orders
size(legendre!(LegendreSphereNorm(), polys, lmax, mmax, x))
# leading 3 dims are shape of x
# trailing 2 dims are ℓ and m axes
```

```
(5, 2, 4, 3, 3)
```

Finally, the package has been carefully designed to support generic number types throughout
the calculation.
This allows, for instance, arbitrary precision calculations with Julia’s default-provided
`BigFloat`

type (note the power of `\(\sim 10^{-22441}\)`

at the end!!)…

```
legendre(LegendreUnitNorm(), 45_000, 10_000, BigFloat(1.0 - eps(1.0)))
```

```
2.353712043124011996421479164920801921796897844034162933107271074638387312112269e-22441
```

…or using more “exotic” number types like `ForwardDiff.jl`

’s `Dual`

numbers to simultaneously calculate the derivative of the Legendre polynomials:

```
using ForwardDiff: Dual, value, partials
x = range(-1, 1, length=1000)
x′ = @. Dual(x, 1.0) # turns range x into a set of dual numbers
# calculates λ_25^12(x) for all x′
tmp = legendre(LegendreSphereNorm(), 25, 12, x′)
# extracting value and first derivative
λ = value.(tmp)
dλdx = partials.(tmp, 1)
using Plots
plot(plot(x, λ,
ylabel = raw"$λ_{25}^{12}(x)$", xticks = (-1:0.5:1, [""]),
title = raw"Legendre function (top) and derivative (bottom)"),
plot(x, dλdx,
ylabel=raw"$\frac{dλ_{25}^{12}}{dx}(x)$", xlabel=raw"$x$"),
layout = (2, 1), legend = false, link = :x
)
```

Because of Julia’s ability to recompile functions on demand for new input types,
the implementation for calculating Legendre polynomials is sufficient to also calculate
the first derivative (with respect to the argument `\(x\)`

) by leveraging a dual-number type.
This is one reason why writing a new implementation of a common special function purely
in Julia is a worthwhile endeavor — a wrapped call to a foreign library function cannot
benefit from the multiple dispatch and type generality that a pure-Julia implementation
can achieve.

For a more complete introduction to everything the package can do, see the package documentation which is linked from the Github README.

## Use Case: Spherical Harmonics¶

In part III of this series, we motivated the need for a prenormalized recurrence relation
as part of calculating the Spherical Harmonics.
With `Legendre.jl`

, we can demonstrate the ability to synthesize a field on the sphere from
its harmonic representation in terms of the coefficients `\(\{a_\mathrm{\ell m}\}\)`

.

Synthesis of the field `\(f(\theta,\phi)\)`

from its harmonic coefficients is a straight-forward
superposition of weighted spherical harmonics:
`\begin{align} f(\theta,\phi) &= \sum_{\ell=0}^{\ell_\mathrm{max}} \sum_{m=-\ell}^{\ell} a_{\ell m} Y_{\ell m}(\theta,\phi) \\ {} &= \sum_{\ell=0}^{\ell_\mathrm{max}} \sum_{m=-\ell}^{\ell} a_{\ell m} \lambda_\ell^m (\cos \theta) e^{im\phi} \end{align}`

The only non-trivial part of this sum is the evaluation of the spherical-harmonic normalized
Legendre functions `\(\lambda_\ell^m(\cos\theta)\)`

, which is solved by `Legendre.jl`

.
(Note that explicitly calculating the sums as written is an inefficient but simple synthesis
algorithm. A future posting will review the most common mechanism for accelerating the
calculation which leverages a fast-Fourier transform to avoid the explicit sum over the
orders `\(m\)`

and calculating many complex exponentials.)

As a simple concrete example, let us generate a random Gaussian field realization which has
a power spectrum which is consistent with `\(C_\ell = \ell^{-1}\)`

.
Being a Gaussian random field means the individual harmonic coefficients are random deviates
drawn from a normal distribution, with the expectation value of the variance being the
power spectrum at each degree: `\(\langle |a_{\ell m}|^2 \rangle = C_\ell\)`

.

```
lmax = 1000
alms = zeros(ComplexF64, lmax+1, lmax+1)
for l in 1:lmax
# m == 0 is real-only by symmetry
alms[l,1] = randn(Float64) / l
# m > 0 are complex random deviates
alms[l,2:l+1] .= randn.(ComplexF64) ./ l
end
```

For the [arbitrarily] chosen `\(\ell^{-1}\)`

spectrum, there is more power in low-`\(\ell\)`

modes
than in high-`\(\ell\)`

, so the expectation is that the field on the sphere will be dominated
by large-scale features, with the small-scale features “refining” the image with more detail.
(In contrast, a flat `\(C_\ell = 1\)`

spectrum would produce something looking like white noise.)

Defering the implementation of the synthesis routine (and foregoing the plotting code for
brevity), the following figure shows the same set of `\(\{a_{\ell m}\}\)`

s synthesized onto the
sphere with two different cut-offs: the left-hand panel is restricted to only the first
100 orders (`\(\ell \le 100\)`

) and shows clearly the smooth, large-scale features being produced
by the early low-`\(\ell\)`

modes.
Then the right-hand panel synthesizes all modes (`\(\ell \le 1000\)`

), and we see that much
smaller-scale features have added fine details on top of the existing background.

## Use Case: Pixel Window Functions¶

`\( \newcommand{\mat}[1]{\mathbf{#1}} \newcommand{\alm}{a_{\ell m}} \newcommand{\Ylm}{Y_{\ell m}} \newcommand{\llm}{\lambda_\ell^m} \newcommand{\ellmax}{\ell_{\mathrm{max}}} \newcommand{\wpix}{w_{\mat p}} \newcommand{\wl}{w_{\ell}} \newcommand{\wlm}{w_{\ell m}} \newcommand{\dx}{\frac{\Delta x}{2}} \newcommand{\dy}{\frac{\Delta y}{2}} \newcommand{\dz}{\frac{\Delta z}{2}} \newcommand{\dth}{\frac{\Delta\theta}{2}} \newcommand{\df}{\frac{\Delta\phi}{2}} \newcommand{\sinc}{\operatorname{sinc}} \)`

The underlying motivation for taking a deep dive through the details of numerical accuracy
that was discussed in part IV was the calculation of so-called pixel window functions (PWF).
That article used it to only motivate why we care about numerical accuracy of very small numbers
up to very high orders, but with the efficient implementation given by `Legendre.jl`

, we can
explore the mathematical problem itself.

A PWF is a quantification of the fact that discretizing the sphere into pixels means there is loss of information. The idea is easiest to see qualitatively in two opposite extremes. For spatial features much, much smaller than the size of the pixel, the pixelized image obviously contains no useful information from which the specific small-scale features can be recovered. At the opposite extreme, a constant value across the entire pixelized image can always be recovered no matter the size of the pixels because the image never changes (e.g. it is still always a constant). Between the two extremes, the ability to resolve features is suppressed, but by how much depends on various factors like the size of the feature compared to the pixels and the shape of the pixels.

The generic, quantitative description of this effect is
given in the documentation
of the HEALPix^{3} library (both a pixelization format and analysis library for the
sphere which is popular in CMB analysis).
I will forego including the derivation of the general formula since it is already well
described in the HEALPix documentation, and instead we’ll concentrate on calculating the
specific result for pixels on the sphere corresponding to an
equidistant cylindrical projection
(ECP).

The reason the ECP pixelization is interesting is that it is the pixelization used in the
Bicep/*Keck Array* analysis, and therefore it is what I was concerned with for my graduate
school work.
A couple of years ago, I dug into this calculation because I noticed a discrepancy in
Fig. 22 of the “matrix paper” (BK-VII^{4}), which is reproduced here:

Hopefully, the problem is clear given the quantitative description I just gave — the
“map” line (which corresponds to the pixel window function) does not limit to unity as
`\(\ell \rightarrow 0\)`

as I just stated.
The issue in the paper’s figure was that there was a coding error made in the function
that produced that line.

It needs to be clearly stated that this error in the calculation has no consequence on any results or conclusions drawn from the data — the explicit form of the PWF is never invoked. Instead, the Monte Carlo-based approach used in the collaboration’s analysis implicitly accounts for the PWF, and the “map” line in the figure is purely for visualizing one of the components of the total transfer function that the figure is presenting.

I proved the implementation error by rederiving the expressions for an ECP pixelization
and redoing the calculation using an early prototype of `Legendre.jl`

.
Therefore, I’d like to present the working implementation here.

The PWF `\(W_\ell\)`

is defined as the quantity
`\begin{align} W_\ell &\equiv \left\{ \frac{1}{N_{\mathrm{pix}}} \sum_{p=1}^{N_\mathrm{pix}} \left( \frac{4\pi}{2\ell + 1} \sum_{m=-\ell}^\ell \left| \wlm(\mat p) \right|^2 \right) \right\}^{1/2} &\text{where}\quad \wlm(\mat p) &= \int \wpix(\mat r) \Ylm(\mat r) \,d\mat r \label{eqn:pixelwindow:defn} \end{align}`

and where `\(\wpix(\mat r)\)`

is a normalized
*indicator function*
for the `\(p\)`

-th pixel `\(\mat p\)`

as a function of position `\(\mat r\)`

on the sphere.
The inner loop defined a pixel-specific window function `\(W_\ell(\mat p)\)`

, and because working
with thousands to millions of individual window functions is impractical in most cases, often
only the average over all pixels (outer loop) is used.

In the ECP pixelization, the pixels are squares in latitude and longitude and spaced
in a uniformly separated grid (in angle), so the pixel indicator functions
`\(\wpix(\mat r)\)`

are trivially defined:
`\begin{align} \wpix(\mat r) = \begin{cases} \Omega_p^{-1} & \theta \in \theta_p \pm \dth \quad\text{and}\quad \phi \in \phi_p \pm \df \\ 0 & \text{otherwise} \end{cases} \end{align}`

where `\(\mat r\)`

is a colatitude-azimuth coordinate pair `\((\theta,\phi)\)`

and each pixel
`\(\mat p\)`

is centered at the coordinates `\((\theta_p, \phi_p)\)`

with width `\(\Delta\theta\)`

and
height `\(\Delta\phi\)`

(covering coordinate area `\(\Omega_p = \Delta\theta \Delta\phi\)`

).
The summations are straight-forward to implement, so we will concentrate on evaluating
the integral.

Because of the simplicity of the pixel definition (and that it lines up
along the directions of integration), the integral is separable in each
direction.
Letting `\(z = \cos\theta\)`

(with appropriately defined `\(\Delta z\)`

) and substituting into
the integral `\(\Ylm(\theta,\phi) = \llm(\cos\theta) e^{im\phi}\)`

,
`\begin{align*} \wlm(\mat p) &= \int_{z_p-\dz}^{z_p+\dz} \int_{\phi_p-\df}^{\phi+\df} \Omega_p^{-1} \llm(z) e^{im\phi} \,dz\,d\phi \\ {} &= \Omega_p^{-1} \left( \int_{\phi_p-\df}^{\phi_p+\df} e^{im\phi} \,d\phi \right) \left( \int_{z_p-\dz}^{z_p+\dz} \llm(z) \,dz \right) \end{align*}`

The integral in `\(\phi\)`

has a closed form solution:
`\begin{align*} \int_{\phi_p-\df}^{\phi_p+\df} e^{im\phi} \,d\phi &= 2 e^{im\phi_p} \frac{\sin\left(m\df\right)}{m} = \Delta\phi \sinc\left(\frac{m\Delta\phi}{2}\right) e^{im\phi_p} \end{align*}`

where we’ve made use of the unnormalized sinc function,
`\(\displaystyle \sinc(x) \equiv \begin{cases} \hfil 1\hfil & \text{if } x = 0 \\ \frac{\sin(x)}{x} & \text{if } x \neq 0 \end{cases}\)`

.
There isn’t a trivial integral for the Legendre polynomials, so we’ll compute that quantity
numerically.

Only the conjugate-square quantity `\(\left|\wlm(\mat p)\right|^2\)`

is used, so we’ll make two
immediate simplifications.
First, the complex exponential `\(e^{im\phi_p}\)`

always cancels to unity with its complex
conjugate, so that term can be dropped.
Second, there’s a symmetry in the Legendre polynomials relating negative and positive
orders as `\(\lambda_\ell^{-m}(x) = (-1)^m \lambda_\ell^{+m}(x)\)`

, so we can truncate the sum
over orders `\(m\)`

to only `\(m \ge 0\)`

by including a multiplicity factor of `\(2\)`

when `\(m > 0\)`

.

Therefore, written in full, the specific PWFs are
`\begin{align} W_\ell(\mat p) &= \left\{ \sum_{m=0}^{\ell} \frac{\mu_m}{2\ell+1} \frac{4\pi \Delta\phi^2}{\Omega_p^2} \sinc^2 \left(\frac{m\Delta\phi}{2}\right) I_{\ell m}^2(\mat p) \right\}^{1/2} \\ &\qquad\qquad\text{where}\quad \begin{aligned} \mu_m &\equiv \begin{cases} \,1 & \text{if }m = 0 \\ \,2 & \text{otherwise} \end{cases} \\ I_{\ell m}(\mat p) &\equiv \int_{z_p-\dz}^{z_p+\dz} \llm(z)\,dz \end{aligned} \nonumber \end{align}`

for the pixels of an equidistant cylindrical projection, and the average PWF is the
quadrature average `\(W_\ell = \sqrt{N_\mathrm{pix}^{-1}\sum_{\mat p} W_\ell^2(\mat p)}\)`

.

To do the numerical integration of the Legendre functions, we’ll make use of the
`HCubature.jl`

package.
(In practice, trapezoidal and Simpson’s rule integration using
a couple dozen to couple hundred iterations, respectively, is also sufficient.)
The rest of the function is pretty straight-forward, especially with Julia’s broadcasting
(as long as you take care to setup the shapes of `ℓrng`

and `mrng`

correctly):

```
using Legendre: λlm
using LinearAlgebra: norm
using HCubature
# _sinc(x) == sinc(x/π) without the extra division
_sinc(x) = iszero(x) ? one(sin(zero(x))) : sin(x) / x
function ecp_pixel_window_function(θmin, θmax, ϕmin, ϕmax, lmax)
ℓrng, mrng = 0:lmax, (0:lmax)'
zmin, zmax = cos(θmax), cos(θmin)
Δz, Δϕ = zmax - zmin, ϕmax - ϕmin
# Individual squared integrals
Φint = @. Δϕ^2 * _sinc(mrng * Δϕ / 2)^2
Λint, _ = hquadrature(z -> λlm(0:lmax, 0:lmax, z), zmin, zmax,
norm = x -> norm((e^2 for e in x), 2)) # converge on integral squared
Λint .^= 2
# product of factors that depend on m
intp = @. Φint * Λint
intp[:,2:end] .*= 2 # μ_m factor for m > 0 terms
Wl = dropdims(sum(intp, dims = 2), dims = 2)
# remaining factors that only depend on ℓ
@. Wl = sqrt(4π * Wl / (2ℓrng + 1) / (Δz * Δϕ)^2)
return Wl
end
```

The Bicep/*Keck Array* pixelization is defined as an ECP grid with the standard parallel at
`\(-57.5^\circ\)`

where square pixels have sides lengths of `\(0.25^\circ\)`

of arc, and the
map spans the area in right ascension `\(-55^\circ \le \text{RA} \le 55^\circ\)`

and
declination `\(-70^\circ \le \text{Dec} \le -45^\circ\)`

.
An important subtlety here is that the pixelization is defined is in terms of physical arc
lengths, not coordinates lengths.
Mapping latitude/longitude onto the sphere compresses lines of longitude nearer to one
another toward the poles, so the coordinate step `\(\Delta\lambda > 0.25^\circ\)`

.
In fact, at `\(\delta = -57.5^\circ\)`

, we have
`\begin{align} \Delta\delta &= 0.25^\circ & \rightarrow \Delta\theta &\approx 4.36 \cdot 10^{-3}\,\mathrm{rad}\\ \Delta\lambda &= \frac{0.25^\circ}{\cos(-57.5^\circ)} \approx 0.465^\circ & \rightarrow \Delta\phi &\approx 8.12 \cdot 10^{-3}\,\mathrm{rad} \end{align}`

```
# pixel sizes and pixel centers
Δδ, Δλ = (0.250, 0.250/cosd(-57.5))
δ₀, λ₀ = (-57.5, 0.000) .+ 0.5 .* (Δδ, Δλ)
# convert from RA/Dec in deg to θ/ϕ in rad
θmin, ϕmin = deg2rad(90 - (δ₀ + 0.5Δδ)), deg2rad(λ₀ - 0.5Δλ)
θmax, ϕmax = deg2rad(90 - (δ₀ - 0.5Δδ)), deg2rad(λ₀ + 0.5Δλ)
lmax = 350
Wl = ecp_pixel_window_function(θmin, θmax, ϕmin, ϕmax, lmax)
```

Comparing this recalculated PWF `\(W_\ell\)`

against that of Fig. 4.1, we now find the
expected behavior at very low `\(\ell\)`

.
(Note that the quantity being plotted is squared to correspond to the suppression effect
in power spectrum space, whereas the PWF is defined as how power is suppressed for feature
of a given angular size in real (“map”) space.)

```
plot(0:lmax, Wl .^ 2,
ylims = (0, 1.05), xlims = (0, 350),
size = (400, 250), legend = false,
title = "BICEP pixel window function at dec -57.5°", titlefont = Plots.font(10),
ylabel = "suppression (\$C_\\ell\$)",
xlabel = "multipole \$\\ell\$")
```

A final interesting detail is that by symmetry, pixels of any azimuth `\(\phi\)`

have the same
PWF (only factors of `\(\Delta\phi\)`

appear in the expression), but only within an iso-latitude
ring since there *is* a dependence on the colatitude `\(\theta\)`

.
Therefore, there is actually a family of PWFs based on the span across declination of the
pixelized map that should in principle be averaged over as well.

That is left as an “exercise for the reader” 😄.

## Conclusions¶

Writing `Legendre.jl`

has been a pretty long diversion from what I originally intended to be
working on, but as soon as I saw the various use cases that a generic implementation would
provide, I felt it was a task I should complete.
Hopefully this article has convinced you of the same.

Going forward, I’d like to return to my original goal of demonstrating a new implementation
of the Bicep/*Keck Array* purification process, so `Legendre.jl`

will return in those articles.

Implicitly defining the polynomial terms

`\(P_\ell^m(x) = 0\)`

for`\(\ell < m\)`

is convenient for two primary reasons. First, indexing the array to a specific degree is independent of the order`\(m\)`

. Second, the all-`\(\ell\)`

/all-`\(m\)`

case is naturally described as a matrix of values where only the lower-triangle is filled with values. One could pack the matrix case to decrease the storage by roughly a factor of two, but (a) indexing then requires extra logic to calculate the correct offset, and (b) the fixed-`\(m\)`

case would no longer be just a slice of the all-`\(m\)`

array if the values were packed. ↩︎The in-place interface avoids allocation of the output array, but there are still allocations that occur due to the need for some internal state. Despite the lack of being allocation free, eliding the output allocation will typically be an advantage because it is often much larger than the state variables — the state is a constant multiple of the argument array size, while the output will be anywhere from

`\(\mathcal{O}(1)\)`

(for single (`\(\ell, m\)`

) calculation) to`\(\mathcal{O}(\ell_\mathrm{max}\cdot m_\mathrm{max})\)`

(for the all-`\(\ell\)`

/all-`\(m\)`

case) times larger than the argument array. It is also planned that a future version will address this by including an interface for providing a reusable state buffer. ↩︎K. M. Górski

*et al*. “HEALPix: A Framework for High-Resolution Discretization and Fast Analysis of Data Distributed on the Sphere”. In*Astrophys. J.*622 (Apr. 2005), pp. 759–771. ads: 2005ApJ…622..759G. ↩︎The Bicep2 and

*Keck Array*Collaborations. " Bicep2/*Keck Array*VII: Matrix Based*E*/*B*Separation Applied to Bicep2 and the*Keck Array*". In*Astrophys. J.*825, 66 (July 2016), p. 66. ads: 2016ApJ…825…66A ↩︎