# Introducing Legendre.jl

Legendre.jl Series: Parts 1, 2, 3, 4, 5

The Legendre.jl package accumulates the features shown throughout this series (and more) into a package expressly designed for calculating numerically accurate Legendre polynomials in a performant manner. In this article I formally introduce the package and demonstrate how it can be used to quickly prototype a couple of useful computations that I have encountered in CMB analysis.

## Introduction¶

This series of articles has introduced the algorithmic underpinings of practically calculating the Associated Legendre polynomials. After introducing the mathematical definition in part I, the basic algorithm of running recurrence relations was explain in part II. Then in part III, we looked at how to modify the recurrence relations to support multiple styles of normalizing the Legendre polynomials, with the express purpose of avoiding numerical over/underflow that rapidly occurs for the unnormalized functions. Finally, the previous article in part IV delved into numerical implementation details that are critical to maintaining numerical accuracy to high degrees and orders.

All of these prior articles have been written using knowledge I have acquired while developing the Legendre.jl package, which (at the time of writing this article) has just been tagged with its version 0.2.0 release. In this article, I want to show some of the extra features the package provides that have not yet been discussed, and then I’d like to conclude by showing a couple of example use cases.

## Using Legendre.jl¶

The Legendre.jl package is not (at this time) a registered package in Julia’s General repository, so the package must be installed manually. (There are appropriate instructions in the package README on Github.) After installing, calculating the Legendre polynomial values looks very similar to the toy examples shown in previous articles — to calculate the value $$\lambda_{25}^{2}(\cos 0.25^\circ)$$, the legendre function is invoked with the requested LegendreSphereNorm() trait for the spherical-harmonic normalization, the degree and order, and the argument:

using Legendre

legendre(LegendreSphereNorm(), 25, 2, cosd(0.25))

0.0031082973212164973


One of the extra features of the package implementation over the sample algorithms shown here in previous articles is that calculating each degree and order separately (i.e. independent of any extra requested degree/order) is a wasteful process. Evaluating the recurrence necessarily computes Legendre polynomial terms of lower degrees and orders on its way to the target, so an easy way to improve the computational performance is to calculate a set of terms simultaneously and store the intermediate results at each step. Legendre.jl supports two specialized cases:

1. Calculating all degrees $$0 \le \ell \le \ell_\mathrm{max}$$ for a fixed order $$m$$.
2. Calculating all degrees $$0 \le \ell \le \ell_\mathrm{max}$$ and all orders $$0 \le m \le m_\mathrm{max}$$ (where $$m_\mathrm{max} \le \ell_\mathrm{max}$$).

For instance, calculating the order $$m = 25$$ polynomials for all $$\ell < 700$$ (where the terms with $$\ell < m$$ are implicitly defined to be zero1) independently is roughly 300 times slower (on my machine) than running the recurrence only once and storing all values along the way:

using BenchmarkTools

x = cosd(0.25)
m = 25
lmax = 700

@btime [l < m ? 0.0 : legendre(LegendreSphereNorm(), l, 25, x) for l in 0:lmax];
@btime legendre(LegendreSphereNorm(), 0:lmax, 25, x);

  1.457 ms (5115 allocations: 85.59 KiB)
4.672 μs (7 allocations: 5.73 KiB)


The package also provides an in-place interface so that allocation2 of the output array can be avoided and reused, and it can do so with argument arrays of $$x$$ of any shape (given an appropriately shaped output array):

lmax, mmax = 2, 2
x = rand(5, 2, 4)
polys = zeros(axes(x)..., lmax+1, mmax+1) # N.B. 0-start for degrees/orders

size(legendre!(LegendreSphereNorm(), polys, lmax, mmax, x))
# leading 3 dims are shape of x
# trailing 2 dims are ℓ and m axes

(5, 2, 4, 3, 3)


Finally, the package has been carefully designed to support generic number types throughout the calculation. This allows, for instance, arbitrary precision calculations with Julia’s default-provided BigFloat type (note the power of $$\sim 10^{-22441}$$ at the end!!)…

legendre(LegendreUnitNorm(), 45_000, 10_000, BigFloat(1.0 - eps(1.0)))

2.353712043124011996421479164920801921796897844034162933107271074638387312112269e-22441


…or using more “exotic” number types like ForwardDiff.jl’s Dual numbers to simultaneously calculate the derivative of the Legendre polynomials:

using ForwardDiff: Dual, value, partials

x  = range(-1, 1, length=1000)
x′ = @. Dual(x, 1.0) # turns range x into a set of dual numbers

# calculates λ_25^12(x) for all x′
tmp = legendre(LegendreSphereNorm(), 25, 12, x′)
# extracting value and first derivative
λ    = value.(tmp)
dλdx = partials.(tmp, 1)

using Plots
plot(plot(x, λ,
ylabel = raw"$λ_{25}^{12}(x)$", xticks = (-1:0.5:1, [""]),
title = raw"Legendre function (top) and derivative (bottom)"),
plot(x, dλdx,
ylabel=raw"$\frac{dλ_{25}^{12}}{dx}(x)$", xlabel=raw"$x$"),
layout = (2, 1), legend = false, link = :x
)


Because of Julia’s ability to recompile functions on demand for new input types, the implementation for calculating Legendre polynomials is sufficient to also calculate the first derivative (with respect to the argument $$x$$) by leveraging a dual-number type. This is one reason why writing a new implementation of a common special function purely in Julia is a worthwhile endeavor — a wrapped call to a foreign library function cannot benefit from the multiple dispatch and type generality that a pure-Julia implementation can achieve.

For a more complete introduction to everything the package can do, see the package documentation which is linked from the Github README.

## Use Case: Spherical Harmonics¶

In part III of this series, we motivated the need for a prenormalized recurrence relation as part of calculating the Spherical Harmonics. With Legendre.jl, we can demonstrate the ability to synthesize a field on the sphere from its harmonic representation in terms of the coefficients $$\{a_\mathrm{\ell m}\}$$.

Synthesis of the field $$f(\theta,\phi)$$ from its harmonic coefficients is a straight-forward superposition of weighted spherical harmonics: \begin{align} f(\theta,\phi) &= \sum_{\ell=0}^{\ell_\mathrm{max}} \sum_{m=-\ell}^{\ell} a_{\ell m} Y_{\ell m}(\theta,\phi) \\ {} &= \sum_{\ell=0}^{\ell_\mathrm{max}} \sum_{m=-\ell}^{\ell} a_{\ell m} \lambda_\ell^m (\cos \theta) e^{im\phi} \end{align} The only non-trivial part of this sum is the evaluation of the spherical-harmonic normalized Legendre functions $$\lambda_\ell^m(\cos\theta)$$, which is solved by Legendre.jl. (Note that explicitly calculating the sums as written is an inefficient but simple synthesis algorithm. A future posting will review the most common mechanism for accelerating the calculation which leverages a fast-Fourier transform to avoid the explicit sum over the orders $$m$$ and calculating many complex exponentials.)

As a simple concrete example, let us generate a random Gaussian field realization which has a power spectrum which is consistent with $$C_\ell = \ell^{-1}$$. Being a Gaussian random field means the individual harmonic coefficients are random deviates drawn from a normal distribution, with the expectation value of the variance being the power spectrum at each degree: $$\langle |a_{\ell m}|^2 \rangle = C_\ell$$.

lmax = 1000
alms = zeros(ComplexF64, lmax+1, lmax+1)
for l in 1:lmax
# m == 0 is real-only by symmetry
alms[l,1] = randn(Float64) / l
# m > 0 are complex random deviates
alms[l,2:l+1] .= randn.(ComplexF64) ./ l
end


For the [arbitrarily] chosen $$\ell^{-1}$$ spectrum, there is more power in low-$$\ell$$ modes than in high-$$\ell$$, so the expectation is that the field on the sphere will be dominated by large-scale features, with the small-scale features “refining” the image with more detail. (In contrast, a flat $$C_\ell = 1$$ spectrum would produce something looking like white noise.)

Defering the implementation of the synthesis routine (and foregoing the plotting code for brevity), the following figure shows the same set of $$\{a_{\ell m}\}$$s synthesized onto the sphere with two different cut-offs: the left-hand panel is restricted to only the first 100 orders ($$\ell \le 100$$) and shows clearly the smooth, large-scale features being produced by the early low-$$\ell$$ modes. Then the right-hand panel synthesizes all modes ($$\ell \le 1000$$), and we see that much smaller-scale features have added fine details on top of the existing background.

## Use Case: Pixel Window Functions¶

The underlying motivation for taking a deep dive through the details of numerical accuracy that was discussed in part IV was the calculation of so-called pixel window functions (PWF). That article used it to only motivate why we care about numerical accuracy of very small numbers up to very high orders, but with the efficient implementation given by Legendre.jl, we can explore the mathematical problem itself.

A PWF is a quantification of the fact that discretizing the sphere into pixels means there is loss of information. The idea is easiest to see qualitatively in two opposite extremes. For spatial features much, much smaller than the size of the pixel, the pixelized image obviously contains no useful information from which the specific small-scale features can be recovered. At the opposite extreme, a constant value across the entire pixelized image can always be recovered no matter the size of the pixels because the image never changes (e.g. it is still always a constant). Between the two extremes, the ability to resolve features is suppressed, but by how much depends on various factors like the size of the feature compared to the pixels and the shape of the pixels.

The generic, quantitative description of this effect is given in the documentation of the HEALPix3 library (both a pixelization format and analysis library for the sphere which is popular in CMB analysis). I will forego including the derivation of the general formula since it is already well described in the HEALPix documentation, and instead we’ll concentrate on calculating the specific result for pixels on the sphere corresponding to an equidistant cylindrical projection (ECP).

The reason the ECP pixelization is interesting is that it is the pixelization used in the Bicep/Keck Array analysis, and therefore it is what I was concerned with for my graduate school work. A couple of years ago, I dug into this calculation because I noticed a discrepancy in Fig. 22 of the “matrix paper” (BK-VII4), which is reproduced here:

Hopefully, the problem is clear given the quantitative description I just gave — the “map” line (which corresponds to the pixel window function) does not limit to unity as $$\ell \rightarrow 0$$ as I just stated. The issue in the paper’s figure was that there was a coding error made in the function that produced that line.

It needs to be clearly stated that this error in the calculation has no consequence on any results or conclusions drawn from the data — the explicit form of the PWF is never invoked. Instead, the Monte Carlo-based approach used in the collaboration’s analysis implicitly accounts for the PWF, and the “map” line in the figure is purely for visualizing one of the components of the total transfer function that the figure is presenting.

I proved the implementation error by rederiving the expressions for an ECP pixelization and redoing the calculation using an early prototype of Legendre.jl. Therefore, I’d like to present the working implementation here.

The PWF $$W_\ell$$ is defined as the quantity \begin{align} W_\ell &\equiv \left\{ \frac{1}{N_{\mathrm{pix}}} \sum_{p=1}^{N_\mathrm{pix}} \left( \frac{4\pi}{2\ell + 1} \sum_{m=-\ell}^\ell \left| \wlm(\mat p) \right|^2 \right) \right\}^{1/2} &\text{where}\quad \wlm(\mat p) &= \int \wpix(\mat r) \Ylm(\mat r) \,d\mat r \label{eqn:pixelwindow:defn} \end{align} and where $$\wpix(\mat r)$$ is a normalized indicator function for the $$p$$-th pixel $$\mat p$$ as a function of position $$\mat r$$ on the sphere. The inner loop defined a pixel-specific window function $$W_\ell(\mat p)$$, and because working with thousands to millions of individual window functions is impractical in most cases, often only the average over all pixels (outer loop) is used.

In the ECP pixelization, the pixels are squares in latitude and longitude and spaced in a uniformly separated grid (in angle), so the pixel indicator functions $$\wpix(\mat r)$$ are trivially defined: \begin{align} \wpix(\mat r) = \begin{cases} \Omega_p^{-1} & \theta \in \theta_p \pm \dth \quad\text{and}\quad \phi \in \phi_p \pm \df \\ 0 & \text{otherwise} \end{cases} \end{align} where $$\mat r$$ is a colatitude-azimuth coordinate pair $$(\theta,\phi)$$ and each pixel $$\mat p$$ is centered at the coordinates $$(\theta_p, \phi_p)$$ with width $$\Delta\theta$$ and height $$\Delta\phi$$ (covering coordinate area $$\Omega_p = \Delta\theta \Delta\phi$$). The summations are straight-forward to implement, so we will concentrate on evaluating the integral.

Because of the simplicity of the pixel definition (and that it lines up along the directions of integration), the integral is separable in each direction. Letting $$z = \cos\theta$$ (with appropriately defined $$\Delta z$$) and substituting into the integral $$\Ylm(\theta,\phi) = \llm(\cos\theta) e^{im\phi}$$, \begin{align*} \wlm(\mat p) &= \int_{z_p-\dz}^{z_p+\dz} \int_{\phi_p-\df}^{\phi+\df} \Omega_p^{-1} \llm(z) e^{im\phi} \,dz\,d\phi \\ {} &= \Omega_p^{-1} \left( \int_{\phi_p-\df}^{\phi_p+\df} e^{im\phi} \,d\phi \right) \left( \int_{z_p-\dz}^{z_p+\dz} \llm(z) \,dz \right) \end{align*} The integral in $$\phi$$ has a closed form solution: \begin{align*} \int_{\phi_p-\df}^{\phi_p+\df} e^{im\phi} \,d\phi &= 2 e^{im\phi_p} \frac{\sin\left(m\df\right)}{m} = \Delta\phi \sinc\left(\frac{m\Delta\phi}{2}\right) e^{im\phi_p} \end{align*} where we’ve made use of the unnormalized sinc function, $$\displaystyle \sinc(x) \equiv \begin{cases} \hfil 1\hfil & \text{if } x = 0 \\ \frac{\sin(x)}{x} & \text{if } x \neq 0 \end{cases}$$. There isn’t a trivial integral for the Legendre polynomials, so we’ll compute that quantity numerically.

Only the conjugate-square quantity $$\left|\wlm(\mat p)\right|^2$$ is used, so we’ll make two immediate simplifications. First, the complex exponential $$e^{im\phi_p}$$ always cancels to unity with its complex conjugate, so that term can be dropped. Second, there’s a symmetry in the Legendre polynomials relating negative and positive orders as $$\lambda_\ell^{-m}(x) = (-1)^m \lambda_\ell^{+m}(x)$$, so we can truncate the sum over orders $$m$$ to only $$m \ge 0$$ by including a multiplicity factor of $$2$$ when $$m > 0$$.

Therefore, written in full, the specific PWFs are \begin{align} W_\ell(\mat p) &= \left\{ \sum_{m=0}^{\ell} \frac{\mu_m}{2\ell+1} \frac{4\pi \Delta\phi^2}{\Omega_p^2} \sinc^2 \left(\frac{m\Delta\phi}{2}\right) I_{\ell m}^2(\mat p) \right\}^{1/2} \\ &\qquad\qquad\text{where}\quad \begin{aligned} \mu_m &\equiv \begin{cases} \,1 & \text{if }m = 0 \\ \,2 & \text{otherwise} \end{cases} \\ I_{\ell m}(\mat p) &\equiv \int_{z_p-\dz}^{z_p+\dz} \llm(z)\,dz \end{aligned} \nonumber \end{align} for the pixels of an equidistant cylindrical projection, and the average PWF is the quadrature average $$W_\ell = \sqrt{N_\mathrm{pix}^{-1}\sum_{\mat p} W_\ell^2(\mat p)}$$.

To do the numerical integration of the Legendre functions, we’ll make use of the HCubature.jl package. (In practice, trapezoidal and Simpson’s rule integration using a couple dozen to couple hundred iterations, respectively, is also sufficient.) The rest of the function is pretty straight-forward, especially with Julia’s broadcasting (as long as you take care to setup the shapes of ℓrng and mrng correctly):

using Legendre: λlm
using LinearAlgebra: norm
using HCubature

# _sinc(x) == sinc(x/π) without the extra division
_sinc(x) = iszero(x) ? one(sin(zero(x))) : sin(x) / x

function ecp_pixel_window_function(θmin, θmax, ϕmin, ϕmax, lmax)
ℓrng, mrng = 0:lmax, (0:lmax)'
zmin, zmax = cos(θmax), cos(θmin)
Δz, Δϕ = zmax - zmin, ϕmax - ϕmin

# Individual squared integrals
Φint = @. Δϕ^2 * _sinc(mrng * Δϕ / 2)^2
Λint, _ = hquadrature(z -> λlm(0:lmax, 0:lmax, z), zmin, zmax,
norm = x -> norm((e^2 for e in x), 2)) # converge on integral squared
Λint .^= 2
# product of factors that depend on m
intp = @. Φint * Λint
intp[:,2:end] .*= 2 # μ_m factor for m > 0 terms
Wl = dropdims(sum(intp, dims = 2), dims = 2)
# remaining factors that only depend on ℓ
@. Wl = sqrt(4π * Wl / (2ℓrng + 1) / (Δz * Δϕ)^2)
return Wl
end


The Bicep/Keck Array pixelization is defined as an ECP grid with the standard parallel at $$-57.5^\circ$$ where square pixels have sides lengths of $$0.25^\circ$$ of arc, and the map spans the area in right ascension $$-55^\circ \le \text{RA} \le 55^\circ$$ and declination $$-70^\circ \le \text{Dec} \le -45^\circ$$. An important subtlety here is that the pixelization is defined is in terms of physical arc lengths, not coordinates lengths. Mapping latitude/longitude onto the sphere compresses lines of longitude nearer to one another toward the poles, so the coordinate step $$\Delta\lambda > 0.25^\circ$$. In fact, at $$\delta = -57.5^\circ$$, we have \begin{align} \Delta\delta &= 0.25^\circ & \rightarrow \Delta\theta &\approx 4.36 \cdot 10^{-3}\,\mathrm{rad}\\ \Delta\lambda &= \frac{0.25^\circ}{\cos(-57.5^\circ)} \approx 0.465^\circ & \rightarrow \Delta\phi &\approx 8.12 \cdot 10^{-3}\,\mathrm{rad} \end{align}

# pixel sizes and pixel centers
Δδ, Δλ = (0.250,  0.250/cosd(-57.5))
δ₀, λ₀ = (-57.5,  0.000) .+ 0.5 .* (Δδ, Δλ)
# convert from RA/Dec in deg to θ/ϕ in rad
θmin, ϕmin = deg2rad(90 - (δ₀ + 0.5Δδ)), deg2rad(λ₀ - 0.5Δλ)
θmax, ϕmax = deg2rad(90 - (δ₀ - 0.5Δδ)), deg2rad(λ₀ + 0.5Δλ)

lmax = 350
Wl = ecp_pixel_window_function(θmin, θmax, ϕmin, ϕmax, lmax)


Comparing this recalculated PWF $$W_\ell$$ against that of Fig. 4.1, we now find the expected behavior at very low $$\ell$$. (Note that the quantity being plotted is squared to correspond to the suppression effect in power spectrum space, whereas the PWF is defined as how power is suppressed for feature of a given angular size in real (“map”) space.)

plot(0:lmax, Wl .^ 2,
ylims = (0, 1.05), xlims = (0, 350),
size = (400, 250), legend = false,
title = "BICEP pixel window function at dec -57.5°", titlefont = Plots.font(10),
ylabel = "suppression (\$C_\\ell\$)",
xlabel = "multipole \$\\ell\$")


A final interesting detail is that by symmetry, pixels of any azimuth $$\phi$$ have the same PWF (only factors of $$\Delta\phi$$ appear in the expression), but only within an iso-latitude ring since there is a dependence on the colatitude $$\theta$$. Therefore, there is actually a family of PWFs based on the span across declination of the pixelized map that should in principle be averaged over as well.

That is left as an “exercise for the reader” 😄.

## Conclusions¶

Writing Legendre.jl has been a pretty long diversion from what I originally intended to be working on, but as soon as I saw the various use cases that a generic implementation would provide, I felt it was a task I should complete. Hopefully this article has convinced you of the same.

Going forward, I’d like to return to my original goal of demonstrating a new implementation of the Bicep/Keck Array purification process, so Legendre.jl will return in those articles.

1. Implicitly defining the polynomial terms $$P_\ell^m(x) = 0$$ for $$\ell < m$$ is convenient for two primary reasons. First, indexing the array to a specific degree is independent of the order $$m$$. Second, the all-$$\ell$$/all-$$m$$ case is naturally described as a matrix of values where only the lower-triangle is filled with values. One could pack the matrix case to decrease the storage by roughly a factor of two, but (a) indexing then requires extra logic to calculate the correct offset, and (b) the fixed-$$m$$ case would no longer be just a slice of the all-$$m$$ array if the values were packed. ↩︎

2. The in-place interface avoids allocation of the output array, but there are still allocations that occur due to the need for some internal state. Despite the lack of being allocation free, eliding the output allocation will typically be an advantage because it is often much larger than the state variables — the state is a constant multiple of the argument array size, while the output will be anywhere from $$\mathcal{O}(1)$$ (for single ($$\ell, m$$) calculation) to $$\mathcal{O}(\ell_\mathrm{max}\cdot m_\mathrm{max})$$ (for the all-$$\ell$$/all-$$m$$ case) times larger than the argument array. It is also planned that a future version will address this by including an interface for providing a reusable state buffer. ↩︎

3. K. M. Górski et al. “HEALPix: A Framework for High-Resolution Discretization and Fast Analysis of Data Distributed on the Sphere”. In Astrophys. J. 622 (Apr. 2005), pp. 759–771. ads: 2005ApJ…622..759G↩︎

4. The Bicep2 and Keck Array Collaborations. " Bicep2/Keck Array VII: Matrix Based E/B Separation Applied to Bicep2 and the Keck Array". In Astrophys. J. 825, 66 (July 2016), p. 66. ads: 2016ApJ…825…66A ↩︎