Notes on Calculating the Spherical Harmonics

In this article I review the critical properties of the Spherical Harmonics. In particular, I concentrate on filling in a couple of details regarding numerical computation of the spherical harmonic transformations that I have found to be unstated or less clear than ideal in the literature.


Introduction

I briefly introduced the spherical harmonics in Part I of the Legendre.jl series, but that quickly moved on to focusing on the calculation of the Associated Legendre Polynomials. In this article, I now want to enumerate some of the most useful properties of the spherical harmonics themselves and make a few clarifications on some calculational aspects that I have had a hard time finding written explicitly in the common literature.

The spherical harmonics arise from solving Laplace’s equation \begin{align} \nabla^2 \psi = 0 \end{align} in spherical coordinates. The equation is separable into a radial component \(R(r)\) and an angular part \(Y(\theta,\phi)\) such that the total solution is \(\psi(r,\theta,\phi) \equiv R(r)Y(\theta,\phi)\). As before, we’ll ignore the radial component and continue with only the angular part.

The second-order differential equation for the angular component, written in the standard physicists' form where \(\theta\) is the colatitude and \(\phi\) the azimuth angles, is \begin{align} \frac{1}{\sin\theta} \frac{\partial}{\partial\theta} \left( \sin\theta \frac{\partial}{\partial\theta} Y \right) + \frac{1}{\sin^2\theta} \frac{\partial^2}{\partial\varphi^2} Y + \ell(\ell+1) Y &= 0 \end{align} The differential equation is further separable among the two angular coordinates (see Legendre.jl Part I), with the result being a family of solutions parameterized by the integer constants \(\ell > 0\) and \(|m| < \ell\). Including normalization factors1, the spherical harmonics \(Y_{\ell m}(\theta,\phi)\) can be written as \begin{align} Y_{\ell m}(\theta, \phi) &\equiv \sqrt{\frac{2\ell+1}{4\pi} \frac{(\ell-m)!}{(\ell+m)!}} P_\ell^m(\cos\theta) \, e^{im\phi} = \lambda_\ell^m(\cos\theta) \, e^{im\phi} \end{align} where \(P_\ell^m(\cos\theta)\) are the Associated Legendre Polynomials — for which I have recently written an entire series, wherein the pre-normalized functional form \(\lambda_\ell^m(\cos\theta)\) is defined — and \(e^{im\phi}\) are the complex exponentials.

Properties and Symmetries

The spherical harmonics are a complete, orthonormal basis for functions on the sphere \((\theta,\phi) \in \mathbb{S} = [0,\pi]\times[0,2\pi]\). Therefore they satisfy the condition that \begin{align*} \int_\mathbb{S} Y_{\ell m}(\theta,\phi) Y_{\ell' m'}(\theta,\phi) \,d\Omega &= \delta_{\ell\ell'} \delta_{mm'} \end{align*} where \(d\Omega = \sin\theta \,d\theta\,d\phi\).

It then follows that an arbitrary function on the sphere \(f(\theta,\phi)\) can be harmonically decomposed into an ensemble of coefficients \(a_{\ell m}\): \begin{align} a_{\ell m} &= \int_\mathbb{S} f(\theta, \phi) \overline{Y_{\ell m}}(\theta,\phi) \,d\Omega \end{align} where the overline on \(\overline{Y_{\ell m}}\) denotes complex conjugation. The function is resynthesized from the \(a_{\ell m}\)s as the linear combination of the spherical harmonics: \begin{align} f(\theta,\phi) &= \sum_{\ell=0}^{\ell_\mathrm{max}} \sum_{m=-\ell}^{\ell} a_{\ell m} Y_{\ell m}(\theta,\phi) \label{eqn:complex_synthesis} \end{align}

In general, \(f\) may be a complex function (i.e. \(f : \mathbb{S} \rightarrow \mathbb{C}\)), but if it is a real function, then there are some additional symmetries of note. This is especially useful in the case of CMB analysis since the sky is observed as a real field, and we can take advantage of the following symmetry properties.

First, due to an underlying property of the Legendre polynomials, the negative orders (\(m < 0\)) are related to the positive orders according to \begin{align} Y_{\ell (-m)}(\theta,\phi) &= (-1)^m \,\overline{Y_{\ell (+m)}}(\theta,\phi) \end{align} Given this relationship between the positive and negative orders, the harmonic coefficients have a similar property as well, with \begin{align} a_{\ell(-m)} &= (-1)^m \,\overline{a_{\ell(+m)}} \end{align} This directly constraints the order \(m = 0\) modes to being real coefficients since \(x = \overline x\) if and only if \(x\) is real. Furthermore, this permits the synthesis to be performed using a sum over the range of integers \(m = [0, \ell]\) instead of \([-\ell, \ell]\) because by pairwise summing the postive and negative orders with each other, we find: \begin{align*} a_{\ell(-m)} Y_{\ell (-m)} + a_{\ell(+m)} Y_{\ell (+m)} &= (-1)^m \,\overline{a_{\ell(+m)}} \cdot (-1)^m \,\overline{Y_{\ell (+m)}} + a_{\ell(+m)} Y_{\ell (+m)} \\ {} &= \overline{a_{\ell(+m)} Y_{\ell (+m)}} + a_{\ell(+m)} Y_{\ell (+m)} \\ {} &= 2 \operatorname{Re}\left[ a_{\ell(+m)} Y_{\ell (+m)} \right] \end{align*} Note that this excludes the \(m = 0\) cases which have no pair so there needs to be an extra multiplicity factor in the summation — for instance, choosing a notation that uses the Kronecker delta, \((2-\delta_{m0}) = \begin{cases} 1 & \text{if } m=0 \\ 2 & \text{otherwise} \end{cases}\), and \begin{align} f(\theta,\phi) &= \sum_{\ell=0}^{\ell_\mathrm{max}} \sum_{m=0}^{\ell} (2 - \delta_{m0}) a_{\ell m} Y_{\ell m}(\theta,\phi) \end{align} A similar argument applies to harmonic analysis of a field into the \(a_{\ell m}\) coefficients except there is no need for an extra multiplicity factor; instead you simply run the integration for only the non-negative orders.

The second useful symmetry is that the Legendre polynomials are even/odd for even/odd combinations of orders and degrees, so the spherical harmonics inherit the even/odd symmetry across the equator: \begin{align} P_\ell^m(-z) = (-1)^{\ell+m} P_\ell^m(z) \quad\Rightarrow\quad Y_{\ell m}(\pi - \theta, \phi) = (-1)^{\ell+m} Y_{\ell m}(\theta,\phi) \end{align} This is a useful property when it comes to designing pixelization schemes on the sphere since symmetry over the equator allows for reuse in the calculation of Legendre polynomials for pairs of latitudes with only a much [computationally] cheaper negation required of one of half of the pair.

Spherical Harmonic Synthesis

In the remainder of this article I want to discuss the synthesis of real fields from a set \(a_{\ell m}\)s. We will leave analysis (the inverse operation) to a future article since there are extra complications from the need to approximate a continuous integral with finite sums, and that topic deserves a more thorough, dedicated discussion.

We’ll start with a simple, reference implementation of the synthesis expression, where the equation below has replaced \(Y_{\ell m}(\theta,\phi)\) with its definition in terms of the pre-normalized Legendre polynomials \(\lambda_\ell^m(\cos\theta)\) and the complex exponentials \(e^{im\phi}\). \begin{align} f(\theta,\phi) &= \sum_{\ell=0}^{\ell_\mathrm{max}} \sum_{m=0}^{\ell} (2 - \delta_{m0}) a_{\ell m} \lambda_\ell^m(\cos\theta) e^{im\phi} \label{eqn:real_synthesis} \end{align} The goal of this reference implementation is to directly transcribe the mathematical definition into code, foregoing any optimizations so that we can be more confident that the results are accurate. The simple triple-loop structure should be straight-forward to follow: the inner two correspond to the summations in the equation above, and the final outer loop is over the coordinates \((\theta,\phi)\).

using Legendre: λlm!

# Range of pixel centers that cover the interval between `a` and `b` in `n` steps.
function crange(a, b, n)
    r = range(a, b, length = n+1)
    s = step(r)
    return range(a + s/2, b - s/2, length = n)
end

function synthesize_reference(alms::Matrix{C}, , ) where {C<:Complex}
    lmax, mmax = size(alms) .- 1
    θ = crange(0.0, 1.0π, )
    ϕ = crange(0.0, 2.0π, )
    
    R = real(C)
    Λ = Matrix{R}(undef, size(alms)...)
    Φ = Vector{C}(undef, mmax + 1)
    map = zeros(R, , )

    for y in 1:, x in 1:
        λlm!(Λ, lmax, mmax, cos(θ[y])) # λ_ℓ^m(cos θ) factors
        Φ .= cis.((0:mmax) .* ϕ[x])    # e^{imϕ} factors

        acc = zero(R)
        # Σ_{ℓ = 0}^{ℓmax}
        for  in 0:lmax
            # Σ_{m = 0}^{mmax}
            acc += real(alms[+1,1] * Λ[+1,1])
            for m in 1:min(, mmax)
                acc += 2 * real(alms[+1,m+1] * Λ[+1,m+1] * Φ[m+1])
            end
        end
        map[y, x] = acc
    end
    return map
end

For simplicity, synthesize_reference takes only an array of \(a_{\ell m}\)s and the number of pixels in each of the colatitude and azimuth directions to cover the entire sphere. (The output array is effectively an equidistant cylindrical projection of the sphere into a 2D image.)

Similar to the demonstration in Part V of the Legendre.jl series, we can draw a random distribution of \(a_{\ell m}\)s consistent with a red (\(\ell^{-1}\)) power specrum, and synthesize the corresponding field.

In particular, note that even with a very low image resolution of only \(720 \times 360\) pixels and a low \(\ell_\mathrm{max} = 100\), the synthesis still takes over 9 seconds (after warmup) on my machine.

alms = zeros(ComplexF64, 101, 101)
for  in 1:size(alms,1)-1
    alms[+1,1] = randn(Float64) / 
    alms[+1,2:+1] .= randn.(ComplexF64) ./ 
end

,  = 360, 720
ref = @time synthesize_reference(alms, , )
  9.781254 seconds (1.49 M allocations: 31.540 MiB)

We can very roughly explain the relatively long time it took to synthesize a very modest resolution map with some rough approximations of the computational complexity. For a map covering the entire sohere, let us assume that both coordinate directions have approximately \(N\) pixels and scale together. The outer loop over every pixel alone has an asymptotic complexity of \(\bigO(N^2)\). Then add in the requirements to calculate the basis functions. Very, very roughly speaking, \(N\) pixels have a “Nyquist” frequency that implies \(\ellmax \sim N/2\) (and \(\mmax = \ellmax\)), so calculating the Legendre polynomials has an asymptotic complexity of \(\bigO(N^2)\), and the complex exponentials over the orders \(m\) scale as \(\bigO(N)\). The basis functions are calculated for every pixel, so put all together, the total complexity is something like \begin{align*} \mathtt{synthesize\_reference} &\sim \underbrace{\bigO(N^2) \cdot \bigO(N^2)}_{\text{Legendre part}} + \underbrace{\bigO(N^2) \cdot \bigO(N)}_{\text{exponential part}} = \bigO(N^4) \end{align*} which bodes poorly for achieving higher pixel densities and/or higher spatial resolution modes.

Of course, this reference implementation is quite wasteful in obvious ways. For instance, we can drop an entire factor of \(N\) in the Legendre part by only updating the Legendre polynomials once per row of constant-latitude (isolatitude) pixels. This is one of the core optimizations which are often made to make fast spherical harmonic transforms — pixelization schemes are purposely designed to have isolatitude rings among which computational work can be shared. In the following section, we’ll look at a couple more techniques that make transforms over large numbers of pixels and up to high degrees/orders possible.

Fast Spherical Harmonic Synthesis

Popular analysis libraries make several optimizations to the naive harmonic transform algorithm that make practical use of spherical harmonic transforms possible. I was specifically introduced to the following techniques via being asked to create Matlab MEX bindings4 to existing packages used in the wider CMB community, with Reinecke et al5 being my choice of reference document for much of my understanding.

Per-ring Legendre polynomials

First, as already mentioned, only calculating the Legendre polynomials once for each row of isolatitude pixels will already reduce the complexity cost of that part of the process from \(\bigO(N^4)\) to \(\bigO(N^3)\). The only requirement is that we must invoke knowledge of the pixel layout, so in principle there is small decrease in the generality of the implementation (but in practice many popular pixeliation schemes already meet this constraint).

For the case of an ECP grid as used previously, the rows of the output matrix are by definition isolatitude rings, so a new synthesize_ecp function can be constructed from the reference implementation with only a couple of line changes to split up the Cartesian loop into two distinct loops. The [whitespace agnostic] diff between the two functions is:

@@ -7,7 +7,7 @@
-function synthesize_reference(alms::Matrix{C}, nθ, nϕ) where {C<:Complex}
+function synthesize_ecp(alms::Matrix{C}, nθ, nϕ) where {C<:Complex}
     lmax, mmax = size(alms) .- 1
     θ = crange(0.0, 1.0π, nθ)
     ϕ = crange(0.0, 2.0π, nϕ)
@@ -17,8 +17,10 @@
     Φ = Vector{C}(undef, mmax + 1)
     map = zeros(R, nθ, nϕ)
 
-    for y in 1:nθ, x in 1:nϕ
+    for y in 1:nθ
         λlm!(Λ, lmax, mmax, cos(θ[y])) # λ_ℓ^m(cos θ) factors
+
+        for x in 1:nϕ
             Φ .= cis.((0:mmax) .* ϕ[x])    # e^{imϕ} factors
 
             acc = zero(R)
@@ -32,6 +34,7 @@
             end
             map[y, x] = acc
         end
+    end
     return map
 end

With just this one simple change, we can already cut the runtime by about 80% for the same resolution and \(\ellmax\) parameters as used in the previous section.

@time synthesize_ecp(alms, , )
  1.820308 seconds (200.21 k allocations: 12.100 MiB, 0.34% gc time)

FFT-based ring synthesis

The second common optimization is to reduce the cost of calculating the complex exponentials by reducing the operation to a Fast Fourier transform (FFT). Start with Eqn. \(\ref{eqn:complex_synthesis}\) above, and change the order of summation so that the sums over degrees \(\ell\) are done before the orders \(m\). (Note that we are starting without the assumption of a real field because it simplifies the derivations initially.) \begin{align*} f(\theta,\phi) &= \sum_{m=-\ellmax}^{\ellmax} \left[ \sum_{\ell=|m|}^{\ellmax} a_{\ell m} \lambda_\ell^m(\cos\theta) \right] e^{im\phi} \end{align*} I have purposely written the groupings such that the implication of an FFT is more obvious — the inner summation are almost a set of Fourier coefficients that could then be transformed via an FFT to quickly perform the outer summation. Using an FFT reduces the computational complexity from requiring \(\bigO(N^2)\) calculations (\(N\) pixels by \(N\) orders) per pixel row to \(\bigO(N \log N)\) per row. Therefore, the FFT makes the exponential component again subdominant to the Legendre part (\(\bigO(N^2 \log N)\) versus \(\bigO(N^3)\), respectively, over all pixels).

Bandlimited Case

Actually making use of an FFT requires that a couple of specific conditions are met. The isolatitude ring of pixels must be uniformly spaced in azimuth so that the sequence of pixel azimuths are \begin{align*} \phi_k = \phi_0 + \frac{2\pi k}{N_\phi} \end{align*} where \(\phi_0\) is the offset of the first pixel in the ring away from \(\phi = 0\), \(N_\phi\) is the number of pixels in the ring, and \(k\) is an integer in \([0, N_\phi - 1]\). Then the expression can be rewritten as \begin{align*} f(\theta,\phi_k) &= \sum_{m=-\ellmax}^{\ellmax} g_m e^{2\pi imk/N_\phi} &\text{where } g_m \equiv e^{im\phi_0} \sum_{\ell=|m|}^{\ellmax} a_{\ell m} \lambda_\ell^m(\cos\theta) \end{align*} When \(N_\phi = 2\ellmax + 1\), the correspondance to the FFT is exact, and \(f(\theta, \phi_k) = \fft^{-1}(g_m)\), assuming using FFTW’s sign convention and \(\fft\) and \(\fft^{-1}\) are the foward and inverse FFT operations, respectively. Also recall that “negative frequencies” are just a matter of interpretation and are equivalent to a higher positive frequency one period later. For instance, the \(m = -1\) term is equivalent in the Fourier transform to the \((N_\phi - 1)\)-th Fourier frequency.

The case where \(2\ellmax + 1 < N_\phi\) is also easy to handle — we can just assume that the frequency axis is zero-extended for \(|m| > \ellmax\) as necessary since the extra terms will not contribute to the sum.

Before discussing the final case where \(2\ellmax + 1 > N_\phi\), let us reintroduce the knowledge that the output field is assumed to be real. The conjugate symmetry of harmonic coefficients discussed in the previous section applies to the Fourier transform as well, and because the case is very common, FFTW implements a special case for dealing with FFTs of real functions. The convenient fact about the FFTW implementation is that it handles the extra multiplicity factor of the \(m \neq 0\) modes6 for us, so using the notation that \(\fft_R\) and \(\fft_R^{-1}\) are the forward and inverse real-FFT operations, respectively, we can again restrict the summations to be only over positive and negative orders.

\begin{align*} f(\theta,\phi_k) = \fft_R^{-1}(g_m) &= \sum_{m=0}^{\ellmax} g_m e^{2\pi imk/N_\phi} \\ \text{where } g_m &\equiv e^{im\phi_0} \sum_{\ell=m}^{\ellmax} a_{\ell m} \lambda_\ell^m(\cos\theta) \end{align*}

Implementing these changes, almost the entirety of the inner loops are rewritten — the loops over \(\ell\) and \(m\) are exchanged, and the explicit loop over \(\phi\) pixels is eliminated since it occurs inside the brfft() function.

@@ -1,4 +1,5 @@
 using Legendre: λlm!
+using FFTW
 
 # Range of pixel centers that cover the interval between `a` and `b` in `n` steps.
 function crange(a, b, n)
@@ -10,30 +11,27 @@
 function synthesize_ecp(alms::Matrix{C}, nθ, nϕ) where {C<:Complex}
     lmax, mmax = size(alms) .- 1
     θ = crange(0.0, 1.0π, nθ)
-    ϕ = crange(0.0, 2.0π, nϕ)
+    ϕ₀ = π / nϕ # == first(crange(0.0, 2.0π, nϕ))
+    nϕr = nϕ ÷ 2 + 1 # length real-only half of FFT axis; see `rfft()`
 
     R = real(C)
     Λ = Matrix{R}(undef, size(alms)...)
-    Φ = Vector{C}(undef, mmax + 1)
+    g = zeros(C, nϕr)
     map = zeros(R, nθ, nϕ)
 
     for y in 1:nθ
         λlm!(Λ, lmax, mmax, cos(θ[y])) # λ_ℓ^m(cos θ) factors
-
-        for x in 1:nϕ
-            Φ .= cis.((0:mmax) .* ϕ[x])    # e^{imϕ} factors
-
-            acc = zero(R)
-            # Σ_{ℓ = 0}^{ℓmax}
-            for ℓ in 0:lmax
-                # Σ_{m = 0}^{mmax}
-                acc += real(alms[ℓ+1,1] * Λ[ℓ+1,1])
-                for m in 1:min(ℓ, mmax)
-                    acc += 2 * real(alms[ℓ+1,m+1] * Λ[ℓ+1,m+1] * Φ[m+1])
+        for m in 0:mmax
+            acc = zero(C)
+            # Σ_{ℓ = m}^{lmax}
+            for ℓ in m:lmax
+                acc += alms[ℓ+1,m+1] * Λ[ℓ+1,m+1]
             end
+            acc *= cis(m * ϕ₀)         # ring offset rotation
+            g[m+1] = acc
         end
-            map[y, x] = acc
-        end
+        map[y, :] = brfft(g, nϕ)
+        fill!(g, zero(C))              # reset g_m factors
     end
     return map
 end

This leads to a further \(\sim 95\%\) reduction in the runtime (or compared to the original reference case, we are now about \(100\times\) faster now).

@time synthesize_ecp(alms, , )
  0.112625 seconds (159.43 k allocations: 12.393 MiB)

Frequency-aliasing Case

Now let us return to the unhandled case where \(2\ellmax + 1 > N_\phi\). As written, if we decrease the requested output resolution of the map enough, we actually get a runtime exception due to out-of-bounds access on the \(g_m\) factors array:

@time synthesize_ecp(alms,  ÷ 8,  ÷ 8)
BoundsError: attempt to access 46-element Array{Complex{Float64},1} at index [47]
...

So the question becomes what to do about the out of bounds frequencies. There are two obvious choices:

  1. Ignore invalid indices, with the reasoning being that very high frequency information cannot be unambiguously reconstructed from a low-resolution map.
  2. Alias the harmonic coefficients into a different frequency bin and sythesize the aliased field.

This is not a purely academic question, either. A well-known pixelization format in CMB analysis is the HEALPix7 format, and for this discussion, the relevant detail is that the pixel rings near the north and south poles vary in length, increasing from only 4 pixels in length at the first/last row and growing by 4 pixels per row progressing toward the equator through a polar cap region. Therefore, synthesis on a HEALPix grid necessarily must handle the situation where the \(2\ellmax + 1 > N_\phi\) (for any non-trivial \(\ellmax\)).

My intution prior to investigating was that option 2 with aliasing of the frequencies was the correct choice, and some simple trials with existing public libraries for spherical harmonic transforms confirmed my suspicions. Unfortunately, even though the libsharp package by Reinecke et al5 demonstrably performs aliased synthesis, the paper neither includes any mention of the behavior nor gives any mathematical details of the correct aliasing calculation.

This is the crux of the issue that motivated this entire article.

The answer lies in careful application of the aliasing behavior of the discrete Fourier transform (DFT), with some added complexity from the assumption of a real-only output field and using only half-length (positive-only frequencies) arrays.

Let \(m \in [0, \mmax]\) be an index where \(\mmax > N_\phi\). The first property is that for a full length harmonic coefficient vector (\(N_\phi\) elements, encompassing both postive and negative frequencies), indexing is periodic in the frequency domain. Therefore any arbitrary index \(m\) can be lowered to the range of valid frequencies by simple modular arithmetic: \(i = m \pmod{N_\phi}\). (This is where thinking of negative frequencies in terms of positive indices makes the concept easier to see.)

There are two special cases in the aliasing operation which requires further consideration. The first case occurs when the aliased index is the \(0\)-frequency (DC) component, and the second case is when \(i = N_\phi / 2\) for even \(N_\phi\). In both of these situations, the symmetries in the DFT mean the coefficient is real-only, and it follows that without a complex conjugate pair, there isn’t a factor of 2 multiplicity in the forward transform (going from real to frequency space). Because we’ve aliased a mode which does have a complex conjugate pair into a frequency bin where it is assumed it does not, we have to explicitly account for the multiplicity factor.

Thus far it has been easiest to calculate indices onto the frequency axis that spans both positive and negative frequencies, but we want everything represented within the positive-only half of the axis. We accomplish this by another aliasing (folding) of \(i\) over \(\lfloor N_\phi / 2 \rfloor\) if \(i > \lfloor N_\phi / 2 \rfloor\). This, though, moves the harmonic coefficient from an entry which would have beeen interpreted as a negative frequency in a full-length inverse FFT to an index where both full-length and half-length FFTs interpret it as a positive frequency, so we have to account for the change in the conjugate symmetry in the frequency folding and explicitly conjugate the coefficient.

The following alias_harmonic() function implements the aliasing behavior just described.

@inline function alias_harmonic(len, i, coeff)
    nyq = len ÷ 2
    i < nyq && return (i, coeff)                # No need for alias analysis
    i = mod(i, len)
    if i > nyq                                  # If in the negative-frequency half
        i = len - i                             #   fold back into positive frequencies
        coeff = conj(coeff)                     #   and conjugate to account for -/+ switch
    elseif i == 0 || (iseven(len) && i == nyq)  # If aliased to a real-only term
        coeff = complex(2real(coeff))           #   manually account for 2× multiplicity
    end
    return (i, coeff)
end

The change to synthesize_ecp is almost trivial once all the aliasing behavior has been sorted out. The only (maybe subtle) point is that the aliasing is performed after the ring offset [complex phase] rotation, and the complex exponential is not changed to use the aliased frequency (keeping its phase factor of \(m\phi_0\)).

@@ -28,7 +41,8 @@ function synthesize_ecp(alms::Matrix{C}, nθ, nϕ) where {C<:Complex}
                 acc += alms[ℓ+1,m+1] * Λ[ℓ+1,m+1]
             end
             acc *= cis(m * ϕ₀)         # ring offset rotation
-            g[m+1] = acc
+            i, acc = alias_harmonic(nϕ, m, acc)
+            g[i+1] += acc
         end
         map[y, :] = brfft(g, nϕ)
         fill!(g, zero(C))              # reset g_m factors

This allows us to synthesize a field with arbitrary values of \((\ellmax,\mmax)\), and more importantly, it reproduces the field as constructed by the brute-force reference implementation where none of these aliasing concerns had to be explicitly handled. (The small numerical difference is expected due to the latest function using an FFT while the reference implementation explicitly evaluates the complex exponentials, probably to poorer precision than the FFT.)

aliased_ref  = synthesize_reference(alms,  ÷ 8,  ÷ 8)
aliased_fast = synthesize_ecp(alms,  ÷ 8,  ÷ 8)
maximum(abs, aliased_fast - aliased_ref)
3.295280714965543e-14

Synthesis over pairs of rings

The only remaining symmetry mentioned in Section II that we haven’t invoked yet is the fact that the Legendre polynomials in are even/odd over as a function of \(\ell + m\). We can very easily take advantage of this last symmetry in an ECP grid because we know that by construction the rings are symmetrically located across the equator, so we can synthesize rings in the northern and southern hemispheres simultaneously. Doing so only requires duplicating the bookkeeping at each step.

@@ -8,17 +8,17 @@ function crange(a, b, n)
     return range(a + s/2, b - s/2, length = n)
 end
 
-@inline function alias_harmonic(len, i, coeff)
+@inline function alias_harmonic(len, i, coeffs)
     nyq = len ÷ 2
-    i < nyq && return (i, coeff)                # No need for alias analysis
+    i < nyq && return (i, coeffs)               # No need for alias analysis
     i = mod(i, len)
     if i > nyq                                  # If in the negative-frequency half
         i = len - i                             #   fold back into positive frequencies
-        coeff = conj(coeff)                     #   and conjugate to account for -/+ switch
+        coeffs = conj.(coeffs)                  #   and conjugate to account for -/+ switch
     elseif i == 0 || (iseven(len) && i == nyq)  # If aliased to a real-only term
-        coeff = complex(2real(coeff))           #   manually account for 2× multiplicity
+        coeffs = complex.(2 .* real.(coeffs))   #   manually account for 2× multiplicity
     end
-    return (i, coeff)
+    return (i, coeffs)
 end
 
 function synthesize_ecp(alms::Matrix{C}, nθ, nϕ) where {C<:Complex}
@@ -26,26 +26,34 @@ function synthesize_ecp(alms::Matrix{C}, nθ, nϕ) where {C<:Complex}
     θ = crange(0.0, 1.0π, nθ)
     ϕ₀ = π / nϕ # == first(crange(0.0, 2.0π, nϕ))
     nϕr = nϕ ÷ 2 + 1 # length real-only half of FFT axis; see `rfft()`
+    nθh = (nθ + 1) ÷ 2 # number of rings in northern hemisphere
 
     R = real(C)
     Λ = Matrix{R}(undef, size(alms)...)
-    g = zeros(C, nϕr)
+    gn = zeros(C, nϕr) # phase factors for northern ring
+    gs = zeros(C, nϕr) # phase factors for southern ring
     map = zeros(R, nθ, nϕ)
 
-    for y in 1:nθ
+    for y in 1:nθh
+        y′ = nθ - y + 1 # southern ring index
         λlm!(Λ, lmax, mmax, cos(θ[y])) # λ_ℓ^m(cos θ) factors
         for m in 0:mmax
-            acc = zero(C)
+            accn, accs = zero(C), zero(C)
             # Σ_{ℓ = m}^{lmax}
             for ℓ in m:lmax
-                acc += alms[ℓ+1,m+1] * Λ[ℓ+1,m+1]
+                term = alms[ℓ+1,m+1] * Λ[ℓ+1,m+1]
+                accn += term
+                accs += isodd(ℓ + m) ? -term : term
             end
-            acc *= cis(m * ϕ₀)         # ring offset rotation
-            i, acc = alias_harmonic(nϕ, m, acc)
-            g[i+1] += acc
+            accn, accs = (accn, accs) .* cis(m * ϕ₀) # ring offset rotation
+            i, (accn, accs) = alias_harmonic(nϕ, m, (accn, accs))
+            gn[i+1] += accn
+            gs[i+1] += accs
         end
-        map[y, :] = brfft(g, nϕ)
-        fill!(g, zero(C))              # reset g_m factors
+        map[y, :] = brfft(gn, nϕ)
+        fill!(gn, zero(C))
+        map[y′, :] = brfft(gs, nϕ)
+        fill!(gs, zero(C))
     end
     return map
 end

This time, the runtime improvement is pretty modest at only about \(\sim 15\%\).

using BenchmarkTools
@btime synthesize_ecp(alms, 10 * , )
# each ring
  355.481 ms (226806 allocations: 51.77 MiB)
# ring pairs
  294.663 ms (217807 allocations: 51.64 MiB)

In the asymptotic limit, you’d hope for a factor of two improvement (since we’re now calculating the most expensive operation — evaluation of the Legendre polynomials — in half), but presumably we’re far from the asymptotic regime and seeing the limits imposed by other parts of the algorithm. For instance, the λlm! is not allocation-free, so throughput may now be constrained by performing memory management. At this point, though, further improvements are more to do with implementation details and less to do with mathematical symmetries.

Conclusions

This has ended up being a much longer article than I had originally anticipated. Hopefully the summary of spherical harmonic properties listed in Section II can serve as a useful reference, and especially hope that the explicit discussion of frequency aliasing of modes in Section IV can eliminate any confusion others may have in how to treat high-\(m\) terms when implementing the FFT-based fast spherical harmonic transform.


  1. Be aware that there are multiple conventions for how to define the spherical harmonics that differ in the details of the normalization factors. For instance, the so-called Condon-Shortley phase factor of \((-1)^m\) can either be omitted entirely, included as an explicit term in the normalization of the spherical harmonics, or included as an explicit symmetry in the definition of the Legendre polynomials. We use this last option here and in part II of the Legendre.jl series, which is also consistent with typical physicists' experience in e.g. Classical Electrodynamics by J.D. Jackson2. See also Abramowitz & Stegun3↩︎

  2. J. D. Jackson. Classical Electrodynamics. 3rd ed. John Wiley & Sons, Inc., 1999. isbn: 978-0-471-30932-1. ↩︎

  3. M. Abramowitz and I. A. Stegun, eds. Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables. Applied Mathematics Series 55. Washington, D.C.: U.S. Government Printing Office, June 1964. 10th printing, Dec 1972, with corrections. ↩︎

  4. The first time, I built a cross-language binding package (almpure) to the S2HAT and pS2HAT libraries. More recently, I started a framework (healmex) which makes available the functions in the healpix library for use in Matlab (in analogy with how healpy package provides the features to Python users). (To be honest, healmex is only partially complete at this point as I stopped working on it to write and finish my thesis, but I believe there’s enough of the core interop framework in place that hooking up the rest of the API should be relatively straight forward.) ↩︎

  5. M. Reinecke and D.S. Seljebotn. “Libsharp - spherical harmonic transforms revisited”. In: Astron. & Astrophy. 554, (Jun 2013) ads: 2013A&A…554A.112R ↩︎

  6. If you are familiar with the “half-complex” FFT, you will probably also know that there is also a real-only coefficient of Fourier frequency \(N / 2\) when \(N\) is even. We will reencounter this fact later when discussing aliasing, but for the moment it can be ignored since the frequency axis over orders \(m\) will always be odd since there are always \((2\ell + 1)\)-many \(m\) terms. ↩︎

  7. K. M. Górski et al. “HEALPix: A Framework for High-Resolution Discretization and Fast Analysis of Data Distributed on the Sphere”. In: Astrophys J. 662 (Apr 2005) ads: 2005ApJ…622..759G ↩︎